System.Text.Encoding decoder/encoder does not follow standard (ISO/IEC 8859-7) for codepage 28597 - by JoukeNuman

Status : 

  Fixed<br /><br />
		This item has been fixed in the current or upcoming version of this product.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.


1
0
Sign in
to vote
ID 790566 Comments
Status Closed Workarounds
Type Bug Repros 0
Opened 6/20/2013 7:33:11 AM
Access Restriction Public

Description

According to ISO/IEC 8859-7 (see http://en.wikipedia.org/wiki/ISO/IEC_8859-7), the
byte values 0xA1 and 0xA2 should decode to U+2018 an U+2019 respectively but .NET encodes them to U+02BD and U+02BC respectively. Reversely, the character  U+2018 an U+2019 both encode to 0x27.

As this violates said standard, .NET framework should not claim conformance to that standard,
eg. on http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx or in the EncodingInfo.Name property for that encoding. Alternatively, implementation should be changed to match the standard
Sign in to post a comment.
Posted by JoukeNuman on 7/11/2013 at 3:03 AM
From your answer, I deduce that you are not planning to support full 8859/7, sorry to hear that.
In addition to the documentation page you mention, you must (I don't understand your "I'll try to add some comment..." remark, are you going to update or not?) also update at least following pages:

http://msdn.microsoft.com/en-us/library/system.text.encoding.getencodings.aspx (it is referred directly from http://msdn.microsoft.com/en-us/library/ms404377.aspx, your comment on class doc page might be missed)
http://msdn.microsoft.com/en-us/library/system.text.encoding.bodyname.aspx (also returns 'iso-8859-7' for CP 1253, see below).
http://msdn.microsoft.com/en-us/library/system.text.encoding.headername.aspx
http://msdn.microsoft.com/en-us/library/system.text.encoding.webname.aspx
http://msdn.microsoft.com/en-us/library/System.Text.EncodingInfo.aspx
http://msdn.microsoft.com/en-us/library/system.text.encodinginfo.name.aspx

http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx

There are also pages in Office, Exchange and Outlook documentation that proclaim 28597 as 8859/7 compliant, just Bing for 8859/7 & 8859-7

When searching for doc pages to update, I also discovered that Encodings.BodyName for codepage 1253 is "iso-8859-7", which is patently not true (I count nine differences), should I create new bug for that?
Posted by Microsoft on 7/8/2013 at 4:59 PM
Thanks again for your follow up. we didn't change any documentation in response to this report. I'll try to add some comment regarding this codepage in the page http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx to tell about the behavior of the characters A1 and A2
Posted by JoukeNuman on 6/28/2013 at 10:16 AM
I don't question that the codepage works as documented but that you claim conformance to the 8859/7 standard.
The Encoding.Name for codepage 28597 should not return the iso name and all documentation where conformance of 28597 to 8859/7 is claimed must be updated as this is not true and therefore misleading and can result in errors in apps using this codepage in the assumption it is maps 8859/7.
If changing 28597 has risk of breaking existing apps, then a new codepage should be created with proper mapping and proper EncodingInfo.Name.. That way, apps iterating across encodings and selecting on the EncodingInfo.Name get the encoding they ask for.
Ofcourse this still has a risk of breaking apps, but IMO standards should be followed as claimed, that is what developers expect.

I understand your recommendation to use utf whenever possible, but there are occasions when interchange using 8859 is required.

Note: http://msdn.microsoft.com/en-us/goglobal/bb964656 "Code pages supporte by Windows - ISO code pages" does not contain 8859/7 as a supported ISO code page, maybe removed in response to earlier bug report on this codepage?
Posted by Microsoft on 6/27/2013 at 6:48 PM
Thanks for your feedback. The mapping of this codepage is documented on the link http://msdn.microsoft.com/en-US/goglobal/cc305173.aspx . You are right about this codepage is not conforming with the standard with code points A1 and A2 but changing that can potentially break other apps which depend on the current behavior. In general we recommend people to use Unicode
Posted by Microsoft on 6/21/2013 at 1:51 AM
Thank you for submitting feedback on Visual Studio and .NET Framework. Your issue has been routed to the appropriate VS development team for investigation. We will contact you if we require any additional information.
Posted by Macy [MSFT] on 6/20/2013 at 10:50 AM
Thank you for your feedback, we are currently reviewing the issue you have submitted. If this issue is urgent, please contact support directly(http://support.microsoft.com)