Home Dashboard Directory Help
Search

Compile error with source file containing UTF8 strings (in CJK system locale) by wva


Status: 

Closed
 as Won't Fix Help for as Won't Fix


3
0
Sign in
to vote
Type: Bug
ID: 341454
Opened: 5/1/2008 5:19:19 AM
Access Restriction: Public
0
Workaround(s)
view
4
User(s) can reproduce this bug

Description

Compiling MySql source code fails on Japanese Windows or if System locale is set to one of CJK locales.

Compile succeeds if system locale is English. Also, VS2003.NET will compile without errors.

The file in question contains UTF8 strings. The workaround to store the file with UTF-8 BOM works, but cannot be applied, because the same file is compiled on different compilers and OS combinations.

For more information refer to mysql bug description here:
http://bugs.mysql.com/bug.php?id=36281

I will attach the file in question shortly, together with preprocessed source
Details
Sign in to post a comment.
Posted by 叶剑飞 Victor on 11/12/2013 at 12:34 PM
Why doesn't Microsoft add the compiler option "/encoding" for cl.exe? If Microsoft adds it, we can type compile command as "cl /encoding utf-8 filename.cpp" and then compile the source file with UTF-8 encoding.
Posted by Microsoft on 5/6/2008 at 11:17 AM
Hi: you are correct the BOM is not part of the C++ Standard - but if you want non-ASCII characters then the "official" and portable way to get them is to use the \u (or \U) hex encoding (which is, I agree, just plain ugly and error prone).

The compiler when faced with a source file that does not have a BOM the compiler reads ahead a certain distance into the file to see if it can detect any Unicode characters - it specifically looks for UTF-16 and UTF-16BE - if it doesn't find either then it assumes that it has MBCS. I suspect that in this case that in this case it falls back to MBCS and this is what is causing the problem.

Being explicit is really best and so while I know it is not a perfect solution I would suggest using the BOM.

Jonathan Caves
Visual C++ Compiler Team.

Jonathan Caves
Visual C++ Compiler Team
Posted by wva on 5/6/2008 at 10:33 AM
I set the status from "Won't Fix" back to "Active". Would appreciate a comment why you think it is not a compiler error and why "Won't fix". This issue is at least a regression. The file compiles ok on VS2003.
Posted by wva on 5/5/2008 at 4:21 PM
Jonathan,
thank you for the quick reply.

Workaround using UTF8-BOM works, as I wrote in the bug description
but unfortunately I cannot use that workaround. I also wrote in the bug description
this file is compiled on different platforms using different compilers.

I have not read the newest C++ specification, but my guts feeling is that BOM is not
part of it. Even if I can workarounf my local problem with this compiler or VS2008,
it will fail on gcc and forte and on VS2003 as well.

I'm trying to understand what the compiler needs to guess here. All non-ASCIIs are strings here, and I would argue that char foo [] ="bar" in C or C++ denotes a null terminated array of bytes ( since 1970 or so )

In this case, no conversion to/from another encoding is desired. I also have no intention to output them with printf or so. Why not just to preserve the bytes "as is"?

Even if the compiler on some reason always needs to convert my source file, then I could also live with another way to specify encoding, without fallback to BOM because of the reasons outlined above.

E.g using a #pragma setlocale("english.65001") .Which also did not work for me in this case.

In the the worst case, I know, I still can convert all bytes in this file to hex, but this is somewhat ugly. I still would like to see and edit the international strings in UTF8 capable editor, Visual Studio IDE for example.
Posted by Microsoft on 5/5/2008 at 3:35 PM
Hi: our suggestion for fixing this issue would be to use a BOM - this unambiguously lets the compiler know the encoding of the file - without this the compiler needs to revert to guess work.

Jonathan Caves
Visual C++ Compiler Team
Posted by Microsoft on 5/2/2008 at 12:18 AM
Thanks for your feedback.

We are escalating this issue to the appropriate group within the Visual Studio Product Team for triage and resolution.
These specialized experts will follow-up with your issue.

Thank you,
Visual Studio Product Team
Sign in to post a workaround.