Home Dashboard Directory Help
Search

mbstowcs_s does not return an error when the current code page does not support all the characters in mbstr by jessica_2009


Status: 

Closed
 as By Design Help for as By Design


0
0
Sign in
to vote
Type: Bug
ID: 392573
Opened: 1/8/2009 5:41:46 PM
Access Restriction: Public
0
Workaround(s)
view
0
User(s) can reproduce this bug

Description

When mbstr contains characters not supported by the current process code page, mbstowcs_s does not return an error and put garbage characters in wcstr.

Example:

setlocale(LC_CTYPE, ".1252"); //set the process to use a locale with English code page
    //you can also try not setting the locale. The default process LC_CTYPE locale is C
    //which means 7-bit ASCII.
char* mbTestStr = "Test. 真的."; //this is a 9 character string with 2 Chinese characters.
size_t charCount;
wchar_t wcStr[50];
errno_t error = mbstowcs_s(charCount, wcStr, 50, mbStr, -1);

After the call, no error is returned, charCount becomes 12, and wcStr contains "Test. ÕæµÄ." It seems charCount is the actual byte count in mbStr. The two Chinese characters each takes two bytes in mbStr.

The function should fail to convert the Chinese characters and return an error because the code page does not support Chinese characters.

If I set the locale to ".936" (936 is a code page for simplified Chinese). No error is returned, charCount becomes 10, and wcStr contains "Test. 真的.". Everything is correct.

_mbstowcs_s_l has the same problem if you give it a locale that does not support all the characters in mbstr.

Details
Sign in to post a comment.
Posted by Michael S. Kaplan on 8/24/2009 at 12:27 PM
As Ale Contenti notes above, this issue is essentially by design, although it seems like MB_ERR_INVALID_CHARS should cause this case to fail.

The underlying issue is one I discuss in this blog:

http://blogs.msdn.com/michkap/archive/2007/07/25/4037646.aspx

which notes that the flag does not affect all unmapped characters, only some of them. Unfortunately, 0x8F is not defined but also handled for cp1252; and it is legal in cp932.

Michael Kaplan
Microsoft
Posted by Microsoft on 8/24/2009 at 12:01 PM
Hi Jessica,

I tried your last repro, and we simply end up calling the Windows function MultiByteToWideChar (check crt\src\mbstowcs.c):

            if ( (count = MultiByteToWideChar( _loc_update.GetLocaleT()->locinfo->lc_codepage,
                                             MB_PRECOMPOSED |
                                                MB_ERR_INVALID_CHARS,
                                             s,
                                             -1,
                                             pwcs,
                                             (int)n )) != 0 )


I'm resolving this by design for now and following up with Michael Kaplan (from http://blogs.msdn.com/michkap/default.aspx) and have him reply on MSDN connect as well.

HTH,

Ale Contenti
VC++ Dev Lead
Posted by Wrenashe on 8/6/2009 at 10:51 PM
The problem here I saw is in _mbstowcs_l_helper() which is called by _mbstowcs_l() by mbstowcs() in visual studio 2008:

if ( (count = MultiByteToWideChar( _loc_update.GetLocaleT()->locinfo->lc_codepage,
                                             MB_PRECOMPOSED |
                                                MB_ERR_INVALID_CHARS,
                                             s,
                                             -1,
                                             pwcs,
                                             (int)n )) != 0 )
                return count - 1; /* don't count NUL */

Why not use _loc_update.GetLocaleT()->mbcinfo->lc_codepage instead of _loc_update.GetLocaleT()->locinfo->lc_codepage? We here need to do multi-byte char to wide char convert, default locale is not expected to get involved, that mbcinfo->lc_codepage is required in the context.
Posted by Wrenashe on 8/6/2009 at 7:17 AM
Visual studio 2008 has the same problem with jessica_2009's case. So the bug?
Posted by jessica_2009 on 2/11/2009 at 10:43 AM
Pat,

Thank you very much for taking the time to look into this. I was able to see the narrow string correctly in VB editor and the debugger when I tested it (with Windows "language for non-unicode program" set to Chinese(PRC))

But anyway, I further thought about it. I agree this probably should not be considered a bug. I told mbstowcs_s to use code page 1252, but gave it a narrow string encoded in code page 936. mbstowcs_s would just faithfully try to interprete the string using code page 1252. But I do have a question, if you don't mind. When mbstowcs_s encounts a character undefined in the code page that it is told to use, what is it supposed to do? I expect that it reports an error.

let's try this :
---------------------------------------------------------
setlocale(LC_CTYPE, ".1252");
char mbStr[];
wchar_t wcStr[5];
size_t count;

//initialize mbBuffer directly
mbStr[0]=0x8F;
mbStr[1]=0x9F;
mbStr[2]=0x23; //character '#'
mbStr[3]=0;    //null terminator
//in code page 936, 0x8F9F is one character: '彑' (http://www.microsoft.com/globaldev/reference/dbcs/936/936_8F.mspx)
//in code page 1252, 8F is undefined, 9F is 'Ÿ"    (http://www.microsoft.com/globaldev/reference/sbcs/1252.mspx)


int error = mbstowcs_s(count, wcStr, 10, mbStr, -1);
---------------------------------------------------------


After the mbstowcs_s call, I got:
        charCount    4    unsigned int
-        mbStr    0x0012fe00 "彑#"    char [5]
        [0]    -113    char
        [1]    -97    char
        [2]    35 '#'    char
        [3]    0    char
        [4]    -52    char

        error    0    int
-        wcStr    0x0012fdec "Ÿ#"    wchar_t [5]
        [0]    143 L''    wchar_t
        [1]    376 L'Ÿ'    wchar_t
        [2]    35 L'#'    wchar_t
        [3]    0    wchar_t
        [4]    65021 L'﷽'    wchar_t
--------------------

So, mbstowcs_s accepted 0x8F even though it is not defined in the 1252 code page and converted it to ''.
Posted by Microsoft on 2/10/2009 at 4:23 PM
Hello Jessica,

Thanks for the report. I've investigated and I'm not convinced that this is a bug in the CRT, or at least I cannot find a bug using your scenario. I've tried saving the source file in every available Chinese code page and when I examine the contents of mbStr in the debugger it is never the same as the string shown in the VS editor itself. After the conversion, the wide string is always the same as the narrow string. Could you please check the scenario? If you have a scenario where the mbStr is correctly initialized and the conversion does not work, please send us that scenario. Thanks!

Pat Brenner
Visual C++ Libraries Development
Posted by Microsoft on 1/13/2009 at 5:48 AM
Thanks for your feedback. We are escalating this bug to the product unit who works on that specific feature area. The team will review this issue and make a decision on whether they will fix it or not for the next release.

Thank you,
Visual Studio Product Team
Posted by Microsoft on 1/11/2009 at 10:33 PM
Thank you for your feedback, We are currently reviewing the issue you have submitted. If this issue is urgent, please contact support directly(http://support.microsoft.com)

If at any time your issue is closed unsatisfactorily, you may edit your issue via Connect and change the status to “Active.”

Thank you,
Visual Studio Product Team
Sign in to post a workaround.