Home Dashboard Directory Help

Code optimizer fails to eliminate memcpy/memset when "number of bytes" is zero by Dmitry Me


 as Fixed Help for as Fixed

Sign in
to vote
Type: Bug
ID: 687527
Opened: 9/7/2011 8:14:03 AM
Access Restriction: Public
Moderator Decision: Sent to Engineering Team for consideration
User(s) can reproduce this bug


This behavior is observed in version 10.0.30319.1 RTMRel

I have the following two snippets that I compile with /O2, put a breakpoint, run and then open the Disassembly window to see the machine code.

Snippet 1:

int _tmain(int argc, _TCHAR* argv[])
    char dummy1 = 0;
    char dummy2 = 0;
    memcpy( &dummy1, &dummy2, 0 );
    memset( &dummy1, 0, 0 );
    if( dummy1 || dummy2 ) {
        Sleep( 0 );
    return 0;


45: int _tmain(int argc, _TCHAR* argv[])
    46: {
00401000 push        ebp
00401001 mov         ebp,esp
00401003 push        ecx
00401004 push        ebx
    47:     char dummy1 = 0;
00401005 xor         ebx,ebx
    48:     char dummy2 = 0;
    49:     memcpy( &dummy1, &dummy2, 0 );
00401007 push        ebx
00401008 lea         eax,[dummy2]
0040100B push        eax
0040100C lea         eax,[dummy1]
0040100F push        eax
00401010 mov         byte ptr [dummy1],bl
00401013 mov         byte ptr [dummy2],bl
00401016 call        memcpy (401844h)
    50:     memset( &dummy1, 0, 0 );
0040101B push        ebx
0040101C lea         eax,[dummy1]
0040101F push        ebx
00401020 push        eax
00401021 call        memset (40183Eh)
00401026 add         esp,18h
    51:     if( dummy1 || dummy2 ) {
00401029 cmp         byte ptr [dummy1],bl
0040102C je         wmain+35h (401035h)
    52:         Sleep( 0 );
0040102E push        ebx
0040102F call        dword ptr [__imp__Sleep@4 (402000h)]
    53:     }
    54:     return 0;
00401035 xor         eax,eax
00401037 pop         ebx
    55: }
00401038 leave
00401039 ret


int _tmain(int argc, _TCHAR* argv[])
    char dummy1 = 0;
    char dummy2 = 0;
    memcpy( &dummy1, &dummy2, 1 );
    memset( &dummy1, 0, 1 );
    if( dummy1 || dummy2 ) {
        Sleep( 0 );
    return 0;

is the same as snippet 1 but "number of bytes" is now 1 instead of 0. It yields:

45: int _tmain(int argc, _TCHAR* argv[])
    46: {
00401000 push        ebp
00401001 mov         ebp,esp
00401003 push        ecx
00401004 push        edi
    47:     char dummy1 = 0;
    48:     char dummy2 = 0;
    49:     memcpy( &dummy1, &dummy2, 1 );
    50:     memset( &dummy1, 0, 1 );
00401005 xor         eax,eax
00401007 lea         edi,[dummy1]
0040100A stos        byte ptr es:[edi]
0040100B pop         edi
    51:     if( dummy1 || dummy2 ) {
0040100C cmp         byte ptr [dummy1],al
0040100F je         wmain+18h (401018h)
    52:         Sleep( 0 );
00401011 push        eax
00401012 call        dword ptr [__imp__Sleep@4 (402000h)]
    53:     }
    54:     return 0;
00401018 xor         eax,eax
    55: }
0040101A leave
0040101B ret

The machine code is massively different - the compiler can see how memcpy() and memset() are implemented.

The problem is that in the first snippet the compiler fails to see that memset()/memcpy() with "number of bytes" equal to zero is a no-op. This leads to lots of inefficient machine code for the first snippet that actually does nothing useful.
Sign in to post a comment.
Posted by Microsoft on 9/13/2011 at 11:53 AM

I’m also having trouble reproducing the exact behavior that you describe in your post. Using the project file that you submitted i see that you’re compiling with /Os in addition to /O2. This may be part of the problem as there is a known issue in VC++ when compiling for size with 0 length memset. This issue will be addressed in future releases of VC++. You can work around this by compiling the function for speed rather than size.

Looking at your original sample code I see that the compiler has inserted calls to memset and memcpy functions in the CRT. This indicates to me that perhaps you’re compiling without /Oi (which is included in the macro switch /O2) or there is a #pragma function(memset) in your code.

You can read more about optimization switches, including the macro switches /O1, /O2, and /Ox here:

An description of the function pragma directive can be found here:

ian Bearman
VC++ Code Generation and Optimization Team
Posted by Microsoft on 9/12/2011 at 12:46 AM
Thanks for your feedback.
We are routing this issue to the appropriate group within the Visual Studio Product Team for triage and resolution. These specialized experts will follow-up with your issue.
Posted by Dmitry Me on 9/9/2011 at 3:00 AM
Attached the full project.
Posted by Microsoft on 9/8/2011 at 11:47 PM
Hi Dmitry Me,

I have this tested with VS2010 RTM ans VS2010 Sp1. And none of them behaves as described.
Could you please help to provide us with some more details about your project's configuration? A sample project zip can be better.

Thanks again for your efforts and we look forward to hearing from you.
Posted by MS-Moderator09 [Feedback Moderator] on 9/8/2011 at 5:09 AM
Thank you for submitting feedback on Visual Studio 2010 and .NET Framework. Your issue has been routed to the appropriate VS development team for review. We will contact you if we require any additional information.
Posted by Dmitry Me on 9/7/2011 at 10:49 PM
If SP1 behaves better - great.
Posted by Mike Danes on 9/7/2011 at 10:38 AM
There's something fishy going on here. My VC++ 2010 SP1 produces the following code for your first example:

00401000 xor         eax,eax

Yes, that's right, just a single instruction for return 0. Now, they might have fixes some bugs between RTM and SP1 but this is not the first time this happens with your examples. Are you sure your compiler installation and the options you use are ok?
Posted by MS-Moderator01 on 9/7/2011 at 8:41 AM
Thank you for your feedback, we are currently reviewing the issue you have submitted. If this issue is urgent, please contact support directly(http://support.microsoft.com)
Sign in to post a workaround.
File Name Submitted By Submitted On File Size  
AnalyzeTest-687527.zip (restricted) 9/9/2011 -