VS2013 C compiler ignores loop stride and writes the wrong memory locations - by bruening

Status : 

 


4
0
Sign in
to vote
ID 884275 Comments
Status Closed Workarounds
Type Bug Repros 2
Opened 5/30/2014 10:06:31 AM
Access Restriction Public

Description

The VS2013 Update 2 C compiler (18.00.30501) contains a bug not present in
prior versions.  It produces incorrect code for a loop with a stride not equal to 1.

Consider the loop at line 12 of this program which reproduces the bug:

--------------------------------------------------
1:#include <stdlib.h>
2:#include <stdio.h>
3:#include <string.h>
5:bool lcs(const char * s, const char * s2, int * l1, int * l2) {
6:    int i, j;
7:    int m = strlen(s);
8:    int n = strlen(s2);
9:    char *c = (char *) malloc((m + 1) * (n + 1));
10:    if (!c)
11:        return false;
12:    for (i = 1; i <= m; i++) c[i*(n+1)] = 0;
13:    for (j = 0; j <= n; j++) c[j] = 0;
14:    for (i = 1; i <= m; i++) {
15:        if (c[i*(n+1)] != 0)
16:            printf("c[%d] is not 0!\n", i*(n+1));
17:    }
18:    free(c);
19:    *l1 = m;
20:    *l2 = n;
21:    return true;
22:}
24:int
25:main(int argc, char *argv[])
26:{
27:    int l1, l2;
28:    if (argc < 3) {
29:        fprintf(stderr, "Usage: %s <string1> <string2>\n", argv[0]);
30:        return 1;
31:    }
32:    if (lcs(argv[1], argv[2], &l1, &l2))
33:        printf("got back %d %d\n", l1, l2);
34:    return 0;
35:}
--------------------------------------------------

That loop should set every (n+1)-th byte, m bytes overall, in the array c
to 0.  However, when compiling with /O1 or with /O2, the compiler produces
code that instead sets m contiguous bytes to 0, which is incorrect:

--------------------------------------------------
C:\...bug>cl /O2 /MT repro.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 18.00.30501 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

repro.cpp
Microsoft (R) Incremental Linker Version 12.00.30501.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:repro.exe
repro.obj

C:\...\bug>repro.exe "ello" "elliot"
c[14] is not 0!
c[28] is not 0!
got back 4 6
--------------------------------------------------

Building at /Od produces correct code.

Looking at the generated machine code for line 12 we see that it uses
2 string loops, one of 4-byte size and then one of 1-byte in case m is not
aligned to 4, to set the memory to 0.  That is clearly not correct.

repro!lcs+0x4c [c:\...\bug\repro.cpp @ 12]:
   12 010d106d 83fe01          cmp     esi,1
   12 010d1070 7c19            jl      repro!lcs+0x6b (010d108b)

repro!lcs+0x52 [c:\...\bug\repro.cpp @ 12]:
   12 010d1072 8d3c2a          lea     edi,[edx+ebp]
   12 010d1075 8bce            mov     ecx,esi
   12 010d1077 8bd1            mov     edx,ecx
   12 010d1079 33c0            xor     eax,eax
   12 010d107b c1e902          shr     ecx,2
   12 010d107e f3ab            rep stos dword ptr es:[edi]
   12 010d1080 8bca            mov     ecx,edx
   12 010d1082 8b542410        mov     edx,dword ptr [esp+10h]
   12 010d1086 83e103          and     ecx,3
   12 010d1089 f3aa            rep stos byte ptr es:[edi]

The /Od code uses a correct loop with stride of (n+1):

repro!lcs+0x4c [c:\...\bug\repro.cpp @ 12]:
   12 0012b5fc c745fc01000000  mov     dword ptr [ebp-4],1
   12 0012b603 eb09            jmp     repro!lcs+0x5e (0012b60e)

repro!lcs+0x55 [c:\...\bug\repro.cpp @ 12]:
   12 0012b605 8b4dfc          mov     ecx,dword ptr [ebp-4]
   12 0012b608 83c101          add     ecx,1
   12 0012b60b 894dfc          mov     dword ptr [ebp-4],ecx

repro!lcs+0x5e [c:\...\bug\repro.cpp @ 12]:
   12 0012b60e 8b55fc          mov     edx,dword ptr [ebp-4]
   12 0012b611 3b55ec          cmp     edx,dword ptr [ebp-14h]
   12 0012b614 7f13            jg      repro!lcs+0x79 (0012b629)

repro!lcs+0x66 [c:\...\bug\repro.cpp @ 12]:
   12 0012b616 8b45f8          mov     eax,dword ptr [ebp-8]
   12 0012b619 83c001          add     eax,1
   12 0012b61c 0faf45fc        imul    eax,dword ptr [ebp-4]
   12 0012b620 8b4df4          mov     ecx,dword ptr [ebp-0Ch]
   12 0012b623 c6040100        mov     byte ptr [ecx+eax],0
   12 0012b627 ebdc            jmp     repro!lcs+0x55 (0012b605)
Sign in to post a comment.
Posted by Microsoft on 6/10/2014 at 1:20 PM
<This is another instance of the bug here: http://connect.microsoft.com/VisualStudio/feedback/details/877557/c-for-loop-miscompile-since-visual-studio-2013-update-2 but I will respond with the same information.>

Hi, thanks for the great bug report. This is a bug in the autovectorizer analysis, where we are incorrectly ascertaining information about the memory reference you have identified. We will fix this bug in VS2013 Update 3. Expect to see the fix in the RC release.

In the meantime If you need a source code workaround, please use #pragma loop(no_vector) around the affected loops. Unfortunately, it's difficult to identify the precise array references that are affected. But the loops will be vectorized (use /Qvec-report:1 to identify those), and will have an array reference of the form A[i*(x+<integer>)].

If this issue is severe, causing critical business situations or blocking your product development or deployment, please go to http://support.microsoft.com or call 1-800-MICROSOFT for assistance. For Microsoft premier customers, please contact your administrator, your Technical Account Manager, or your Microsoft premier account representative.

I am closing this MSConnect item. Feel free to respond if you need anything else.

Thanks,
Eric Brumer
Microsoft Visual C++ Team
Posted by UnitUniverse on 6/4/2014 at 11:42 PM
The issue may still in the VS2014CTP x64. However the code sample isn't portable to the x64 either. If you changed the 'int' to 'size_t' then everything would be fine (so far so good).
Posted by UnitUniverse on 5/30/2014 at 9:19 PM
please fix it as quick...
Posted by Microsoft on 5/30/2014 at 10:52 AM
Thank you for your feedback, we are currently reviewing the issue you have submitted. If you require immediate assistance with this issue, please contact product support at http://support.microsoft.com/ph/1117.