Home Dashboard Directory Help
Search

Code-gen bug: incorrect devirtualization and inlining when building with LTCG by Bruce Dawson


Status: 

Closed
 as Fixed Help for as Fixed


2
0
Sign in
to vote
Type: Bug
ID: 812124
Opened: 12/20/2013 11:40:07 AM
Access Restriction: Public
0
Workaround(s)
view
0
User(s) can reproduce this bug

Description

As discussed in http://www.youtube.com/watch?feature=player_detailpage&v=3MRxucTXPdw#t=2215 the VS 2013 compiler will speculatively devirtualize and inline function calls. We have a DLL where this produces a 100% repro crash. The v-table entry is correct but the compiler compares it to just one function address before deciding that it must be a different function, which the compiler then inlines. The inlined code is incorrect because the function pointer has many possible values beyond the two that the optimizer allows for.

For now we can avoid this bug by disabling LTCG, but we don't want to disable LTCG.

The buggy code that is generated is shown here:

000000010FAC7C27 48 8D 05 22 7E 00 00 lea         rax,[CSchemaUtlVectorManipulator<CUtlVector<TestStruct_1::Range_t,CUtlMemory<TestStruct_1::Range_t,int> >,CUtlMemory<CUtlVector<TestStruct_1::Range_t,CUtlMemory<TestStruct_1::Range_t,int> >,int> >::SetCount (010FACFA50h)]
000000010FAC7C2E 48 39 41 28         cmp         qword ptr [rcx+28h],rax
000000010FAC7C32 75 26                jne         ConvertAtomicFromScriptTable+1AAh (010FAC7C5Ah)
                    pManipulator->SetCount( pData, nCount );
000000010FAC7C34 8B 4F 10             mov         ecx,dword ptr [rdi+10h]
000000010FAC7C37 2B D1                sub         edx,ecx
000000010FAC7C39 85 D2                test        edx,edx
000000010FAC7C3B 7E 0F                jle         ConvertAtomicFromScriptTable+19Ch (010FAC7C4Ch)
000000010FAC7C3D 44 8B C2             mov         r8d,edx
000000010FAC7C40 8B D1                mov         edx,ecx
000000010FAC7C42 48 8B CF             mov         rcx,rdi
000000010FAC7C45 E8 06 87 00 00     call        CUtlVectorBase<CUtlVector<TestStruct_1::Range_t,CUtlMemory<TestStruct_1::Range_t,int> >,CUtlMemory<CUtlVector<TestStruct_1::Range_t,CUtlMemory<TestStruct_1::Range_t,int> >,int> >::InsertMultipleBefore (010FAD0350h)
000000010FAC7C4A EB 16                jmp         ConvertAtomicFromScriptTable+1B2h (010FAC7C62h)
000000010FAC7C4C 79 14                jns         ConvertAtomicFromScriptTable+1B2h (010FAC7C62h)
000000010FAC7C4E F7 DA                neg         edx
000000010FAC7C50 48 8B CF             mov         rcx,rdi
000000010FAC7C53 E8 08 89 00 00     call        CUtlVectorBase<CUtlVector<TestStruct_1::Range_t,CUtlMemory<TestStruct_1::Range_t,int> >,CUtlMemory<CUtlVector<TestStruct_1::Range_t,CUtlMemory<TestStruct_1::Range_t,int> >,int> >::RemoveMultipleFromTail (010FAD0560h)
000000010FAC7C58 EB 08                jmp         ConvertAtomicFromScriptTable+1B2h (010FAC7C62h)
000000010FAC7C5A 48 8B CF             mov         rcx,rdi
000000010FAC7C5D E8 7E 8C 00 00     call        CUtlVectorBase<TestStruct_1::Range_t,CUtlMemory<TestStruct_1::Range_t,int> >::SetCount (010FAD08E0h)


The code that is pointed to by the v-table pointer is shown here:

    virtual void SetCount( void *pMember, int nCount )
    {
0000000101175370 48 83 EC 28         sub         rsp,28h
0000000101175374 48 8B C2             mov         rax,rdx
        CUtlVector< TUtlVectorElementType, TUtlVectorAllocator > *pVector = (CUtlVector< TUtlVectorElementType, TUtlVectorAllocator >*)pMember;
        pVector->SetCount( nCount );
0000000101175377 8B 52 10             mov         edx,dword ptr [rdx+10h]
000000010117537A 44 2B C2             sub         r8d,edx
000000010117537D 45 85 C0             test        r8d,r8d
0000000101175380 7E 0C                jle         CSchemaUtlVectorManipulator<CUtlString,CUtlMemory<CUtlString,int> >::SetCount+1Eh (010117538Eh)
0000000101175382 48 8B C8             mov         rcx,rax
    }
0000000101175385 48 83 C4 28         add         rsp,28h
0000000101175389 E9 A2 F0 FF FF     jmp         CUtlVectorBase<CUtlString,CUtlMemory<CUtlString,int> >::InsertMultipleBefore (0101174430h)
        CUtlVector< TUtlVectorElementType, TUtlVectorAllocator > *pVector = (CUtlVector< TUtlVectorElementType, TUtlVectorAllocator >*)pMember;
        pVector->SetCount( nCount );
000000010117538E 79 0E                jns         CSchemaUtlVectorManipulator<CUtlString,CUtlMemory<CUtlString,int> >::SetCount+2Eh (010117539Eh)
0000000101175390 41 F7 D8             neg         r8d
0000000101175393 48 8B C8             mov         rcx,rax
0000000101175396 41 8B D0             mov         edx,r8d
0000000101175399 E8 C2 EC FF FF     call        CUtlVectorBase<CUtlString,CUtlMemory<CUtlString,int> >::RemoveMultipleFromTail (0101174060h)
    }
000000010117539E 48 83 C4 28         add         rsp,28h
00000001011753A2 C3                 ret
Details
Sign in to post a comment.
Posted by Bruce Dawson on 6/3/2014 at 10:19 AM
I am investigating this incorrect devirtualization bug right now. It might be useful to be able to share findings and ask questions. I've attached a file with my e-mail address.
Posted by Microsoft on 6/2/2014 at 3:00 PM
Hi Bruce,

We are making a push to fix bugs for the VS summer update. I would appreciate it if you could investigate and respond promptly.

Thanks,

VC++ Code Generation Team
Posted by Microsoft on 5/30/2014 at 3:28 PM
Hi Bruce,

We call the safety analysis "Invasion Analysis" because its purpose is to detect cases where an un-known derived class of a known base class invades the application from a static library or a DLL. It is largely a type based analysis and very little tracking of actual value flow is done. For the static library case the analysis looks at the types of globals defined in the library and pointer/reference function parameter/return types of functions defined in it. The rationale for the latter is that a pointer/reference to an aggregate could be passed to a library function which then updates one of its pointer type fields with an "invading" object. Likewise such a function could return a pointer/reference to an invading object directly or transitively.

For dynamically linked libraries we look at similar sources of invasion: dllexport/dllimport variables and parameter types of dllimport/dllexport functions and return types of dllimports. Another possible source of invasion are exception objects thrown from outside the binary and caught within it.

Again, the analysis is type based and we have to be conservative in the presence of casting (e.g. casts to/from "void *") and unsafe downcasts. When investigating bugs in this analysis found building our own source bases I have usually had to examine the source code to determine the root cause of invasion. Since that is not possible here we are going to have to rely on you.

Thanks,

VC++ Code Generation Team
Posted by Bruce Dawson on 5/29/2014 at 8:09 PM
Thank you for all of this analysis. I will try to collect the information that you are looking for. My guess is that the derived type comes from a different DLL -- do you know how the safety analysis deals with that possibility?
Posted by Microsoft on 5/29/2014 at 2:15 PM
Hi Bruce,

It appears that the de-virtualization optimization thinks that base class SchemaCollectionManipulator has only two derived classes. The safety analysis that is part of the optimization also decides that no unknown sub-classes of the base class can "invade" the application. This is the reason that it does not generate the fall-back virtual call instruction. The most likely scenario is that an instance of an un-known (to the compiler) derived class of the base reaches the call-site and hits your repro. Which would point to a bug in the safety analysis.

Could you please verify if this is the case ? It is virtually impossible for us to do so without source access and being able to run the binary.

Thanks,

VC++ Code Generation Team
Posted by Microsoft on 5/29/2014 at 2:15 PM
Hi Bruce,

It appears that the de-virtualization optimization thinks that base class SchemaCollectionManipulator has only two derived classes. The safety analysis that is part of the optimization also decides that no unknown sub-classes of the base class can "invade" the application. This is the reason that it does not generate the fall-back virtual call instruction. The most likely scenario is that an instance of an un-known (to the compiler) derived class of the base reaches the call-site and hits your repro. Which would point to a bug in the safety analysis.

Could you please verify if this is the case ? It is virtually impossible for us to do so without source access and being able to run the binary.

Thanks,

VC++ Code Generation Team
Posted by Microsoft on 5/28/2014 at 5:06 PM
For starters, /d2:-notypeopt disables devirtualization (globally) - and you can do the usual disable optimizations around the problematic function.

If the compiler believes that a virutal call target can only be one of two possible values, it'll generate the call sequence shown. The problem is in this case it appears to be incorrectly concluding there are only two possible targets; the question is then who is the third (correct) target, and how did it's pointer get to the devirt site without the compiler noticing.

Do you still have a repro locally that you can build from source? There are one of two possible situations here:
1. The object is a legally constructed valid C++ object, and the compiler missed it. We would need to know where the type came from (how it propagated to the callsite from it's construction location) and fix the bug in the compiler to identify it as an open type
2. The object is not valid; ie the constructor was never invoked, or other undefined behavior was involved. In this case the compiler might be allowed to assume that type is not a valid vcall target.

We'll take a look at the link repro, but it may come down to knowing exactly how that type got created and got to the callsite.
Posted by Microsoft on 12/30/2013 at 1:36 AM
Thank you for submitting feedback on Visual Studio and .NET Framework. Your issue has been routed to the appropriate VS development team for investigation. We will contact you if we require any additional information.
Posted by Bruce Dawson on 12/29/2013 at 3:44 PM
I've added a text file (Microsoft visible only) that gives directions to an FTP site that contains an 84 MB linker repro. That should be sufficient to reproduce the problem.
Posted by Bruce Dawson on 12/25/2013 at 5:28 PM
I'll get a link repro after Christmas.
Posted by Microsoft on 12/24/2013 at 10:56 PM
Hello again. We wanted to give you a quick reminder that to efficiently investigate and reproduce your issue, we need you submit the additional information we requested.
Posted by Microsoft on 12/24/2013 at 12:21 AM
Hello again. We wanted to give you a quick reminder that to efficiently investigate and reproduce your issue, we need you submit the additional information we requested.
Posted by Microsoft on 12/22/2013 at 11:23 PM
Thank you for your feedback. In order to efficiently investigate and reproduce this issue, we are requesting additional information outlined below.

Would you please provide us more information so that we can conduct further research?

We look forward to hearing from you with this information.
Posted by Bruce Dawson on 12/20/2013 at 12:13 PM
Here's a summary. We wrote code something like this:

    p->SomeVirtualFunction();

The optimization is supposed to translate it to something like this:

if (p->SomeVirtualFunction == ClassA::SomeVirtualFunction)
    p->ClassA::SomeVirtualFunction();
else if (p->SomeVirtualFunction == ClassB::SomeVirtualFunction)
    p->ClassB::SomeVirtualFunction();
else
    p->SomeVirtualFunction();

The bug is that the compiler actually generated code like this:

if (p->SomeVirtualFunction == ClassA::SomeVirtualFunction)
    p->ClassA::SomeVirtualFunction();
else
    p->ClassB::SomeVirtualFunction();

I can share out a link repro or something like that, if that is the best way to report the bug.
Posted by Bruce Dawson on 12/20/2013 at 11:56 AM
It would be nice if we could disable some individual optimizer features. Auto-vectorization and speculative de-virtualization and inlining are two new optimization features that are probably more likely than most to have bugs, and they also make significant size/speed tradeoffs. Potentially buggy features need switches to control them, and the current /O1 versus /02 switches seem way too granular for the complex range of possibilities that they hide. In many cases it probably makes sense to optimize for speed but without going completely crazy on code size (two pages of checks for overlap when vectorizing).

I know, I know, PGO, but I'm not sure it's practical for our workflow, and it doesn't deal with the compiler bug issues.

In short, right now we can avoid this bug by turning off LTCG or turning optimizations off or switching to optimize-for-size on individual functions, but it would be nice to have other options.
Posted by Microsoft on 12/20/2013 at 11:53 AM
Thank you for your feedback, we are currently reviewing the issue you have submitted. If this issue is urgent, please contact support directly(http://support.microsoft.com)
Sign in to post a workaround.
File Name Submitted By Submitted On File Size  
readme.txt (restricted) 12/29/2013 -
contactme.txt (restricted) 6/3/2014 -