cl.exe in VS2015 much slower to compile some C++ code than VS2010 (template-heavy, 5x slower with /O1 or /O2) - by Aras Pranckevičius

Status : 


Sign in
to vote
ID 3141368 Comments
Status Active Workarounds
Type Bug Repros 0
Opened 9/22/2017 2:12:38 AM
Access Restriction Public


We are trying to finally upgrade from VS2010 to VS2015 for a large codebase (Unity game engine), and noticed that some .cpp files are much slower to compile with optimizations on. The files tend to be very template heavy (our "SIMD math library" code, that implements full HLSL-style swizzle support, etc. - a bit on the "crazy templates" side).

The most extreme difference is in one file, that takes 20 seconds to compile with VS2010, and 115 seconds with VS2015.
Sign in to post a comment.
Posted by Microsoft on 9/26/2017 at 8:52 AM
I don't know of a way to turn off the vectorizor specifically. I know the new SSA optimizer has a switch, /d2SSAOptimzer-. It actually uses more time than the vectorizer. But neither of these approaches will impact inlining, and I really think all the forceinline functions are the root cause here. Everyone else is just trying to deal with the massive functions that are produced.

I agree with you on the warnings: I actually would like to see a set of warnings around throughput. They'd have to be off by default, of course, and there are a lot of issues to be worked out (how do we word them in such a way that they are actionable). Look for that in an upcoming release as well.
Posted by Aras Pranckevičius on 9/26/2017 at 6:19 AM
Thanks, that's useful information! Will see what we can do on our side to massage the codebase.

Is it possible to turn some of these features off (e.g. the vectorizer; at least in our case of explicit intrinsics usage we suspect it might be not very relevant)? I've found "/favor:ATOM" trick that seems to be suggested on the internet, but that feels a bit of a hack.

A compiler diagnostics switch that would show stats like you quoted (e.g. "this looks like way too many inlined stuff!", or "this function uses force inline, but is pretty large") would be helpful.
Posted by Microsoft on 9/25/2017 at 2:11 PM
The basic root cause here is there are a few functions (one is void __cdecl Suitevec_transform_testskUnitTestCategory::Testinverse_WorksFor_SingularAffineX::RunImpl(void)const __ptr64, but there are similar ones that appear to be stamped out by the same set of macros) which have deep, deep inline trees. Around 8000 individual inline instances; the majority of which come from __forceinline functions.

That's the first problem - all this inlining takes time. This results in a couple of massive functions that take about 30-40 seconds to compile, making up the majority of the compile time. Of that time, all of that is spent optimizing these massive functions. We had a new feature in the VS 2015 timeframe (the vectorizer), and even more features in VS 2017 that make it worse (the SSA optimizer). There are other features like inline function caching which speed things in VS 2017 as well.

We'll see what we can do to improve our optimizers, but in the mean time my suggestion is to use __forceinline less liberally. If I take your preprocessed file and remove all __forceinline directives, the file compiles in under a second on my machine.
Posted by Microsoft on 9/25/2017 at 12:27 AM
Thank you for your feedback, we are currently reviewing the issue you have submitted.

Microsoft Visual Studio Connect Support Team
Posted by Aras Pranckevičius on 9/24/2017 at 11:01 PM
I attached the files (.cpp with repro; as well as WPR trace), and the UI said all is fine, the files might take an hour or two to show up. Now three days later, it still shows "no attachments". Should I try attaching again?