C# and JIT optimizers do not eliminate local variables - by Edward D. Brey

Status : 


Sign in
to vote
ID 739851 Comments
Status Active Workarounds
Type Bug Repros 1
Opened 4/30/2012 4:22:41 AM
Access Restriction Public
Moderator Decision Sent to Engineering Team for consideration


Breaking a single expression into two expressions with an intermediate local variable causes the optimized code to be significantly slower (40% or more). Correspondingly, additional assembly code instructions are generated. The problem occurs whether compiling to x64 or  x86.

While this isn't a code correctness issue, I submitted this is a bug (not a suggestion) because this is a fundamental optimization. I could see the frustration in trying to profile a complex algorithm that had problems like this scattered throughout its routines. You'd work through the hotspots but have no hint of the broader warm region that could be vastly improved by hand tweaking what we normally take for granted from the compiler.

A related question on stackoverflow is here:
Sign in to post a comment.
Posted by Edward D. Brey on 5/17/2012 at 9:12 AM
The repro deals with two optimization opportunities:

1. Collapsing local variables: Making { var t=f(x); return g(t); } compile to the same code as { return g(f(x)); } where f and g represent any code within the method.

2. Loop top alignment: This is the one that causes different performance based on JIT ordering.

The first is the impetus behind this bug. The second is a tag-along issue that was discovered while trying to understand the first. To avoid confusion, I created a separate bug report for the second: https://connect.microsoft.com/VisualStudio/feedback/details/742527/jit-optimizer-does-not-perform-loop-top-alignment

I appreciate your response. Assuming I understand it correctly, it deals with the second issue and should be applied to that bug report. Any comment on adding the optimization to collapse local variables?
Posted by Microsoft on 5/17/2012 at 8:25 AM
Thanks for the repro. Yes, for micro-benchmarks loop top alignment can have a dramatic impact, depending on the hardware you're running on (the Core i-series does pretty well, generally, but Core2's and K8's are both highly alignment sensitive). We're not going to be able to address issues of this nature in this product cycle (we're coming into the home stretch), but I've moved it to the Post-Dev11 code quality list, so we don't lose it.
Posted by Edward D. Brey on 5/2/2012 at 8:05 PM
@Mike: I updated the repro to include tests that avoid the use of conditionals within the loop. For x64, they still behave differently depending on whether there is a local variable (if this case, the local variable makes the code run faster). So the behavior occurs with or without conditionals in the loop.
Posted by Edward D. Brey on 5/2/2012 at 7:58 PM
@Mike Your alignment theory seems right. In this case, I believe it is related to JIT compilation order, since the order affects where in memory the machine code for the loop gets written to, which affects alignment.
Posted by Mike Danes on 5/2/2012 at 1:09 AM
That's unlikely to have anything to do with JIT ordering, you can take only SingleLineTest, make 4 copies of the code inside of it and then you'll notice similar time patterns. Obviously since it is only one function there's no JIT ordering that can affect timings.

I did a similar test in C++ and same thing happens. The difference is smaller (for example 150 ms instead of 350 ms) but it's there. If you change the loop expression to something like count += (I % 16); then the differences disappear in both C# and C++. Smells more like a branch prediction (and maybe code alignment) issue.
Posted by Edward D. Brey on 5/1/2012 at 7:51 PM
The repro on github has been updated to show an additional anomaly, which is that for x86, performance differs depending on the order that methods are JIT compiled. See: http://stackoverflow.com/questions/10406796/why-does-jit-order-affect-performance
Posted by MS-Moderator10 [Feedback Moderator] on 4/30/2012 at 9:39 PM
Thanks for your feedback.

We are rerouting this issue to the appropriate group within the Visual Studio Product Team for triage and resolution. These specialized experts will follow-up with your issue.
Posted by Mike Danes on 4/30/2012 at 7:14 AM
This has nothing to do with local variable elimination, if you look at the generated code you'll see that only registers are used anyway, no spill to stack happens. The problem is that when you add a variable you also add another conditional and the JIT compiler is not smart enough to eliminate it. Ideally the compiler should figure out that bool is either 0 or 1 and use it directly instead of doing the isMultipleOf16 ? 1 : 0 thing.
Posted by MS-Moderator01 on 4/30/2012 at 4:43 AM
Thank you for your feedback, we are currently reviewing the issue you have submitted. If this issue is urgent, please contact support directly(http://support.microsoft.com)