OpenMP & C++ optimization crashes - incorrect program transform/optimization? - by Uncle Joe 1985

Status : 

  Fixed<br /><br />
		This item has been fixed in the current or upcoming version of this product.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.

Sign in
to vote
ID 634223 Comments
Status Closed Workarounds
Type Bug Repros 0
Opened 1/1/2011 2:04:13 AM
Access Restriction Public


I think I found a bug with the Visual C++ optimizer or some combination of the compiler and OpenMP runtime. The attached program when compiled with OpenMP enabled crashes inside compiler generated OpenMP prologue code?

I've ran several tests and have found this problem only happens when building for 32bit & with optimizations on (not sure which, but it doesn't take a lot of optimizations to break). Disabling optimizations or building for 64bit gets around this problem, but I need the code to work for both 32b & 64b.

Please tell me what the problem is and how long it will take to fix. Thanks.
Sign in to post a comment.
Posted by Uncle Joe 1985 on 1/19/2011 at 5:18 PM
When will the service pack be released? If it's within a month or 2, that should be OK.
Posted by Mark [MSFT] on 1/18/2011 at 5:47 PM
Actually, the problem is this case is specific to our x86 compiler and should not show up with the x64 compiler.

Additionally, we will be able to make the fix available in the VS2010 service pack.

Mark Levine
Visual C++
Posted by Uncle Joe 1985 on 1/8/2011 at 1:43 PM
Good to hear it has been fixed. Sorry for thinking it hasn't. I've developed a few LLVM backends 2 years ago in school and know the complexities involved. With such precise reasoning required, I know such a grievous error wouldn't exist for long.

I assume the bug is also in the x64 compiler, even though I didn't encounter it (less loads due to less register pressure?)

I guess a work around with more peace of mind would be to only use aligned loads with _mm_alignr_epi8() to extract the required data. You're right about the conditions being rare. I admit, this is probably my 2nd piece of code using neighborhood operations.

I'll ask customer support anyways to get it fixed in 10.0

Posted by Mark [MSFT] on 1/7/2011 at 4:29 PM
Hi, Joe.

Yes, the optimizer is trying to fuse instructions and winds up eliminating the unaligned load that was originally created from the __mm_loadu_ps intrinsic and folding it into the single subps instruction.

The problem is not related to /openmp; using /openmp in this case happens to create the conditions necessary for the error to occur. (In order to parallelize the call to Gradient_SIMD, the compiler effectively takes this code and creates a separate function for it:

for (y = 1; y < nHeight - 1; ++y)
    MyGradient_SIMD(Get2D(pGradientX, 0, y, nGradientPitch),
                                            Get2D(pGradientY, 0, y, nGradientPitch),
                                            Get2D(pData, 0, y, nDataPitch), nDataPitch, nWidth);

The function created for the extracted code happens to trigger the problem. I was able to modify your test case to do this and was able to reproduce your problem with and without /openmp. The conditions required for this problem to appear do not appear to be very common, however.)

While we did discover and fix this problem while working on the next major release, applying fixes for a future release to other releases already in progress (such as the VS2010 service pack) is not always feasible.

If additional support is needed sooner, please contact Microsoft Customer Support. The experts there will work with you (and with us) to find an appropriate solution.

Getting Help from Microsoft Support (Visual Studio):
Assisted Support Options:

Mark Levine
Visual C++
Posted by Uncle Joe 1985 on 1/5/2011 at 6:27 PM
Thanks Mark

I tried the work around and it works, but I'm quite worried because I have lots of other SIMD code that haven't been tested. I suppose the good news is it is a fail-deadly error instead of a silent error. If other code needs to be changed, it would be really unacceptable, and I'll have to use the 2008 tools.

"What the compiler should be doing is generating an unaligned move"

Why would the compiler generate a load when I already use an unaligned load in the code? Am I correct to say the optimizer is trying to fuse the load with the subtract into 1 x86 instruction?

Why does this error only happen when OpenMP is enabled? Is it the compiler's fault, or some combination of OpenMP & the optimizer?

Why is it going to take so long to fix it? My company pays good American money $ for MSDN subscriptions.
Posted by Mark [MSFT] on 1/5/2011 at 4:52 PM
Hello. The problem you have encountered is a case where the compiler is incorrectly generating a "subps xmm0,xmmword ptr [edi+ecx]" instruction where the address at [edi+ecx] is not known to be aligned on a 16-byte memory boundary.
What the compiler should be doing is generating an unaligned move from that address into an xmm register and doing the subps on two xmm registers.
This problem has been identified and will be fixed in the next major release of the product.
The best workaround I have been able to find so far is to modify the source to try to force the sources to the problematic __mm_sub_ps call to be in memory. For example, change:
     __m128 v4f_BottomNeighbors = _mm_loadu_ps(Get2D(pData, x, -1, nPitch)),
             v4f_TopNeighbors = _mm_loadu_ps(Get2D(pData, x, 1, nPitch));
     v4f_dy = _mm_mul_ps(_mm_sub_ps(v4f_TopNeighbors, v4f_BottomNeighbors), v4f_OneHalf);

     __m128 v4f_BottomNeighbors = _mm_loadu_ps(Get2D(pData, x, -1, nPitch));
     __m128 v4f_TopNeighbors = _mm_loadu_ps(Get2D(pData, x, 1, nPitch));
     __m128 *tmp1 = &v4f_TopNeighbors;
     __m128 *tmp2 = &v4f_BottomNeighbors;
     v4f_dy = _mm_mul_ps(_mm_sub_ps(*tmp1, *tmp1), v4f_OneHalf);

Unfortunately, I don't think that the fix will be in the service pack release for VS 2010. If working around the issue in the source or by turning off optimizations locally is insufficient, please contact Microsoft Customer Support to discuss obtaining a fix for the problem before the next major release.

Mark Levine
Visual C++

Posted by Microsoft on 1/3/2011 at 11:38 PM
Thanks for your feedback.

We are rerouting this issue to the appropriate group within the Visual Studio Product Team for triage and resolution. These specialized experts will follow-up with your issue.

Posted by Uncle Joe 1985 on 1/3/2011 at 10:19 AM
Correction: I'm using Visual C++ Professional 10.0.30319.1
Posted by Uncle Joe 1985 on 1/3/2011 at 1:24 AM
Also, I'm using Visual C++ 10.0 Ultimate downloaded from MSDN
Posted by Uncle Joe 1985 on 1/3/2011 at 1:20 AM
OK, I've uploaded my Visual C++ project file (I don't think it's that hard to reproduce). I also included a screenshot of the crash when using the debugger. The crash happens under Release Win32.
Posted by Microsoft on 1/2/2011 at 11:47 PM
Please give us a demo project to demonstrate this issue. Thanks!
Posted by Microsoft on 1/2/2011 at 10:51 PM
Thanks for reporting this issue. In order to fix the issue, we must first reproduce the issue in our labs. We are unable to reproduce the issue with the steps you provided.

Please give us a video of this issue so that we can conduct further research.

It would be greatly appreciated if you could provide us with that information as quickly as possible. If we do not hear back from you within 7 days, we will close this issue.

Thanks again for your efforts and we look forward to hearing from you.

Microsoft Visual Studio Connect Support Team
Posted by Microsoft on 1/1/2011 at 2:21 AM
Thank you for your feedback, we are currently reviewing the issue you have submitted. If this issue is urgent, please contact support directly(