Intrinsics problem - by wDrewniak

Status : 

  Fixed<br /><br />
		This item has been fixed in the current or upcoming version of this product.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.

Sign in
to vote
ID 727519 Comments
Status Closed Workarounds
Type Bug Repros 0
Opened 3/1/2012 5:12:43 AM
Access Restriction Public


Last time i want to optimize few functions. I decided to use intrinsics in some calculations where compiler can't make it. Anyway my problem is with wrong order of compiling results. My code works in debug version, but in release it gives me wrong values. It looks like compiler doesn't care about xmmH values when i use _mm_mul_sd. More information in details.
Sign in to post a comment.
Posted by Microsoft on 4/29/2014 at 12:24 PM
Thank you for reporting this issue. This issue has been fixed in Visual Studio 2013. You can install a trial version of Visual Studio 2013 with the fix from:
Posted by Microsoft on 3/9/2012 at 9:38 AM
I suggest searching MS Connect and/or other websites for issues involving intrinsic functions and see if anything is relevant to your situation.

ian Bearman
VC++ Code Generation & Optimization
Posted by wDrewniak on 3/8/2012 at 2:11 AM
Thanks for confirmation. Is there any other problem with intrinsics functions that i should know about? Would be easier to know what kind operation i should avoid.

Best Regards
Wiktor Drewniak
Posted by Microsoft on 3/2/2012 at 11:27 AM
Thanks for reporting this issue and providing an excellent repro case.

I can confirm that, as you suspected, the compiler was not correctly tracking or maintaining the upper portion of the vector register through the scalar SSE2 (mulsd, etc) instructions. I can also confirm that this issue has already been fixed and that fix is available in the latest Visual Studio 11 Beta (Consumer Preview). I verified that the test case you provided works with this compiler release.

It sounds like you already have a workaround. Let me know if there’s any other information or assistance I can provide.

ian Bearman
VC++ Code Generation & Optimization
Posted by wDrewniak on 3/2/2012 at 1:22 AM
During explaining it i found small cheat that helped me get correct results.
I replaced
HL = HH;
HL = _mm_shuffle_pd(HH, HH, _MM_SHUFFLE2(0, 0));
Finally it works, but i think it is only temporary solution... I couldn't find any other errors...
Posted by wDrewniak on 3/2/2012 at 1:17 AM
I've added sources that i could without breaking my company policies. Anyway maybe i will explain better what is the problem:

HH = _mm_div_pd(HH, MA1);                                         //my HH contains two important values
HH = _mm_mul_sd(HH, HL);                                             //i use one of them, second untouched
MA2 = _mm_shuffle_pd(MA2, MA2, _MM_SHUFFLE2(0, 1)); //not important for this code shuffle
X = _mm_shuffle_pd(X, X, _MM_SHUFFLE2(0, 1));
HL = HH;                                                                         //i copy HH to HL
HL = _mm_mul_sd(HL, MA2);                                         //modifing HL low value
HH = _mm_shuffle_pd(HH, HH, _MM_SHUFFLE2(0, 1));     //shuffling HH to use high value in scalar operation, low value untouched
HL = _mm_add_sd(HL, X);                                             //still modifing HL low value
HH = _mm_mul_sd(HH, HL);                                         //multiply HH (old high) by new HL(modified old HH low) save result in HH to save not modified high value (old HH low value)
X = HH;
_mm_storeu_pd(&x[k], X); //save results to memory

it was compiled to:

004013E0 divpd     xmm0,xmm3         //we get to xmm0 our HH value
004013E4 mulsd     xmm0,xmm4         //modify xmm0L
004013E8 movapd     xmm4,xmm0        //copy to HL
004013EC shufpd     xmm1,xmm1,1
004013F1 mulsd     xmm4,xmm1         //modifing HL low
004013F5 shufpd     xmm2,xmm2,1
004013FA shufpd     xmm0,xmm0,1
004013FF addsd     xmm4,xmm2
00401403 mulsd     xmm4,xmm0         //and finally... multipling old modified HH low by old HH high and storing it to HL without changing high value
00401407 movupd     xmmword ptr [esp+eax+138h],xmm4 //store in memory not this values i expected

For me this problem looks quite bad... Hope you will find solution soon.

Best regards
Posted by MS-Moderator08 [Feedback Moderator] on 3/1/2012 at 10:02 PM
Thank you for reporting this issue. Could you please attach a sample project to help us reproduce this issue?
Posted by MS-Moderator01 on 3/1/2012 at 5:23 PM
Thank you for your feedback, we are currently reviewing the issue you have submitted. If this issue is urgent, please contact support directly(