MOVSS problems - by Michal Řeháček

Status : 

 


0
0
Sign in
to vote
ID 282486 Comments
Status Active Workarounds
Type Bug Repros 0
Opened 6/12/2007 7:49:56 AM
Access Restriction Public

Description

The compiler generates wrong code when using _mm_move_ss and when certain optimizations are turned on. This is what happens:

1. _mm_move_ss generates a MOVSS instruction from memory to XMM register. The register holds the expected result, i.e. the value being loaded in the first element and zeroes in the remaining elements.
2. Some other code uses different XMM registrers.
3. another MOVSS instruction is generated, this time it moves the low-order doubleword from the register used in step 1 to a different register that was trashed in step 2. So this new register now contains the expected floating point value in the low-order doubleword and trash in the remaining elements.
4. This new register with three of the four elements containing trash values are used in calculation.

The sample code that demonstrates this problem is fairly complex; a lot of things seem to have an effect on this, including some asserts, the exact type of optimizations, the fact that some functions are defined in a different source module than the one they are called from, and so on.

Once I post this bug report, I will attach a ZIP file containing a visual studio project with all the necessary files (about 10KB of headers and CPP files in total). The ZIP file also contains the generated COD files from my machine.

The problem occurs in Orcas as well, but be warned - when upgrading the project file, it turned off optimizations, so if you're gonna try this in Orcas, you will need to set Optimization back to Maximize Speed.

If you need more details about the problem or a better explanation of what's going on, let me know.
Sign in to post a comment.
Posted by Michal Řeháček on 11/20/2007 at 5:51 AM
I am also not aware of any other occurences than in _mm_load_ss or _mm_load_sd.

And thank you for the fix!
Posted by Microsoft on 11/16/2007 at 11:16 AM
Hello,

The fix for this bug has been checked into the sources for VS 2008 Service Pack 1 and beyond. I fixed both _mm_load_ss and _mm_load_sd which had the same problem. I looked for other intrinsics that had the same problem but didn't find any more than these two, so if I missed yet another occurance of the bug please report it.
Posted by Microsoft on 11/5/2007 at 7:30 PM
Hello,

Just an update to let you know that I wrote a fix for this bug (a couple weeks ago now) and that the bug report has now been re-triaged and it is now planned to be included in VS 2008 Service Pack 1.
Posted by Microsoft on 10/3/2007 at 1:57 PM
Since customer has satisfactory workaround and this is a narrow scenario this bugfix has been rejected for VS 2008, it will be re-triaged for SP1.
Posted by Michal Řeháček on 10/1/2007 at 8:26 AM
Hello,

Thank you for getting back to this problem.

While I had already found a workaround for this particular issue (in that particular case where I needed to load 1,0,0,0, which isn't that hard even without using _mm_load_ss), I tried to switch back to _mm_load_ss and _mm_set_ss and I can confirm that using _mm_set_ss works well: the destination register is always cleared before moving data to it as it's supposed to be.

I also checked the rest of my codebase and I found one more place where I was using _mm_load_ss. I changed that to _mm_set_ss as well. This second case works well now too.

I was playing a bit with the test case I sent you and when I switch to _mm_set_ss, I was not able to reproduce the bad behavior as well.

In other words, using _mm_set_ss to work around the bug worked for me in all the scenarios I had tried.
Posted by Microsoft on 9/28/2007 at 7:22 PM
Hello,
I have confirmed that we are generating bad code for the _mm_load_ss() intrinsic. This intrinsic always expands to a movss instruction. It looks like the bug has been in the code since the intrinsic was first implemented. The problem is that during register allocation we may or may not choose to enregister the source operand of the intrinsic. The destination will always be a register. If we happen to choose to enregister the source the movss instruction does not clear the rest of the destination register. There is no way a C-language-level programmer can control whether the source is enregistered.

The _mm_load_ss() intrinsic is clearly documented on msdn to clear the other 96 bits of the destination, and so this intrinsic may not function as expected, depending on whether the source value is enregistered or not.

Fortunately, a simple workaround for this bug exists. There is another intrinsic, _mm_set_ss(), that is very similar to _mm_load_ss(). The two differences are that _mm_set_ss() takes a float rather than a pointer to a float, and it expands to code that will always clear the destination (usually using xorps) before the movss.

So for your particular case, I believe you can work around the bug by replacing the _mm_load_ss() intrinsic call in your vector_1000() function this way:

return _mm_set_ss( constants.numbers.m128_f32[ 0 ] ); // change load to set and &constants to constants

You will need to inspect all other uses of this intrinsic and consider changing them as well.

We are in the process of testing a potential fix and evaluating risk and considering whether to ship it with VS2008 or not. At this point we don't plan to go back and fix prior releases since an easy workaround exists.

I realize that since it has been 3 months since you originally reported this you may have moved on to other things, but if you can try the workaround and let us know if it works for you we'd appreciate having that information (particularly if it doesn't work for you for some reason).
Posted by Microsoft on 9/28/2007 at 7:22 PM
Hello,
I have confirmed that we are generating bad code for the _mm_load_ss() intrinsic. This intrinsic always expands to a movss instruction. It looks like the bug has been in the code since the intrinsic was first implemented. The problem is that during register allocation we may or may not choose to enregister the source operand of the intrinsic. The destination will always be a register. If we happen to choose to enregister the source the movss instruction does not clear the rest of the destination register. There is no way a C-language-level programmer can control whether the source is enregistered.

The _mm_load_ss() intrinsic is clearly documented on msdn to clear the other 96 bits of the destination, and so this intrinsic may not function as expected, depending on whether the source value is enregistered or not.

Fortunately, a simple workaround for this bug exists. There is another intrinsic, _mm_set_ss(), that is very similar to _mm_load_ss(). The two differences are that _mm_set_ss() takes a float rather than a pointer to a float, and it expands to code that will always clear the destination (usually using xorps) before the movss.

So for your particular case, I believe you can work around the bug by replacing the _mm_load_ss() intrinsic call in your vector_1000() function this way:

return _mm_set_ss( constants.numbers.m128_f32[ 0 ] ); // change load to set and &constants to constants

You will need to inspect all other uses of this intrinsic and consider changing them as well.

We are in the process of testing a potential fix and evaluating risk and considering whether to ship it with VS2008 or not. At this point we don't plan to go back and fix prior releases since an easy workaround exists.

I realize that since it has been 3 months since you originally reported this you may have moved on to other things, but if you can try the workaround and let us know if it works for you we'd appreciate having that information (particularly if it doesn't work for you for some reason).
Posted by Microsoft on 9/26/2007 at 2:59 PM
Hello, this email is about a bug you filed on msconnect over 3 months ago. Unfortunately an error was made on our end in handling this. It looks like what happened is that the person in the customer service group who originally looked at the bug, wrote you that they couldn't find the repro, and then closed the bug (as "no repro") within about 8 hours of sending you that note. Your response telling where the repro was came in an hour later but wasn't noticed until a post-audit of all bugs closed 'no repro' was conducted. So we've wasted 3 months, for which I apologize, but I've reopened this bug and we now have it here in the development group.

Fortunately the zip file was still on the web site where you put it. I've collected that and attached it to the bug record in our database. I've also confirmed that a recent build of Orcas still generates the same code, so I've given this a high priority to investigate further.

Again, I'm very sorry this happened, and I want to thank you for reporting the bug, because we obviously haven't found it ourselves.
Posted by Michal Řeháček on 6/13/2007 at 2:41 AM
Hello,

I sent you the zipped project and source files by e-mail. They are also available at

http://hobit.altar.cz/microsoft/movss.zip
Posted by Microsoft on 6/12/2007 at 6:00 PM
Hi,

Thanks for your feedback. but I can't find the project files you attached. Maybe something unexpected happend. Could you please have another try ? Alternatively you may email me the zipped project files.(v-kyming@microsoft.com)

Thanks,
Kylin
Posted by Microsoft on 6/12/2007 at 5:04 PM
Thank you for your feedback. We are currently investigating. If this issue is urgent, please call support directly (see http://support.microsoft.com).

Thank you,
Visual Studio Product Team