The compiler generates wrong code when using _mm_move_ss and when certain optimizations are turned on. This is what happens:
1. _mm_move_ss generates a MOVSS instruction from memory to XMM register. The register holds the expected result, i.e. the value being loaded in the first element and zeroes in the remaining elements.
2. Some other code uses different XMM registrers.
3. another MOVSS instruction is generated, this time it moves the low-order doubleword from the register used in step 1 to a different register that was trashed in step 2. So this new register now contains the expected floating point value in the low-order doubleword and trash in the remaining elements.
4. This new register with three of the four elements containing trash values are used in calculation.
The sample code that demonstrates this problem is fairly complex; a lot of things seem to have an effect on this, including some asserts, the exact type of optimizations, the fact that some functions are defined in a different source module than the one they are called from, and so on.
Once I post this bug report, I will attach a ZIP file containing a visual studio project with all the necessary files (about 10KB of headers and CPP files in total). The ZIP file also contains the generated COD files from my machine.
The problem occurs in Orcas as well, but be warned - when upgrading the project file, it turned off optimizations, so if you're gonna try this in Orcas, you will need to set Optimization back to Maximize Speed.
If you need more details about the problem or a better explanation of what's going on, let me know.