I upgraded to the new VS2010 C++ compiler and found my previous code not working anymore. I pin-pointed the reason in the way the compiler compiles the following example (i want to load a 3-vector (1,2,3) and set the 4th component to 1 using SSE4 commands). See steps-Section for the code.
What I expect is the following output: "Result: 1, 2, 3, 1" and this is what I get in Debug-Build. When I turn on Release, this is what I get: "Result: 1, 2, 3, 0". The assembly code generated in Release is the following:
15: __m128 xy = _mm_loadl_pi(ALL_ONES, (const __m64*)&x.x);
16: __m128 z = _mm_load_ss(&x.z);
movss xmm0,dword ptr [eax+8]
movlps xmm1,qword ptr [eax]
17: return _mm_insert_ps(xy, z, 0x20);
Somehow my ALL_ONES in the high-part of xmm1 disappeared completely, leaving the result undefined. There should be a movaps xmm1, qword ptr [ALL_ONES]