I was basically trying to implement a simple vector subtraction with SSE.
typedef unsigned short ushort;
typedef unsigned int uint;
void print(__m128i i)
auto& arr = i.m128i_u16;
printf("[%d %d %d %d %d %d %d %d]\n", arr, arr, arr, arr, arr, arr, arr, arr);
int _tmain(int argc, _TCHAR* argv)
const int lineSize = 912;
// printf("%X %X\n", input, vals); // note this one
for (uint i=0; i<lineSize; i+=8)
__m128i vecinput = _mm_loadu_si128((__m128i*) &input[i]);
__m128i vecvals = _mm_loadu_si128((__m128i*) &vals[i]);
__m128i output = _mm_subs_epu16(vecinput, vecvals);
It optimizes vals away and incorrectly treats it like it was the same as input, so the result is always 0. Note that the arrays are intended to be uninitialized to get "random" values.
If that printf is uncommented, correct code is generated.
See http://stackoverflow.com/questions/14600413/is-this-a-bug-in-the-vc-optimizer-or-in-my-code for the assembler output.