The attached sample code and project emits a 'prefetchw' instruction when compiled with full optimizations for x64. This instruction is an AMD specific instruction and on most Intel chips is a no-op, however we have found at least one chip where this produces a fault.
According this to this document, http://developer.amd.com/assets/CrossVendorMigration.pdf (the "Prefetch Instruction" section) there are at least 6 known Intel models where this will generate an undefined instruction fault.
Since this is a processor specific instruction it seems like it shouldn't be emitted unless specifically requested. The only way I've found to get rid of it is to not enable optimizations, either around the specific PushTo function or the project in general.
If you compare the ASM for PushTo and the unoptimized PushTo2 you'll see there is a very large impact to number of instructions. The unoptimized version is not suitable and we wrote our own version in ASM, just omitting the 'prefetchw' instruction.