_mm256_castps128_ps256 does unaligned read - by Gabest

Status : 

  Fixed<br /><br />
		This item has been fixed in the current or upcoming version of this product.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.

Sign in
to vote
ID 653771 Comments
Status Closed Workarounds
Type Bug Repros 1
Opened 3/27/2011 4:53:01 AM
Access Restriction Public


There are two ways to put __m128 into __m256 directly.

1. _mm256_castps128_ps256, (mostly) a free operation just changing the register reference to ymm. 

2. _mm256_insertf128_ps, which means costy RAW dependence, since it needs to merge half of the register with the existing value. 

So generally casting is prefered. 

The real showstopper here is, that the compiler may spill those __m128 variables and _mm256_castps128_ps256 gets compiled to "vmovaps reg, m256", where m256 is only aligned to 16 bytes, since it is the address of a __m128 variable.
Sign in to post a comment.
Posted by A Fog on 4/6/2012 at 5:11 AM
Please reopen this issue (or a new one). A similar bug is found in VS 11.0.50214.1 beta.

In my case, the 256-bit argument for _mm256_extractf128_ps is stored temporarily on the stack across a function call. The temporary variable is aligned by 16, not by 32. It is saved with vmovups (correct), but restored with vmovaps (=crash).

It's a big project so I don't care to send it all in case this is a known bug.

Posted by Bill [MSFT] on 12/2/2011 at 11:44 AM
We were able to reproduce this issue. The fix will appear in a future release of Visual Studio.

Microsoft Visual Studio Product Team
Posted by Gabest on 7/15/2011 at 7:11 PM
There is one more thing to watch out for. Simply making sure the address is 32 byte aligned is not enough. The v* instructions on xmm registers zero out the high part if you refer to them as ymm later, but if you spill and reload it from a __m128 variable then it will contain junk data. So currently it is incorrect even if the address is 32 bit aligned by chance.
Posted by Natalia [MSFT] on 7/14/2011 at 12:47 PM
Thank you for quick response.
Bug was marked as 'fixed' by accident, it was not possible to reproduce it without sample code.
avxbug2.asm attached to the bug now, we can see it (don't know if it should be visible to you or not). We'll continue to review the issue you submitted.
Thank you again.

Posted by Gabest on 7/14/2011 at 12:43 AM
avxbug2.asm, it was late... But I cannot see the attached file I uploaded. Is it normal?
Posted by Gabest on 7/13/2011 at 8:24 PM
See asmbug2.asm, it's full of YMMWORD PTR _m$[esp+N] where N is mod16.
Posted by Gabest on 7/13/2011 at 8:17 PM
I can give you a repro, though it was already set to "fixed".
Posted by Natalia [MSFT] on 7/13/2011 at 5:21 PM
Thank you for your report. Is it possible for you to share the code on which this issue reproduces?
Posted by Microsoft on 3/27/2011 at 10:41 PM
Thank you for submitting feedback on Visual Studio 2010 and .NET Framework. Your issue has been routed to the appropriate VS development team for review. We will contact you if we require any additional information.
Posted by Microsoft on 3/27/2011 at 5:16 AM
Thank you for your feedback, we are currently reviewing the issue you have submitted. If this issue is urgent, please contact support directly(http://support.microsoft.com)