Home Dashboard Directory Help

_mm256_castps128_ps256 does unaligned read by Gabest


 as Fixed Help for as Fixed

Sign in
to vote
Type: Bug
ID: 653771
Opened: 3/27/2011 4:53:01 AM
Access Restriction: Public
User(s) can reproduce this bug


There are two ways to put __m128 into __m256 directly.

1. _mm256_castps128_ps256, (mostly) a free operation just changing the register reference to ymm.

2. _mm256_insertf128_ps, which means costy RAW dependence, since it needs to merge half of the register with the existing value.

So generally casting is prefered.

The real showstopper here is, that the compiler may spill those __m128 variables and _mm256_castps128_ps256 gets compiled to "vmovaps reg, m256", where m256 is only aligned to 16 bytes, since it is the address of a __m128 variable.
Sign in to post a comment.
Posted by A Fog on 4/6/2012 at 5:11 AM
Please reopen this issue (or a new one). A similar bug is found in VS 11.0.50214.1 beta.

In my case, the 256-bit argument for _mm256_extractf128_ps is stored temporarily on the stack across a function call. The temporary variable is aligned by 16, not by 32. It is saved with vmovups (correct), but restored with vmovaps (=crash).

It's a big project so I don't care to send it all in case this is a known bug.

Posted by Microsoft on 12/2/2011 at 11:44 AM
We were able to reproduce this issue. The fix will appear in a future release of Visual Studio.

Microsoft Visual Studio Product Team
Posted by Gabest on 7/15/2011 at 7:11 PM
There is one more thing to watch out for. Simply making sure the address is 32 byte aligned is not enough. The v* instructions on xmm registers zero out the high part if you refer to them as ymm later, but if you spill and reload it from a __m128 variable then it will contain junk data. So currently it is incorrect even if the address is 32 bit aligned by chance.
Posted by Microsoft on 7/14/2011 at 12:47 PM
Thank you for quick response.
Bug was marked as 'fixed' by accident, it was not possible to reproduce it without sample code.
avxbug2.asm attached to the bug now, we can see it (don't know if it should be visible to you or not). We'll continue to review the issue you submitted.
Thank you again.

Posted by Gabest on 7/14/2011 at 12:43 AM
avxbug2.asm, it was late... But I cannot see the attached file I uploaded. Is it normal?
Posted by Gabest on 7/13/2011 at 8:24 PM
See asmbug2.asm, it's full of YMMWORD PTR _m$[esp+N] where N is mod16.
Posted by Gabest on 7/13/2011 at 8:17 PM
I can give you a repro, though it was already set to "fixed".
Posted by Microsoft on 7/13/2011 at 5:21 PM
Thank you for your report. Is it possible for you to share the code on which this issue reproduces?
Posted by Microsoft on 3/27/2011 at 10:41 PM
Thank you for submitting feedback on Visual Studio 2010 and .NET Framework. Your issue has been routed to the appropriate VS development team for review. We will contact you if we require any additional information.
Posted by Microsoft on 3/27/2011 at 5:16 AM
Thank you for your feedback, we are currently reviewing the issue you have submitted. If this issue is urgent, please contact support directly(http://support.microsoft.com)
Sign in to post a workaround.
Posted by Vegan Fanatic on 6/20/2011 at 3:40 PM
I have used SSE2 quite extensively for some time.

The best route I have had was to use a class to automatically use the registers.

Keep in mind they are packed differently depending on the basic type such as 32-bit float, 64-bit float, integers etc

I have a set of classes for to manage all of the types SSE2 uses
File Name Submitted By Submitted On File Size  
avxbug2.zip 7/13/2011 5 KB
avxbug2.zip 7/14/2011 5 KB