std::atomic load implementation is absurdly slow - by CornedBee

Status : 

  Deferred<br /><br />
		The product team has reviewed this issue and has deferred it for consideration at a later time.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.


6
0
Sign in
to vote
ID 770885 Comments
Status Closed Workarounds
Type Bug Repros 2
Opened 11/13/2012 5:10:38 AM
Access Restriction Public

Description

The x64 (and probably x32) implementation of std::atomic::load in VS 2012 is so bad as to be practically useless. Given a std::atomic<std::size_t> ai, the expression ai.load(std::memory_order_relaxed) ultimately arrives at a call to the intrinsic _InterlockedOr64(&x, 0). This intrinsic, in turn, emits a cmpxchg loop that repeatedly loads the memory location, and then compares the loaded value to the memory location just loaded from.
For reference, the correct code to emit for a relaxed load is "mov register, [memory location]".

It seems that all atomic load operations are treated the same no matter what memory ordering is specified; all operations have sequential consistency. This is, strictly speaking, permitted behavior. It's just completely useless. Atomics are used in performance-critical lock-free data structures, and the abysmal implementation slows our lock-free hash map down by a factor of 10 or even 100 compared to the Intel atomics implementation or Boost.Atomic. The atomics become a major bottleneck and scalability issue in applications using them.
Sign in to post a comment.
Posted by Jonathan Potter on 12/9/2013 at 5:06 PM
I've done some simple benchmarking with atomic<DWORD> and it seems that in VS 2013 std::atomic is now the same speed as boost::atomic in a release build. In a debug build boost is still about twice as fast.
Posted by Microsoft on 3/22/2013 at 4:44 PM
Hi again,

We've fixed this bug, and the fix will be available in VC12. See the attached meow.zip for an example of the new codegen on x64:

mov    rax, QWORD PTR ?g_i@@3U?$atomic@_K@std@@A ; g_i
ret    0

Wenlei He from our compiler back-end team contributed a major rewrite of <atomic>'s implementation, improving performance for x86/x64/ARM.

Note: Connect doesn't notify me about comments. If you have any further questions, please E-mail me.

Stephan T. Lavavej
Senior Developer - Visual C++ Libraries
stl@microsoft.com
Posted by Microsoft on 2/21/2013 at 1:52 PM
Hi,

Thanks for reporting this bug. I wanted to let you know what's happening with it. I'm still keeping track of it, but it's been resolved as "Deferred" because we may not have time to fix it in VC12. (Note: VC8 = VS 2005, VC9 = VS 2008, VC10 = VS 2010, VC11 = VS 2012.)

Note: Connect doesn't notify me about comments. If you have any further questions, please E-mail me.

Stephan T. Lavavej
Senior Developer - Visual C++ Libraries
stl@microsoft.com
Posted by JustManowar on 1/28/2013 at 2:13 AM
Agree with the bug submitter.
Unfortunately, totally can't use std::atomic<> implementation because of this issue.
Posted by Microsoft on 11/21/2012 at 2:03 AM
Thank you for submitting feedback on Visual Studio and .NET Framework. Your issue has been routed to the appropriate VS development team for investigation. We will contact you if we require any additional information.
Posted by Microsoft on 11/13/2012 at 6:19 PM
Thank you for your feedback, we are currently reviewing the issue you have submitted. If this issue is urgent, please contact support directly(http://support.microsoft.com)