The compiler does not make use of named return value optimization if
(1) the POD's size exceeds 64 bits
(2) the POD contains floating-point attributes.
Return values are then always spilled to the stack. This causes performance degradation (for some of our inner loops, 20-30 % of instructions are needless MOVs, causing lots of load-hit-stores).
The problem can be solved by
(1) adding an empty default constructor
(2) using macros instead of functions.
(1) is no option for us because our data types rely on static initialization and lots of code is built upon initializer lists (std::initializer_list is not yet available in Visual C++).
(2) would introduce serious maintainability issues.
We think this is an optimizer bug.