SELECT SUM(...) is non-deterministic when adding the column-values of datatype float - by Holger Schmeling

Status : 

  By Design<br /><br />
		The product team believes this item works according to its intended design.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.

Sign in
to vote
ID 465147 Comments
Status Closed Workarounds
Type Bug Repros 23
Opened 6/8/2009 3:18:36 AM
Access Restriction Public


SELECT SUM(x) ... on a table, where x is of datatype float returns different results for different tries with the same table data. The results differ by about 20% in value, even if only 50 rows are selected and the total value is about 200000.
Sign in to post a comment.
Posted by Holger Schmeling on 11/13/2009 at 10:54 PM
Thank you for your answer, Miles. I'm very happy that you took the time to investigate the problem and finally fond an explanation. Thanks again for this.
Of course you left out an answer to the important question whether MSFT will solve this problem generally. Reducing the number of threads is unfortunately not appropriate. I'm running a Reporting application against a DWH and the least thing I'd like to do is thwarting our quite expensive hardware by not using all available computing power. Because inside the database are a lot of floats, I would have to do this generally, meaning for all connections/queries. Certainly not what I want. Another - more obvious - solution would of course be just changing the data types from float to decimal. I'd definitely do this, but the folks from the cube development team told me that analysis services will create write back columns generally by using the data type float. So I can't do this either.
Any other suggestions besides MAXDOP?
Posted by Microsoft on 11/13/2009 at 12:57 PM
Hi CaptainKirk,

This is a very good question, at first I was puzzled by the behavior you reported to us. After investigating I can answer why this is happening, and how to prevent each execution of the query from returning a different result.

The first point is to agree that the order of arithmetic operations on floating point matters. This can be shown by the following example where the first sum is 0 and the second is 507904 even though the values added are identical in both cases. Only the order in which the operations are performed differs.

DECLARE @fl FLOAT = 100000000000000000000
WHILE (@i < 100)
    SET @fl = @fl + CONVERT(float, 5000)
    SET @i = @i + 1
SET @fl = @fl - 100000000000000000000

WHILE (@i < 100)
    SET @fl = @fl + CONVERT(float, 5000)
    SET @i = @i + 1
SET @fl = @fl + 100000000000000000000
SET @fl = @fl - 100000000000000000000

The second point is to understand why the order in which the sums are performed changes in your case. Afterall you don't have this loop trick in your query, so the plan should add values in the same order every time. Actually it's not the case because you probably have a parallel query plan. On my machine when I execute your query I get a degree of parallelism of 4, so 4 threads scan the table, each feeding values to the floating point sum aggregator. Each time the query executes, the lifetime of the threads is slightly different, based on how busy the CPU's are at that moment, when the operating system decides to do context switches ...

The good news is that when a stable query result matters to your application, you can force the order to be the same by preventing parallelism with OPTION (MAXDOP 1). In your case you would do

with test(SumFloat, SumDecimal) as (
    select sum(x), sum(cast(x as decimal(30,4)))
    from t1
    where cSel1=200908
    and cSel2=30077500
    and x is not null

select SumFloat, SumDecimal, SumFloat-SumDecimal as TotalDiff, SumFloat/SumDecimal as PercentDiff
from test
option (maxdop 1)

This will make the floating point sum the same everytime. It won't make it the equal to the decimal(30,4) sum though because floating point arithmetic is not precise unlike fixed point.

I hope this helps.

Thank you,
Miles Trochesset - Microsoft SQL Server Engine Team
Posted by Microsoft on 11/12/2009 at 10:28 PM
Thank you for reporting the issue and apologize for the delayed reply! We are actively working on this issue right now and will reply with our findings in a few days.

Wei Yu
SQL Server Engine

Posted by Holger Schmeling on 9/16/2009 at 12:50 AM
Relax? What's that? I've not constructed the problem just to bother you folks at MSFT. It is not artifically or so. The behaviour was first communicated to me from end users of a real life application, when those users detected wrong numbers in their reports. Not much reason to relax, he?
Posted by Adam Machanic on 7/7/2009 at 9:11 AM
Relax. It may be a bug or it may not be a bug. There are plenty of other confirmed bugs, and it may take weeks before you get a response here. And it may not be the response you want to hear.
Posted by Flair on 6/29/2009 at 3:27 PM
I was verifiying the problem with the provided database backup --> the diff was about 0.8 up to 3.0 percent.
I have a 10.000 times loop running with a CTE and also with a static view
Posted by Holger Schmeling on 6/17/2009 at 11:37 AM
What is it with you folks at MSFT? Do you need any further information? How else may I assist you with the validation?
Posted by Holger Schmeling on 6/11/2009 at 11:11 AM
I've just checked the same database/query on a 32 Bit system (Vista). The behaviour is identical.
MS, what's up with you? Could somebody answer, please. This *is* a bug.
Posted by Holger Schmeling on 6/9/2009 at 7:53 AM
I know the problem is very odd. I can only reproduce the behaviour with *exact* the table structure I provided in the "Steps to reproduce" section. Whenever I create a table with only the three columns (cSel1, cSel2 and x) and insert identical data, the problem does not appear.
The compressed database backup is about 5 MB in size. If you'd like to have it, just let me know a place where to put it or, may be, an email address.
Posted by AaronBertrand on 6/9/2009 at 7:07 AM
I tried building quite similar data and even at much higher sums I get results like this:

SumFloat    SumDecimal    TotalDiff    PercentDiff    
177349722.359998    177349722.3600    -2.35438346862793E-06    0.999999999999987    

The result was the same every single time.

If you post your backup somewhere public, I will gladly try to reproduce. As it stands I cannot access the backup because your attachments are only visible to MSFT folks.