Problem getting correct result - by SwePeso

Status : 

  By Design<br /><br />
		The product team believes this item works according to its intended design.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.


1
2
Sign in
to vote
ID 781673 Comments
Status Closed Workarounds
Type Bug Repros 0
Opened 3/19/2013 7:30:09 AM
Access Restriction Public

Description

I am aiding Kathi Kellenberger with her new article. I wrote a solution that work fine for the small sample set in the article. Today I added the sample set with 176,000 rows.

This gives me strange results! The table is a heap. The code is attached.
When I run the code without OPTION(MAXDOP 1), I get a count ranging from 16112 and 16160. I have no idea why.
When I turn on OPTION(MAXDOP 1), I get the correct result of 16008 in almost all cases. In some cases I get a count of 16009.

Why is the count not stable? It doesn't matter if I put a clustered index on the table.
Sign in to post a comment.
Posted by Martin Smith on 3/19/2013 at 1:59 PM
Or using `DENSE_RANK` instead of `ROW_NUMBER` gives a consistent 16008
Posted by Martin Smith on 3/19/2013 at 1:49 PM
You have duplicate data.

e.g. 8 rows match StaffMember = A0207, EventDate = 07/01/2013, EventTime = 08:00:00, EventType = Enter.

Within that group of 8 rows it is undeterminsitic what the result of both

ROW_NUMBER() OVER (PARTITION BY StaffMember ORDER BY EventDate, EventTime) and

ROW_NUMBER() OVER (PARTITION BY StaffMember, EventType ORDER BY EventDate, EventTime)

should be.

I think you are assuming that the row that is assigned the earliest value from the first `ROW_NUMBER` expression will also be assigned the earliest value in the second `ROW_NUMBER` but this may or may not be the case.

If it is the case then these will all be given the same "IslandCandidate" value if it isn't then they won't.

If you add %%physloc%% in as a deterministic tie breaker (for demo purposes) then the issue disappears

        ROW_NUMBER() OVER (PARTITION BY StaffMember ORDER BY EventDate, EventTime,%%physloc%%) -
        ROW_NUMBER() OVER (PARTITION BY StaffMember, EventType ORDER BY EventDate, EventTime,%%physloc%%) AS IslandCandidate
    
            
Posted by SwePeso on 3/19/2013 at 7:36 AM
It doesn't matter if you remove the two where-clauses in the query. The count value still will vary.

However, if I first execute cteIsland and store the data in a temporary table, and use that temporary table in cteGrouped, I always get the correct result of 16008, no matter what.