Home Dashboard Directory Help
Search

Poor Performance with Parallelism and Top by Paul White NZ


Status: 

Resolved
 as Won't Fix Help for as Won't Fix


37
0
Sign in
to vote
Type: Bug
ID: 740234
Opened: 5/2/2012 11:37:10 AM
Access Restriction: Public
0
Workaround(s)
view
9
User(s) can reproduce this bug

Description

In SQL Server 2008 onward, a parallel plan that includes a final TOP in a serial zone may exhibit very poor performance at DOP > 2.

The early end to processing signalled by the TOP may not be honoured by one or more threads in the parallel branches of the query. These threads run to completion instead of stopping when the TOP has received enough rows. If the branches perform significant processing, the performance of the query will be unexpectedly poor.

This problem does not occur in SQL Server 2005, but reproduces on SQL Server 2008 build 5768, SQL Server 2008 R2 build 2811, and SQL Server 2012 build 2316. Repro:

WITH
    N1 (n) AS
    (
        -- 10 rows
        SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
        SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
        SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
        SELECT 1
    ),
    N2 AS (SELECT L.n FROM N1 AS L CROSS JOIN N1 AS R), -- 100 rows
    N3 AS (SELECT L.n FROM N2 AS L CROSS JOIN N2 AS R), -- 10,000 rows
    N4 AS (SELECT L.n FROM N3 AS L CROSS JOIN N3 AS R), -- 100,000,000 rows
    N5 AS (SELECT L.n FROM N4 AS L CROSS JOIN N1 AS R), -- 1,000,000,000 rows
    N6 AS
    (
        SELECT TOP (1000000) n
        FROM N5
    )
SELECT
    COUNT_BIG(*)
FROM N6
OPTION (RECOMPILE, MAXDOP 0);

When the issue occurs, one or more threads process their entire share of the one billion-row potential result set. Execution should stop after a total 1,000,000 rows have been processed. The effect is much more likely at higher DOP, but can be seen from time to time at as low as DOP 3. In one case at DOP 3, execution time for this query degraded from 40ms to 57 seconds.

The problem does not manifest on every run at lower DOP, suggesting a race condition. When it occurs, a very long wait is shown in the waiting tasks DMV with a typical wait description of "exchangeEvent id=Port2f8c52200 WaitType=e_waitPortClose nodeId=2", where node 2 is the start parallelism exchange below the TOP.
Details
Sign in to post a comment.
Posted by Microsoft on 4/26/2013 at 10:25 AM
Hello,

Thank you for submitting this feedback. After carefully evaluating all of the bugs in our pipeline, we are closing bugs that we will not fix in the current or future versions of SQL Server. Thank you for submitting this feedback. After carefully evaluating all of the bugs in our pipeline, we are closing bugs that we will not fix in the current or future versions of SQL Server. The reasons for closing this bug is because the scenarios reported in the bug are not common enough and so unfortunately it does not meet the bar for the current release.

If you have more supporting information and details or feel strongly that we should reconsider, please do resubmit this item for review.

Thanks again for reporting the product issue and continued support in improving our product.
Gus Apostol, SQL Server Program Manager
Posted by Robert Heinig II on 11/9/2012 at 7:24 AM
Repro'ed on SQL Server 2008 R2 SP2 (Build 4000).

DL380G4 (2-socket,Xeon 3.6 = Nocona F.4.1.E0, SQL sees 4 cpu's through HT): 1 of 100 Tests with MAXDOP 3 exhibited the behaviour. Interestingly, of the "good" results a rather large (maybe 10-20%) fraction showed a large imbalance in row distribution over threads (e.g. 317k-222k-464k) - perhaps as a consequence of hypertreading?

On a moderately recent 24-way box (I believe an 3850Mk3 with 2 6-core Xeons), 10 out of 10 tests at MAXDOP 12 had an average of 9.8 runaway threads. On the same box, 2 of 20 Tests at MAXDOP 3 ran amok.

I'm a bit disappointed at the wording: 'release' should have been at least 'service pack'...
Posted by Adam Machanic on 8/15/2012 at 2:18 PM
Eric: I just hit this issue today with a production query that runs in 13 seconds if a serial plan is generated, and *68 minutes* if a parallel plan is. The query has no filter predicates, so should terminate as soon as the row goal is met. I hope that the team will be able to devote some time to this issue. It's clearly quite nasty when it manifests itself.

Thanks,
Adam
Posted by Microsoft on 5/11/2012 at 9:59 AM
Paul,

This is very interesting and useful feedback. We'll consider this for a future release.

Best regards,
Eric Hanson
Program Manager
SQL Server
Sign in to post a workaround.