Search

Incorrect / Inconsistent results for CROSS APPLY (SELECT TOP X PERCENT ... ) by Martin Smith

Closed
as By Design Help for as By Design

4
0
Sign in
to vote
Type: Bug
ID: 778200
Opened: 2/1/2013 4:45:34 AM
Access Restriction: Public
1
Workaround(s)
0
User(s) can reproduce this bug
For the query

SELECT x.*
FROM T1 org
     CROSS APPLY (SELECT TOP (50) PERCENT *
                    FROM T2
                    WHERE OrganizationID = org.OrganizationID) x;

the results differ dependant upon whether the plan has a table spool or an index spool.

The table T2 has 4 rows matching OrganizationID = 123 and 100 matching OrganizationID = 456

When the execution plan shows an Index spool the results have 4 rows for OrganizationID = 123 and 52 matching OrganizationID = 456 indicating that it is doing `TOP (0.5 * total row count)`

When the execution plan shows a Table spool the results have 2 rows for OrganizationID = 123 and 50 matching OrganizationID = 456 so seems to be doing `TOP (0.5 * filtered row count)`

The second behaviour is the one that seems correct to me.
Details (expand)

Product Language

English

Version

SQL Server 2012 - Developer Edition

Category

SQL Engine

Operating System

Not Applicable

Operating System Language

Not Applicable

Steps to Reproduce

/*Create the tables*/

CREATE TABLE T1
(
[OrganizationID] INT,
Name VARCHAR(50),
[OrderID] VARCHAR(4)
);

CREATE TABLE T2
(
[OrganizationID] INT,
[CustomerID] VARCHAR(3),
[OrderID] VARCHAR(4)
);

DECLARE @T1 TABLE (
[OrganizationID] INT,
Name VARCHAR(50),
[OrderID] VARCHAR(4));

DECLARE @T2 TABLE (
[OrganizationID] INT,
[CustomerID] VARCHAR(3),
[OrderID] VARCHAR(4));

/*Populate them*/

INSERT INTO T1
([OrganizationID],Name)
OUTPUT inserted.* INTO @T1
VALUES
(123, 'Org 1'),
(456, 'Org 2'),
(789, 'Org 3');



INSERT INTO T2
([OrganizationID],
[CustomerID],
[OrderID])
OUTPUT inserted.* INTO @T2
SELECT 123, 'abc', 'p456' UNION ALL
SELECT 123, 'def', 'p457' UNION ALL
SELECT 123, 'ghi', 'p458' UNION ALL
SELECT 123, 'jkl', 'p459' UNION ALL
SELECT TOP (100) 456, LEFT(NEWID(), 3), LEFT(NEWID(), 4)
FROM sys.all_columns


/*Query both*/

SELECT x.*
FROM @T1 org
CROSS APPLY (SELECT TOP (50) PERCENT *
FROM @T2
WHERE OrganizationID = org.OrganizationID) x;

SELECT x.*
FROM T1 org
CROSS APPLY (SELECT TOP (50) PERCENT *
FROM T2
WHERE OrganizationID = org.OrganizationID) x;

DROP TABLE T1, T2

Actual Results

The query on T1 and T2 shows an index spool and uses TOP (52) for both values of OrganizationID


The query on @T1 and @T2 shows a table spool and uses TOP (2) for OrganizationID = 123 and TOP (50) for OrganizationID = 456

Expected Results

I would expect the query on T1 and T2 to return the same results as the query on @T1 and @T2 .

Platform

X64

Virtualization

 
File Attachments
0 attachments
Sign in to post a comment.
Posted by Martin Smith on 2/13/2013 at 1:36 PM
Or at least it doesn't change anything when doing "ORDER BY OrganizationID".

Doing "ORDER BY CustomerID" happens to change both plans so it doesn't use any spools at all and returns the 2 rows / 50 rows as per the table spool.

The non deterministic aspect of TOP (N) without an ORDER BY clause surely refers to the N rows returned not the value of N itself.
Posted by Martin Smith on 2/13/2013 at 1:29 PM
Not sure why you couldn't have tried adding an ORDER BY clause yourself but no it doesn't change anything.
Posted by Microsoft on 2/13/2013 at 12:18 PM
Martin- A TOP without an ORDERBY Clause is nondeterministic. Can you use an Order by clause and see if that helps?

Vishal
Posted by Microsoft on 2/13/2013 at 12:16 PM
Thanks, we are looking into this request
Sign in to post a workaround.
Posted by Martin Smith on 2/2/2013 at 4:50 AM
This is easily worked around with Window Functions. NTILE(2) could be used here but I tend to avoid NTILE for this type of query due to its behaviour when the number of rows is not evenly divisible by number of buckets.

SELECT x.OrganizationID,
     x.CustomerID,
     x.OrderID
FROM T1 org
     CROSS APPLY (SELECT *,
                         100.0 * ROW_NUMBER() OVER (ORDER BY (SELECT 0)) / COUNT(*) OVER () AS Pct
                    FROM T2
                    WHERE OrganizationID = org.OrganizationID) x
WHERE Pct <= 50;