Home Dashboard Directory Help
Search

Improve syntax for TOP .. ORDER BY, and perhaps plan the deprecation of the current syntax by Steve Kass


Status: 

Active


4
3
Sign in
to vote
Type: Suggestion
ID: 337299
Opened: 4/7/2008 9:46:45 PM
Access Restriction: Public
Primary Feedback Item: 254390
0
Workaround(s)
view

Description

T-SQL provides TOP, which can be used alone (and is nondeterministic) or with ORDER BY (deterministic if the order by list is a key) as an extention to standard SQL. Unfortunately, this causes a lot of confusion. People (and some tools) put TOP 100 PERCENT .. ORDER BY in views, and more importantly, people use TOP without ORDER BY and expect things they don't get. There are further confusions when TOP is used in a UNION query. Since only one ORDER BY clause is allowed. TOP .. ORDER BY is not available, or it is only with some messy subqueries or WITH clauses.

Finally, an ORDER BY clause is ambiguous. If there is no matching TOP clause, it orders the results presented to the client. If there is a TOP clause, the results are not guaranteed to be presented on order, which is difficult to explain.
Details
Sign in to post a comment.
Posted by Microsoft on 3/10/2011 at 5:31 PM
Hello Steve,

I have resolved yours as duplicate of:

http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=254390

--
Umachandar, SQL Programmability Team
Posted by Steve Kass on 11/19/2008 at 7:39 PM
There is yet another opportunity here, and that is to allow partitioning. To find the top 3 orders by price for each customer, this could be the syntax:

SELECT TOP (3 ROWS) OVER (PARTITION BY CustomerID ORDER BY PRICE DESC) * FROM Orders;

The presence of PARTITION BY would mean this returns 3 ROWS *per* CustomerID group.

This would be syntactic sugar for first ranking/numbering with an OVER clause, then selecting from the top of the numbering (for TIES use RANK, for no ties, use ROW_NUMBER, and for an option not now available with TOP, use DENSE_RANK)

WITH Ranked AS (
SELECT *,
ROW_NUMBER() OVER (
    PARTITION BY ...
    ORDER BY
) as rk
)
SELECT ...
FROM Ranked
WHERE rk <= 3

It will be important to find out why this is not part of ANSI before doing it, but it's likely better than the current TOP syntax...
Posted by Microsoft on 4/10/2008 at 7:44 PM
Thanks for the feedback and votes. Please keep them coming. We agree that the syntax about union all and orderby and top needs more careful re-evaluation (note that subqueries have other binding preferences). Since this is however too late for SQL Server 2008, we are taking this into account for our planning for an upcoming version.

Thanks
Michael
Posted by Steve Kass on 4/8/2008 at 7:50 AM
DoButton,

I'm not sure what you find unclear about the example you quote. It is a UNION ALL of two queries. Each of the two queries returns one row because of the TOP clause. Therefore two rows are returned. I'm not suggesting it is useful, but it returns the first row in customer ID order along with the first row in Order Date order (starting with most recent).

You are correct that the ORDER BY in a UNION applies to the whole query. Unfortunately, it is not easy to explain that to someone who writes something like this, trying to compare one customer's prices with the highest price on record.

SELECT CustomerID, Price
FROM Orders
WHERE CustomerID = 'abc'
UNION ALL
SELECT TOP 1 'BestPrice', Price
FROM Orders
ORDER BY Price DESC

Unexpectedly, this returns 'abc's prices along with a random price, not the highest on record.

Can you give an example of a query that would be confusing, unclear, or impossible with my suggested syntax? I would allow
SELECT (TOP 10 ROWS) OVER () for a completely nondeterministic TOP 10
SELECT (TOP 10 PERCENT WITH TIES) OVER (ORDER BY Profit DESC), and so on.

(While I haven't thought it through, maybe this could be generalized to a simpler syntax than we have now for "top 3 for each employee" like
SELECT (TOP 3 ROWS) OVER (PARTITION BY EmployeeID ORDER BY Total DESC)
EmployeeID,
Total,
OrderDate
FROM Orders
)
Posted by DB007 on 4/8/2008 at 1:03 AM
Absolutly horible syntax, much prefer it as it is.

The query is not clear at all, what would you get back 1 or 2 rows or stop the following from happening?
SELECT (TOP 1 ROWS) OVER (ORDER BY CustomerID)
OrderID, CustomerID
FROM Orders
UNION ALL
SELECT (TOP 1 ROWS) OVER (ORDER BY OrderDate DESC)
OrderID, CustomerID
FROM Orders
ORDER BY OrderDate DESC -- applies to entire UNION

: This is not clear which amount of rows come back.


There are further confusions when TOP is used in a UNION query. Since only one ORDER BY clause is allowed. -- Prehaps suggest a clearer documentation in BOL. The current syntax is a nice short robust piece of code - makes perfect sense, you have a record set, you only see N rows.

Finally, an ORDER BY clause is ambiguous -- I do not believe this at all, the bottom order by clause in a union applys to the whole query.

The additional inclusion of order bys for every top only will cause even more confusion, and unclear implementation.


This is something, I really do not support.
Sign in to post a workaround.