Top 1 is not considered as a factor for query optimization - by John-Huang

Status : 

  By Design<br /><br />
		The product team believes this item works according to its intended design.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.

Sign in
to vote
ID 781990 Comments
Status Closed Workarounds
Type Bug Repros 2
Opened 3/24/2013 10:04:55 AM
Access Restriction Public


query optimizer does not take top 1 clause into account while estimating cost
Sign in to post a comment.
Posted by Paul White NZ on 3/28/2013 at 4:16 PM

The estimated plan for (key + lookup) is more expensive than (scan + top) if the optimizer predicts it will have to scan very few rows. This is the expected behaviour according to the cost model, which includes an assumption that data will be uniformly distributed (from the point of view of the clustered index in this case).

One could make the argument that the optimizer should choose the plan with more predictable performance (scan + top will not perform well with an unfortunate data distribution) but as it is, you have an "out of model" query for which the workaround is to use a hint or plan guide.

Perhaps you should open a new Connect item with your suggestion for improvement. As it is, this one simply states, "query optimizer does not take top 1 clause into account while estimating cost" - which is not true.
Posted by david_wei on 3/28/2013 at 12:00 PM
This is a real case, user wants find any row if column1 (indexed) = 'something', more than 50 columns needs to be returned , so create a covered index is not an option. So we choose top 1 without order by, hoping SQL will do one column1 index seek then do a bookmark lookup, this will only need several IO reads.
Unfortunately, if the statistics shows too many rows is 'something', it do a table scan, and result nearly 200K reads! completely not accepatble.

(BTW, if only few rows = 'something', SQL indeed do a sek + lookup)

SQL server should be smart enough even many rows = 'something', we just need any 1.
Posted by david_wei on 3/28/2013 at 11:48 AM
aaronbertrand's comments does not make sense.

We intentionally skip the order by clause to avoid table scan if order by column is not indexed.
So SQL can jus choose the index based on the where clause to satisfy the query.
Posted by John-Huang on 3/27/2013 at 3:17 PM
Expected behavior does not mean right behavior. Back to SQL Server 2000, when you write "select * from big_table where 1=0", it took ages to complete. I knew it was an expected behavior. It did make a lot of sense from many perspectives. But the fact is it was fixed.
Posted by Microsoft on 3/27/2013 at 2:30 PM
Hello John,

This is expected behaviour; this link may also provide more information

Posted by AaronBertrand on 3/25/2013 at 11:01 AM
Sorry but you're using SELECT * and no ORDER BY. You're basically telling SQL Server "give me all the stuff" and letting *it* decide how to optimize the top. You don't get to complain about what SQL Server determined to be the cheapest path, and you also shouldn't make the assumption that a scan is always worse than a seek. When you use SELECT * instead of the columns covered by the index, the seek suddenly becomes less attractive because it will add the requirement of a lookup to get the rest of SELECT *. The optimizer is pretty smart and if you want to poke holes in it I'd suggest starting with much more complex examples.