Home Dashboard Directory Help
Search

SQL Denali: add total rows counter to SELECT statement by Alexey Rokhin


Status: 

Closed
 as Won't Fix Help for as Won't Fix


5
1
Sign in
to vote
Type: Suggestion
ID: 622246
Opened: 11/16/2010 6:08:56 AM
Access Restriction: Public
0
Workaround(s)
view

Description

SQL Server Denali now has paging functionality implemented as extension for ORDER BY clause. It is possible to select rows portion starting arbitrary row without using row_number() function.
But this functionality is not complete. Usually user need total pages/rows count. Currently user need to submit additional query which counts rows (or select all rows). It is acceptable if data is relatively static and rows count can be cached anywhere. But if row count is frequently changed then user have to submit row count query every time it select "page". It leads to additional IO.
Details
Sign in to post a comment.
Posted by Microsoft on 11/29/2010 at 11:50 AM
I would try the COUNT(*) OVER() approach first in terms of performance and the query approach next. The COUNT(*) approach however might incur cost in terms of creating/writing to a worktable.

A better approach would be to avoid getting the total count and perhaps provide separate option to get that if requested by the user. If the data in the underlying table can change considerably between query executions (first page, next page and so on) then the total count is probably not much value. Depending on the isolation level, you cannot even guarantee if the results will be stable. So there are lot of pros & cons for doing paging on the server unlike web search results.

--
Umachandar
Posted by Alexey Rokhin on 11/25/2010 at 1:44 AM
Thank you Umachandar.
Have you some recommendations about when to use count(*) over() within same query and when to use additional query with count(*)? I mean using them in Denali with OFFSET and FETCH not older versions of MS SQL.

About breaking application: of course when rowcount extension is omitted then resultset has to remain unchanged.

Alexey.
Posted by Microsoft on 11/24/2010 at 11:34 AM
Hi Alexey,

The requirement that OFFSET/FETCH requires ORDER BY is a restriction in this release. In the ANSI SQL standard (SQL:2011) where the new OFFSET/FETCH clauses are proposed ORDER BY is optional. The restriction in SQL Server has to do with limitation in our parser technology that cannot handle the optional syntax without making OFFSET a reserved keyword. We may remove it in the future.

Now with respect to OFFSET/FETCH, we optimize it using existing top and top N sort operators. Even in case of table that has no index, we have to scan the entire table but we use a top N sort operator that doesn't have to sort the entire result set nor consume much memory. So we are able to optimize the actual sorting. You can see this in the example below by looking at the actual rows being sent from one operator to another.

create table dbo.PT1( i int not null );
create table dbo.PT2( i int not null primary key );
insert into dbo.PT1 (i)
output inserted.i into dbo.PT2
values(1), (2), (3), (4), (5);
go
set statistics profile on;
go
select *
from dbo.PT1
order by i
offset 1 rows
fetch next 2 rows only;

select *
from dbo.PT2
order by i
offset 1 rows
fetch next 2 rows only;
go
set statistics profile off;
go

drop table dbo.PT1, dbo.PT2;
go

In case of index on the column, the top is more efficient. Craig's blog postings at http://blogs.msdn.com/b/craigfr/archive/2007/08/01/more-on-top.aspx has some details.

So as I said before, if you want to count the rows use the window function approach or temporary table or re-running the query. We have to do pretty much the same thing and there is no value in adding such special syntax to the core SELECT statement which changes behavior / affects performance. Additionally, adding special column or additional resulset based on a setting will break lot of applications. The query shape has to be deterministic for lot of operations.

--
Umachandar, SQL Programmability Team
Posted by Alexey Rokhin on 11/24/2010 at 12:04 AM
Thank you Umachandar for noticing count(*) over(). I have forget about it. But over clause is too universal and is not suitable for paging because performance reasons.

Temporary table or table variable is old common way to implement paging with total row counting. It has disadvantage: you have to store possibly large dataset that increases tempdb load.
Another old common way is to submit 2 SELECTs. First SELECT selects required number of rows (using 2 TOP clause or row_number() in current days). Second SELECT has same search conditions as first one and select total row count. 2 statements may be combined together using cross join to achive same behaviour as count(*) over(). Then selects may use single CTE.

ORDER BY with OFFSET and FETCH NEXT removes row_number() from select and may be improve performance.

You notice that engine must run query completely to determine actual total rows (and can not find actual row count using estimation). But I believe that there is some thing that could make row counting very simple in SQL Denali. It is requirement that OFFSET and FETCH NEXT could be used only with ORDER BY. In common case engine MUST store whole resultset in internal worktable to SORT it. And at this moment (and certainly after sorting) engine ALREADY knows about total row count in resultset. Engine don't need to submit additional query to count rows and simply can put row count number in additional column or additional resultset (with optional setting BEFORE or AFTER main result set) or scalar variable (better if all 3 cases would be implemented to satisfy different developer's requirements). This SQL exntesion could improve query performance in many situations.
Posted by Microsoft on 11/23/2010 at 10:19 AM
Hi Alexey,
Thanks for your feedback. You can get the total rows that will be returned by the query using count(*) window function. See example below:

select t.schema_id, t.name, c.name, count(*) over() as total_rows
from sys.tables as t
join sys.columns as c
    on c.object_id = t.object_id
order by t.name, c.name
offset 1 rows
fetch next 10 rows only;

You can also rewrite above by dumping the rows into a temporary table and then using @@ROWCOUNT like below:

select t.schema_id, t.name, c.name as colname
into #t
from sys.tables as t
join sys.columns as c
    on c.object_id = t.object_id;
declare @total_rows int = @@ROWCOUNT;
select *, @total_rows as total_rows
from #t as t
order by t.name, t.colname
offset 1 rows
fetch next 10 rows only;
drop table #t;

Basically, the engine cannot to anything special to determine the total rows that will be returned by a query without running it completely. We could do estimates but that will not provide the correct results so you need to use the existing mechanims like in above examples.
So I have resolved and closed your feedback item as won't fix.

--
Umachandar, SQL Programmability Team
Sign in to post a workaround.