SQL Server Home
Non-recursive CTEs inefficiently recompute self-joins and ROW_NUMBER instead of using worktable
10/27/2008 2:47:34 PM
Primary Feedback Item:
User(s) can reproduce this bug
Non-recursive CTEs which contain self-joins or ROW_NUMBER() are unusuably slow for large data sets, beucase the CTE query is re-evaluated many times instead of being spooled into a worktable and only evaluated once. This makes CTEs useless for large data sets when self-joins or ROW_NUMBER is used.
Is this fixed in SQL 2008?
Tony Rogerson describes the problem thoroughly in these two blog posts.
#1 - http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/05/17/non-recursive-common-table-expressions-performance-sucks-1-cte-self-join-cte-sub-query-inline-expansion.aspx
#2 - http://sqlblogcasts.com/blogs/tonyrogerson/archive/2008/05/17/non-recursive-common-table-expressions-performance-sucks-2-row-number-is-executed-number-of-cte-references-x-number-of-rows-from-the-anchor.aspx
SQL Server 2005 - Enterprise Edition
Win2003 Enterprise Server (SP2)
Operating System Language
Steps to Reproduce
See the blog posts below for repro steps and sample code
CTE query executed many times. See blog posts above for details about query plans, statistics, etc.
CTE spooled into a worktable or otherwise cached so query only needs to be executed once.
to post a comment.
Please enter a comment.
on 10/31/2008 at 10:51 AM
Thanks again for your feedback. We are tracking variants of this DCR under "Feedback ID: 218968".
Campbell Fraser, SQL Development.
on 10/30/2008 at 10:46 AM
Thanks for your feedback. We haven't improved the situation in SQL Server 2008. Non-recursive CTEs are processed as inline view definitions. When used later in a query, they are expanded just like views. We'd like to improve this at some point and will consider your suggestion for a future release. For now, if performance is a problem for you, we recommend materializing intermediate results yourself using temp tables, scalar variables, or table variables. Temp tables give you statistics so query plans that use them will tend to be better than plans that use table variables. That's because table variables don't have statistics. If you use scalar variables in queries and compile time is not an issue for you, use OPTION(RECOMPILE) on the query. This does full contant folding of the expression containing the variable and will tend to give you a better query plan.
to post a workaround.
Please enter a workaround.
© 2014 Microsoft