Home Dashboard Directory Help
Search

Provide a hint to force intermediate materialization of CTEs or derived tables by Adam Machanic


Status: 

Active


94
1
Sign in
to vote
Type: Suggestion
ID: 218968
Opened: 10/5/2006 12:33:16 PM
Access Restriction: Public
Duplicates: 377976 483181
0
Workaround(s)
view

Description

When working with SQL Server CTEs or derived tables, the query optimizer is free to in-line the inner queries, thereby changing the physical processing characteristics of the query. This is generally a good thing, but in some cases we (database developers working with the product) can make a better decision that the query optimizer and need to override it. In many cases this is handled (join and query hints), but one place that we currently have no control is intermediate materialization of derived tables. There are many cases in which materializing the inner set first produces a much better plan, with greatly reduced I/Os. Example, from AdventureWorks:

---
select
x.employeeid,
y.thecount
from humanresources.employee x
join
(
select p.managerid, count(*)
from humanresources.employee p
group by p.managerid
) y (managerid, thecount) on y.managerid = x.managerid
---

... This query produces 767 logical reads. Now I can force intermediate materialization using TOP + ORDER BY:

---
select
x.employeeid,
y.thecount
from humanresources.employee x
join
(
select top (2147483647)
p.managerid, count(*)
from humanresources.employee p
group by p.managerid
order by p.managerid
) y (managerid, thecount) on y.managerid = x.managerid
---

... and now I get 4 logical reads.
Details
Sign in to post a comment.
Posted by Adamantish on 3/25/2014 at 6:51 AM
It's a shame MS haven't decided to go with this. You could go with Paul White's argument that the places for materialisation are TVFs and temp tables but that condemns us to a world in which Views can never go beyond a certain complexity without fracturing their definition into a mess of over-verbose TVFs.

Views, with their server centrality and set based efficiencies are potentially a great place to program a lot of detailed business logic that's often done at the app layer at the moment, but refusing to allow key tuning options like this sells them short.
Posted by Stephen P on 4/12/2013 at 1:23 AM
For example; I have a table with bill details in. About 200,000,000 records. I loaded a new bill into it. The bill had 2,000,000. After I load it, I check that the correct dollar value is loaded by running something like this

select sum(Dollars) from BigTable where BillID = 123456

This actually performs quite well, it correctly uses the index on BillID to identify the appropriate records even though the stats have no information about the Bill that I just loaded.

But then a few days later, after the stat refresh job that I run has been around and done its work, the same query performs appallingly. The query plan knows that there are 2,000,000 records to return and decides to scan the primary key which is on a whole different unrelated field. This takes FOREVER. Now I have to persuade SQL Server to use the index on BillID for a very simple query that looks for a single BillID.

(One way to do this is to parameterise with a variable @BillID in the where clause then the optimiser does not know which BillID it is getting and optimises for an average bill. The average bill is much smaller than 2,000,000 records).

A simple rule based optimiser would use the index on BillID when searching for a single bill. Every time! No queries sometimes taking 10 seconds and sometimes taking 20 minutes.
Posted by Stephen P on 4/11/2013 at 10:13 PM
I spend so much time working around the optimiser when it is doing the wrong thing that I would like a --RULE BASED-- hint. When applied, the optimiser would attack the query based on the order of the tables, joins and predicates. Oracle used to have such a thing and it was almost always preferable to the Cost based optimiser because it led to predictable query plans that wouldn't suddenly stop performing as your data changed or grew.

I know that MS has gone to the trouble of writing a cost based optimiser for people who can't write sensible SQL but it thinks it is so smart that when it does the wrong thing, it can be very hard to persuade it to change its mind.
Posted by crokusek3 on 4/4/2013 at 4:46 PM
Please consider the ability to specify the materialized table's PK with the materialization hint else I'm still out of luck for some cases. I am constantly using temp tables to work around this issue (e.g. no good for use within a view). Thanks!
Posted by DaveH0ward on 12/27/2012 at 1:55 PM
I ran into this issue today and was able to use the "TOP (2147483647)" trick to reduce the run time of a complex query from hours down to seconds. I could not have used a temp table or table variable here - this needed to be a single query due to an application restriction.
I believe this hint would be extremely useful in a limited but significant number of situations. I agree with Paul that it would likely be abused/misused by some people (as many hints are), but I don't think that's a good reason to not implement it.
+1
Posted by SAinCA on 11/15/2012 at 5:54 PM
I found this "bug" when using various grouping sets over aggregated data from a base set, all within CTEs. What should have been a few seconds became 40 minutes when I added the 3rd level CTE. In my case, 100,000 rows aggregated to just 120 rows within the 2nd-level CTE and 8 rows in the final CTE. The final CTE was used twice in the SELECT, once as the left side of an OUTER Join, then once more for a reason I don't quite recollect. Rewrote the CTE's into Table-variables and seconds later was done...

From inspection of the Explain Plan, the base query was being executed repeatedly, despite the fact that nothing warranted it. The optimizer should be smart enough to see when each CTE needs reevaluation or can be executed, then materialized internally and presented as a stable set of data to the next level CTE.

Definitely +1.
Posted by TechVsLife2 on 7/27/2012 at 9:22 PM
In my experience, some sort of intermediate materialization (I generally force it by using temp tables), has been the trick to solving executions plans generated by sql server -- usually the problem arises with multiple ctes or table functions in joins with each other. (One query took over 48 hrs, but with intermediate materialization, took 18 seconds.)
    But perhaps it could be solved by better optimization (or an option to let sql server take a longer time to optimize? as in computer chess)
Posted by justingrant on 1/2/2012 at 3:59 PM
Having this kind of hint would be very helpful for tuning complex CTE queries (especially self-joins on a CTE) where SQL Server today always chooses sub-optimal plans with huge I/O cost.

@SQL_Kiwi - the temp table workaround is usually much slower (in my experience, at least 30% slower) than an inner query that's been materialized. Table variables may be faster, but they prevent SQL Server from using a parallel query plan which negates any speed benefit.

Of course, in addition to a hint, it'd be ideal if SQL's optimizer also got smarter about the I/O cost involved in repetitive evaluation of an inner query, to ensure that there were fewer cases where the hint was needed.
Posted by Paul White NZ on 2/4/2010 at 8:40 AM
SELECT    x.EmployeeID,
        (
        SELECT    COUNT_BIG(*)
        FROM    HumanResources.Employee AS e
        WHERE    e.ManagerId = x.ManagerID
        ) AS theCount
FROM    HumanResources.Employee AS x
WHERE    x.ManagerID IS NOT NULL;

is also 4 logical reads - no materialization needed.

In general, I'm not so sure about the need for this. If I truly need to materialize an intermediate result (which is typically quite small) I'm happy using a table variable or temporary table. A materialized CTE or derived table feature would probably just be abused, and some would no doubt go on to decry the lack of statistics or indexability...

I quite like the CTE/derived table as a sort of inline view, with the optimizer free to do clever stuff with the overall plan. While this facility might result in better plans on a few occasions, we have a workaround, and the change might affect the wider quality of plans produced.

I may well be wrong though ;c)
Posted by Microsoft on 10/12/2006 at 2:21 PM
Dear Adam,

Thanks for your feedback. I think this idea has quite a bit of merit. You would like a way to tell the query processor to evaluate query subexpressions in a constrained order. There is no perfect workaround right now. You could use a multi-statement TVF, but that is hard to program and hard to read. Your workaround requires an extra sort. We'll consider this as an improvement for a future release.

Regards,
Eric
Sign in to post a workaround.