Search

No statistics management options fit nightly load large ETL system scenario by Custom22

Closed
as Fixed Help for as Fixed

1
0
Sign in
to vote
Type: Suggestion
ID: 668354
Opened: 5/13/2011 6:38:21 AM
Access Restriction: Public
0
Workaround(s)
See http://social.msdn.microsoft.com/Forums/en/sqldatabaseengine/thread/d32bdad8-eb84-4a64-a061-e657744083cf

I've copied my post on that forum here;

I use SQL server 2008 R1, Enterprise.

We have nightly ETL processes which insert incrementally into large databases (2-6TB). The main tables are all sequentially ID'd by our loading software (i.e. it controls the incrementing primary keys, not SQL server)

I'm trying to figure out ways to make our statistics more efficient.

Auto create stats is on, Auto update stats is on, but Auto update stats async is off.

However, what we find is that even though auto update stats is on, it is necessary for us to run sp_updatestats as part of the nightly load. If we don't, SQL server comes up with inaccurate explain plans, especially around the newly inserted data. I've read that auto update stats won't do anything if SQL server estimates that it would be quicker just to run the query, but I can't reproduce that so far on a vanilla test system. I also don't know how SQL server decides it would be quicker just to run the query. In any case, for newly inserted data, you can see from the Explain Plan that the estimated number of rows is tiny (maybe 1) where it will in fact be hundreds of thousands.

I'm essentially thinking about developing an alternative version of sp_updatestats, because the existing one is taking too long to run (more than an hour on some systems).

My thinking is that there are two types of statistics;

1. Stats on non-primary key columns. The histogram distributions in these statistics don't need to be recalculated very much, since the nature of the added data is relatively uniform over time. However, the stats don't store % distributions, they store numbers of rows for bins. I'd like to just incrementally multiply these numbers, but I don't think I can manipulate the statistics data?

2. Stats on the incrementing primary key columns. These don't necessarily need to be recalculated very much either. As well as the incremental bin adjustment described above, I'd just need to correct the max value.

Before I waffle on too much, any thoughts on the above?

I've read about trace flags (2389, 2390?) which can be set when starting SQL server which seem related to this, but I'm not so keen on using trace flags on a production system
Details (expand)

Product Language

English

Category

SQL Engine

Proposed Solution

A change to the stats to store percent distributions, rather than number of rows, for histograms, would it help? (I'm not sure if this makes any difference to the plans SQL server would generate)

Ability to integrate newly added rows into existing statistics based on incrementing primary key, rather than rescanning statistics from scratch. Note primary key may not always be incremented by SQL server (it may not be identity)

Ability to manipulate the data stored in statistics objects to cater for scenarios not anticipated.


Primary Benefit

Improved Performance

Other Benefits

 
File Attachments
0 attachments
Sign in to post a comment.
Posted by Microsoft on 1/25/2012 at 4:42 PM
We are about to finish a project that, among other things, fixes cardinality estimates for filters outside the bounds of the histogram. This will be in a post-SQL-Server-2012 release. I can't tell you the timeline.

Thank you,
Eric Hanson
Program Manager
SQL Server Query Processing
Posted by Microsoft on 5/18/2011 at 1:34 PM
Thanks for the feedback. We'll consider this as an enhancement for a release after Denali. We've heard this feedback from others as well. It appears that your loads are not big enough to cross the 20% threshold that causes auto-update to kick in, and that is why you have to update statistics manaually every night. Take a look at this white paper for better insight into how the statistics system works: http://msdn.microsoft.com/en-us/library/dd535534(SQL.100).aspx.

Best regards,
Eric Hanson
Program Manager
SQL Server Query Processing
Sign in to post a workaround.