Our databases are configured with AUTO_CREATE_STATISTICS enabled, and AUTO_UPDATE_STATISTICS_ASYNC disabled. We're using SQL 2008 and 2008 R2 Developer and Standard editions in test, and SQL 2008 and 2008 R2 Standard Edition in production.
We've got a table that contains JPEG image data similar to the below:
CREATE TABLE Pictures ([Key] INT NOT NULL IDENTITY(1,1) PRIMARY KEY, [Image] VARBINARY(MAX))
The following query performs very slowly when a statistics creation or update is triggered.
SELECT COUNT(*), SUM(CAST(DATALENGTH([Image]) AS bigint)) FROM Pictures WHERE [Image] IS NOT NULL
The same query without the WHERE predicate takes only a few seconds, even on a 20GB table, and the 'bytes read' performance counter indicates that sqlservr.exe is not reading all the blob data.
Examining sys.dm_exec_requests while the statistics update is occuring, we see:
SELECT StatMan([SC0], [LC0]) FROM (SELECT TOP 100 PERCENT CONVERT([varbinary](200), SUBSTRING ([Image], 1, 100)++substring([Image], case when LEN([Image])<=200 then 101 else LEN([Image])-99 end, 100)) AS [SC0], datalength([Image]) AS [LC0] FROM [dbo].[RequestFormImages] WITH (READUNCOMMITTED) ORDER BY [SC0] ) AS _MS_UPDSTATS_TBL
Investigation shows this query creates the 'string summary' for the index, as described in http://msdn.microsoft.com/en-us/library/ms190397%28v=SQL.90%29.aspx. In this and other cases in our database, the string summary will never produce an improved plan.
The article above states that string summaries will be discarded if they provide no useful information - is it possible to add an option to CREATE STATISTICS to control whether the string summary is computed in the first place?