Add support for Multi-Column Cross-Table Statistics - by MauriD

Status : 

  Won't Fix<br /><br />
		Due to several factors the product team decided to focus its efforts on other items.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.


19
0
Sign in
to vote
ID 731859 Comments
Status Closed Workarounds
Type Suggestion Repros 0
Opened 3/18/2012 2:10:02 PM
Access Restriction Public

Description

One of the biggest problems I always find when helping customers to optimize SQL Server performance is the lack of statistics that spans more than one table.

In complex environment where there are a lot of tables that needs to be joined and where data doesn't have an homogeneous distrbution, the fact that SQL Server is not able to understand how two values from two different tables are related to each other in terms of cardinality estimation is becoming a problem bigger and bigger every day.

Here's the sample script to sistematically generate a bad plan. In this particular case, that I build after seeing the problem on a quite big production server, the wrong query plan also prevented parallelism to kick in, producing really poor performances (minutes vs seconds):

/*

	Setup Environment

*/
create database StatsDemo
go

use StatsDemo
go

if ( object_id('dbo.Table0') is not null ) drop table dbo.Table0;
if ( object_id('dbo.TableA') is not null ) drop table dbo.TableA;
if ( object_id('dbo.TableB') is not null ) drop table dbo.TableB;
go

create table dbo.Table0
(
	table_0_id int identity(100000, 1) not null constraint pk__Table0 primary key,
	table_a_id int not null,
)
go

create table dbo.TableA
(
	table_a_id int identity not null constraint pk__TableA primary key,
	table_b_id int not null,
)
go

create table dbo.TableB
(
	table_b_id int not null constraint pk__TableB primary key,
	alternate_b_key char(2) not null,
)
go

alter table dbo.TableA 
add constraint fk__TableA__TableB
foreign key (table_b_id) references dbo.TableB(table_b_id)
go

alter table dbo.Table0
add constraint fk__Table0__TableA
foreign key (table_a_id) references dbo.TableA(table_a_id)
go

/*

	Add some test data in order to
	create the correct data environment to reproduce the problem

*/
set nocount on
go

declare @k char(36) = '1234567890qazwsxedcrfvtgbyhnujmikolp';
declare @r int = 1;
declare @j int = 1;
while (@j <= 36)
begin
	declare @i int = 1
	
	while (@i <= 36)
	begin 
		insert into dbo.TableB (table_b_id, alternate_b_key)
		values (@r, substring(@k, @i, 1) + substring(@k, @j, 1));
		
		set @i += 1;
		set @r += 1;
	end

	set @j += 1;
end
go

declare @i int = 0
while (@i < 10000)
begin 
	insert into dbo.TableA (table_b_id)
	values (1);
	
	set @i += 1;
end
go

declare @i int = 0
while (@i < 10000)
begin 
	insert into dbo.TableA (table_b_id)
	values ((@i % (36*36)) + 1);
	
	set @i += 1;
end
go

declare @i int = 1;
while (@i <= 10000)
begin 
	insert into dbo.Table0 (table_a_id)
	values (@i);
	
	set @i += 1;
end
go

create nonclustered index ix1 on dbo.Table0(table_a_id)
go

create nonclustered index ix1 on dbo.TableA(table_b_id)
go

/*
	
	Display test data

*/
select * from dbo.TableB
select * from dbo.TableA
select * from dbo.Table0
go

/*

	The problem

*/

-- When searching into table "B" using an alternate key
-- only one value is found.
select table_b_id from dbo.TableB where alternate_b_key = '11'

-- Table "A" contains 10K rows related to the "table_b_id" value found above
select * from dbo.TableA where table_b_id = 1

-- All other values from table_b_id, are connected with very few rows.
select table_b_id, count(*) from dbo.TableA group by table_b_id

-- Table "B" will be joined with table "A" using PK/FK relationship
-- which is built on the surrogate "table_b_id" key.
-- Due to the distribution of data, SQL Server thinks that on average, for any given valoue of table_b_id, there are 15 rows in table "A"
dbcc show_statistics ('dbo.TableA', 'ix1')
go
select 20000 * 0.0007716049
go

-- Unfortunately this is not true and this wrong assumption leads to wrong plans.
-- This is due to the fact that SQL Server cannot correlate the alternate_b_key column in table "B" with values in table "A".
select
	*
from
	dbo.TableA a 
inner join
	dbo.TableB b on a.table_b_id = b.table_b_id
where
	b.alternate_b_key = '11'
	
-- That get worse when you join additional tables. In many case the plan is also considered so cheap that parallelism is prevented, while, of course, 
-- this is not true, and parallelism and maybe an hash join, should be used.
select
	*
from
	dbo.Table0 t
inner join
	dbo.TableA a on a.table_a_id = t.table_a_id
inner join
	dbo.TableB b on a.table_b_id = b.table_b_id
where
	b.alternate_b_key = '11'

-- The problem would be solved if there would be the possibility to create multi-column cross-table statistics, doing something like:
--
-- create statistics MyCrossTableStatistics on dbo.TableB (alternate_b_key, table_b_id), dbo.TableA(table_a_id)


Sign in to post a comment.
Posted by Fabiano Neves Amorim on 12/26/2012 at 12:57 PM
I totally agree that this is a very demanded feature... I hope one day see this in the product.
A statistic in a view works partially, there are a lot of problems on maintaining the view because it requires to be materialized (lot of restrictions and cost to maintain). Also, even I create a indexed view and a statistic on the column, it will only be used if I use the noexpand hint, so I've to change the query to select the view instead of the tables...
Posted by Microsoft on 7/19/2012 at 10:16 AM
Thanks for your feedback Mauri. Can you try seeing if creating an Index view on to of the tables and then seeing if it resolves the problem.?

Also since this is a new feature request we will triage and determine if it meets the bar for future releases.