Home Dashboard Directory Help
Search

Add support for Multi-Column Cross-Table Statistics by MauriD


Status: 

Closed
 as Won't Fix Help for as Won't Fix


19
0
Sign in
to vote
Type: Suggestion
ID: 731859
Opened: 3/18/2012 2:10:02 PM
Access Restriction: Public
0
Workaround(s)
view

Description

One of the biggest problems I always find when helping customers to optimize SQL Server performance is the lack of statistics that spans more than one table.

In complex environment where there are a lot of tables that needs to be joined and where data doesn't have an homogeneous distrbution, the fact that SQL Server is not able to understand how two values from two different tables are related to each other in terms of cardinality estimation is becoming a problem bigger and bigger every day.

Here's the sample script to sistematically generate a bad plan. In this particular case, that I build after seeing the problem on a quite big production server, the wrong query plan also prevented parallelism to kick in, producing really poor performances (minutes vs seconds):

/*

    Setup Environment

*/
create database StatsDemo
go

use StatsDemo
go

if ( object_id('dbo.Table0') is not null ) drop table dbo.Table0;
if ( object_id('dbo.TableA') is not null ) drop table dbo.TableA;
if ( object_id('dbo.TableB') is not null ) drop table dbo.TableB;
go

create table dbo.Table0
(
    table_0_id int identity(100000, 1) not null constraint pk__Table0 primary key,
    table_a_id int not null,
)
go

create table dbo.TableA
(
    table_a_id int identity not null constraint pk__TableA primary key,
    table_b_id int not null,
)
go

create table dbo.TableB
(
    table_b_id int not null constraint pk__TableB primary key,
    alternate_b_key char(2) not null,
)
go

alter table dbo.TableA
add constraint fk__TableA__TableB
foreign key (table_b_id) references dbo.TableB(table_b_id)
go

alter table dbo.Table0
add constraint fk__Table0__TableA
foreign key (table_a_id) references dbo.TableA(table_a_id)
go

/*

    Add some test data in order to
    create the correct data environment to reproduce the problem

*/
set nocount on
go

declare @k char(36) = '1234567890qazwsxedcrfvtgbyhnujmikolp';
declare @r int = 1;
declare @j int = 1;
while (@j <= 36)
begin
    declare @i int = 1
    
    while (@i <= 36)
    begin
        insert into dbo.TableB (table_b_id, alternate_b_key)
        values (@r, substring(@k, @i, 1) + substring(@k, @j, 1));
        
        set @i += 1;
        set @r += 1;
    end

    set @j += 1;
end
go

declare @i int = 0
while (@i < 10000)
begin
    insert into dbo.TableA (table_b_id)
    values (1);
    
    set @i += 1;
end
go

declare @i int = 0
while (@i < 10000)
begin
    insert into dbo.TableA (table_b_id)
    values ((@i % (36*36)) + 1);
    
    set @i += 1;
end
go

declare @i int = 1;
while (@i <= 10000)
begin
    insert into dbo.Table0 (table_a_id)
    values (@i);
    
    set @i += 1;
end
go

create nonclustered index ix1 on dbo.Table0(table_a_id)
go

create nonclustered index ix1 on dbo.TableA(table_b_id)
go

/*
    
    Display test data

*/
select * from dbo.TableB
select * from dbo.TableA
select * from dbo.Table0
go

/*

    The problem

*/

-- When searching into table "B" using an alternate key
-- only one value is found.
select table_b_id from dbo.TableB where alternate_b_key = '11'

-- Table "A" contains 10K rows related to the "table_b_id" value found above
select * from dbo.TableA where table_b_id = 1

-- All other values from table_b_id, are connected with very few rows.
select table_b_id, count(*) from dbo.TableA group by table_b_id

-- Table "B" will be joined with table "A" using PK/FK relationship
-- which is built on the surrogate "table_b_id" key.
-- Due to the distribution of data, SQL Server thinks that on average, for any given valoue of table_b_id, there are 15 rows in table "A"
dbcc show_statistics ('dbo.TableA', 'ix1')
go
select 20000 * 0.0007716049
go

-- Unfortunately this is not true and this wrong assumption leads to wrong plans.
-- This is due to the fact that SQL Server cannot correlate the alternate_b_key column in table "B" with values in table "A".
select
    *
from
    dbo.TableA a
inner join
    dbo.TableB b on a.table_b_id = b.table_b_id
where
    b.alternate_b_key = '11'
    
-- That get worse when you join additional tables. In many case the plan is also considered so cheap that parallelism is prevented, while, of course,
-- this is not true, and parallelism and maybe an hash join, should be used.
select
    *
from
    dbo.Table0 t
inner join
    dbo.TableA a on a.table_a_id = t.table_a_id
inner join
    dbo.TableB b on a.table_b_id = b.table_b_id
where
    b.alternate_b_key = '11'

-- The problem would be solved if there would be the possibility to create multi-column cross-table statistics, doing something like:
--
-- create statistics MyCrossTableStatistics on dbo.TableB (alternate_b_key, table_b_id), dbo.TableA(table_a_id)


Details
Sign in to post a comment.
Posted by Fabiano Neves Amorim on 12/26/2012 at 12:57 PM
I totally agree that this is a very demanded feature... I hope one day see this in the product.
A statistic in a view works partially, there are a lot of problems on maintaining the view because it requires to be materialized (lot of restrictions and cost to maintain). Also, even I create a indexed view and a statistic on the column, it will only be used if I use the noexpand hint, so I've to change the query to select the view instead of the tables...
Posted by Microsoft on 7/19/2012 at 10:16 AM
Thanks for your feedback Mauri. Can you try seeing if creating an Index view on to of the tables and then seeing if it resolves the problem.?

Also since this is a new feature request we will triage and determine if it meets the bar for future releases.

Sign in to post a workaround.