MERGE statement bypasses Referential Integrity - by Itzik Ben-Gan

Status : 

  Fixed<br /><br />
		This item has been fixed in the current or upcoming version of this product.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.


31
1
Sign in
to vote
ID 357419 Comments
Status Closed Workarounds
Type Bug Repros 15
Opened 7/23/2008 2:51:23 AM
Access Restriction Public

Description

The bug is that in certain cases an update applied by the MERGE statement 
seems to bypass referential integrity, while an UPDATE statement doesn't (as 
it should).
Sign in to post a comment.
Posted by Microsoft on 8/21/2008 at 6:26 PM
Thank you Itzik for reporting this problem, and thank you all for voting on the issue. We truly appreciate your efforts. We have prepared Critical On Demand (COD) hotfix 1750 for this bug and PSS now has the fix available. The KB article describing the issue and the fix will be located here soon: http://support.microsoft.com/?id=956718

In short, we recommend that if you are updating a non-clustered unique key with MERGE, you should take this fix.

Via a link from the KB article, you can download the fix automatically, without the need to call PSS.

The fix will also be in the Sept 22nd cumulative update.

Posted by Microsoft on 8/6/2008 at 6:32 PM
We are preparing a hotfix for this and will push it out as soon as possible. The fix is coded and we are testing it now. -Eric
Posted by Microsoft on 7/24/2008 at 7:00 PM
If you try the UPDATE statement with "set showplan_text on", you'll see the following operators for validating the foreign key constraint:


     |--Nested Loops(Left Semi Join, PASSTHRU:([c2_OLD] = [tempdb].[dbo].[t1].[c2])...)
         ...
            |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[t2].[PK__t2]), WHERE:([tempdb].[dbo].[t2].[c2]=[c2_OLD]))


The PASSTHRU in the first operator says we shouldn't worry about validating the constraint if we're not actually changing the referenced column (Rajeev's scenario); the predicate in the index scan says we should be looking for rows in the foreign table with the *old* value of c2.

Now try the same UPDATE statement with traceflag 8790 turned on. That will force us to generate a "wide" plan -- instead of maintaining the nonclustered index on c2 together with the clustered index on c1, we maintain it using a separate operator. That introduces new operators to

- Split the updates into a sequence of delete and inserts
- Sort the changes so that the deletes come before the inserts
- Collapse a delete and insert of the same c2 value into an update of the remaining columns.

As a side effect of this split-sort-collapse pattern, we also change the way we validate the foreign key constraint:


     |--Nested Loops(Left Semi Join, PASSTHRU:([Act1027]<>(3))...)
         ...
            |--Clustered Index Scan(OBJECT:([tempdb].[dbo].[t2].[PK__t2]), WHERE:([tempdb].[dbo].[t2].[c2]=[tempdb].[dbo].[t1].[c2]))


The PASSTHRU now says that we should only worry about changes that come out of the collapse as a delete (Act1027 will be 1 for update, 3 for delete, or 4 for insert). The column [t1].[c2] referenced in the index scan predicate has different semantics depending on the action -- for inserts and updates it's the new value of the column while for deletes it's the old value. Since we're only worrying about deletes, this has the same net effect as using [c2_OLD].

As a third experiment, turn off the traceflag and try

create table #new_values(c1 int primary key, c2 int);
insert #new_values values(1,2);
go

update t1
set t1.c2 = d.c2
from t1 join #new_values as d on t1.c1=d.c1
go

You should get the same plan as the original update got with the traceflag. By default, the optimizer generates a split-sort-collapse plan whenever you modify a unique column. However, we disable that pattern when we know that there is only one value being assigned to the column (either because we can determine that there is only one row being updated or because we can determine that the right side of the assignment is a constant). When you join with a genuine table instead of using a predicate on the primary key, the optimizer no longer knows that you're only modifying one row, so we will maintain the index on c2 separately.


The MERGE statement's overall design (without regard to foreign key validation) is similar to the split-sort-collapse but not quite exactly the same. Both problems you mention (allowing an update which we shouldn't and vice versa) occur because we are generating a foreign key validation plan that quietly assumes that there is actually a split. In particular, the join predicate uses [t1].[c2] instead of [c2_OLD]. Since there is no split, the action is always update and [t1].[c2] refers to the new value, not the old one.

You can work around the bug by forcing us to include a split in the plan. Some possible tricks are:

- Use TF 8790. Note that this will force us to maintain *all* nonclustered indices separately, which could result in bloated plans.
- Use a temp table in your USING clause (making sure to modify the assignment list to use a value from the temp table, not a hard coded constant or parameter).
- Modify your schema to make the referenced column part of the clustering key (we always split updates to the clustering key even when we know there's only one value being assigned to it).


Thank you for letting us know about this bug.

Andrew Richardson
Developer, SQL Server Query Optimizer.
Posted by Microsoft on 7/24/2008 at 8:55 AM
Thanks for the feedback Itzik! We are actively investigating this. -Eric
Posted by Itzik Ben-Gan on 7/23/2008 at 7:46 AM
I got the following from Rajeev Lahoty...

It seems like there's another bug with MERGE; not sure if it's related to the previous one or not.
When you update a referenced row key that has related referencing rows (prior to the update) without actually changing the key, an UPDATE statement succeeds, while a MERGE statement fails.
Here's the repro:

set nocount on;
use tempdb;
drop table t2, t1;
create table t1(c1 int not null primary key, c2 int not null unique);
create table t2(c1 int not null, c2 int not null references t1(c2), primary key(c1, c2));
go
insert into t1(c1, c2) values(1, 1);
insert into t2(c1, c2) values(1, 1);
go

-- Succeeds
update t1
set c2 = 1
where c1 = 1;

-- Fails
merge into t1
using (select 1 as c1) as d
on t1.c1 = d.c1
when matched then
update set t1.c2 = 1;

select * from t1;
select * from t2;