Home Dashboard Directory Help
Search

CTE's are handled inappropriately with specific UPDATE syntax by Aaron Alton


Status: 

Closed
 as Won't Fix Help for as Won't Fix


3
0
Sign in
to vote
Type: Bug
ID: 392152
Opened: 1/8/2009 12:24:02 PM
Access Restriction: Public
0
Workaround(s)
view
1
User(s) can reproduce this bug

Description

It seems that using a CTE with the following update syntax:

WITH CTE (ColumnList)
AS
(SelectStatement)

UPDATE CTE
SET Colx = b.Colx
FROM CTE
JOIN Tableb b ON CTE.JoinCol = b.JoinCol

Produces both an incorrect query plan and incorrect UPDATE results. Full repro below.


The following test data sets up an example wherein an UPDATE using a CTE and the syntax provided above produces incorrect results. Using this sample data:
---
CREATE TABLE t1 (    
c1 INT NOT NULL,    
c2 INT NOT NULL,    
c3 INT NULL    
)    
CREATE TABLE t2 (    
c1 INT NOT NULL,    
c2 INT NOT NULL    
)    
CREATE TABLE t3 (    
c1 INT NOT NULL,    
c2 INT NOT NULL    
)    
    
INSERT INTO t1 VALUES(1, 1, NULL)    
INSERT INTO t1 VALUES(2, 2, NULL)    
INSERT INTO t1 VALUES(3, 3, NULL)    
    
INSERT INTO t2 VALUES(1, 3)    
INSERT INTO t2 VALUES(2, 1)    
    
INSERT INTO t3 VALUES(1, 2)    
INSERT INTO t3 VALUES(2, 1)    
---
I can perform a simple update on the base table t1 directly, using the following:

---
UPDATE t1
SET c3 = b.[c2]    
FROM t1 AS a INNER JOIN t3 AS b ON a.c1 = b.[c1]    

SELECT * FROM t1
UPDATE t1 SET c3 = NULL
---
And the results are, as expected, correct. If you look at the execution plan, there are no missing predicate warnings.
Doing the same thing with a view yields the same result, and also produces no warnings in the execution plan:
---
CREATE VIEW t1View
AS
SELECT c1, c2, c3
FROM t1
GO

UPDATE t1View
SET c3 = b.c2
FROM t1View AS a INNER JOIN t3 AS b ON a.c1 = b.c1

SELECT * FROM t1
UPDATE t1 SET c3 = NULL
---
Now using the exact same query in the CTE that I used in the view (SELECT c1, c2, c3 FROM t1), and the exact same update syntax, I'm getting a missing predicate warning and incorrect results.
---
;WITH r_set    
AS    
(    
SELECT c1, c2, c3
FROM t1    
)    
UPDATE r_set    
SET c3 = b.[c2]    
FROM r_set AS a INNER JOIN t3 AS b ON a.c1 = b.[c1]    

SELECT * FROM t1
UPDATE t1 SET c3 = NULL
---
Here's the strange part. If I create a view using the CTE, like this:
---
CREATE VIEW t1CTE
AS
WITH r_set    
AS    
(    
SELECT c1, c2, c3
FROM t1    
)    
SELECT * FROM r_set
GO

UPDATE t1CTE
SET c3 = b.c2
FROM t1CTE AS a INNER JOIN t3 AS b ON a.c1 = b.c1

SELECT * FROM t1
UPDATE t1 SET c3 = NULL
--------------------------------
It works, and there's no missing predicate warning!

If I use a subquery to perform the update, or I use an alias in the update line (UPDATE a SET a.Col1 = b.Col1 FROM CTE AS a JOIN......), the statement also works, and the join predicate warning disappears.
Details
Sign in to post a comment.
Posted by Microsoft on 2/4/2009 at 1:42 PM
The BOL update is on its way so I am closing this now.

Thank you again for bringing this to our attention!

-Tobias
Posted by Microsoft on 1/15/2009 at 3:28 PM
Let's keep it open until BOL is updated.

Thanks
-Tobias
Posted by Aaron Alton on 1/14/2009 at 6:51 PM
Thanks, Tobias. I'm not too concerned about the fix as long as BOL is updated (forewarned is forearmed). I like the thought of removing ambiguity in the UPDATE syntax, so you have my vote!

Should I close the bug, or will you do that once BOL has been updated?
Posted by Microsoft on 1/14/2009 at 5:53 PM
Hi Aaron,

The behavior that you have found is known and although it is not desirable it isn’t a priority to fix given the easy work around that you of course found (just use the alias in the UPDATE). We will however update the documentation to warn about this behavior. We are considering disallowing this kind of ambiguous syntax in future versions of SQL Server. That is, if you use a FROM-clause in an update statement, the name that you are referring to in your update has to exist in the FROM/JOIN-clauses. This kind of change would need to be implemented over several versions because of back comp. issues.

Examples:
This would be legal:
UPDATE MyTable SET
    Col1 = o.Col1
FROM MyTable
INNER JOIN OtherTable AS o ON o.ID = MyTable.ID;

This would also be legal:
UPDATE m SET
    Col1 = o.Col1
FROM MyTable AS m
INNER JOIN OtherTable AS o ON o.ID = m.ID;

But not this:
UPDATE MyTable SET
    Col1 = o.Col1
FROM MyTable AS m
INNER JOIN OtherTable AS o ON o.ID = m.ID;

Again, thank you for your feedback and I am sorry that we don't have a better answer than this to give you!
/Tobias Ternstrom
Posted by Aaron Alton on 1/9/2009 at 1:03 PM
Hi Adam,

Thanks for having a look at this. I understand that the explicit alias is preferred, and that it does resolve the issue. My problem, and the reason why I would classify it as a bug is that if you use any other construct with exactly the same syntax (a table variable, a temp table, a permanent table, or a view), SQL Server understands an implicit relationship between the object in the UPDATE line and the object in the FROM line. The only place this doesn't happen is with a CTE. My examples walk through this as well.

My primary concern, and reason for filing this as a "bug" is that some people may be using the same syntax that I have used _until now_ , without realizing that it behaves differently if you substitute a CTE. What do you think - is this a valid concern?
Posted by Adam Machanic on 1/9/2009 at 12:30 PM
After thinking about this some more, and looking at the query plans, here's what I think is happening if you don't use the proper correlation name: T1 is joined to T3, then the results of that are used to go back and update T1 again, without considering that it has already been joined. So it's the equivalent of:

UPDATE t1
SET t1.c3 = x.c2
FROM
(
SELECT t3.c2
FROM t1
JOIN t3 ON t1.c1 = t3.c1
) x

In this case for each row, you're going to get effectively a random (or at least nondeterministic) value from the derived table. And I think this is actually correct/expected behavior, even if it seems to not make sense. SQL Server is doing exactly what you asked it to do... You just didn't ask for the right thing :-)
Posted by Adam Machanic on 1/9/2009 at 12:14 PM
I think your CTE query may be malformed, but I'm not entirely sure what it means as-is. If you change it slightly, it starts to work properly:

---
;WITH r_set    
AS    
(    
SELECT c1, c2, c3
FROM t1    
)    
UPDATE a    
SET c3 = b.[c2]    
FROM r_set AS a
INNER JOIN t3 AS b ON a.c1 = b.[c1]    
---

All I changed is "UPDATE r_set" to "UPDATE a". I'm not sure what you're actually updating if you don't use the correlation name; the FROM clause doesn't actually reference any table called "r_set" in that case, and so I wonder whether the JOIN condition is actually considered, or how. It seems much better to be consistent with the correlation name. This might actually be correct behavior...
Posted by Aaron Alton on 1/8/2009 at 6:37 PM
Forgot to add - we investigated this on the MSDN forums - have a look at this thread for more information:
http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/f120453b-ea18-4991-b891-4bef9400349d
Sign in to post a workaround.