Home Dashboard Directory Help
Search

Unnecessarily bad performance for coalesce(subquery) by Erland Sommarskog


Status: 

Active


43
0
Sign in
to vote
Type: Bug
ID: 336002
Opened: 3/30/2008 11:47:01 AM
Access Restriction: Public
0
Workaround(s)
view
17
User(s) can reproduce this bug

Description

It appears that the algebrizer transforms coalesce(a, b) to

CASE WHEN a IS NOT NULL THEN a ELSE b END

before optimizer gets a chance to see it. This is testified by the fact a
query like "SELECT coalesce(xyz, 0) FROM sys.objects" yields two error
messages for the missing column and not one.

When a is a subquery, this becomes

CASE WHEN (subquery) IS NOT NULL THEN (subquery) ELSE b END

This is unfortunate, because the optimizer does not seem to be able to match
the two instances of subquery, why it gets evaluated twice. This can easily be
worked around by using isnull instead, but coalesce is in the ANSI standard,
and coalesce is also more useful than isnull, since you can have more than
two parameters. And more importantly, few users would even suspect that
coalesce is bad.

In the repro, I use a SELECT query, because it was easier to compose such
a query. In real life, maybe the most important context is UPDATE statements
where you which to adhere to the ANSI standard. In this case, there are few choices
to coalesce.


Details
Sign in to post a comment.
Posted by Paul White NZ on 7/23/2010 at 11:03 PM
Hopefully the work necessary to incorporate the Covering Subexpression Manager (presented by Jingren Zhou and Per-Åke Larson from Microsoft Research three years ago) into the query optimizer will be included in SQL11.

As a minimum, this would avoid the performance problems demonstrated in this important Connect item - and would benefit a broad range of common queries, if more fully implemented.
Posted by Dmitry Mashkov [MSFT] on 1/19/2010 at 2:39 AM
"We assume that this does not affect many customers" - how did you assume this???
This has a great impact, on a lot of queries, we cannot rewrite all of them. It is a matter of common sense to not recalculate the same expression for comparisions.
Posted by Vladimir Moldovanenko on 12/7/2009 at 8:17 AM
Microsoft, I think your assumption is incorrect. "We assume that this does not affect many customers,...".

This is pretty common issue, particularly in procedures and functions, where the next number is retrieved.
I just spent time updating 40 or so procedures having this issue.
Code like this is very common in TSQL programming
    SET @conSequence = COALESCE(
    (
        SELECT MAX(con.conSequence)
        FROM dbo.Container con
        WHERE con.pdtID = @pdtID
        AND con.conDate = @conDate
    ), 0) + 1
Posted by Microsoft on 4/3/2008 at 4:10 PM
Erland,
You point out an interesting issue here. For the coalesce example, we have a single expression that we happen to evaluate twice. Although not a frequent scenario, it would be good to avoid that and we will keep it on our radar.
Regarding the update: this amounts to the problem of common sub-expression evaluation, which we will definitely tackle sooner or later in the query optimizer. It's a significant task though and just didn't meet the triage bar for 2008 anymore.
Thanks again and best regards,
Roman Schindlauer
Posted by Erland Sommarskog on 4/3/2008 at 1:14 PM
I can understand that this a too tall order for SQL 2008 RTM, nevertheless it is something that should be addressed in one way or another.

I can't say how many customers this affect, but permit me to point out that if you want to write ANSI-compatible UPDATEs, you can easily wind up with this.

The whole issue could of course be generalised, as the optimizer could make an effort to consolitated not only identical subqueries, but also similar. Consider:

UPDATE tbl
SET     col1 = (SELECT SUM(this) FROM tbl2 WHERE tbl2.keycol = tbl.keycol1),
             col2 = (SELECT SUM(that) FROM tbl2 WHERE tbl2.keycol = tbl.keycol1)

While adding row constructors would evade the need to write code like this,
there is still a lot of code like above out there, code that SQL 11 could all of
a sudden help to go faster.

But in this case, at least, the user can guss that this is not effcient. In the case
of coalesce(), this is hidden for him, which is why I have classified this as a bug.
Posted by AaronBertrand on 4/2/2008 at 6:33 PM
Erland, your repro steps should indicate that having Northwind installed is a prerequisite. :-)
Posted by Microsoft on 4/2/2008 at 5:15 PM
Erland,
Thanks for reporting this issue. It is by design that the subquery in a coalesce is not matched between the optimizer and the algebrizer. Enabling this would involve considerable efforts and does not meet the current triage bar for SQL 2008. We assume that this does not affect many customers, however, we will consider it again for the next major release.
A simple workaround that still uses coalesce would be to store the subquery's result in a variable and use that in the coalesce statement.
With best regards,
Roman Schindlauer
SQL Server Relational Engine
Sign in to post a workaround.