Aggregates Don't Follow the Semantics Of CASE - by Paul White NZ

Status : 

  Fixed<br /><br />
		This item has been fixed in the current or upcoming version of this product.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.


8
0
Sign in
to vote
ID 691535 Comments
Status Closed Workarounds
Type Bug Repros 4
Opened 9/28/2011 9:40:40 PM
Access Restriction Public

Description

SQL Server developers often rely on the left-to-right order of evaluation in CASE expressions to avoid runtime errors, and particularly on the implication that later expressions in the CASE statement will *not be evaluated* if an earlier one returns true.

When an aggregate is used in the CASE expression, this semantic is not always followed, as this example shows (it fails with a divide-by-zero error at runtime):

WITH Data (value) AS
(
    SELECT 0 UNION ALL
    SELECT 1
)
SELECT
    CASE 
        WHEN MIN(value) <= 0 THEN 0 
        WHEN MAX(1/value) >= 100 THEN 1
    END
FROM Data

Both WHEN conditions are evaluated in the same aggregate operator.  The same logical query expressed using different syntax does not cause an error:

WITH Data (value) AS
(
    SELECT 0 UNION ALL
    SELECT 1
)
SELECT
    CASE 
        WHEN (SELECT MIN(value) FROM Data) <= 0 THEN 0 
        WHEN (SELECT MAX(1/value) FROM Data) >= 100 THEN 1
    END

The second example uses a pass-through on the join to avoid evaluating the second WHEN condition if the first one evaluated to true.

Does CASE guarantee left-to-right evaluation with short-circuiting, or are we reading more into the documentation (at http://msdn.microsoft.com/en-us/library/ms181765.aspx) than we should?

If the different behaviour with aggregates is by design, the documentation should be updated accordingly.  Otherwise, the optimizer should generate a query plan that guarantees to follow the semantic of CASE - perhaps by eager-spooling intermediate results and using pass-through predicates on joins as shown above, or by suppressing errors that do not affect the result (errors in expressions that are never used).

A final simplistic example that throws an unexpected error:

SELECT 
    CASE @@ERROR
        WHEN 0 THEN 0
        WHEN MIN(SQRT(-1)) THEN 0
    END
Sign in to post a comment.
Posted by TechVsLife2 on 7/25/2012 at 5:12 PM
I'd emphasize that aggregate expressions in the THEN part (not just the WHEN part) are also evaluated first (prior to evaluating the CASE expression), even though their respective WHEN conditions are never met. So,
WHEN 1=0 THEN AVG(1/0)
will trigger a divide by zero error, and regardless of the location of the WHEN clause relative to other WHEN clauses.
Posted by TechVsLife2 on 7/25/2012 at 5:06 PM
Note: All of the below is now in the most recent BOL for sql 2012:

The CASE statement evaluates its conditions sequentially and stops with the first condition whose condition is satisfied. In some situations, an expression is evaluated before a CASE statement receives the results of the expression as its input. Errors in evaluating these expressions are possible. Aggregate expressions that appear in WHEN arguments to a CASE statement are evaluated first, then provided to the CASE statement. For example, the following query produces a divide by zero error when producing the value of the MAX aggregate. This occurs prior to evaluating the CASE expression.

Transact-SQLView ColorizedCopy to ClipboardPrintWITH Data (value) AS
(
SELECT 0
UNION ALL
SELECT 1
)
SELECT
     CASE
        WHEN MIN(value) <= 0 THEN 0
        WHEN MAX(1/value) >= 100 THEN 1
     END
FROM Data ;

WITH Data (value) AS
(
SELECT 0
UNION ALL
SELECT 1
)
SELECT
CASE
     WHEN MIN(value) <= 0 THEN 0
     WHEN MAX(1/value) >= 100 THEN 1
END
FROM Data ;

You should only depend on order of evaluation of the WHEN conditions for scalar expressions (including non-correlated sub-queries that return scalars), not for aggregate expressions.

Posted by TechVsLife2 on 2/12/2012 at 4:42 PM
ok, I got bitten by this. Note that it also hurts you when you have aggregates in the THEN part (i.e. with an "unchosen" WHEN). I call this "extralong circuiting":

This fails with divide by zero though N is never 4.
DECLARE @TblTest TABLE (N INT);
INSERT INTO @TblTest
VALUES (1), (1);
--SELECT N FROM @tblTest;
SELECT
        CASE
            WHEN N=1 THEN AVG(1)
            WHEN N=4 THEN AVG (1 / 0) --N is never 4!
            --note short-circuiting appears to work when next line is substituted for prev line:
            --WHEN N=4 THEN (1/0)
        END AS
Result
FROM @TblTest
GROUP BY N;
Posted by Paul White NZ on 1/5/2012 at 8:14 AM
Can you say anything about when this change to Books Online will appear? None of the online versions are showing the updated Remarks section text yet (3 months). Not complaining, just curious.
Posted by Microsoft on 10/5/2011 at 8:17 AM
The following has been added to the Remarks section of the topic CASE (Transact-SQL) in Books Online.
The CASE statement evaluates its conditions sequentially and stops with the first condition whose condition is satisfied. In some situations, an expression is evaluated before a CASE statement receives the results of the expression as its input. Errors in evaluating these expressions are possible. Aggregate expressions that appear in WHEN arguments to a CASE statement are evaluated first, then provided to the CASE statement. For example, the following query produces a divide by zero error when producing the value of the MAX aggregate. This occurs prior to evaluating the CASE expression.

WITH Data (value) AS
(
SELECT 0 UNION ALL
SELECT 1
)
SELECT
CASE
     WHEN MIN(value) <= 0 THEN 0
     WHEN MAX(1/value) >= 100 THEN 1
END
FROM Data

You should only depend on order of evaluation of the WHEN conditions for scalar expressions (including non-correlated sub-queries that return scalars), not for aggregate expressions.

Thank you for reporting this.
Posted by Paul White NZ on 10/3/2011 at 11:50 AM
Thanks for the detailed explanation :-/
Posted by Microsoft on 9/30/2011 at 9:47 AM
Thanks for the feedback. This is by design. Aggregates need to be evaluated first before the CASE. I'll pass this to our technical writing team to have them evaluate the documentation for this.

Best regards,
Eric
Program Manager
SQL Server Query Processing
Posted by Paul White NZ on 9/29/2011 at 5:05 AM
Hi sql_head,

There are all sorts of ways the implementation could honour the CASE semantic without being less efficient. One would be to suppress errors in expressions that are never referenced. In the specific example, the aggregate defines [Expr1003] and [Expr1004]. The compute scalar references both, but only [Expr1003] is actually touched:

[Expr1005] = Scalar Operator(CASE WHEN [Expr1003]<=(0) THEN (0) ELSE CASE WHEN [Expr1004]>=(100) THEN (1) ELSE NULL END END)

This sort of 'lazy error reporting' is already done e.g. for inserts where IGNORE_DUP_KEY is active, and expressions are generally evaluated lazily anyway (on first reference, not when defined) so there's really no reason to be concerned about the performance implications. Nevertheless, it's a good comment, so I updated the main text accordingly.

Paul
Posted by sql_head on 9/29/2011 at 12:40 AM
Not to say that the BOL documentation on CASE couldn't be made clearer for this particular scenario, but I do not believe that this violates the in-sequence-stop-on-true evaluation of CASE expressions. What it does present is an excellent example of the order in which SQL Server processes parts of each query. It confirms that all aggregates referenced in the SELECT or HAVING clauses are fully processed before any output criteria is applied as it is far more efficient to do so. Evaluating each aggregate only as needed by the output stage could have terrible implications for query performance (not to mention the logical issues that could be generated).

To take this specific example's estimated execution plan, there is a single sequence of constant scan >> stream aggregate (build MIN and MAX) >> compute scalar (evaluate CASE) >> Select (output results). In order to behave in the alternative method you suggest, it would require a looping stream aggregate / computer scalar combo, with each loop based on the results of the prior loop, and an initial sorting step to put the post-join-and-WHERE-clause resultset in the order of the grouping fields.