The following Adventure Works query generates a plan with a partial aggregate in a serial execution plan (cost threshold for parallelism at the default value of 5):
FROM Production.TransactionHistory AS th
JOIN Production.Product AS p ON
th.ProductID = p.ProductID
OPTION (MAXDOP 1)
An estimated execution plan shows the correct cardinality estimation of 441 rows (from statistics) for the partial aggregate below the merge join. When the MAXDOP hint is changed to 2, the estimate doubles to 882 (the plan is still serial). At MAXDOP 3 it trebles...and so on up to half the logical processors available to SQL Server.
With no MAXDOP hint the estimate is 4 times the correct value on a machine with 8 cores available to SQL Server. With server max degree of parallelism set to 1, or Resource Governor configured for the same limit, or a machine with only one processing unit, or affinity set so SQL Server can only see one processor, the estimate is correct (441).
The problem arises when the optimizer generates a local/global aggregate pair (rule GenLGAgg) for a serial plan alternative, and the local aggregate is pushed below a join (rule LocalAggBelowJoin). The costing and cardinality estimation performed by the partial aggregate (and *only* the partial aggregate in the plan) is calculated as if the operator were running in parallel at the estimated available DOP - despite the plan being a serial one in all cases.
This error results in poor plan selection and large, avoidable cost and cardinality errors that propagate up the plan tree. Reproduces on SQL Server 2005 to SQL Server 2012 RC0.