Home Dashboard Directory Help
Search

OVER clause enhancement request - DISTINCT clause for aggregate functions by Itzik Ben-Gan


Status: 

Active


445
3
Sign in
to vote
Type: Suggestion
ID: 254393
Opened: 1/28/2007 11:55:01 AM
Access Restriction: Public
Duplicates: 350940
6
Workaround(s)
view

Description

SQL Server 2005 introduced only partial support for the OVER clause. It is our strong believe that a more complete implementation of the OVER clause should be prioritized highly in consideration for future enhancements in SQL Server. A more complete implementation of the OVER clause can help in solving many common business problems with simpler, more intuitive, and faster solutions than available today and also substantially reduce the need for cursors.
The following paper details the feature enhancement requests:
http://www.insidetsql.com/OVER_Clause_and_Ordered_Calculations.doc
The current feedback item addresses the OVER clause enhancement request - DISTINCT clause for aggregate functions, discussed in the paper in section 4 item vi.
Details
Sign in to post a comment.
Posted by theAnonGuy on 3/5/2013 at 7:04 AM
It's really frustrating that this feature is not included in SQL Server 2012 also !
Posted by shadybyte on 1/10/2013 at 12:47 PM
I really hope this feature is a part of SQL Server 2012 because I just tried to use it in 2008 and came here to find it's not supported. I'm using something like the code below to get the result.

SELECT     COUNT(DISTINCT IPAddress) AS DedupedUserCount
FROM        Referrals AS R
GROUP BY R.Website
HAVING     ReferralTime >= '2012-10-30' AND ReferralTime < '2012-10-31';
Posted by Iain Elder on 10/31/2012 at 10:23 AM
Say you are reporting on a price comparison engine.

In a given time period, you want a count of how many users clicked through to each website whose prices you compare.

On the assumption that users make at most one purchase per website per day, you want to dedupe the cases where users click through many times.

The Referrals table contains one row per click-through.

To get this deduped count for yesterday, you should be able to write

SELECT
COUNT(DISTINCT IPAddress) OVER (PARTITION BY Website) AS DedupedUserCount
FROM Referrals
WHERE
ReferralTime >= '2012-10-30' AND
ReferralTime < '2012-10-31'

This fails with the disappointing message

    Msg 10759, Level 15, State 1, Line 11
    Use of DISTINCT is not allowed with the OVER clause.

Plamen Ratchev provides a workaround (see Workarounds) using the DENSE_RANK function in a subquery and the MAX function.

But we should be able to do it as declaratively as the SQL grammar allows!
Posted by Professional The on 11/3/2011 at 12:38 PM
It would be nice to be able to use the result of an OVER clause in a where clause too. I'm always having to make wrappers just to filter the result of an OVER clause.
Posted by Erickson Winter on 5/6/2011 at 8:21 AM
I use the over() clause quite a bit and it's been very useful, however only now am I really wanting more functionality from it, so figured I'd bump this thread. I'm good with workarounds, and not entirely sure I, "need" additional functionality, however it would surely lend itself to more elegant solutions.

-- sql 2008 r2, mainly using over() for data warehousing, or tweaky ad hoc reports,
-- DBA, New Belgium Brewery
Posted by ErikEckhardt on 12/14/2009 at 6:05 PM
If we could nest ranking functions this would do the same thing (of course, we can't so this code doesn't work):

SELECT
    X,
    Y,
    Rnk = Max(Dense_Rank() OVER (PARTITION BY X ORDER BY Y)) OVER (PARTITION BY X)
FROM (
    SELECT 'a' x, 1 y
    UNION ALL SELECT 'a', 1
    UNION ALL SELECT 'a', 2
    UNION ALL SELECT 'b', 1
    UNION ALL SELECT 'b', 1
) Z
Posted by ErikEckhardt on 12/14/2009 at 5:53 PM
Count(DISTINCT Col1) OVER (PARTITION BY Col2) would solve my problem perfectly! I need to know if the number of items is different than the number of distinct items. Solving this another way will be tricky.
Posted by Jeremy Holovacs on 9/15/2009 at 7:08 AM
yes this would be very helpful. I'm sitting here frustrated over my inability to do this.
Posted by solleyman on 3/10/2009 at 4:37 PM
I am a big fan of the ranking functions and partition by clause. I used them extensively. It was a fantastic inclusion in 2005. However, I can't tell the number of times I wished DISTINCT worked with the count function. I can't believe it wasn't included in 2008!! So I've added my vote to get this in ASAP.
Posted by Microsoft on 12/13/2007 at 3:33 PM
Dear Itzik.

Thanks a lot for your feedback... and the strong voting support for this feature. Indeed, as you know, I am very much in favor of extending our functionality in this area. For a variety of reasons we did not get this into SQL Server 2008, but we are certainly looking into it for a future release.

Keep the votes and comments coming...
Michael

PS: My apologies for the late official reply...
Sign in to post a workaround.
Posted by jakubk on 3/19/2014 at 6:11 PM
Using the foo table declared below


select *, DENSE_RANK() over (partition by sales_rep order by client asc) + DENSE_RANK() over (partition by sales_rep order by client desc) - 1
from foo

the easiest to implement workaround/hack i've seen for this. Stolen from : http://stackoverflow.com/questions/11202878/partition-function-count-over-possible-using-distinct
Posted by Iain Elder on 10/31/2012 at 10:08 AM
Plamen Ratchev's workaround worked for me:

http://social.msdn.microsoft.com/Forums/br/transactsql/thread/8868da5f-873f-4584-8831-4fcc7a620e74

Query:

CREATE TABLE Foo (
fookey INT PRIMARY KEY,
company VARCHAR(30),
sales_rep VARCHAR(30),
client VARCHAR(30));

INSERT INTO Foo VALUES(1, 'ABC Corp.', 'Joe', 'Client1');
INSERT INTO Foo VALUES(2, 'ABC Corp.', 'Joe', 'Client2');
INSERT INTO Foo VALUES(3, 'ABC Corp.', 'Peter', 'Client2');
INSERT INTO Foo VALUES(4, 'DEF Corp.', 'Joe', 'Client1');
INSERT INTO Foo VALUES(5, 'DEF Corp.', 'Joe', 'Client3');

SELECT fookey, company, sales_rep, client,
MAX(rk1) OVER(PARTITION BY sales_rep) AS rep_distinct_client_cnt,
MAX(rk2) OVER(PARTITION BY company) AS company_distinct_client_cnt
FROM (
SELECT fookey, company, sales_rep, client,
DENSE_RANK() OVER(PARTITION BY sales_rep ORDER BY client) As rk1,
DENSE_RANK() OVER(PARTITION BY company ORDER BY client) As rk2
FROM Foo) AS F;

Result:

ookey company sales_rep client rep_distinct_client_cnt
company_distinct_client_cnt
----------- ---------- ----------- -------- -----------------------
---------------------------
1 ABC Corp. Joe Client1 3 2
2 ABC Corp. Joe Client2 3 2
3 ABC Corp. Peter Client2 1 2
4 DEF Corp. Joe Client1 3 2
5 DEF Corp. Joe Client3 3 2

Explanation:

In the subquery, assign each unique client per grouping column a new integer from an ascending sequence. In the outer query, take the maximum integer for each grouping column.

The effect is what we would like to see by writing two column expressions COUNT(DISTINCT client) OVER (PARTITION BY sales_rep), COUNT(DISTINCT client) OVER (PARTITION BY company)
Posted by DWalker on 5/10/2012 at 8:35 AM
Adam, I'll bet you thought your "3 more years" comment was being generous.

I just needed this functionality today, and not only did it not make the cut for SQL 2008, but I'm not sure if it's in SQL 2012.
Posted by tymk on 3/4/2010 at 9:30 AM
Even simpler, use a database that supports it.
Posted by ErikEckhardt on 12/14/2009 at 6:04 PM
Here's a slightly simpler way to do it. I modified Adam's query just to keep comparison easier. Sorry, I couldn't help formatting the query my preferred way.

SELECT
X,
Y,
DistinctY = Max(Rnk) OVER (PARTITION BY X)
FROM (
SELECT
     X,
     Y,
     Rnk = Dense_Rank() OVER (PARTITION BY X ORDER BY Y)
FROM (
     SELECT 'a' x, 1 y
     UNION ALL SELECT 'a', 1
     UNION ALL SELECT 'a', 2
     UNION ALL SELECT 'b', 1
     UNION ALL SELECT 'b', 1
) Z
) Z1
Posted by Adam Machanic on 9/12/2008 at 9:16 AM
In case anyone needs to do this before this fix gets made (3 more years, I suppose?), here's one way I've found to fake it. Following query gives the distinct count of Y partitioned by X:


select
    x,
    y,
    sum(case when r=1 then 1 else 0 end) over (partition by x) as distinct_y
from
(
    select
        x,
        y,
        row_number() over (partition by x, y order by y) r
    from
    (
        select 'a' x, 1 y
        union all
        select 'a', 1
        union all
        select 'a', 2
        union all
        select 'b', 1
        union all
        select 'b', 1
    ) z
) z1