Home Dashboard Directory Help
Search

COUNT(DISTINCT *) by mravikiran


Status: 

Closed
 as Won't Fix Help for as Won't Fix


1
3
Sign in
to vote
Type: Suggestion
ID: 624166
Opened: 11/23/2010 10:26:50 PM
Access Restriction: Public
2
Workaround(s)
view

Description

I find it very essential that SQL Server supports COUNT(DISTINCT *), something like:

SELECT COUNT(DISTINCT *) FROM TableName

The above query should give count of unique rows in the given table.
Details
Sign in to post a comment.
Posted by Microsoft on 12/5/2010 at 10:21 PM
Closed the request.

--
Umachandar
Posted by Microsoft on 12/5/2010 at 10:01 PM
Hi Ravi,
Thanks for your feedback. We follow the ANSI SQL syntax/behavior for the aggregate functions. There is no syntax like getting distinct of the column values with COUNT. So I don't see us adding T-SQL specific extension for COUNT(DISTINCT *). Doing this also has challenges if the data contains LOB columns. Anyway, there are many ways to get the same result and you can do it yourself as you showed. Optionally, for something faster with less chance of error you can use the hashbytes to compute a cryptographic checksum which can avoid collisions in even large data sets. For ex:

select HASHBYTES('md5', CAST(t.name as binary(128)) + cast(t.schema_id as binary(4))) from sys.tables as t;

This might be one approach to doing distinct count on the entire row values faster. Hope this helps.

--
Umachandar, SQL Programmability Team
Posted by mravikiran on 11/24/2010 at 10:54 PM
Partially True.

Duplicates can also be "logically" determined by a set of columns. And, dups can also occur in temp tables, table variables etc. COUNT(DISTINCT *) is meant for progammatic reasons, not for design reasons.
Posted by DB007 on 11/24/2010 at 8:58 AM
All rows in a table should be unique; exact duplicate rows should not be present in a table: support.microsoft.com/kb/139444 - How to remove duplicate rows (would be important here).

(so a select count(distinct *) from tablename - is pretty much a redundant command; will be used <0.1% of the time as it would not identify duplicate rows in the table either: if thats what you are trying to do)

Once the PK is present on a table, just substitue for select count(distinct PK_column) from tablename.

It generally will be a lot faster doing a PK Count; or even filtered count against a PK Index.

I cannot see any reason why a table should not have a primary key in any circumstances, and a distinct * is just lazy.
Sign in to post a workaround.
Posted by mravikiran on 11/23/2010 at 10:30 PM
The below query, for example, can also be used:

SELECT COUNT(*)
FROM (SELECT DISTINCT * FROM Employees) E
Posted by mravikiran on 11/23/2010 at 10:27 PM
1) SELECT DISTINCT into a temp table or table variable.
2) Take a COUNT(*) from the temp table or table variable.
3) Drop if it is a temp table.