COUNT(DISTINCT *) - by mravikiran

Status : 

  Won't Fix<br /><br />
		Due to several factors the product team decided to focus its efforts on other items.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.


1
3
Sign in
to vote
ID 624166 Comments
Status Closed Workarounds
Type Suggestion Repros 0
Opened 11/23/2010 10:26:50 PM
Access Restriction Public

Description

I find it very essential that SQL Server supports COUNT(DISTINCT *), something like:

SELECT COUNT(DISTINCT *) FROM TableName

The above query should give count of unique rows in the given table.
Sign in to post a comment.
Posted by Microsoft on 12/5/2010 at 10:21 PM
Closed the request.

--
Umachandar
Posted by Microsoft on 12/5/2010 at 10:01 PM
Hi Ravi,
Thanks for your feedback. We follow the ANSI SQL syntax/behavior for the aggregate functions. There is no syntax like getting distinct of the column values with COUNT. So I don't see us adding T-SQL specific extension for COUNT(DISTINCT *). Doing this also has challenges if the data contains LOB columns. Anyway, there are many ways to get the same result and you can do it yourself as you showed. Optionally, for something faster with less chance of error you can use the hashbytes to compute a cryptographic checksum which can avoid collisions in even large data sets. For ex:

select HASHBYTES('md5', CAST(t.name as binary(128)) + cast(t.schema_id as binary(4))) from sys.tables as t;

This might be one approach to doing distinct count on the entire row values faster. Hope this helps.

--
Umachandar, SQL Programmability Team
Posted by mravikiran on 11/24/2010 at 10:54 PM
Partially True.

Duplicates can also be "logically" determined by a set of columns. And, dups can also occur in temp tables, table variables etc. COUNT(DISTINCT *) is meant for progammatic reasons, not for design reasons.
Posted by DB007 on 11/24/2010 at 8:58 AM
All rows in a table should be unique; exact duplicate rows should not be present in a table: support.microsoft.com/kb/139444 - How to remove duplicate rows (would be important here).

(so a select count(distinct *) from tablename - is pretty much a redundant command; will be used <0.1% of the time as it would not identify duplicate rows in the table either: if thats what you are trying to do)

Once the PK is present on a table, just substitue for select count(distinct PK_column) from tablename.

It generally will be a lot faster doing a PK Count; or even filtered count against a PK Index.

I cannot see any reason why a table should not have a primary key in any circumstances, and a distinct * is just lazy.