Please reconsider depreciating HASHBYTES algorithms MD5 & SHA1 - by David Lean

Status : 


Sign in
to vote
ID 2630638 Comments
Status Closed Workarounds
Type Suggestion Repros 1
Opened 4/24/2016 5:21:52 PM
Access Restriction Public


Please reverse the decision to depreciate HASHBYTES (MD5 & SHA1). They are a core requirement for Data Vault 2.0. Given its growing popularity as a data warehouse design methodology. The demand for that functionality is set to increase. 

I noticed that we’ve depreciated support for many of the HASHBYTES algorithms. (leaving only SHA2_256 & SHA2_512)

No doubt the logic behind the decision is the earlier algorithms are cracked & inappropriate for crypto. 

I’d suggest we rethink that decision. 

There is a growing interest in Data Vault 2.0 as a Data Warehouse design methodology. 
This approach depends heavily on hashing keys, most often MD5 & SHA1 are the recommended approaches. 
These hash keys are used as “almost” unique identifiers. They are applied to natural keys whose length is often too small to use SHA2 algorithms. 

They are recommended by DV2.0 because most platforms support them & they are computationally lighter weight.  Thus similar to a GUID, Hashed keys can be generated by many heterogeneous systems in parallel, & loaded into a single Staging database with “minimal” probability of collision. 

Depreciating these algorithms will create uncertainty for the DBA’s who are considering using SQL Server  as the platform for their Data Vault 2.0 data warehouse. 


PS: While we could debate the wisdom of such an approach given the risk of Hash Key collisions. But that is for the DV2.0 guys to argue. We just want to remain relevant. 
Sign in to post a comment.
Posted by David Lean on 5/18/2016 at 6:27 AM
With all due respect I don't feel that you (or I) are in a position to decide what the main purpose of a feature is.
In the 2 decades I worked at MSFT, inc assisting in product planning of SQL. We were constantly surprised by customers using our products in innovative ways that we had not originally envisioned.

In this case, while you may feel that HASHBYTES was added for security & CHECKSUM is for row comparison. A significant proportion of your customers will disagree with you. In their minds CHECKSUM is "depreciated" as BOL recommends against it. Most people now use HASHBYTES(MD5) to check for duplicate rows when loading large extracts or building SCD Type 2 dimensions.

Raul I have seen & enjoyed many of your presentations over the years. I know you eat, sleep & breathe Security. I love the improvements you have contributed to the product. So I totally get why you'd discount the MD5 & SHA1 algorithms. Please just consider changing BOL to recommend against using them for security. But leave them to enjoy great productivity as they are repurposed for another role.

Ultimately it is your call. But please think beyond crypto.
Thanks for taking time to reply.
Posted by Microsoft on 5/16/2016 at 10:34 AM
Thank you for your request.

To complement Steven’s comment. We currently do not have plans to completely remove support for SHA-1 and MD5 algorithms completely when using HASHBYTES function at this time, although these algorithms are no longer suited as sufficiently strong for cryptographic usage (which is the main purpose of this function), and we need to issue a deprecation warning users from using such algorithms and move to better alternatives.
These algorithms will still be available for usage in HASHBYTES for backwards compatibility reasons, but the deprecation event will continue firing when they are used.

I hope this information helps,
-Raul Garcia
Posted by Microsoft on 5/2/2016 at 4:58 PM
Hi David,

We made a last minute change in RTM to allow the use of SHA1 and MD5 for the HashBytes function. I will work to get the Books Online Documentation updated.

Thank You,

-Steven Gott