Potentially Missing Compression Codecs on HDInsight platform. - by John Diss

Status : 

  External<br /><br />
		This item may be valid but belongs to an external system out of the direct control of this product team.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.

Sign in
to vote
ID 783292 Comments
Status Closed Workarounds
Type Bug Repros 0
Opened 4/9/2013 11:03:52 AM
Access Restriction Public


We are supplied by a third party some text files compressed with gz. 
HDInsight cannot load some of them them unless we decompress them manually and then recompress (also as gz), but with no other changes. We have been using 7z to do the de/compression.

Small files work ok and without unpacking but over a certain size they break. As supplied to us they fall into 2 clusters < 1Mb (All work)and 50Mb+ (All break) so we don't have any real visibility on where the cutoff actually arises. 

These same files also have issue with the .NET System.IO.GZipStream class but work fine with SharpZipLib.

These files work without issue on the aws platform.

Edit: Unfortunately connect will not allow me to upload test files due to their size.
Sign in to post a comment.
Posted by Microsoft on 6/20/2013 at 2:53 PM
Thank you for reporting this issue. This might be due to a MD5 bug in windowsazure api which has been fixed in the latest version. Please let us know if you are still seeing this issue.