We are supplied by a third party some text files compressed with gz.
HDInsight cannot load some of them them unless we decompress them manually and then recompress (also as gz), but with no other changes. We have been using 7z to do the de/compression.
Small files work ok and without unpacking but over a certain size they break. As supplied to us they fall into 2 clusters < 1Mb (All work)and 50Mb+ (All break) so we don't have any real visibility on where the cutoff actually arises.
These same files also have issue with the .NET System.IO.GZipStream class but work fine with SharpZipLib.
These files work without issue on the aws platform.
Edit: Unfortunately connect will not allow me to upload test files due to their size.