System.IO.ZipArchive zipped only UTF-8 encoding - by kkamegawa

Status : 

  Fixed<br /><br />
		This item has been fixed in the current or upcoming version of this product.<br /><br />
		A more detailed explanation for the resolution of this particular item may have been provided in the comments section.

Sign in
to vote
ID 711235 Comments
Status Closed Workarounds
Type Bug Repros 3
Opened 12/3/2011 8:55:42 AM
Access Restriction Public
Moderator Decision Sent to Engineering Team for consideration


Windows Zipped folder stord in MS932 encoding in Japanese Windows.
But, System.IO.ZipArchive Class  only zipped in UTF-8 Encoding.

Japanese Windows zipped folder stored in MS932 Encoding.
Sign in to post a comment.
Posted by Microsoft on 4/29/2014 at 12:24 PM
Thank you for reporting this issue. This issue has been fixed in Visual Studio 2013. You can install a trial version of Visual Studio 2013 with the fix from:
Posted by HomeCloset on 6/1/2012 at 6:12 PM
Wonderful. Thank you for getting things right!
Posted by Microsoft on 6/1/2012 at 10:22 AM

Glad to help any time! :)

Greg & the BCL Team.
Posted by kkamegawa on 6/1/2012 at 10:02 AM
Thank you BCL team!
I'm check to Visual Studio 2012 RC/.NET Framework 4.5 RC document, and found new constructor.
Posted by kkamegawa on 3/26/2012 at 8:37 AM
Thank you comment, Greg.

I hope to support this feature.
Posted by Microsoft on 3/25/2012 at 8:12 PM
Hi folks,

You know that we cannot make specific technical details public before things actually get released.
But my hunch is that you will find this particular issue adequately addressed when .NET 4.5 comes out.

Posted by HomeCloset on 3/24/2012 at 9:41 AM
Any update?
Posted by 鉴证奇迹 on 1/29/2012 at 3:04 PM
Posted by Microsoft on 1/27/2012 at 1:37 PM
Hi All,

Sorry for being quiet for a while. We were busy making sure our next release is awesome! :))
I think you will be pleased with the way we are following up on this particular concern.
I am not yet ready to share specific API details: We need to first make sure that not only our Japanese users, but users from all parts of the world get what they need, and that all other partners and consumers are thought after too. But I think we are lading with a good solution for the issue discussed on this thread.

Posted by kkamegawa on 1/14/2012 at 8:51 AM
Thank you HomeCloset.

私もエンコーディングが指定できるコンストラクタが追加されるという方針に賛成です。また、Windows エクスプローラーがUTF-8で格納されたzipを扱えるようになることを期待しています。

Posted by HomeCloset on 1/12/2012 at 11:52 AM
For the bug opener K.Kamegawa's sake, here is Mike's post translated into Japanese by me.

> もとはといえば、ZIP ファイルにファイル名のコード ページは格納できなかった。PKWare の仕様書には、歴史的に IBM 437 が使われていたと書いてある。けど、今回のバグ レポートによれば、世のソフトが他のコードページや文字セットも使ってきたのは明らか。UTF-8 は General Purpose Bit 11 から識別できることになっているが、他のコード ページは、ZIP ファイルの仕様からは識別できない。だから開発者がコード ページを指定してやるしかない。
> (コード ページ用に Extra Field code が予約されているけど、定義されていない。)
> 簡単なのは、オーバーロードしたコンストラクタを追加して、そこではファイル名に用いるエンコードを指定できるようにすることだと思う。そうすれば開発者は、他のソフトで作成した既存の ZIP ファイルも読めるようになる。他のソフトで読める ZIP ファイルを作成できるようにもなる。.NET Framework としての一貫性からいって、デフォルト(エンコード指定のないオーバーロードたち)は UTF-8 のままがよい。開発者が UTF8Encoding を指定したら、プリアンブル(BOM) 無しと指定されない限り、General Purpose Bit 11 を立てるべし。
> ただしこれだと、とあるコード ページで作られた ZIP ファイルを別のコード ページで更新するようなケースには対応できない。でも、従来のソフトだってそのようなケースには対応していない(から対応しなくてよい)のだろう。

Posted by HomeCloset on 1/12/2012 at 10:26 AM
If the developer likes Encoding.Default, just allow him to specify it in a certain way. Currently there is no way. That is the problem here, I believe.
Posted by Atsushi Eno on 1/12/2012 at 8:24 AM

While you guys (Mike and HomeCloset) are going to agree to use utf-8, if I were to design these stuff, I'd have rather used another encoding - Encoding.Default.

Zip archive is not designed to be encoding-agnostic or global-ready, hence we anyways have problem with *any* default encoding that ZipArchive uses (will use). It's also known problem that Windows people and Mac people can't share zip archives that contain such Japanese files (there are some archive utilities that handles path names in MS932 or "shift_jis" in MacOS land). So, the world is divided anyways.

*Then*, IMO the most common encoding for every "locale-specific" zip archive users had better be saved. I'm sort of sure that those who consider globalization well wouldn't use (have used) non-ASCII path names for their file names in zip archives.

The situation is similar to the default text encoding for C# compiler, which used to be (or maybe even now?) locale-specific encodings. Before VS 2010, many Japanese people wrote their code (comments) in MS932 which was the default encoding in VS<2010.
I'm sort of sure that similar practices had been taken in everywhere in the world. For example, we, the mono team, has been using Latin1 in our source code (which was the most of the team and the community used).
While we had been improving our C# compiler, we were faced the issue how to determine its default encoding. It used to be Latin1 which surprised me and we now use Encoding.Default for mostly-making-sense behavior.

While text file encoding can be determined by BOMs (hence VS 2010 default encoding migration was almost harmless), it's worse in Zip archive land - there is no known way to determine it.

Though I sort of agree to the utf-8 based idea. Those who want to use Zip archives in their locale-specific manner could resort to other zip libraries. Encoding.Default does not exist in Silverlight and WP7, so it cannot be an universal solution anyways.

And yet, I'm all for providing developers a way to specify Encoding for resolving paths. There is some Japanese Encoding implementation that can be used in Silverlight or WP7, so it's practical when/if this class becomes ready in those worlds.
Posted by HomeCloset on 1/11/2012 at 9:43 AM
Technical tactics aside, I'd agree to the strategy of Mike's suggestion.
- Default UTF-8 for globalization.
- Optional encoding(s) for compatibility. (Right, this compatibility will work only within the same code page. But that's definitely what the legacy code page is for.)
Posted by Mike Dimmick on 1/10/2012 at 8:37 AM
The underlying problem is that the ZIP file format doesn't indicate which code page the file names are in. While PKWare's specification at says it was traditionally IBM 437, it's clear from this report that other code pages/charsets have been used to encode the file names in other software. UTF-8 is indicated with General Purpose Bit 11 but other code pages can't be distinguished from the format itself. The programmer will have to supply this information.

(There is a reserved Extra Field code for specifying the code page but it is not defined.)

I think the simplest solution is to add new overloaded constructors with an Encoding parameter to indicate the encoding to be used for file names in the file. That allows the programmer to read existing ZIP files created with other software, and allows the programmer to create ZIP files that other software can consume. For consistency with other Framework behaviour, the default, for the overloads without an Encoding parameter, should remain UTF-8. If the programmer specifies UTF8Encoding, then the General Purpose Bit 11 should be set, as long as it's set not to emit the preamble ('BOM').

This approach won't be enough if we need to support archives that were created with one code page, then updated from another, but other legacy software will not work correctly in that situation either.

The programmer would obviously be able to supply Encoding.Unicode or Encoding.UTF32, which would give results incompatible with other software, but I can't see how to avoid that in a general way (obviously you could check whether the Encoding is a UnicodeEncoding or UTF32Encoding, but you can't generally check the granularity of what an Encoding will emit).
Posted by HomeCloset on 12/29/2011 at 4:34 PM
> [Background]
> Most Japanese users extract Windows Explorer to extract ZIP files.
Most Japanese users employ Windows Explorer to extract ZIP files.
Posted by HomeCloset on 12/29/2011 at 4:11 PM
Here is what Mr./Ms. Kamegawa wrote back (with some edit by me). Hope it helps.
I think you've already answered most of the issue, but could you please recap it and/or add any comment?
(Personally I would agree with him/her. Whatever the ZIP file specification is, it seems like the de facto standard in Japan has been MBCS ZIP files. For a very long time. It is great that the .NET API uses UTF-8 for globalization, but I'm afraid the lack of a viable option to create MBCS ZIP files would make the .NET API practically useless in Japan.)

System.IO.Compress.ZipArchive stores MBCS file names in UTF-8.
Windows Explorer can't handle UTF-8 ZIP files.
So the ZIP files compressed by System.IO.Compress.ZipArchive are not extracted correctly by Windows Explorer.
Windows Explorer should support extracting UTF-8 ZIP files. Otherwise System.IO.Compress.ZipArchive should support storing MBCS file names. And the latter seems more practical.

Most Japanese users extract Windows Explorer to extract ZIP files. Japanese file names are used so frequently in Japan that the incompatibility between Windows Explorer and the ZIP files compressed by System.IO.Compress.ZipArchive is unacceptable.

[Repro Steps]
1. Create Japanese named files in c:\temp\.
2. Compress them into using the code below.
    --- Begin ---
    using (var zip = new ZipArchive(@"c:\temp\あ\", ZipArchiveMode.Create)){
     var files = new DirectoryInfo(@"c:\temp").GetFiles("*.*");
     Array.ForEach(files, x => zip.CreateEntryFromFile(x.FullName, x.Name));
    --- End ---
3. Extract using Windows Explorer.
-> The Japanese file names are corrupted as Greg depicted before.

[Expected Behavior]
1. Make Windows Explorer support extracting UTF-8 ZIP files.
2. Make System.IO.Compress.ZipArchive support compressing ZIP files using MBCS or the system locale.

The best would be 1. But I don't think it possible to make all the widespread Windows versions (i.e. XP/Vista/7/2003/2008/2008 R2) support UTF-8 ZIP files.

So I would like to ask for 2. It will provide the maximum compatibility with Windows including the legacy-but-widely-used versions of Windows.
Posted by kkamegawa on 12/29/2011 at 10:07 AM



using (var zip = new ZipArchive(@"c:\temp\あ\", ZipArchiveMode.Create)){
var files = new DirectoryInfo(@"c:\temp").GetFiles("*.*");
Array.ForEach(files, x => zip.CreateEntryFromFile(x.FullName, x.Name));



1はもっともよいシナリオだと考えていますが、すべてのWindows(XP/Vista/7/2003/2008/2008 R2)がすぐにUTF-8のzipをサポートできるとは考えられません。

よって、System.IO.Compress.ZipArchiveクラスでシステムロケールのzipの圧縮も行えるようにしてもえらえれば、アップデートを行っていないWindowsでも正しく展開可能なzipファイルを.NET Frameworkから簡単に作成することができるため、問題が発生する可能性が低くなると考えます。

Posted by HomeCloset on 12/27/2011 at 6:14 PM
English follows Japanese.


Kamegawa さん。
私は日本人ですが、Kamegawa さんが
- .NET とエクスプローラそれぞれの、圧縮・解凍について、現行のどのような動作に困っており、
- それを解決するために、.NET とエクスプローラそれぞれがどのように圧縮・解凍をして欲しい

まず次の 3 点を、括弧内を特定して、日本語で書いていただけませんか?

① 現行の「.NET またはエクスプローラ」は、ZIP ファイルを「圧縮または解凍」するときに、日本語ファイル名を「CP932 または UTF-8」で(のみ?)扱う。この動作に困っている。

② なぜ困るかというと、「.NET またはエクスプローラまたはその他のアプリケーション」は、ZIP ファイルを「圧縮または解凍」するときに、日本語ファイル名を「CP932 または UTF-8」で(のみ?)扱うからである。このため、①のような動作は、事実上使い物にならない。

③ この問題を解決するために、「.NET またはエクスプローラ」は、ZIP ファイルを「圧縮または解凍」するときに、日本語ファイル名を「CP932 または UTF-8」で(も?)扱うようになって欲しい。


    ① 現行の .NET は、ZIP ファイルを圧縮するときに、日本語ファイル名を UTF-8 でのみ扱う。この動作に困っている。
    ② なぜ困るかというと、エクスプローラおよび日本で人気の HogeHogeApp は、ZIP ファイルを解凍するときに、日本語ファイル名を CP932 でしか扱えないからである。このため、①のような動作は、事実上使い物にならない。
    ③ この問題を解決するために、.NET は、ZIP ファイルを圧縮するときに、日本語ファイル名を CP932 でも格納できるようになって欲しい。




    1. Windows 7 x86 SP1 英語版で、システム ロケールを日本語にする。
    2. 下記のコードで ZIP ファイルを作成する。
     hoge = hogehoge( hogehogehoge );
    → ZIP ファイル内に UTF-8 のファイル名だけが格納される。

    1. Windows 7 x86 SP1 英語版で、システム ロケールを日本語にする。
    2. ①で作成した ZIP ファイルをエクスプローラで右クリック。
    3. [Extract All] を選択。



I, as a Japanese, still don't understand what is your exact problem and what you're expecting exactly. That's why I can't decide whether to agree or disagree with you as a Japanese.
Can you elaborate on the following three points in the Japanse language?
1. The current behavior.
2. The fact that conflicts with the current behavior.
3. The expected behavior.
And then please provide the repro steps for 1 and 2, also in the Japanese language.
Posted by macrogreg on 12/27/2011 at 3:48 PM
Many thanks for the additional info.

My difficulty was not so much the creation of the file (your 5 steps below), but the reproduction of an error.
By following your steps I am creating a ZIP file "あ.zip " that contains one single text file "あ.txt ".
Then I use the following program to unpack it:

ZipArchive zip = ZipFile.OpenRead(@"E:\temp\ttt\あ.zip");
    foreach (var entry in zip.Entries) {
    using (FileStream outs = File.Create(@"E:\temp\ttt\UnzippedUsingZipArchiveAPIs\" + entry.FullName)) {
        using (var ins = entry.Open())

This works correctly. Please see the screenshot I attached to this bug (macrogreg20111227.UnzippedUsingZipArchiveAPIs.jpg)

Further, I take that same ZIP file and unpack it by using the Windows shell tool (right-click > Extract All…).
Again, this works correctly (screenshot attached: macrogreg20111227.UnzippedUsingShellTool.jpg)

** So what I am after is a detailed description of how you use the file produced in your 5-step recipe to produce an error or an unexpected result.

Please note that the Windows Shell ZIP tool is owned by the Windows team which is another team in Microsoft.
I have contacted them to highlight the issue, but I do not know whether or not they will be able to attend to this particular matter, and if – then when.
My goal is to make sure that everyone has a good experience with the .NET APIs.
If you would like to provide some feedback to the Windows team directly, check out this community site:
Posted by kkamegawa on 12/27/2011 at 8:26 AM
Thank you for your reply. and thank you HomeCloset.

Greg:Sorry,I can repro Windows 7 en-us.

1.Click "Region and Language" at control panel
2.Click "Administrative" tab.
3.Click "Language for non-Unicode programs" and "Change System Locale"
4.Change from English to Japanese(Japan)
(It do not need changes "display language" to Japanese)
5.Multi-byte Filename(あ.txt) send to Zipped folder.

I attach to screen capture in this report "20111228-01.png".

Thank you for your review.

I expect to System.IO.Compression.ZipArchive non-unicode zipped file UNPACK system locale's filename.Ideally, Windows Shell also wants support UTF-8 zipped files.
Posted by macrogreg on 12/26/2011 at 3:08 PM
Thank you for your replies. I really appreciate your valuable feedback.

Kamegawa: Unfortunately, I do not understand what the Japanese error message says. Could you please help me translate that?

HomeCloset: I translated your second paragraph that contains Japanese sentences using an automatic translation program. The translation was a bit ambiguous, but I think we are on the same page:

What works:
System.IO.Compression.ZipArchive WILL WORK CORRECTLY when UNCOMPRESSING files created with Windows' Send to Compressed Folder tool, iff the compression happens on a machine with the same system locale. System.IO.Compression.ZipArchive will detect that Unicode was not used, fall back to the local code page and this will co-operate well with the Windows shell.

What doesn't work:
Windows shell function Extract All Files WILL NOT WORK CORRECTLY when UNCOMPRESSING files created with System.IO.Compression.ZipArchive iff they contain non-ASCII characters. This is because System.IO.Compression.ZipArchive will use Unicode to make sure it creates a standard-compliant ZIP file that can be uncompressed on any machine.

In other words, System.IO.Compression.ZipArchive ALWAYS UNPACKS CORRECTLY, but it does NOT ALWAYS PACK a files that can be uncompressed with the Windows shell.

It is crucial that we establish whether or not this correctly described the problem from the perspective of the Japanese users.


(.NET Base Class Libraries team)
Posted by HomeCloset on 12/26/2011 at 12:26 PM
I can compress あ.txt using Windows Explorer on English Windows.

1. Install Windows 7 RTM x86 English. While installing, accept the default language settings.
2. Log on. Change the system locale to [Japanese (Japan)]. Restart Windows.
3. Create "c:\temp\あ.txt".
4. Open c:\temp folder using Explorer.
5. Right-click "あ.txt".
6. [Send to] - [Compressed (zipped) folder].
-> あ.zip is created in c:\temp. The name of あ.txt in あ.zip is encoded in CP932.

So, I'd suggest you to focus on your core issue.
I'm not familiar with .NET at all, but I guess the kernel of the issue you are trying to say is simple. "日本語名のファイルを圧縮する際に、System.IO.ZipArchive は UTF-8 でのみエンコードを行う。一方、Windows エクスプローラをはじめとする一般的なアプリケーションは、CP932 でエンコードするし、デコードの際にも CP932 を期待している。", isn't it? Please correct me if I'm wrong.

(Presuming my understanding is correct...)

However, I couldn't figure out what you expect. Which one do you expect?
- System.IO.ZipArchive should use the system locale? (CP932 in this case.)
- Windows Explorer should be able to expand the UTF-8 named files?
- System.IO.ZipArchive should have an additional parameter where the developers can specify the encoding?

And why do you expect it?
Posted by kkamegawa on 12/26/2011 at 8:43 AM
Thank you for your comment.
I tried to create Japanese zip file in Windows(en-us).
I tried to Windows Locale changes ja-jp and create zipped folder from あ.txt to あ.zip。
(Control Panel > Region and Language > Administrative Tab > Change System Locale)

But, Cannot create あ.zip from Explorer.Error message attached by this issue(20111227-01.png).
I think, this issue only occures Windows ja-jp only and Windows Explorer's design.

I want to support UTF-8 Encoding Windows ja-jp's Explorer.

Thank you.
Posted by Microsoft on 12/23/2011 at 2:13 AM
Hi there,

I tried many combinations, but I keep failing to reproduce the issue.

To simulate being on a Japanese system, I go to:
Control Panel > Region and Language > Administrative Tab > Change System Locale
and select “Japanese (Japan)”

Is this what you have as well?
This gives me the default system encoding “Japanese (Shift-JIS)” with the windows code page 932.

Our ZipArchive APIs actually insect a bit in the entry header to determine whether UTF-8 was used for compression.
Since Windows does not do that, we assume the default system codepage – in this case 932. Everything should work.
Indeed, I tried unzipping the files you provided and some more files I created locally and everything seems to work as long as my localization settings are set as above.

The only issue I expect to occur with our current design is: “Zip using our APIs – UnZip using Windows Shell”.
This is because when we see a Japanese character in the file name we use UTF-8 and the Windows tool cannot deal with it.

We will think about approaches for addressing the latter issue, but before we do that I want to understand why I cannot reproduce the failure as you describe it.
Could you please provide more info about your localization settings and verify your failure scenario in detail.
Please provide a step by step description starting from zero ending in a specific unexpected behavior to make sure we are talking about the same thing.

Thanks a lot!
Posted by kkamegawa on 12/14/2011 at 7:19 AM

1.Yes collect.
2.Yes .

Thank you for your suggestions.If Windows Zipped Folder supports UTF-8 Encoding, consider and decide all the problems.

Posted by Microsoft on 12/13/2011 at 3:54 PM
Hi there,

It would take me a while to get a localized Windows version, so let me see if I can get this straight regardless:

1. You create a ZIP archive using the Windows Explorer functionality to right-click on a file or folder and the choose “Send To > Compressed Folder”.
Is this correct?

2. You then Unzip this file by right-clicking again and selecting “Extract All”. It works and you get what you expected. Right?

3. Now you try to Unzip that same file using the new .NET 4.5 ZipArchive APIs and you get messed up file names. Correct?

If all of the above is correct, then please try creating another a ZIP file using the new ZipArchive APIs and then Unzipping that file using the new APIs.
I expect this will work. Can you confirm this?

I presume that the Windows shell archiver uses the local code page instead of UTF8 for file names inside the archive. This is not according to the ZIP standard. The resulting issues do not only impact Japanese users, but everyone with substantially different code pages. For instance, what happens if you create an ZIP file on a Russian version and then try to Unzip it in Vietnam? I will let the Windows team know about that, but I do not know whether this can be addressed.

From the .NET perspective, we cannot support creating non-standard compliant archives, but since this obviously affects a large proportion of our customers, we will explore options to help reading and extracting ZIP archives that contain non-standard compliant entry name encodings. We may not be able to automatically detect the encoding, but one potential option is to allow users to specify it in one way or another. Would this be helpful?

I do not know at this stage whether or not we will be able to address this in a manner that is satisfactory to everyone, but we will certainly investigate potential approaches.


Posted by kkamegawa on 12/13/2011 at 6:48 AM
Thank you for your reply.
I attach sample ZIP file made by Windows Zipped Folder.

”新しいテキスト ドキュメント.zip” is default text file(”新しいテキスト ドキュメント.txt” ) zipped by Windows Zipped folder.
"New Text" store in CP932 encoding file, ”新しいテキスト ドキュメント.txt” and "New Text Document.txt".

Posted by Microsoft on 12/12/2011 at 3:06 PM
For us to further investigate how to approach this problem, could you please make a sample ZIP file available that creates problems for you and describe a usage scenario with expected and actual behaviour.
We will then investigate options to alleviate the problem.


Posted by yfakariya on 12/10/2011 at 10:22 AM
I agree K.Kamegawa's opinion and I will describe extra backgrounds.

This issue could be serious usability problem even if it is Windows Shell issue or API specification design issue.
Most of Japanese users--especially non-technical users--often use non-ASCII file names (for example, business letters, families' photos, personal notes, etc.). They will never understand about charset or zip file spec, so they just recognize that the their Windows is buggy because it cannot display their file name in the compressed folder at all. Additionally, there are millions of zipped files which contains cp-932 encoded files in real world, so re-encoding as utf-8 is not available in real world. We will not use such implementation because it never satisfies any customers requirements due to file name encoding problem.
Posted by kkamegawa on 12/10/2011 at 9:41 AM
Thank you for your reply.This issue is bery important in Japan.

Windows Explorer's zipped folder store for CP932 at Japanese Windows(XP/Vista/Windows 7).
I think over 99% zip file makes Windows zipped folder.

System.IO.ZipArchive ony support UTF-8,and never use Japanese Developer.
Ideally, a zippd folder UTF-8 should support in Windows.but all users update cannot update.
Posted by Microsoft on 12/9/2011 at 2:17 PM
Hi there !

Thanks for bringing up this interesting issue. We are always grateful when customers point towards potential concerns - this helps us ensuring the quality of the .NET Framework and driving the product into the right direction.

However, in this case the system works correctly.

According to the ZIP file standard specification, only CP 437 and UTF-8 encodings are supported by for names of ZIP file entries.
Thus, to make sure that you can use your ZIP files created with our libraries with any other ZIP archive tools, we cannot support any other encodings.
However, if you feel that you have correctly encoded you entry names using UTF-8 and something did not work as expected, we would like to fix that issue.

In that case, pleaser re-open the bug and provide the following info:

- What you did;
- What you expected to get;
- What you got instead;
- A UTF-8 encoded text file with the list of file names you wanted to ZIP;
- The ZIP file you got (that you believe to be incorrect) or a full error message / exception stack trace if you did not get any file at all.

Thanks a lot!

(Software Engineer on the .NET Base Class Libraries team)

Posted by MS-Moderator10 [Feedback Moderator] on 12/6/2011 at 12:43 AM
Thank you for submitting feedback on Visual Studio 2010 and .NET Framework. Your issue has been routed to the appropriate VS development team for investigation. We will contact you if we require any additional information.
Posted by MS-Moderator01 on 12/4/2011 at 11:40 AM
Thank you for your feedback, we are currently reviewing the issue you have submitted. If this issue is urgent, please contact support directly(