The problem is that by default in zip entry names only the ASCII characters are allowed by the Zip specification, more specifically: (Source: APPENDIX D)
APPENDIX D.1 The ZIP format has historically supported only the original IBM PC character
encoding set, commonly referred to as IBM Code Page 437. This limits storing
file name characters to only those within the original MS-DOS range of values
and does not properly support file names in other character encodings, or
languages. To address this limitation, this specification will support the
following change.
Later support for Unicode names has been added. This can be marked with a special bit referred to as general purpose bit 11
, also called Language encoding flag (EFS)
:
Section 4.4.4 - General purpose bit flag - Bit 11 - Language encoding flag (EFS). If this bit is set, the filename and comment fields for this file MUST be encoded using UTF-8.
APPENDIX D.2 If general purpose bit 11 is unset, the file name and comment should conform
to the original ZIP character encoding. If general purpose bit 11 is set, the
filename and comment must support The Unicode Standard, Version 4.1.0 or
greater using the character encoding form defined by the UTF-8 storage
specification. The Unicode Standard is published by the The Unicode
Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files
is expected to not include a byte order mark (BOM).
The general purpose bit flag
is present and supported by Go: it is the Flags
field of the FileHeader
struct. Unfortunately Go doesn't have methods to set this bit, and by default it is 0.
So the easiest way to add support for Unicode names is to simply set bit 11
to one. Instead of
FileName, _ := Zip.Create(Path)
Start your zip entry with:
h := &zip.FileHeader{Name:Path, Method: zip.Deflate, Flags: 0x800}
FileName, _ := Zip.CreateHeader(h)
The first line creates a FileHeader
in which 0x800
(bit 11
) value is set for the Flags
field which tells that the file name will be encoded using UTF-8
(which is what Go does when it writes a string
to an io.Writer
).
Note:
By doing this, UTF-8 filenames will be preserved, but not all zip reader/extractor supports it. For example on Windows, the windows file handler, the Windows Explorer will not decode it as UTF-8, but for example a more serious Zip handler (e.g. SecureZip) will see the UTF-8 file names and will extract the file names properly (using UTF-8 decoding).