New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
archive/zip: need new api to support local file name encoding #10741
Comments
I think we just need to export the FlagEncodingUtf8
constant and let the user choose what he/she needs.
The (*Writer).Create method could default to UTF-8 if
name is valid utf-8 encoding, but not ASCII.
|
@minux If a filename is GBK and valid utf8, and i want to use GBK(not utf8), |
@chai2010, use CreateHeader directly, and not Create. |
@bradfitz If we force use utf8 encoding for valid utf8 gbkFilename (as @minux suggest), if err := writeHeader(w.cw, fh); err != nil {
return nil, err
} we can't clear flag's bit11 before But the filename is gbk, NOT utf8string(though |
I didn't suggest CreateHeader to change fh.Flags.
Only Create will set fh.Flags to a sane default based
on the name.
If you use CreateHeader method, then you need to
set the Flags field by yourselves.
|
@minux I agree with you. Only |
What's the status of this issue? At least we should document the bit11 - I've had to read through the APPNOTE.TXT of the ZIP spec to find out how t ostore UTF-8 names in the zip file. The best (IMHO) would be to document & make Create set that bit, as in Go, every valid string is UTF-8, if one wants something else, should use CreateHeader. |
The fact that many zip writers use local character encoding is a hack since the specification only seems to support two different encodings: CP-437 or UTF-8. CL/39570 (exposed in Go1.9) changed it such that the UTF-8 flag was set automatically if the string looked like valid UTF-8, but it seems that the check was too liberal. It seems to me that the thing that can be practically done here is to provide the user with a way to ensure bit 11 (which controls UTF-8) is cleared. The automatic detection scheme adding in Go1.9 is problematic, so we can either revert it for Go1.10 or explicitly add API to control clearing it. If we revert CL/39570, users were always able to set UTF8 by manually setting bit 11 on \cc @mattn |
When set utf-8 string in filename, it works fine. package main
import (
"archive/zip"
"io"
"log"
"os"
)
func init() {
f, err := os.Create("sample.txt")
if err != nil {
panic(err)
}
f.Write([]byte("hello world"))
f.Close()
}
func main() {
zf, err := os.Create("sample.zip")
if err != nil {
log.Fatal(err)
}
defer zf.Close()
zw := zip.NewWriter(zf)
defer zw.Close()
info, err := os.Stat("sample.txt")
if err != nil {
log.Fatal(err)
}
hdr, err := zip.FileInfoHeader(info)
if err != nil {
log.Fatal(err)
}
hdr.Name = "サンプル.txt"
zh, err := zw.CreateHeader(hdr)
if err != nil {
log.Fatal(err)
}
f, err := os.Open("sample.txt")
if err != nil {
log.Fatal(err)
}
io.Copy(zh, f)
} Windows command line. Linux Also Windows explorer. Sorry, I don't know how this works on OSX. When set non-utf-8 multi-byte string, user should handle himself. This zip archive contain filename in Shift_JIS. Windows explorer (japanese environment) works well. But zip command line doesn't since the archive doesn't have encoding name. You must convert filename manually. package main
import (
"archive/zip"
"io"
"log"
"os"
"golang.org/x/text/encoding/japanese"
)
func main() {
zr, err := zip.OpenReader("sample.zip")
if err != nil {
log.Fatal(err)
}
defer zr.Close()
name, err := japanese.ShiftJIS.NewDecoder().String(zr.File[0].Name)
if err != nil {
log.Fatal(err)
}
f, err := os.Create(name)
if err != nil {
log.Fatal(err)
}
defer f.Close()
r, err := zr.File[0].Open()
if err != nil {
log.Fatal(err)
}
io.Copy(f, r)
} |
hi! If i call zip.CreateHeader with local encoding, Any non ascii characters is recognized as valid utf8 in hasValidUTF8 function. package main
import (
"fmt"
"log"
"unicode/utf8"
"golang.org/x/text/encoding/japanese"
"golang.org/x/text/transform"
)
func main() {
name, _, err := transform.String(japanese.ShiftJIS.NewEncoder(), "日本語")
if err != nil {
log.Fatal(err)
}
for _, s := range name {
b := utf8.ValidRune(s)
fmt.Printf("%x = %v\n", s, b)
}
} output is
So zip.CreateHeader set bit 11 and results corrupted zip file. And Windows7's explorer without optional hotfix can't handle utf-8 encoded zip. |
I created it with 7-zip. I'm thinking it is not neccesary that Go can create zip file with non-utf-8. It's enough that Go can read/write utf-8 file and read non-utf-8 filename as bytes. |
I can agree. I think using utf-8 encoding with bit 11 is a standard way. |
Or one another idea. Add new field into ZipReader/ZipWriter like below.
This translate filename and comments in zip file. NameEncoder can be set japanese.ShiftJIS.NewEncoder().String. And NameDecoder can be set japanese.ShiftJIS.NewDecoder().String. @khiro, @dsnet How do you think? |
little mistake. Reader.File can not provide way to do it. So only NameEncoder for Writer can be useful. |
Change https://golang.org/cl/72410 mentions this issue: |
👍 for CL/72410 |
Change https://golang.org/cl/72430 mentions this issue: |
I post one another CL. Which do you like? |
CL/72410 CL/72430 I think most people are fine with utf-8 encoding. |
Change https://golang.org/cl/72791 mentions this issue: |
CL 39570 added support for automatically setting flag bit 11 to indicate that the filename and comment fields are encoded in UTF-8, which is (conventionally) the encoding using for most Go strings. However, the detection added is too lose for two reasons: * We need to ensure both fields are at least possibly UTF-8. That is, if any field is definitely not UTF-8, then we can't set the bit. * The utf8.ValidRune returns true for utf8.RuneError, which iterating over a Go string automatically returns for invalid UTF-8. Thus, we manually check for that value. Updates #22367 Updates #10741 Change-Id: Ie8aae388432e546e44c6bebd06a00434373ca99e Reviewed-on: https://go-review.googlesource.com/72791 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
Change https://golang.org/cl/75592 mentions this issue: |
The
archive/zip
can't support gbk filename on chinese windows.This is a simple test (
test-gbk-zip.go
):So i create CL9381 try to fix this poblem.
But CL9381 can only support utf8 encoding,
it can't create local gbk encoding zip.
I think we need new api for user defined filename encoding:
If the
decoder
orencoder
isnil
, the local encoding isutf8
.The
func (w *Writer) Create(name string) (io.Writer, error)
force use utf8 encoding.The text was updated successfully, but these errors were encountered: