Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/zip: does not allow local filename encoding #22367

Closed
k-hiro opened this issue Oct 20, 2017 · 3 comments
Closed

archive/zip: does not allow local filename encoding #22367

k-hiro opened this issue Oct 20, 2017 · 3 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@k-hiro
Copy link

k-hiro commented Oct 20, 2017

I making zip file with local filename encoding for compatibility on japanese Windows.
Below codes was working fine with go 1.8.4 windows/amd64.

package main

import (
  "archive/zip"
  "fmt"
  "os"

  "golang.org/x/text/encoding/japanese"
  "golang.org/x/text/transform"
)

func main() {
  f, err := os.Create("test.zip")
  if err != nil {
    panic(err)
  }

  myZip := zip.NewWriter(f)
  defer myZip.Close()

  jpnName, _, err := transform.String(japanese.ShiftJIS.NewEncoder(), "日本語.txt")
  if err != nil {
    panic(err)
  }

  header := zip.FileHeader{
    Name:   jpnName,
    Method: zip.Deflate,
  }

  writer, err := myZip.CreateHeader(&header)
  if err != nil {
    panic(err)
  }

  fmt.Fprintln(writer, "hello")
}

With go 1.9.1 windows/amd64, CreateHeader() overwrites utf8 flag to 1. So there is no way to create zip with local encoding.

Even today, Windows 7 can't extract utf-8 encoded zip by default.
User needs to manually download and install HOTFIX to extract correctly.

Windows 8 and later can extract utf-8 encoded zip without problems.
But Windows zip tool doesn't make any utf-8 encoded zip file. it always uses local encoding.
If you try to compress files that includes unicode only characters like emoji in filename, error dialog popups and it says like "you can't use this filename, please change it."
Also 7zip for Windows uses local encoding by default. you can make utf-8 encoded zip with -mcu=on option.

I think current implementation is intended to set utf8 flag only if filename is valid utf-8 string. But hasValidUTF8() function returns true for non ascii string even if its not valid utf-8.

func hasValidUTF8(s string) bool {
n := 0
for _, r := range s {
// By default, ZIP uses CP437, which is only identical to ASCII for the printable characters.
if r < 0x20 || r >= 0x7f {
if !utf8.ValidRune(r) {
return false
}
n++
}
}
return n > 0
}

see also #10741

@ianlancetaylor ianlancetaylor added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Oct 20, 2017
@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Oct 20, 2017
@ianlancetaylor
Copy link
Contributor

CC @mattn

@dsnet
Copy link
Member

dsnet commented Oct 20, 2017

Closing as duplicate of #10741. I have finished all planned improvements to archive/tar. I'll look to improving and cleaning up archive/zip. Probably won't be able to get much done in the remainder of this cycle, though.

Please continue discussion on #10741

@gopherbot
Copy link

Change https://golang.org/cl/72791 mentions this issue: archive/zip: restrict UTF8 detection for comment and name fields

gopherbot pushed a commit that referenced this issue Oct 25, 2017
CL 39570 added support for automatically setting flag bit 11 to
indicate that the filename and comment fields are encoded in UTF-8,
which is (conventionally) the encoding using for most Go strings.

However, the detection added is too lose for two reasons:
* We need to ensure both fields are at least possibly UTF-8.
That is, if any field is definitely not UTF-8, then we can't set the bit.
* The utf8.ValidRune returns true for utf8.RuneError, which iterating
over a Go string automatically returns for invalid UTF-8.
Thus, we manually check for that value.

Updates #22367
Updates #10741

Change-Id: Ie8aae388432e546e44c6bebd06a00434373ca99e
Reviewed-on: https://go-review.googlesource.com/72791
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Oct 23, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

4 participants