Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/zip: Create() does not overwrite duplicate filenames, leading to unnecessary bloating of the resulting ZIP file #66810

Open
wjkoh opened this issue Apr 13, 2024 · 3 comments
Labels
Documentation help wanted NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@wjkoh
Copy link

wjkoh commented Apr 13, 2024

Go version

go version go1.22.2 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/wjkoh/Library/Caches/go-build'
GOENV='/Users/wjkoh/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/wjkoh/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/wjkoh/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.22.2/libexec'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.22.2/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.2'
GCCGO='gccgo'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/y1/jsmqrm1s6y55qh9s2rmw39lh0000gn/T/go-build548572318=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

I called Create() with the same filename multiple times. https://go.dev/play/p/hu1QDLK7BnT

What did you see happen?

I have noticed that when using Create() with the same filename multiple times, the resulting ZIP file becomes increasingly larger. This becomes a problem when trying to extract the files using common unzip utilities such as the default archiver on MacOS or the unzip command. These utilities are unable to handle duplicate filenames and will only output a single file when there are multiple files with the same name in a ZIP file. This is confusing and inefficient at the same time.

What did you expect to see?

In my opinion, there are two potential solutions to this issue. First, Create() could prevent multiple calls with the same filename. Alternatively, it could overwrite the previously added file with the same name using the new file content, rather than simply appending it. However, I believe this may cause unnecessary overhead. In such a scenario, adding a caution to the documentation of Create() would be beneficial for users.

@ianlancetaylor
Copy link
Contributor

I think we should just document this.

@cherrymui
Copy link
Member

cc @dsnet @bradfitz

@cherrymui cherrymui added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Apr 15, 2024
@cherrymui cherrymui added this to the Backlog milestone Apr 15, 2024
@jfrech
Copy link

jfrech commented Apr 28, 2024

Disallowing said "unnecessary bloating" would both be backwards-incompatible and break archive/tar symmetry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation help wanted NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

4 participants