Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/gzip: ignore trailing garbage data #47809

Closed
kunsonx opened this issue Aug 19, 2021 · 12 comments
Closed

compress/gzip: ignore trailing garbage data #47809

kunsonx opened this issue Aug 19, 2021 · 12 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@kunsonx
Copy link

kunsonx commented Aug 19, 2021

What version of Go are you using (go version)?

$ go version
go version go1.17 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOOS="darwin"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.17.0/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.17.0/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.17"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/h1/qnz0z93n4_7fymcgs9wd8ls00000gn/T/go-build2980224056=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I trying to use go compress/gzip package to unzip this file.
But program pop me an error and then use gzip -d -k server.data I can get the actually data.

gzip -d -k server.data.gz 
gzip: stdin: decompression OK, trailing garbage ignored.

Then I looking into what is happened then I found this compress file include extra data.
Look back in go standard package code I did not see any code handle compress file extra data instead trying to use remaining data for next compress file (when MultipleStream is True).

Gzip file format documentation reference (from stackoverflow): https://stackoverflow.com/questions/4928560/how-can-i-work-with-gzip-files-which-contain-extra-data

Fully example code here: https://github.com/kunsonx/go-reproduce-data/blob/master/test-gzip-without-handle-extra-data.go

What did you expect to see?

No errors pop. compress file extra data handled.

What did you see instead?

error : "gzip: invalid header"

@mknyszek mknyszek changed the title compress/gzip : decompress without handle extract data. compress/gzip: ignore trailing garbage data Aug 19, 2021
@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 19, 2021
@mknyszek mknyszek added this to the Backlog milestone Aug 19, 2021
@mknyszek
Copy link
Contributor

According to the documentation for compress/gzip.Reader:

In general, a gzip file can be a concatenation of gzip files, each with its own header. Reads from the Reader return the concatenation of the uncompressed data of each. Only the first header is recorded in the Reader fields.

That probably explains why it says there's an invalid header. This lines up with what the Stack Overflow post suggests for Python.

It may be that this is a bug, at least according to the Stack Overflow post, but I'm not sure. The link in that post re: the spec is broken. Looking at RFC 1952 which the package claims to follow, I don't see mention of this behavior. If you could provide a reference, that would be great.

CC @dsnet via https://dev.golang.org/owners ?

I guess also CC @mdempsky who is the secondary owner on a bunch of other compress packages.

@dsnet
Copy link
Member

dsnet commented Aug 19, 2021

This is working as intended.

The grammar in RFC 1952 does not allow for arbitrary junk at the end since a gzip file must be a series of valid members (section 2.2). It explicitly says that the "members simply appear one after another in the file, with no additional information before, between, or after them." This implies that it is not legal to have trailing junk.

If you would like the ability to read a valid gzip member and specially handle non-gzip data after the member, then you can use the gzip.Reader.Multistream feature to read the file one member at a time.

@kunsonx
Copy link
Author

kunsonx commented Aug 19, 2021

Just test this compress from gzip/7zip other decompress software it can decompress whole file succeed.

@mknyszek I will looking for it when available.

@dsnet
Copy link
Member

dsnet commented Aug 19, 2021

Regardless of what other popular software do, their behavior is clearly non-compliant with the specification.

@kunsonx
Copy link
Author

kunsonx commented Aug 19, 2021

OK. Good to know that. It according RFC 1952 may be better. Thanks!

@kunsonx
Copy link
Author

kunsonx commented Aug 19, 2021

It have any plan for support other compress/decompress algorithm?

@dsnet
Copy link
Member

dsnet commented Aug 19, 2021

It have any plan for support other compress/decompress algorithm?

Can you explain what you mean?

https://pkg.go.dev/compress shows the list of the compression formats that the Go standard library supports. The bar for a new format to be added is very high. It would have to be very popular format and someone needs to be willing to own it for the long term. The formats I can potentially see being added are Brotli and Zstandard, but I think their popularity still has a ways to go before it would considered for being added. Even then, there's still the question of who's going to maintain it.

@kunsonx
Copy link
Author

kunsonx commented Aug 19, 2021

It have any plan for support other compress/decompress algorithm?

Can you explain what you mean?

https://pkg.go.dev/compress shows the list of the compression formats that the Go standard library supports. The bar for a new format to be added is very high. It would have to be very popular format and someone needs to be willing to own it for the long term. The formats I can potentially see being added are Brotli and Zstandard, but I think their popularity still has a ways to go before it would considered for being added. Even then, there's still the question of who's going to maintain it.

For extract question without this issue. I'm looking for LZ4 or ZStandard algorithms. It's any possible to become contributor with those compress packages?

@dsnet
Copy link
Member

dsnet commented Aug 19, 2021

Any new formats to be added needs to be approved as a Go proposal. I doubt lz4 would ever be added. Zstandard is still in my opinion too young.

@kunsonx
Copy link
Author

kunsonx commented Aug 19, 2021

It have any plan for support other compress/decompress algorithm?

Can you explain what you mean?

https://pkg.go.dev/compress shows the list of the compression formats that the Go standard library supports. The bar for a new format to be added is very high. It would have to be very popular format and someone needs to be willing to own it for the long term. The formats I can potentially see being added are Brotli and Zstandard, but I think their popularity still has a ways to go before it would considered for being added. Even then, there's still the question of who's going to maintain it.

And compare with other language (Like Java/Python). The also support rich compress algorithm on standard library supports. In my opinion.

@bjorndm
Copy link

bjorndm commented Sep 3, 2021

@kunsonx

Both compressions already exist as pure Go libraries:
https://github.com/pierrec/lz4
https://github.com/KillingSpark/sparkzstd

So I think this is https://golang.org/doc/faq#x_in_std.

@seankhliao
Copy link
Member

Closing as this is working as intended

Please file a proposals if you think something should be added to the stdlib

@golang golang locked and limited conversation to collaborators Sep 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants