Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/pargzip: Writer should output single-stream gzip files #19052

Closed
nvx opened this issue Feb 13, 2017 · 15 comments
Closed

x/build/pargzip: Writer should output single-stream gzip files #19052

nvx opened this issue Feb 13, 2017 · 15 comments

Comments

@nvx
Copy link

nvx commented Feb 13, 2017

The last 4 bytes of a gzipped file (the file footer) contain the uncompressed data size in bytes.

Until "go1.7.2.linux-amd64.tar.gz", this value was correct for the golang distributed linux binaries, for example:

$ for FILE in go1.7.*.linux-amd64.tar.gz; do echo Size of $FILE from header "\t" $(od -A none -t u4 --skip-bytes=$(expr $(wc -c "$FILE" | cut -f1 -d' ') - 4) "$FILE"); echo gunzipped size of $FILE "\t" $(gzip -dc "$FILE" | wc -c) ; done
Size of go1.7.1.linux-amd64.tar.gz from header   248978432
gunzipped size of go1.7.1.linux-amd64.tar.gz     248978432

Size of go1.7.2.linux-amd64.tar.gz from header   510464
gunzipped size of go1.7.2.linux-amd64.tar.gz     249022976

Size of go1.7.3.linux-amd64.tar.gz from header   492032
gunzipped size of go1.7.3.linux-amd64.tar.gz     249004544

Size of go1.7.4.linux-amd64.tar.gz from header   961536
gunzipped size of go1.7.4.linux-amd64.tar.gz     250522624

Size of go1.7.5.linux-amd64.tar.gz from header   637440
gunzipped size of go1.7.5.linux-amd64.tar.gz     251247104

Notice 1.7.1 the header size agrees with the actual decompressed size, while all versions after that the value does not match.

The issue also seems to affect 1.6.4 release, but not 1.6.3

Size of go1.6.3.linux-amd64.tar.gz from header   323064832
gunzipped size of go1.6.3.linux-amd64.tar.gz     323064832

Size of go1.6.4.linux-amd64.tar.gz from header   651264
gunzipped size of go1.6.4.linux-amd64.tar.gz     324661248

The files sha256 hashes have been verified to eliminate download corruption (except for 1.7.2 which is not listed on the download page?)

43ad621c9b014cde8db17393dc108378d37bc853aa351a6c74bf6432c1bbd182  go1.7.1.linux-amd64.tar.gz
9985da3f056093ff7a7e0814e14168112ac66c3585c56969c1d1ab2b72c36fdd  go1.7.2.linux-amd64.tar.gz
508028aac0654e993564b6e2014bf2d4a9751e3b286661b0b0040046cf18028e  go1.7.3.linux-amd64.tar.gz
47fda42e46b4c3ec93fa5d4d4cc6a748aa3f9411a2a2b7e08e3a6d80d753ec8b  go1.7.4.linux-amd64.tar.gz
2e4dd6c44f0693bef4e7b46cc701513d74c3cc44f2419bf519d7868b12931ac3  go1.7.5.linux-amd64.tar.gz

While some gzip tools do not rely on this value (for example GNU gzip), other tools do (for example JZlib) resulting in failure to decompress these gzipped files (as per https://issues.jenkins-ci.org/browse/JENKINS-39515).

Looking at the hexed values for 1.7.2 for example, the header value is 0x0007CA00, while the actual value is 0x0ED7CA00, this looks like the most significant 3 nibbles (12 bits) are being incorrectly set to 0.

@bradfitz bradfitz added this to the Go1.8Maybe milestone Feb 13, 2017
@bradfitz
Copy link
Contributor

@dsnet, can you take a look?

@nvx
Copy link
Author

nvx commented Feb 13, 2017

Of interested, this appears to have been affecting darwin builds even earlier than this.

go1.4.2.darwin-amd64-osx10.8.tar.gz and earlier appears to be fine, while go1.4.3.darwin-amd64.tar.gz is not.
go1.5.1.darwin-amd64.tar.gz (and 1.5) is also fine, while go1.5.1.darwin-amd64.tar.gz and later are not.

@josharian
Copy link
Contributor

@nvx just to set expectations, any fix here will only take place going forward. The existing binaries will not be modified, as doing so causes a lot of pain for distro managers who hardcode expected hashes.

@nvx
Copy link
Author

nvx commented Feb 13, 2017

Yeah that's understood and expected. The additional information about the darwin builds was only to assist in tracking down the source of the issue if there was a particular change at that time that may have been the culprit.

It would be nice if once fixed a new build could be made (even if no changes since 1.7.5 other than the fixed gzip archive) unless 1.8 is due to come out not long after. Not really concerned about old versions as long as the latest works. :)

@dsnet
Copy link
Member

dsnet commented Feb 13, 2017

This is working as intending. I believe the file is generated by golang.org/build/pargzip, which breaks the input into 1MiB chunks and compresses each chunk independently as a gzip file and then concatenates the outputs together.

This is an entirely valid use of gzip as specified according to RFC 1952, section 2.2:

A gzip file consists of a series of "members" (compressed data
sets). The format of each member is specified in the following
section. The members simply appear one after another in the file,
with no additional information before, between, or after them.

@dsnet
Copy link
Member

dsnet commented Feb 13, 2017

That being said, it may be worth changing pargzip to do chunking at the DEFLATE layer, rather than the GZIP layer. This can guarantee that there is only one gzip file, which may be helpful to Go users.

@bradfitz, do you want me to make the suggested change to pargzip?

@dsnet dsnet changed the title golang 1.7.2+ linux distribution gzip footer has incorrect size x/build/pargzip: Writer should output single-stream gzip files Feb 13, 2017
@dsnet
Copy link
Member

dsnet commented Feb 13, 2017

@nvx, I should also note that this is clearly a bug in JZlib if they are not able to handle multistream gzip files.

@minux
Copy link
Member

minux commented Feb 13, 2017 via email

@rsc
Copy link
Contributor

rsc commented Feb 13, 2017

@dsnet, chunking in DEFLATE has its subtleties too. You're going to need some way to synchronize the blocks to byte offsets. The obvious way is a Z_SYNC_FLUSH but those don't usually appear in gzipped files either, so it will still look a little odd.

I tend to agree with Minux: the risk of incompatibility seems much higher here than the benefit. Even Go hasn't always handled these kinds of details correctly.

@bradfitz
Copy link
Contributor

I agree with @minux too. It was never my intention that pargzip files would make it to end users. It was just meant to speed up builds.

@bradfitz bradfitz assigned bradfitz and unassigned dsnet Feb 13, 2017
@gopherbot
Copy link

CL https://golang.org/cl/36910 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/36913 mentions this issue.

gopherbot pushed a commit to golang/build that referenced this issue Feb 13, 2017
Updates golang/go#19052

Change-Id: I453dd1cc6016bf137f25f4e5b7a772b2c1888279
Reviewed-on: https://go-review.googlesource.com/36913
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
@MicahLC
Copy link

MicahLC commented Feb 24, 2017

@josharian : I understand not updating existing binaries in place, but would it be possible for there to be updated binaries hosted somewhere official so that applications that encountered this issue can use them, if they need older binaries?

@dsnet
Copy link
Member

dsnet commented Feb 24, 2017

@MicahLC, is this a real issue or hypothetical? The latest Go1.8 release does not have this problem. So there's only an issue if someone must use an older version and their package manager can't handle multistream gzip files.

@MicahLC
Copy link

MicahLC commented Feb 25, 2017

@dsnet, since Go1.8 doesn't have the issue, most likely hypothetical. There were numerous users of the Jenkins Go plugin that were blocked by this issue when trying to use the previous binaries, but assuming they can all move forwards to Go1.8 (which, given Go's good backwards compatibility, they should be able to do), they will be unblocked.

@golang golang locked and limited conversation to collaborators Feb 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants