Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/distpack: offer Zstandard-compressed archives in addition to gzip #62446

Open
cespare opened this issue Sep 4, 2023 · 13 comments
Open

cmd/distpack: offer Zstandard-compressed archives in addition to gzip #62446

cespare opened this issue Sep 4, 2023 · 13 comments

Comments

@cespare
Copy link
Contributor

cespare commented Sep 4, 2023

This is inspired by #62445, where @dsnet proposes using zopfli to create ~6% smaller .gz downloads for Go release downloads.

As he writes in that issue:

zstd is well-positioned to take over as the defacto compression format, but that probably won't happen for another decade.

This proposal is to help usher in that future by offering zstd downloads in addition to gzip.

Here's a very quick'n'dirty comparison of compression performance on the same go1.21.0.linux-amd64.tar.gz archive Joe looked at:

file size cmp ratio vs. orig CPU time
.tar 223.0 MB
orig .gz 66.7 MB 29.9% 5-7s1
gzip -9 65.6 MB 29.4% -1.6% 20s
zopfli .gz 62.6 MB 28.1% -6.1% 15min on Joe's machine
zstd 3 63.6 MB 28.5% -4.6% 800ms
zstd 7 58.4 MB 26.2% -12.4% 2.4s
zstd 12 52.0 MB 23.3% -22.0% 7.7s
zstd 19 44.5 MB 20.0% -33.3% 64s

Also, decompressing the .zst archives takes about 4x less CPU time than decompressing the .gz archives on my machine.

If we offered .gz and .zst, people who care at all about size and speed can just use .zst and get a much bigger benefit than if we had zopfli-encoded .gzs.

Footnotes

  1. This is an estimate based on the fact that the file size falls between gzip -5 and gzip -6. I think that the actual release process uses compress/gzip which is quite a bit slower.

@ianlancetaylor
Copy link
Contributor

CC @golang/release

@heschi
Copy link
Contributor

heschi commented Sep 5, 2023

Those are pretty compelling numbers. At least on my machine, with tar 1.34, tar -xf works just as well on .tar.zst, so I don't see any downsides to doing this other than some UI clutter on go.dev/dl.

The implementation detail is not so trivial. Creating release archives is now the responsibility of https://cs.opensource.google/go/go/+/master:src/cmd/distpack/pack.go, and we want them to be completely deterministic, which means using a compression algorithm that we can hold constant for the lifetime of a Go release. (See the associated blog post). We'd need to pull a zstd implementation into the distribution, either as a standard library package (unlikely), an internal package we own (time-consuming to write, unless someone wants to contribute it), or vendor something that looks solid (seems fine?).

Overall I'm in favor of this, it seems like a moderate amount of effort and pretty much a pure win for users.

@heschi heschi changed the title proposal: x/build/cmd/relui: offer Zstandard-compressed archives in addition to gzip proposal: cmd/distpack: offer Zstandard-compressed archives in addition to gzip Sep 5, 2023
@dsnet
Copy link
Member

dsnet commented Sep 5, 2023

We'd need to pull a zstd implementation into the distribution, either as a standard library package (unlikely), an internal package we own

Alternatively, rather than freezing it at the Go package layer, you could rely on os/exec, and freeze it at the binary level of which zstd (or zopfli for #62445) binary you use.

@ianlancetaylor
Copy link
Contributor

@heschi Just a note that there is a package that we could vendor if we go that route: github.com/klauspost/compress/zstd.

@klauspost
Copy link
Contributor

FWIW github.com/klauspost/compress/zstd compresses it to 43873902 bytes with the best compression setting. That is 43.87MB in ~8.3s.

But to be fair it does have a bigger window size. Without the same it is 49.83MB - but there isn't too much reason to have the small window, if you are that resource constrained just use gz.

@rsc
Copy link
Contributor

rsc commented Sep 11, 2023

As Heschi notes, the relevant code needs to live or be vendored into the Go tree so that we can reproduce the archives bit-for-bit even far into the future. We could do that, but it increases the cost. Shelling out to a separate tool that isn't versioned in the Go repo is not an option. We'd also have to update gorebuild to verify zstd as well.

In the long term we may end up with zstd vendored anyway, or perhaps even added to the standard library. I'm OK with vendoring it for use in cmd/dist.

That said, it will require work on the release team's part, and we may not have bandwidth for reviewing and deploying such a change in the near future. But in the abstract it sounds reasonable to me.

@heschi
Copy link
Contributor

heschi commented Sep 11, 2023

If someone's interested in moving this forward, I think the steps are to vendor a zstd implementation, add support to cmd/distpack, and update our release automation to also publish the new files. If someone does the first two pieces I think the release team can find the time to do the latter.

There are two other kinds of artifacts not covered by this proposal: Windows distribution archives and toolchain module files, both .zip files. Wikipedia says that zip standardized zstd support a few years ago, so it's theoretically possible to make this change to both.

For Windows, it would be interesting to survey implementations and see how usable a more advanced compression would be.

For the toolchain module files, we'd need to teach the Go command to understand them, and (per discussion with Russ) probably start publishing a second series of archives, v0.0.2 rather than v0.0.1. Since toolchain upgrades will increasingly be done via the Go command, these are arguably the most important to optimize. But perhaps we should start by getting experience with the release archives.

@klauspost
Copy link
Contributor

klauspost commented Sep 14, 2023

Wikipedia says that zip standardized zstd support a few years ago, so it's theoretically possible to make this change to both.

Yeah; No. Using the Windows 11 built-in extraction tool s with zstd in a ZIP file just gives an Error 0x80004005: Unspecified error. 90% of users will use that for extraction.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Dec 6, 2023

Are there any objections to adding this?

@rsc
Copy link
Contributor

rsc commented Dec 14, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

Add .tar.zst archives anywhere we generate .tar.gz archives in cmd/distpack.
We would not add zstd-enabled zip files because windows zip readers can’t handle them.

In the longer term, this could be a step toward zstd-compressed modules,
but that would require changing many more moving parts and is not in scope
for this specific proposal.

@rsc
Copy link
Contributor

rsc commented Dec 21, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

Add .tar.zst archives anywhere we generate .tar.gz archives in cmd/distpack.
We would not add zstd-enabled zip files because windows zip readers can’t handle them.

In the longer term, this could be a step toward zstd-compressed modules,
but that would require changing many more moving parts and is not in scope
for this specific proposal.

@rsc rsc changed the title proposal: cmd/distpack: offer Zstandard-compressed archives in addition to gzip cmd/distpack: offer Zstandard-compressed archives in addition to gzip Dec 21, 2023
@rsc rsc modified the milestones: Proposal, Backlog Dec 21, 2023
@mvdan
Copy link
Member

mvdan commented Feb 13, 2024

In the longer term, this could be a step toward zstd-compressed modules,
but that would require changing many more moving parts and is not in scope
for this specific proposal.

Out of curiosity, would the thinking there be to keep the module archives as ZIP, but swap the compression algorithm to zstd, or to switch to something else entirely like .tar.zst?

The latter is more standard in terms of zstd compression, and will give a better compression ratio since all files are compressed together, but we would lose the ablity to seek through files without decompressing. I suspect that's not a problem, given that GOPROXY serves go.mod files separately, and GOMODCACHE already extracts the entire module archives for use in cmd/go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Planned
Status: Accepted
Development

No branches or pull requests

8 participants