compress/flate: decompression performance #23154

flx42 · 2017-12-16T03:43:00Z

What version of Go are you using (`go version`)?

go version go1.9.2 linux/amd64

What operating system and processor architecture are you using (`go env`)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/fabecassis/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build767750319=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

I noticed that the performance of "compress/gzip" is close to 2x slower than the gunzip(1) command-line tool. I've tested multiple methods of performing the decompression in this simple repository:
https://github.com/flx42/go-gunzip-bench
In summary, on a 333 MB gzipped file, for the read-decompress-write process:

The idiomatic go implementation takes 7.8s.
Piping to gunzip(1) takes 4.4s.
Using cgzip, a cgo wrapper on top on zlib, it takes 3.4s

Is there any room for improvement here for the standard implementation? The use case is decompression of downloaded Docker layers, in golang.

The text was updated successfully, but these errors were encountered:

flx42 · 2017-12-16T03:47:07Z

@klauspost FYI

agnivade · 2017-12-16T05:42:47Z

/cc @dsnet

dongweigogo · 2017-12-16T06:52:31Z

I see some complaints about it elsewhere.

dsnet · 2017-12-16T07:10:05Z

There's a large amount of area for improvement. I re-implemented the decompressor in github.com/dsnet/compress/flate. I have yet to merge my changes into the standard library, but it's about 1.6x to 2.3x faster. You can use that package if you want.

dsnet · 2017-12-16T07:22:57Z

The biggest bang for buck is to get around the io.ByteReader interface that the flate.Reader promises to uphold. This is problematic since it limits the input bandwidth to 200MB/s in the best case.

The approach I took was to special-case the fact that most io.Reader don't satisfy io.ByteReader and so will be automatically wrapped as a bufio.Reader. Thus, I type-assert the input reader to check whether it has the Peek and Discard methods and use those to process as much of the known input as possible. This allows you to take advantage of the new math/bits package to parallelize certain operations across multiple bytes. My implementation was written before math/bits, so I think it's possible to get yet more speed benefits than its current state.

Unfortunately, bytes.Buffer, bytes.Reader, and strings.Reader also implement io.ByteReader, so will also need some wrapping to be performant.

Some of the relevant performance changes:

flate: first draft of DEFLATE implementation. This takes advantage of the Peek trick.
flate: add TryWriteCopy to inline dictionary operations. I think I already upstreamed this optimization to the standard library.
internal/prefix: use wide integer loads when reading

flx42 · 2017-12-16T07:29:13Z

Thanks @dsnet, do you have any ETA yet on when you will start to integrate your optimizations into the standard library?

I know that more optimized packages exist, but projects might be reluctant to add an external dependency for this task.

dsnet · 2017-12-16T07:31:24Z

The biggest blocker is time on myself and a reviewer who's also familiar with RFC1951. It's possible to get a new implementation in for Go1.11, but who knows. Compression needs to be reviewed carefully.

\cc @mdempsky @nigeltao @klauspost. Any takers as reviewers?

klauspost · 2017-12-17T22:07:41Z

I have added a few minor comments to the commits listed above.

You could consider supporting the io.WriterTo interface. It avoids a copy in the Read function.

I assume you have already fuzzed it :)

nigeltao · 2017-12-21T00:49:16Z

It's not production-ready yet, or any time soon, but I have been working on https://github.com/google/wuffs, which is a new, memory-safe programming language that generates C code now, and could generate Go code (using package unsafe) in the future. The kicker is that despite being memory safe (e.g. all array accesses are bounds-checked), its DEFLATE implementation benchmarks more or less as fast as the C zlib library (see https://github.com/google/wuffs/blob/master/doc/benchmarks.md).

A rough estimate (different measurement frameworks, different test data) is that it is therefore about 4x faster at decoding than Go's compress/flate. On my desktop: Wuffs is 300 MB/s compared to "go test -test.bench=. compress/flate"'s 75 MB/s. There are a couple further optimization techniques (madler/zlib#292 that I proposed for C zlib but had to abort because of API backwards compatibility constraints) that I'd expect to give an extra 1.5x improvement to Wuffs' throughput.

It's still a work in progress, and won't help Go programs any time soon. But that's where I'm spending my limited time these days.

nigeltao · 2017-12-21T00:55:10Z

As for APIs (io.ByteReader, io.WriterTo, etc), Wuffs' API is similar to Go's golang.org/x/text/transform.Transformer. You wouldn't be able to just drop that interface (and its specific errors) into the stdlib per se, as the stdlib can't depend on x, but there may be some ideas in there worth considering.

flx42 · 2017-12-21T06:00:14Z

@nigeltao This project looks amazing. Even if it probably won't happen soon, having a new implementation under golang.org/x/ would already be a big step up to convince projects to adopt it, compared to non-official projects on GitHub.

flx42 · 2018-08-02T21:44:33Z

FWIW, I'm seeing a nice performance improvement with Go 1.11 beta2. Down to 6.8s, compared to 8s originally with 1.9.2.

Not sure if this is an optimization in the deflate code, or general compiler optimizations.

One of the largest resource hogs in umoci is compression-related, and it turns out[1] that Go's ordinary gzip implementation is nowhere near as fast as other modern gzip implementations. In order to help our users get more efficient builds, switch to a "pure-Go" implementation which has x64-specific optimisations. We cannot use zlib wrappers like [2] because of issues with "os/user" and static compilation (so we wouldn't be able to release binaries). This very simplified benchmark shows the positive difference when switching libraries (using "tensorflow/tensorflow:latest" as the image base): % time umoci unpack --image tensorflow:latest bundle # before 39.03user 7.58system 0:45.16elapsed % time umoci unpack --image tensorflow:latest bundle # after 40.89user 7.99system 0:34.89elapsed And repacking is similarly fast (in this benchmark the changes were installing FireFox and doing a repo refresh): % time umoci repack --image tensorflow:latest bundle # before 78.54user 13.71system 1:26.66elapsed % time umoci repack --image tensorflow:latest bundle # after 53.11user 4.87system 0:36.75elapsed [1]: golang/go#23154 [2]: https://github.com/vitessio/vitess/tree/v2.2/go/cgzip Signed-off-by: Aleksa Sarai <asarai@suse.de>

One of the largest resource hogs in umoci is compression-related, and it turns out[1] that Go's ordinary gzip implementation is nowhere near as fast as other modern gzip implementations. In order to help our users get more efficient builds, switch to a "pure-Go" implementation which has x64-specific optimisations. We cannot use zlib wrappers like [2] because of issues with "os/user" and static compilation (so we wouldn't be able to release binaries). This very simplified benchmark shows the positive difference when switching libraries (using "tensorflow/tensorflow:latest" as the image base): % time umoci unpack --image tensorflow:latest bundle # before 39.03user 7.58system 0:45.16elapsed % time umoci unpack --image tensorflow:latest bundle # after 40.89user 7.99system 0:34.89elapsed But the real benefit is when it comes to compression and repacking (in this benchmark the changes were installing FireFox and doing a repo refresh): % time umoci repack --image tensorflow:latest bundle # before 78.54user 13.71system 1:26.66elapsed % time umoci repack --image tensorflow:latest bundle # after 53.11user 4.87system 0:36.75elapsed That's almost 3x faster, just by having a more optimised compression library! [1]: golang/go#23154 [2]: https://github.com/vitessio/vitess/tree/v2.2/go/cgzip Signed-off-by: Aleksa Sarai <asarai@suse.de>

One of the largest resource hogs in umoci is compression-related, and it turns out[1] that Go's ordinary gzip implementation is nowhere near as fast as other modern gzip implementations. In order to help our users get more efficient builds, switch to a "pure-Go" implementation which has x64-specific optimisations. We cannot use zlib wrappers like [2] because of issues with "os/user" and static compilation (so we wouldn't be able to release binaries). This very simplified benchmark shows the positive difference when switching libraries (using "tensorflow/tensorflow:latest" as the image base): % time umoci unpack --image tensorflow:latest bundle # before 39.03user 7.58system 0:45.16elapsed % time umoci unpack --image tensorflow:latest bundle # after 40.89user 7.99system 0:34.89elapsed But the real benefit is when it comes to compression and repacking (in this benchmark the changes were installing FireFox and doing a repo refresh): % time umoci repack --image tensorflow:latest bundle # before 78.54user 13.71system 1:26.66elapsed % time umoci repack --image tensorflow:latest bundle # after 51.14user 3.25system 0:30.30elapsed That's almost 3x faster, just by having a more optimised compression library! [1]: golang/go#23154 [2]: https://github.com/vitessio/vitess/tree/v2.2/go/cgzip Signed-off-by: Aleksa Sarai <asarai@suse.de>

bradfitz added the Performance label Dec 16, 2017

bradfitz added this to the Unplanned milestone Dec 16, 2017

bradfitz added help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Dec 16, 2017

dsnet changed the title ~~compress/gzip: decompression performance~~ compress/flate: decompression performance Dec 16, 2017

dsnet removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 16, 2017

flx42 mentioned this issue Feb 8, 2018

Performance improvement for gunzip opencontainers/umoci#225

Closed

iand mentioned this issue Oct 5, 2018

proposal: image: decoding options #27830

Open

diamondburned mentioned this issue Apr 3, 2020

klauspost/compress is 21 megabytes nhooyr/websocket#220

Closed

CAFxX mentioned this issue Apr 5, 2021

Golang Support oracle/graal#381

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compress/flate: decompression performance #23154

compress/flate: decompression performance #23154

flx42 commented Dec 16, 2017

flx42 commented Dec 16, 2017

agnivade commented Dec 16, 2017

dongweigogo commented Dec 16, 2017

dsnet commented Dec 16, 2017

dsnet commented Dec 16, 2017 •

edited

flx42 commented Dec 16, 2017

dsnet commented Dec 16, 2017 •

edited

klauspost commented Dec 17, 2017 •

edited

nigeltao commented Dec 21, 2017

nigeltao commented Dec 21, 2017 •

edited

flx42 commented Dec 21, 2017

flx42 commented Aug 2, 2018

compress/flate: decompression performance #23154

compress/flate: decompression performance #23154

Comments

flx42 commented Dec 16, 2017

What version of Go are you using (go version)?

What operating system and processor architecture are you using (go env)?

What did you do?

flx42 commented Dec 16, 2017

agnivade commented Dec 16, 2017

dongweigogo commented Dec 16, 2017

dsnet commented Dec 16, 2017

dsnet commented Dec 16, 2017 • edited

flx42 commented Dec 16, 2017

dsnet commented Dec 16, 2017 • edited

klauspost commented Dec 17, 2017 • edited

nigeltao commented Dec 21, 2017

nigeltao commented Dec 21, 2017 • edited

flx42 commented Dec 21, 2017

flx42 commented Aug 2, 2018

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

dsnet commented Dec 16, 2017 •

edited

dsnet commented Dec 16, 2017 •

edited

klauspost commented Dec 17, 2017 •

edited

nigeltao commented Dec 21, 2017 •

edited