Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/flate: Add optimizations and re+balance. #14258

Closed
klauspost opened this issue Feb 8, 2016 · 5 comments
Closed

compress/flate: Add optimizations and re+balance. #14258

klauspost opened this issue Feb 8, 2016 · 5 comments

Comments

@klauspost
Copy link
Contributor

We should merge optimizations added to https://github.com/klauspost/compress

Typical speedup is around 1.7x for web content, when maintaining roughly the same compression.

With re-balanced compression settings, the default level will be 3.1x faster at a 7% compression loss. Mixed content (disk images) show a 3.1x speedup with a 3% compression loss.

Extensive benchmarks in this google spreadsheet.

@minux minux added this to the Go1.7 milestone Feb 8, 2016
@gopherbot
Copy link

CL https://golang.org/cl/19336 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/20513 mentions this issue.

gopherbot pushed a commit that referenced this issue Mar 11, 2016
Part 1 of optimizing the deflater. This optimizes the bitwriter by:

* Removing allocations.
* Storing compound values for bit codes instead of 2 separate tables.
* Accumulate 48 bits between writes instead of 24.
* Inline bit flushing.

This also contains code that will be used in later CL's
(writeBlockDynamic, writeBlockHuff).

Tests for Huffman bit writer encoding regressions has been added.

name                       old speed      new speed      delta
EncodeDigitsSpeed1e4-4     19.3MB/s ± 1%  21.6MB/s ± 1%  +11.77%
EncodeDigitsSpeed1e5-4     25.0MB/s ± 6%  30.7MB/s ± 1%  +22.70%
EncodeDigitsSpeed1e6-4     28.2MB/s ± 1%  32.3MB/s ± 1%  +14.64%
EncodeDigitsDefault1e4-4   13.3MB/s ± 0%  14.2MB/s ± 1%   +7.07%
EncodeDigitsDefault1e5-4   6.43MB/s ± 1%  6.64MB/s ± 1%   +3.27%
EncodeDigitsDefault1e6-4   5.81MB/s ± 0%  5.85MB/s ± 1%   +0.69%
EncodeDigitsCompress1e4-4  13.2MB/s ± 0%  14.4MB/s ± 0%   +9.10%
EncodeDigitsCompress1e5-4  6.40MB/s ± 1%  6.61MB/s ± 0%   +3.20%
EncodeDigitsCompress1e6-4  5.80MB/s ± 1%  5.90MB/s ± 1%   +1.64%
EncodeTwainSpeed1e4-4      18.4MB/s ± 1%  20.7MB/s ± 1%  +12.72%
EncodeTwainSpeed1e5-4      27.7MB/s ± 1%  31.0MB/s ± 1%  +11.78%
EncodeTwainSpeed1e6-4      29.1MB/s ± 0%  32.9MB/s ± 2%  +13.25%
EncodeTwainDefault1e4-4    12.4MB/s ± 0%  13.1MB/s ± 1%   +5.88%
EncodeTwainDefault1e5-4    7.52MB/s ± 1%  7.83MB/s ± 0%   +4.19%
EncodeTwainDefault1e6-4    7.08MB/s ± 1%  7.26MB/s ± 0%   +2.54%
EncodeTwainCompress1e4-4   12.0MB/s ± 1%  12.8MB/s ± 1%   +6.70%
EncodeTwainCompress1e5-4   5.96MB/s ± 1%  6.16MB/s ± 0%   +3.27%
EncodeTwainCompress1e6-4   5.37MB/s ± 0%  5.39MB/s ± 1%   +0.47%

>Allocations:

benchmark                              old allocs     new allocs     delta
BenchmarkEncodeDigitsSpeed1e4-4        50             0              -100.00%
BenchmarkEncodeDigitsSpeed1e5-4        110            0              -100.00%
BenchmarkEncodeDigitsSpeed1e6-4        1032           0              -100.00%
BenchmarkEncodeDigitsDefault1e4-4      56             0              -100.00%
BenchmarkEncodeDigitsDefault1e5-4      120            0              -100.00%
BenchmarkEncodeDigitsDefault1e6-4      966            0              -100.00%
BenchmarkEncodeDigitsCompress1e4-4     56             0              -100.00%
BenchmarkEncodeDigitsCompress1e5-4     120            0              -100.00%
BenchmarkEncodeDigitsCompress1e6-4     966            0              -100.00%
BenchmarkEncodeTwainSpeed1e4-4         58             0              -100.00%
BenchmarkEncodeTwainSpeed1e5-4         132            0              -100.00%
BenchmarkEncodeTwainSpeed1e6-4         1082           0              -100.00%
BenchmarkEncodeTwainDefault1e4-4       52             0              -100.00%
BenchmarkEncodeTwainDefault1e5-4       126            0              -100.00%
BenchmarkEncodeTwainDefault1e6-4       886            0              -100.00%
BenchmarkEncodeTwainCompress1e4-4      52             0              -100.00%
BenchmarkEncodeTwainCompress1e5-4      120            0              -100.00%
BenchmarkEncodeTwainCompress1e6-4      880            0              -100.00%

benchmark                              old bytes     new bytes     delta
BenchmarkEncodeDigitsSpeed1e4-4        4288          2             -99.95%
BenchmarkEncodeDigitsSpeed1e5-4        8896          15            -99.83%
BenchmarkEncodeDigitsSpeed1e6-4        84098         153           -99.82%
BenchmarkEncodeDigitsDefault1e4-4      4480          3             -99.93%
BenchmarkEncodeDigitsDefault1e5-4      9216          76            -99.18%
BenchmarkEncodeDigitsDefault1e6-4      73920         768           -98.96%
BenchmarkEncodeDigitsCompress1e4-4     4480          3             -99.93%
BenchmarkEncodeDigitsCompress1e5-4     9216          76            -99.18%
BenchmarkEncodeDigitsCompress1e6-4     73920         768           -98.96%
BenchmarkEncodeTwainSpeed1e4-4         4544          2             -99.96%
BenchmarkEncodeTwainSpeed1e5-4         9600          15            -99.84%
BenchmarkEncodeTwainSpeed1e6-4         77633         153           -99.80%
BenchmarkEncodeTwainDefault1e4-4       4352          3             -99.93%
BenchmarkEncodeTwainDefault1e5-4       9408          76            -99.19%
BenchmarkEncodeTwainDefault1e6-4       65984         768           -98.84%
BenchmarkEncodeTwainCompress1e4-4      4352          3             -99.93%
BenchmarkEncodeTwainCompress1e5-4      9216          76            -99.18%
BenchmarkEncodeTwainCompress1e6-4      65792         768           -98.83%

Updates #14258

Change-Id: Ibaa97b9619743ad623094727228eb2ada1ec7f1f
Reviewed-on: https://go-review.googlesource.com/19336
Reviewed-by: Nigel Tao <nigeltao@golang.org>
Reviewed-by: Joe Tsai <joetsai@digital-static.net>
Run-TryBot: Joe Tsai <joetsai@digital-static.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@rsc
Copy link
Contributor

rsc commented May 18, 2016

@nigeltao, @klauspost, what is left here? Did the rebalancing happen?

@nigeltao
Copy link
Contributor

nigeltao commented May 18, 2016

Some optimizations landed. Others didn't make the code freeze deadline. @klauspost would probably know better than me what's out that he'd still like to bring over from https://github.com/klauspost/compress

Re-balancing, in terms of tweaking what parameter settings correspond to "compression level == 6", didn't happen, but when compared to Go 1.6, I believe that compression is still generally faster on GOARCH=amd64 at the default setting. I don't have any numbers on me right now.

Also, "compression level == 1" aka "best speed" now uses the Snappy algorithm, for higher throughput but larger encoded size. That encoding size can come down with further algorithmic changes, but there was a correctness bug found in the fancier algorithm just before code freeze, so that will have to wait for Go 1.8.

Issue #15622 is that compress/flate's minimum LZW-style match length increased from 3 bytes to 4 bytes, as 32-bit ops can be a lot faster. However, compressing RGB images can often want 3-byte matches, corresponding to an RGB tuple. We're still investigating how big a regression this is: whether it's "TIFF and PNG images are consistently e.g. 10% to 20% larger" or if it's "some are larger, some are smaller, and it's mostly a wash, although you can always find examples where it's significantly worse", or if it's somewhere in between. The current proposal is to re-introduce 3-byte matches for higher levels of the compression size-vs-throughput knob, although that will be more (as-yet-unwritten-and-untested) code.

I don't think there are any open CLs. Please correct me if I'm wrong.

@klauspost
Copy link
Contributor Author

I think the overall goal has been achieved, with some details left.

Did the rebalancing happen?

No - I haven't tested the current build, but I suspect it could need a re-balance and that "standard" is quite close to "best" in terms of both compression and speed.

There are a few things that I would like to look at in the future, mainly:

  • Split lazy/non-lazy matching, and add 3-length matching to the lazy matcher as an elegant fix for compress/flate: add 3-byte matching for higher compression levels #15622. Not sure if Nigel agrees through.
  • Re-balance. If lazy is made capable of 3 byte matches, that will be even more relevant, and "lazy" belongs to level 7-9 or thereabout.
  • The token storage can be made to avoid bounds checks. This is trivial and can be quite nicely implmented, but I couldn't make the 1.7 deadline.

I don't think there are any open CLs. Please correct me if I'm wrong.

Correct, AFAIK.

@golang golang locked and limited conversation to collaborators May 18, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants