New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compress/flate: Add optimizations and re+balance. #14258
Comments
CL https://golang.org/cl/19336 mentions this issue. |
CL https://golang.org/cl/20513 mentions this issue. |
Part 1 of optimizing the deflater. This optimizes the bitwriter by: * Removing allocations. * Storing compound values for bit codes instead of 2 separate tables. * Accumulate 48 bits between writes instead of 24. * Inline bit flushing. This also contains code that will be used in later CL's (writeBlockDynamic, writeBlockHuff). Tests for Huffman bit writer encoding regressions has been added. name old speed new speed delta EncodeDigitsSpeed1e4-4 19.3MB/s ± 1% 21.6MB/s ± 1% +11.77% EncodeDigitsSpeed1e5-4 25.0MB/s ± 6% 30.7MB/s ± 1% +22.70% EncodeDigitsSpeed1e6-4 28.2MB/s ± 1% 32.3MB/s ± 1% +14.64% EncodeDigitsDefault1e4-4 13.3MB/s ± 0% 14.2MB/s ± 1% +7.07% EncodeDigitsDefault1e5-4 6.43MB/s ± 1% 6.64MB/s ± 1% +3.27% EncodeDigitsDefault1e6-4 5.81MB/s ± 0% 5.85MB/s ± 1% +0.69% EncodeDigitsCompress1e4-4 13.2MB/s ± 0% 14.4MB/s ± 0% +9.10% EncodeDigitsCompress1e5-4 6.40MB/s ± 1% 6.61MB/s ± 0% +3.20% EncodeDigitsCompress1e6-4 5.80MB/s ± 1% 5.90MB/s ± 1% +1.64% EncodeTwainSpeed1e4-4 18.4MB/s ± 1% 20.7MB/s ± 1% +12.72% EncodeTwainSpeed1e5-4 27.7MB/s ± 1% 31.0MB/s ± 1% +11.78% EncodeTwainSpeed1e6-4 29.1MB/s ± 0% 32.9MB/s ± 2% +13.25% EncodeTwainDefault1e4-4 12.4MB/s ± 0% 13.1MB/s ± 1% +5.88% EncodeTwainDefault1e5-4 7.52MB/s ± 1% 7.83MB/s ± 0% +4.19% EncodeTwainDefault1e6-4 7.08MB/s ± 1% 7.26MB/s ± 0% +2.54% EncodeTwainCompress1e4-4 12.0MB/s ± 1% 12.8MB/s ± 1% +6.70% EncodeTwainCompress1e5-4 5.96MB/s ± 1% 6.16MB/s ± 0% +3.27% EncodeTwainCompress1e6-4 5.37MB/s ± 0% 5.39MB/s ± 1% +0.47% >Allocations: benchmark old allocs new allocs delta BenchmarkEncodeDigitsSpeed1e4-4 50 0 -100.00% BenchmarkEncodeDigitsSpeed1e5-4 110 0 -100.00% BenchmarkEncodeDigitsSpeed1e6-4 1032 0 -100.00% BenchmarkEncodeDigitsDefault1e4-4 56 0 -100.00% BenchmarkEncodeDigitsDefault1e5-4 120 0 -100.00% BenchmarkEncodeDigitsDefault1e6-4 966 0 -100.00% BenchmarkEncodeDigitsCompress1e4-4 56 0 -100.00% BenchmarkEncodeDigitsCompress1e5-4 120 0 -100.00% BenchmarkEncodeDigitsCompress1e6-4 966 0 -100.00% BenchmarkEncodeTwainSpeed1e4-4 58 0 -100.00% BenchmarkEncodeTwainSpeed1e5-4 132 0 -100.00% BenchmarkEncodeTwainSpeed1e6-4 1082 0 -100.00% BenchmarkEncodeTwainDefault1e4-4 52 0 -100.00% BenchmarkEncodeTwainDefault1e5-4 126 0 -100.00% BenchmarkEncodeTwainDefault1e6-4 886 0 -100.00% BenchmarkEncodeTwainCompress1e4-4 52 0 -100.00% BenchmarkEncodeTwainCompress1e5-4 120 0 -100.00% BenchmarkEncodeTwainCompress1e6-4 880 0 -100.00% benchmark old bytes new bytes delta BenchmarkEncodeDigitsSpeed1e4-4 4288 2 -99.95% BenchmarkEncodeDigitsSpeed1e5-4 8896 15 -99.83% BenchmarkEncodeDigitsSpeed1e6-4 84098 153 -99.82% BenchmarkEncodeDigitsDefault1e4-4 4480 3 -99.93% BenchmarkEncodeDigitsDefault1e5-4 9216 76 -99.18% BenchmarkEncodeDigitsDefault1e6-4 73920 768 -98.96% BenchmarkEncodeDigitsCompress1e4-4 4480 3 -99.93% BenchmarkEncodeDigitsCompress1e5-4 9216 76 -99.18% BenchmarkEncodeDigitsCompress1e6-4 73920 768 -98.96% BenchmarkEncodeTwainSpeed1e4-4 4544 2 -99.96% BenchmarkEncodeTwainSpeed1e5-4 9600 15 -99.84% BenchmarkEncodeTwainSpeed1e6-4 77633 153 -99.80% BenchmarkEncodeTwainDefault1e4-4 4352 3 -99.93% BenchmarkEncodeTwainDefault1e5-4 9408 76 -99.19% BenchmarkEncodeTwainDefault1e6-4 65984 768 -98.84% BenchmarkEncodeTwainCompress1e4-4 4352 3 -99.93% BenchmarkEncodeTwainCompress1e5-4 9216 76 -99.18% BenchmarkEncodeTwainCompress1e6-4 65792 768 -98.83% Updates #14258 Change-Id: Ibaa97b9619743ad623094727228eb2ada1ec7f1f Reviewed-on: https://go-review.googlesource.com/19336 Reviewed-by: Nigel Tao <nigeltao@golang.org> Reviewed-by: Joe Tsai <joetsai@digital-static.net> Run-TryBot: Joe Tsai <joetsai@digital-static.net> TryBot-Result: Gobot Gobot <gobot@golang.org>
@nigeltao, @klauspost, what is left here? Did the rebalancing happen? |
Some optimizations landed. Others didn't make the code freeze deadline. @klauspost would probably know better than me what's out that he'd still like to bring over from https://github.com/klauspost/compress Re-balancing, in terms of tweaking what parameter settings correspond to "compression level == 6", didn't happen, but when compared to Go 1.6, I believe that compression is still generally faster on GOARCH=amd64 at the default setting. I don't have any numbers on me right now. Also, "compression level == 1" aka "best speed" now uses the Snappy algorithm, for higher throughput but larger encoded size. That encoding size can come down with further algorithmic changes, but there was a correctness bug found in the fancier algorithm just before code freeze, so that will have to wait for Go 1.8. Issue #15622 is that compress/flate's minimum LZW-style match length increased from 3 bytes to 4 bytes, as 32-bit ops can be a lot faster. However, compressing RGB images can often want 3-byte matches, corresponding to an RGB tuple. We're still investigating how big a regression this is: whether it's "TIFF and PNG images are consistently e.g. 10% to 20% larger" or if it's "some are larger, some are smaller, and it's mostly a wash, although you can always find examples where it's significantly worse", or if it's somewhere in between. The current proposal is to re-introduce 3-byte matches for higher levels of the compression size-vs-throughput knob, although that will be more (as-yet-unwritten-and-untested) code. I don't think there are any open CLs. Please correct me if I'm wrong. |
I think the overall goal has been achieved, with some details left.
No - I haven't tested the current build, but I suspect it could need a re-balance and that "standard" is quite close to "best" in terms of both compression and speed. There are a few things that I would like to look at in the future, mainly:
Correct, AFAIK. |
We should merge optimizations added to https://github.com/klauspost/compress
Typical speedup is around 1.7x for web content, when maintaining roughly the same compression.
With re-balanced compression settings, the default level will be 3.1x faster at a 7% compression loss. Mixed content (disk images) show a 3.1x speedup with a 3% compression loss.
Extensive benchmarks in this google spreadsheet.
The text was updated successfully, but these errors were encountered: