New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compress/gzip: use bufio to improve compress speed #56449
Comments
It's straightforward for your code to wrap |
I'm not sure I understand why it's flushing every 240 bytes. The compressor already buffers a relatively large window before flushing since there's already a built-in buffer of at least 32KiB of uncompressed input. Of course, that's for the uncompressed input, not the compressed output. Is the input highly compressible such that a large chunk of data outputs as a only a small amount? |
Assuming this issue is because the data is highly compressible, I don't think we should add an output buffer automatically since memory use of DEFLATE compressors is something that matters for servers with many concurrent writers. |
FYI, I think go/src/compress/flate/huffman_bit_writer.go Lines 25 to 29 in 537c435
|
Thank you for the reply ! I'am confused that in |
The buffer size is of course rather arbitrary, but I think as with most other packages it is reasonable for users to apply the buffering that matches their use case. If you are writing to a Deflate is already rather memory intensive. I think leaving the option to add a buffer to the user is the correct approach. The buffer is big enough that writes to memory output isn't affected. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Use gzip to compress, for example:
In my case, the target tar file
dst
is on a device which has a high I/O latency.What did you expect to see?
Time cost for
TarGz()
is close to the time cost for linux commandtar -czf ....
.What did you see instead?
The result shows that
TarGz()
is 8x slower thantar -czf ...
.Tracing the code, I found that gzip write file every
240 bytes
, which is inefficiency in my case.Thus, I recommend that the bufio writer can be used as a buffered writer to avoid too many I/O requests in the process of compressing. It is just like what is done in
compress/lzw
:With bufio, the
dst
file is written every4096 bytes
(compated to240 bytes
), which decreases the I/O requests and has a significant improvement in my case.Furthermore, it makes the request to I/O to be stable and irrelevant to the implementation of given compression algorithm.
The text was updated successfully, but these errors were encountered: