Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/bzip2: Slow performance #6754

Closed
tomc603 opened this issue Nov 13, 2013 · 16 comments
Closed

compress/bzip2: Slow performance #6754

tomc603 opened this issue Nov 13, 2013 · 16 comments

Comments

@tomc603
Copy link

tomc603 commented Nov 13, 2013

When decompressing file data from a bz2 compressed file, Go is much slower than other
popular languages' implementations.

For the simple sample program at http://play.golang.org/p/e0N9J8fsvz, I've incuded pprof
output for Go v 1.1 and 1.2rc2. The program walks a directory of BZ2 compressed text log
files, opens the file and passes the reader to a function that performs an io.copy() to
ioutil.discard. The results are almost exactly the same when processing each text line
from the reader with bufio.NewScanner().

Using large bufio readers or no bufio handling at all does not impact the overall
performance of the bzip2 functionality.

This test was conducted on 64bit Ubuntu Linux 13.10 using 64bit Go binaries. Go 1.2rc2
was downloaded directly from the Go site, whereas 1.1 was installed as an Ubuntu package.

Attachments:

  1. bztest-1.2rc2-discard.txt (5862 bytes)
  2. bztest-1.1-discard.txt (5859 bytes)
@robpike
Copy link
Contributor

robpike commented Nov 20, 2013

Comment 1:

Can you quantify 'much slower'?

Status changed to WaitingForReply.

@tomc603
Copy link
Author

tomc603 commented Nov 25, 2013

Comment 2:

Sorry for the long delay. Work got in the way of writing a comparison between Go and
Python.
I hate two test scenarios- The first is a 1GB file of data from /dev/zero, bzip
compressed. The second is a 1GB file of data from /dev/urandom also bzip compressed. 
The first should be a best case performance since all of the data is RLE encoded and the
compressed file is a few hundred bytes. The second case should be a worst-case scenario
where the data is not generally compressible and the compressed file is larger than the
source.
Results:
Decompressing /home/tcameron/tmp/decompress/zeros.data.bz2
Go 1.1 Decompress time: 3.212 sec
Py 2.7 Decompress time: 3.070 sec
Decompressing /home/tcameron/tmp/decompress/random.data.bz2
Go 1.1 Decompress time: 528.765 sec
Py 2.7 Decompress time: 104.724 sec
Let's call the zeros.dat.bz2 test even. Milliseconds for this file do not really
interest me. It is worth noting that Python's version is faster...but by less than a
quarter of a second. This could be down to lots of things and I'm not necessarily
interested in tracking them down.
The random.dat.bz2 test is much more enlightening. Slower by a factor of >5 is
surprising to me, and it equates to roughly 1.9MB/sec. I understand there hasn't been
much effort to optimize the bzip library for speed, so I figured my real-world
experience could be used to help the project in some way.
My actual use case of this is a syslog file parser, which I've been writing to replace a
Python script I previously wrote and to drive the lessons of Go into my brain. I see
very similar results with text file processing, but since I can not offer the text files
themselves for others to test with, I've tried something a bit more reproducible.
These tests are being performed on a Lenovo T430 with an SSD, Intel Core i5-3320M CPU @
2.60GHz, and 8GB RAM while plugged into an AC power source. The Operating System is
Ubuntu 13.10 with Kernel 3.11.0-13-generic, x86_64 architecture.
To review the source of each test application, please review my Github repos:
https://github.com/tomc603/pycompresstest
https://github.com/tomc603/gocompresstest

@remyoudompheng
Copy link
Contributor

Comment 3:

Go 1.2 is about 30% faster (revision cf3ee583c568), we're not there yet but it's already
better, can you also have a look at it?

@remyoudompheng
Copy link
Contributor

Comment 4:

Can you also give your method to produce random.data.bz2 ? Thanks.

@tomc603
Copy link
Author

tomc603 commented Nov 25, 2013

Comment 5:

To produce random.data.bz2:
dd if=/dev/urandom of=random.data; bzip2 random.data
I will check out a newer revision of 1.2 and test again, but from previous
results discussed in the gonuts mailing list, there was a small difference.
Thanks all!

@tomc603
Copy link
Author

tomc603 commented Nov 26, 2013

Comment 6:

After running the same tests with Go 1.2rc5 a couple times just to confirm I'm not crazy
(still a possibility though), it seems data that is RLE is actually twice as slow as Go
1.1. For these particular tests, I'm not seeing a 30% increase in speed, though I'm
exercising the two most extreme cases.
Results:
Decompressing /home/tcameron/tmp/decompress/zeros.data.bz2
Go 1.1    Decompress time: 3.000 sec
Go 1.2rc5 Decompress time: 6.612 sec
Decompressing /home/tcameron/tmp/decompress/random.data.bz2
Go 1.1    Decompress time: 534.020 sec
Go 1.2rc5 Decompress time: 499.078 sec

@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 7:

Labels changed: added go1.3maybe.

@dsymonds
Copy link
Contributor

Comment 8:

Labels changed: added performance.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 9:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 10:

Labels changed: added repo-main.

@davecheney
Copy link
Contributor

Comment 11:

Status changed to Accepted.

@gopherbot
Copy link

Comment 12:

CL https://golang.org/cl/131840043 mentions this issue.

@jeffallen
Copy link
Contributor

Comment 13:

11% faster is not insignificant, but there's probably more performance to be squeezed...
looking (casually) for it now.

@gopherbot
Copy link

Comment 14:

CL https://golang.org/cl/131470043 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/13852 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/13853 mentions this issue.

@mikioh mikioh modified the milestones: Go1.6, Unplanned Aug 29, 2015
@golang golang locked and limited conversation to collaborators Sep 4, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants