Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: buffered channels are slower than a slice and a mutex #6394

Closed
davecheney opened this issue Sep 16, 2013 · 5 comments
Closed

runtime: buffered channels are slower than a slice and a mutex #6394

davecheney opened this issue Sep 16, 2013 · 5 comments

Comments

@davecheney
Copy link
Contributor

What steps will reproduce the problem?

Comparing the performance of this CL which replaces the hand rolled cache with the
buffered channel idiom found in net/http and other places.

https://golang.org/cl/13348058/

linux/386 (linux/amd64 shows less of a regression, but still regresses)

benchmark                      old ns/op    new ns/op    delta
BenchmarkSprintfEmpty                744          996  +33.87%
BenchmarkSprintfString              2576         3122  +21.20%
BenchmarkSprintfInt                 2079         2349  +12.99%
BenchmarkSprintfIntInt              3059         3376  +10.36%
BenchmarkSprintfPrefixedInt         3621         3769   +4.09%
BenchmarkSprintfFloat               5522         5362   -2.90%
BenchmarkManyArgs                  10838        11195   +3.29%
BenchmarkScanInts                8353897      8468101   +1.37%
BenchmarkScanRecursiveInt       10387658     13175570  +26.84%


What is the expected output? What do you see instead?

Expected: no performance decrease, or possibly an improvement

What I got: 15-30% worst case decrease across fmt benchmarks.

Please use labels and text to provide additional information.
@davecheney
Copy link
Contributor Author

Comment 1:

Slightly less awful on linux/arm
localhost(~/go/src/pkg/fmt) % ~/go/misc/benchcmp {old,new}.txt
benchmark                      old ns/op    new ns/op    delta
BenchmarkSprintfEmpty                751          889  +18.38%
BenchmarkSprintfString              1702         1999  +17.45%
BenchmarkSprintfInt                 1405         1701  +21.07%
BenchmarkSprintfIntInt              1981         2234  +12.77%
BenchmarkSprintfPrefixedInt         1978         2237  +13.09%
BenchmarkSprintfFloat               4079         4397   +7.80%
BenchmarkManyArgs                   7054         7197   +2.03%
BenchmarkScanInts                5443710      5773931   +6.07%
BenchmarkScanRecursiveInt        7618105      8175157   +7.31%

@robpike
Copy link
Contributor

robpike commented Sep 16, 2013

Comment 2:

Actually I'm pleased it's so close. But yeah. It used to be a channel, but we made this
change for a reason.

@rsc
Copy link
Contributor

rsc commented Sep 18, 2013

Comment 3:

A buffered channel is a slice, a mutex, and extra stuff to make select work. It would be
embarrassing for the slice and mutex if that combination were somehow faster than a
slice+mutex alone. So "buffered channels are slower than a slice and a mutex" is working
as intended.
There is a possibly bigger difference in your CL, though. The slice+mutex in fmt is a
stack, and you are replacing it with a queue. If there are five entries in the cache,
the stack will just keep reusing one of them, while the queue will cycle through all
five. You'll touch less memory using the stack, which is part of why it's there.

Status changed to WorkingAsIntended.

@davecheney
Copy link
Contributor Author

Comment 4:

Np. Thanks for the explanation.

@alexbrainman
Copy link
Member

Comment 5:

I am just making a mental picture of "embarrassed slice and mutex combination" ... No,
no such luck. :-)
Alex

rod-hynes added a commit to rod-hynes/psiphon-tunnel-core that referenced this issue Dec 29, 2014
…'default'

select case which always ran. As this case was important to prevent callers to
recordStat() from blocking forever, and a robust fix appeared complex, I have
simply removed the processStats goro and associated channel. A buffered channel
requires a mutex lock in any case [1], and the mutex-protected logic, now
inline in recordStat, will execute in fractions of a microsecond. I believe that
the goro/channel is not worth the extra complexity at this time.

[1] golang/go#6394
@golang golang locked and limited conversation to collaborators Jun 25, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants