runtime: buffered channels are slower than a slice and a mutex #6394

davecheney · 2013-09-16T02:20:28Z

What steps will reproduce the problem?

Comparing the performance of this CL which replaces the hand rolled cache with the
buffered channel idiom found in net/http and other places.

https://golang.org/cl/13348058/

linux/386 (linux/amd64 shows less of a regression, but still regresses)

benchmark                      old ns/op    new ns/op    delta
BenchmarkSprintfEmpty                744          996  +33.87%
BenchmarkSprintfString              2576         3122  +21.20%
BenchmarkSprintfInt                 2079         2349  +12.99%
BenchmarkSprintfIntInt              3059         3376  +10.36%
BenchmarkSprintfPrefixedInt         3621         3769   +4.09%
BenchmarkSprintfFloat               5522         5362   -2.90%
BenchmarkManyArgs                  10838        11195   +3.29%
BenchmarkScanInts                8353897      8468101   +1.37%
BenchmarkScanRecursiveInt       10387658     13175570  +26.84%


What is the expected output? What do you see instead?

Expected: no performance decrease, or possibly an improvement

What I got: 15-30% worst case decrease across fmt benchmarks.

Please use labels and text to provide additional information.

davecheney · 2013-09-16T02:25:00Z

Comment 1:

Slightly less awful on linux/arm
localhost(~/go/src/pkg/fmt) % ~/go/misc/benchcmp {old,new}.txt
benchmark                      old ns/op    new ns/op    delta
BenchmarkSprintfEmpty                751          889  +18.38%
BenchmarkSprintfString              1702         1999  +17.45%
BenchmarkSprintfInt                 1405         1701  +21.07%
BenchmarkSprintfIntInt              1981         2234  +12.77%
BenchmarkSprintfPrefixedInt         1978         2237  +13.09%
BenchmarkSprintfFloat               4079         4397   +7.80%
BenchmarkManyArgs                   7054         7197   +2.03%
BenchmarkScanInts                5443710      5773931   +6.07%
BenchmarkScanRecursiveInt        7618105      8175157   +7.31%

robpike · 2013-09-16T03:43:03Z

Comment 2:

Actually I'm pleased it's so close. But yeah. It used to be a channel, but we made this
change for a reason.

rsc · 2013-09-18T22:46:51Z

Comment 3:

A buffered channel is a slice, a mutex, and extra stuff to make select work. It would be
embarrassing for the slice and mutex if that combination were somehow faster than a
slice+mutex alone. So "buffered channels are slower than a slice and a mutex" is working
as intended.
There is a possibly bigger difference in your CL, though. The slice+mutex in fmt is a
stack, and you are replacing it with a queue. If there are five entries in the cache,
the stack will just keep reusing one of them, while the queue will cycle through all
five. You'll touch less memory using the stack, which is part of why it's there.

Status changed to WorkingAsIntended.

davecheney · 2013-09-18T22:49:26Z

Comment 4:

Np. Thanks for the explanation.

alexbrainman · 2013-09-18T23:19:53Z

Comment 5:

I am just making a mental picture of "embarrassed slice and mutex combination" ... No,
no such luck. :-)
Alex

…'default' select case which always ran. As this case was important to prevent callers to recordStat() from blocking forever, and a robust fix appeared complex, I have simply removed the processStats goro and associated channel. A buffered channel requires a mutex lock in any case [1], and the mutex-protected logic, now inline in recordStat, will execute in fractions of a microsecond. I believe that the goro/channel is not worth the extra complexity at this time. [1] golang/go#6394

davecheney added workingasintended Performance labels Sep 18, 2013

rod-hynes mentioned this issue Dec 29, 2014

Fix: processStats goro was pinning the CPU. Add flag to enable CPU profiling. Psiphon-Labs/psiphon-tunnel-core#51

Merged

golang locked and limited conversation to collaborators Jun 25, 2016

gopherbot added the FrozenDueToAge label Jun 25, 2016

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: buffered channels are slower than a slice and a mutex #6394

runtime: buffered channels are slower than a slice and a mutex #6394

davecheney commented Sep 16, 2013

davecheney commented Sep 16, 2013

robpike commented Sep 16, 2013

rsc commented Sep 18, 2013

davecheney commented Sep 18, 2013

alexbrainman commented Sep 18, 2013

runtime: buffered channels are slower than a slice and a mutex #6394

runtime: buffered channels are slower than a slice and a mutex #6394

Comments

davecheney commented Sep 16, 2013

davecheney commented Sep 16, 2013

robpike commented Sep 16, 2013

rsc commented Sep 18, 2013

davecheney commented Sep 18, 2013

alexbrainman commented Sep 18, 2013