You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary: An idiomatic serve loop that reads from a file (as opposed to a socket) pauses for 40us before the request goroutine starts. We isolated this from a FUSE filesystem into a small benchmark.
We compared it against a no-concurrency serial server:
for {
req := accept()
process(req)
}
We measured the overhead (from the end of accept() to beginning of process()). Median of 100k runs for each strategy:
Serial: 441ns
Goroutine: 38us
For context, we've found the rest of our logic can run in ~10us, so the go scheduling overhead would be 400%
We tried a channel:
go func() {
for {
process(<-ch)
}
}
for {
req := accept()
ch <- req
}
Which is comparable to a Goroutine:
Serial: 441ns
Goroutine: 38us
Channel: 40us
Most servers read/accept from a socket, which the Go runtime implements using polling via the netpoller. FUSE reads from a file, which uses a blocking syscall. AIUI, the Go runtime will let the thread make the syscall, and then another thread of the runtime will notice it's blocked after ~20 us, and then reschedule work. These numbers seem similar which makes me think they're related.
Our workaround: service one process (both accept and process) on one goroutine and hand off the next request to a new goroutine. Strategy "handoff" looks like:
func serve() {
req := accept()
go serve()
process(req)
}
Build it into a binary fileping. To run:
rm a b; mkfifo a; mkfifo b; ./bin/fileping -print -strategy handoff < b > a& ./bin/fileping -strategy serial < a > b& echo 000000000000000 > a
(Modify the -strategy argument in the first command to try different strategies)
The text was updated successfully, but these errors were encountered:
The measurement ignores time the connection spent in the kernel queue
before Go picks them up.
If you measure request latency from the client, I doubt the handoff based
solution helps much (it just moves the latency from measured time to
unmeasured time so the measured latency might appear to be less, but the
latency is actually still there.)
Optimizing the delay before a new goroutine starts running will have a
negative impact on system throughput, so we must find a balance here (as
you've discovered, the obvious solution is for the runtime to immediately
preempt the current goroutine and execute the newly created one, but then
the accept goroutine must migrate to another thread, and that will hurt
future connections.)
The benchmark that I showed has two servers/clients talking to each other. With strategy goroutine, the program takes 19s. With strategy handoff it takes 1.9s, so there is an improvement in the RTT.
Would it help if I made a benchmark that captured the latency from the client side?
jasonbs10
added a commit
to twitter-forks/fuse
that referenced
this issue
Apr 12, 2016
Summary: An idiomatic serve loop that reads from a file (as opposed to a socket) pauses for 40us before the request goroutine starts. We isolated this from a FUSE filesystem into a small benchmark.
The idiomatic serve loop (Cf. src/net/http/server.go Serve and bazil.org/fuse/fs/serve.go Serve) processes a request on a new goroutine:
We compared it against a no-concurrency serial server:
We measured the overhead (from the end of accept() to beginning of process()). Median of 100k runs for each strategy:
Serial: 441ns
Goroutine: 38us
For context, we've found the rest of our logic can run in ~10us, so the go scheduling overhead would be 400%
We tried a channel:
Which is comparable to a Goroutine:
Serial: 441ns
Goroutine: 38us
Channel: 40us
Most servers read/accept from a socket, which the Go runtime implements using polling via the netpoller. FUSE reads from a file, which uses a blocking syscall. AIUI, the Go runtime will let the thread make the syscall, and then another thread of the runtime will notice it's blocked after ~20 us, and then reschedule work. These numbers seem similar which makes me think they're related.
Our workaround: service one process (both accept and process) on one goroutine and hand off the next request to a new goroutine. Strategy "handoff" looks like:
Serial: 441ns
Goroutine: 38us
Channel: 40us
Handoff: 671ns
I'm seeing this on Go version 1.6 darwin/amd64
Is this a known issue? Does it affect other platforms? Is our rewriting of the serve loop a known workaround?
Our full benchmark: http://play.golang.org/p/wc6hPnN778
Build it into a binary fileping. To run:
rm a b; mkfifo a; mkfifo b; ./bin/fileping -print -strategy handoff < b > a& ./bin/fileping -strategy serial < a > b& echo 000000000000000 > a
(Modify the -strategy argument in the first command to try different strategies)
The text was updated successfully, but these errors were encountered: