Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: HTTP/1 server got 0.5μs slower in Go 1.8 #18964

Closed
OneOfOne opened this issue Feb 6, 2017 · 35 comments
Closed

net/http: HTTP/1 server got 0.5μs slower in Go 1.8 #18964

OneOfOne opened this issue Feb 6, 2017 · 35 comments
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@OneOfOne
Copy link
Contributor

OneOfOne commented Feb 6, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

Go 1.7.5, 1.8 rc3 and git

What operating system and processor architecture are you using (go env)?

Arch Linux 64bit

What did you do?

go run https://gist.github.com/OneOfOne/4d7e13977886ddab825870bc3422a901
switch terminals and run wrk -c 20 -d 30s http://localhost:8081

What did you expect to see?

Same throughput as 1.7 or faster.

What did you see instead?

go tip fd37b8c

Running 30s test @ http://localhost:8081
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   195.30us  470.12us  16.30ms   95.11%
    Req/Sec    85.62k     6.00k   95.21k    80.83%
  5110713 requests in 30.01s, 721.35MB read
Requests/sec: 170322.67
Transfer/sec:     24.04MB

go 1.8rc3 758a728

Running 30s test @ http://localhost:8081
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   192.49us  451.74us  15.14ms   95.02%
    Req/Sec    85.91k     6.37k   97.60k    83.50%
  5130079 requests in 30.01s, 724.08MB read
Requests/sec: 170941.23
Transfer/sec:     24.13MB

go 1.7.5

Running 30s test @ http://localhost:8081
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   210.16us  528.53us  14.78ms   94.13%
    Req/Sec    94.34k     4.31k  103.56k    73.83%
  5631803 requests in 30.01s, 794.89MB read
Requests/sec: 187673.03
Transfer/sec:     26.49MB
@bradfitz bradfitz added this to the Go1.9 milestone Feb 6, 2017
@bradfitz
Copy link
Contributor

bradfitz commented Feb 6, 2017

Too late for Go 1.8, but we can look into performance during Go 1.9.

@bradfitz
Copy link
Contributor

bradfitz commented Feb 6, 2017

Anybody want to investigate the difference? What's a Go 1.7 vs Go 1.8 CPU profile say?

@bradfitz bradfitz added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 6, 2017
@OneOfOne
Copy link
Contributor Author

OneOfOne commented Feb 6, 2017

Updated play with pprof: https://play.golang.org/p/GZ4zQOg1Wf

I ran go tool pprof http://localhost:6060/debug/pprof/profile, switched terms and ran wrk -c 20 -d 30s http://localhost:6060/.

go tip

(pprof) top
36600ms of 80000ms total (45.75%)
Dropped 297 nodes (cum <= 400ms)
Showing top 10 nodes out of 135 (cum >= 970ms)
      flat  flat%   sum%        cum   cum%
   26360ms 32.95% 32.95%    27340ms 34.17%  syscall.Syscall
    2280ms  2.85% 35.80%     5740ms  7.17%  runtime.mallocgc
    1310ms  1.64% 37.44%     1310ms  1.64%  runtime.heapBitsSetType
    1260ms  1.57% 39.01%     1260ms  1.57%  runtime._ExternalCode
    1030ms  1.29% 40.30%     7280ms  9.10%  net/http.(*chunkWriter).writeHeader
     970ms  1.21% 41.51%      970ms  1.21%  runtime.epollwait
     900ms  1.12% 42.64%      920ms  1.15%  runtime.mapiternext
     880ms  1.10% 43.74%      880ms  1.10%  runtime.usleep
     820ms  1.03% 44.76%     1490ms  1.86%  runtime.deferreturn
     790ms  0.99% 45.75%      970ms  1.21%  runtime.mapaccess1_faststr
(pprof) top -cum
27.89s of 80s total (34.86%)
Dropped 297 nodes (cum <= 0.40s)
Showing top 10 nodes out of 135 (cum >= 23.44s)
      flat  flat%   sum%        cum   cum%
     0.01s 0.013% 0.013%     73.46s 91.83%  runtime.goexit
     0.55s  0.69%   0.7%     69.55s 86.94%  net/http.(*conn).serve
     0.30s  0.38%  1.07%     35.91s 44.89%  net/http.(*response).finishRequest
     0.15s  0.19%  1.26%     32.10s 40.12%  bufio.(*Writer).Flush
    26.36s 32.95% 34.21%     27.34s 34.17%  syscall.Syscall
     0.10s  0.12% 34.34%     24.56s 30.70%  net/http.checkConnErrorWriter.Write
         0     0% 34.34%     24.44s 30.55%  net.(*conn).Write
     0.23s  0.29% 34.62%     24.44s 30.55%  net.(*netFD).Write
     0.06s 0.075% 34.70%     23.50s 29.38%  syscall.Write
     0.13s  0.16% 34.86%     23.44s 29.30%  syscall.write

go 1.7.5

(pprof) top
40520ms of 82240ms total (49.27%)
Dropped 281 nodes (cum <= 411.20ms)
Showing top 10 nodes out of 128 (cum >= 860ms)
      flat  flat%   sum%        cum   cum%
   29480ms 35.85% 35.85%    30920ms 37.60%  syscall.Syscall
    2550ms  3.10% 38.95%     5710ms  6.94%  runtime.mallocgc
    1560ms  1.90% 40.84%     1590ms  1.93%  runtime.heapBitsSetType
    1220ms  1.48% 42.33%     1220ms  1.48%  runtime.epollwait
    1050ms  1.28% 43.60%     2750ms  3.34%  runtime.mapassign1
    1050ms  1.28% 44.88%     1080ms  1.31%  runtime.mapiternext
    1000ms  1.22% 46.10%     7890ms  9.59%  net/http.(*chunkWriter).writeHeader
     880ms  1.07% 47.17%     2910ms  3.54%  net/http.DetectContentType
     870ms  1.06% 48.22%     1010ms  1.23%  runtime.mapaccess1_faststr
     860ms  1.05% 49.27%      860ms  1.05%  runtime.futex
(pprof) top -cum
31.67s of 82.24s total (38.51%)
Dropped 281 nodes (cum <= 0.41s)
Showing top 10 nodes out of 128 (cum >= 27.69s)
      flat  flat%   sum%        cum   cum%
         0     0%     0%     75.77s 92.13%  runtime.goexit
     0.44s  0.54%  0.54%     74.26s 90.30%  net/http.(*conn).serve
     0.27s  0.33%  0.86%     37.08s 45.09%  net/http.(*response).finishRequest
     0.18s  0.22%  1.08%     36.44s 44.31%  bufio.(*Writer).Flush
     0.25s   0.3%  1.39%     36.26s 44.09%  bufio.(*Writer).flush
    29.48s 35.85% 37.23%     30.92s 37.60%  syscall.Syscall
     0.12s  0.15% 37.38%     27.99s 34.03%  net/http.checkConnErrorWriter.Write
     0.69s  0.84% 38.22%     27.85s 33.86%  net/http.(*conn).readRequest
     0.08s 0.097% 38.31%     27.77s 33.77%  net.(*conn).Write
     0.16s  0.19% 38.51%     27.69s 33.67%  net.(*netFD).Write

Let me know if you want me to do anything specific.

@FrozenPrincess
Copy link

FrozenPrincess commented Feb 7, 2017

Ouch, I'm seeing the same thing with about a 20% slowdown on my webserver benchmark with 1.8r3, and a slightly bigger binary size too

@OneOfOne
Copy link
Contributor Author

OneOfOne commented Feb 7, 2017

I realize it's late on the release cycle but 20% regression on http is a huge regression for a lot of people.

I'm willing to help debug if you tell me what's needed @bradfitz.

@bradfitz
Copy link
Contributor

bradfitz commented Feb 7, 2017

I'm willing to help debug if you tell me what's needed @bradfitz.

Step 1 is debugging why it got slower. If you find any clues, let me know.

@dlsniper
Copy link
Contributor

dlsniper commented Feb 8, 2017

@OneOfOne how does this work for non-hello world applications? Do you see any issues there? Maybe 20% for Hello World apps is acceptable until 1.9?

@ghost
Copy link

ghost commented Feb 8, 2017

@bradfitz I think the feature for Shutdown is causing lower performance for the kind of 'hello world' apps/tests.

			select {
			case <-srv.getDoneChan():
return ErrServerClosed
//....

Look here .

At 1.7 it was just a for loop.

Can somebody confirm my assumption?

Thanks,
kataras

@mholt
Copy link

mholt commented Feb 8, 2017

I am not able to reproduce the OP's results. I'm on a Mac, and I'm using slightly older versions of Go.

$ wrk -c 20 -d 30s http://localhost:8081  # go 1.8 RC2
Running 30s test @ http://localhost:8081
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.87ms    6.99ms 151.45ms   99.38%
    Req/Sec    25.22k     2.34k   33.28k    66.94%
  1510655 requests in 30.10s, 213.22MB read
Requests/sec:  50188.34
Transfer/sec:      7.08MB

$ wrk -c 20 -d 30s http://localhost:8081  # go 1.7.4
Running 30s test @ http://localhost:8081
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   840.73us    6.85ms 151.46ms   99.41%
    Req/Sec    26.05k     3.67k   33.43k    60.96%
  1560770 requests in 30.10s, 220.29MB read
Requests/sec:  51853.50
Transfer/sec:      7.32MB

There's a slight difference but it's smaller than 20%, almost negligible here.

@bradfitz
Copy link
Contributor

bradfitz commented Feb 8, 2017

@kataras, do you have evidence in favor of that theory?

You can confirm it yourself: delete those lines and measure again.

I would be surprised, though.

@cstockton
Copy link

cstockton commented Feb 8, 2017

@kataras That looks like it is probably it. In addition to the select you have a mutex acquire and a unlock that is done in a defer (which I know recently got sped up, but are still a bit slower than a direct unlock.

func (s *Server) getDoneChan() <-chan struct{} {
	s.mu.Lock()
	defer s.mu.Unlock()
	return s.getDoneChanLocked()
}

Since the accept loop is such a hot path, it may be worth moving the shutdown select into a separate goroutine and using sync/atomic to signal shutdown in the accept loop.

EDIT:
I don't think it's just this, just tried tip without the select all together and it adds about 7us (5%~).

@renannprado
Copy link

I understand that's too late for 1.8, but why not maybe fix this in a future 1.8.1 version?

@mholt
Copy link

mholt commented Feb 8, 2017

Has anyone else even been able to reproduce these results?

@ghost
Copy link

ghost commented Feb 8, 2017

-11.2% on Debian 8, amd64 (1.7.5 and rc3)

Running 30s test @ http://localhost:8081
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   457.65us    1.26ms  46.74ms   93.23%
    Req/Sec    65.08k     7.84k   99.65k    73.83%
  3891443 requests in 30.07s, 549.25MB read
Requests/sec: 129397.13
Transfer/sec:     18.26MB

Running 30s test @ http://localhost:8081
  2 threads and 20 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   350.93us    0.92ms  39.50ms   94.72%
    Req/Sec    57.70k     5.34k   77.31k    74.50%
  3447358 requests in 30.03s, 486.57MB read
Requests/sec: 114800.24
Transfer/sec:     16.20MB

@cstockton
Copy link

@kataras The implementation is correct, which means the author was careful.. it's not that big of a deal bugs happen in software.

That said I was wrong anyways, I looked again just now and realized the select only happens when an error occurs during accept. What is odd is how it slowed down my program consistently when I remove it from the accept. Maybe it disqualifies that block from some optimization passes or inlining? There may be a problem somewhere causing a slowdown, but it's somewhere else and needs a different setup than mine.

@codido
Copy link

codido commented Feb 8, 2017

@bradfitz Briefly bisected this, and seems commit faf882d is causing at least some of it. I'm seeing a performance hit of roughly 10% on this benchmark with this commit compared to the commit preceding it.

@bradfitz
Copy link
Contributor

bradfitz commented Feb 8, 2017

@codido, thanks for the bisect! Yes, and that's the commit I would suspect too. Good to see confirmation. That was one of the biggest architectural changes in the net/http.Server in quite some time.

I never did any benchmarking (or optimizations) after that change.

We can look into it for Go 1.9.

@reimertz
Copy link

reimertz commented Feb 8, 2017

Hi @bradfitz

Just a quick question from an outsider, so please bear with me if I missed something obvious.
Why would it be too late to fix this issue? Isn't this the whole reason for release candidates? To find final major issues before releasing the next major version?

If not, please educate me.

Also, thanks for working on this project, I know a lot of people who love this language and the community around it. 🎉

@tinco
Copy link

tinco commented Feb 8, 2017

Why are people responding and voting as if this is a major issue? This change seems to have a worst case performance impact of ~40us per request. That sounds very low to me, is there a real world scenario where this would matter?

(edit: I was wrong, it's ~0.5us per request, so even lower)

@matzhouse
Copy link

I think there might need to be more benchmarks around this to see the effect on things other than hello world. Things like hello world stick a lot of benchmark pressure on the internals and as such can magnify the effects of slowdowns. In a real world application, there's a boat-load more code that runs which in theory makes the effect of things like this far smaller per request - ie it becomes less of a % per request which means the effect is reduced.
Just my 2 cents.
(I'm going to look at precisely this later on if I have time)

@kmlx
Copy link

kmlx commented Feb 8, 2017

@reimertz I remember reading about the release cycle here: https://github.com/golang/go/wiki/Go-Release-Cycle

Once a release candidate is issued, only documentation changes and changes to address critical bugs should be made. In general the bar for bug fixes at this point is even slightly higher than the bar for bug fixes in a minor release. We may prefer to issue a release with a known but very rare crash than to issue a release with a new but not production-tested fix.

One of the criteria for issuing a release candidate is that Google be using that version of the code for new production builds by default: if we at Google are not willing to run it for production use, we shouldn't be asking others to.

@reimertz
Copy link

reimertz commented Feb 8, 2017

@kmlx thanks for the link! I only went here: https://golang.org/doc/contribute.html and searched for 'release' and couldn't find anything. My bad.

Also, wow! If Google will run this version on their production servers and are okay with a worst case performance impact of ~40us per request (quoting @tinco ), I guess the rest of the world can as well. 🙂

@ALTree
Copy link
Member

ALTree commented Feb 8, 2017

This is why we ask people to test beta releases. So that if they believe they've found an issue, they can complain early.

go1.8beta1 was released on 1, Dec 2016, more than 2 months ago:

https://groups.google.com/d/topic/golang-nuts/QYuo0fai6YE/discussion

Quoting the announcement:

It is important that we find bugs before issuing a release candidate.
The release candidate is planned for the first week of January.
Your help in testing this beta is invaluable.

@tinco
Copy link

tinco commented Feb 8, 2017

Sorry I looked at the numbers wrong. So from @OneOfOne's test, go tip made 5110713 requests in 30 seconds, that's 5.87us per request. Go 1.7.5 did 5631803 requests in 30 seconds, 5.33us per request. So when you compare those to eachother, that's like an 11% performance decrease. But if you look at it from an absolute perspective, that's a performance hit of just a half microsecond per request. I can't even imagine an HTTP service where this would be relevant.

@Vorror
Copy link

Vorror commented Feb 8, 2017

@tinco I agree, it's a very minor regression when put into perspective. However, it's still very important to figure out why it regressed. Unless you want a situation like: #6853 and Go 1.9 comes around with another 11% decrease.

That being said I'm not sure why this couldn't be fixed with a patch release(vs a minor release).

@funny-falcon
Copy link
Contributor

@tinco , how many cores that computer has? multiply 0.5us by number of cores.

@bradfitz bradfitz changed the title net/http: throughput regression for a simple hello world net/http: HTTP/1 server got 0.5μs slower in Go 1.8 Feb 8, 2017
@minux
Copy link
Member

minux commented Feb 8, 2017 via email

@tinco
Copy link

tinco commented Feb 8, 2017

@minux the 5 million requests are handled in 30 seconds by N cores. So the real time spent per request on each core is 30 / 5M * N. N is likely rather small, less than 10 probably so it's not really relevant.

@rsc
Copy link
Contributor

rsc commented Feb 9, 2017

Minor releases are for critical, unavoidable bugs. Your program has a 1% chance of randomly crashing after a day of execution is the kind of bug we fix in a point release, with the simplest, safest, most trivial fix possible. Minor releases are not for fixing "your program runs 0.5us slower to execute an operation that probably takes far longer than that overall anyway".

@supermarin
Copy link

supermarin commented Feb 10, 2017

I guess you're referring to patch releases (1.8.x) vs minor (1.x.0).
IMO the whole point of patch releases was to address regressions without changing features or any behavior. Whereas this one doesn't look like a big one, I see no reason not to fix it in a 1.8.patch.

@rsc
Copy link
Contributor

rsc commented Feb 10, 2017

In the Go project, we refer to 1.x as a major release and 1.x.y as a minor (or sometimes point) release: Go 1.8 is a major release; Go 1.8.2 is a minor or point release. It doesn't make sense for us to call Go 1.8 a minor release.

The Go project release policy is documented at https://golang.org/doc/devel/release.html#policy:

Each major Go release obsoletes and ends support for the previous one. For example, if Go 1.5 has been released, then it is the current release and Go 1.4 and earlier are no longer supported. We fix critical problems in the current release as needed by issuing minor revisions (for example, Go 1.5.1, Go 1.5.2, and so on).

Go 1.5 is backwards compatible with Go 1.4 with some caveats: your code has to not depend on undocumented behavior (for example, the ordering of equal elements chosen by sort.Sort), your code has to use keyed struct literals so it doesn't break if new struct fields are added, and so on.

To the fullest extent possible, Go 1.5.1 is backwards compatible with Go 1.5 without any such caveats: we aim for updating from Go 1.5 to Go 1.5.1 to be a guaranteed-safe operation, a non-event, as you say "without changing features or any behavior".

Accepting that making mistakes is an inevitable part of writing software and that any time you make a change there's a risk it will introduce a bug that isn't caught by testing before the release, the best way we know to reduce that risk is to disallow non-critical changes in point releases.

Fixing a 0.5us slowdown that only appears in microbenchmarks is a non-critical change.

@rashidul0405
Copy link

@tinco if we don't consider this "so-called minor" performance issue and every time we ignore it, we'll ultimately get "a slower go". IMO no one wants this to happen.

@rsc
Copy link
Contributor

rsc commented Feb 10, 2017

@rashidul0405 This issue is still open; no one is ignoring it. I was explaining why it doesn't merit a rushed fix into Go 1.8 or Go 1.8.1.

I would gladly take a 1% slower Go instead of a 1% crashier Go.

@bradfitz
Copy link
Contributor

Let's keep discussion on mailing lists and Twitter etc.

Please only comment here if you're working on this issue.

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9 Jul 5, 2017
@rsc
Copy link
Contributor

rsc commented Nov 22, 2017

Since no one appears to want to work on this, I'm going to close it. We can't chase down every possible performance detail, and this one is getting old.

@rsc rsc closed this as completed Nov 22, 2017
@golang golang locked and limited conversation to collaborators Nov 22, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests