Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: pool GZIP readers to amortise initialisation costs #46181

Open
neetle opened this issue May 15, 2021 · 10 comments
Open

net/http: pool GZIP readers to amortise initialisation costs #46181

neetle opened this issue May 15, 2021 · 10 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@neetle
Copy link

neetle commented May 15, 2021

What version of Go are you using (go version)?

$ go version
1.16.3

Does this issue reproduce with the latest release?

Yes

What did you do?

Ran a HTTP server that returns a mix of small and large responses.

What did you expect to see?

Less of a memory spike when returning results

What did you see instead?

The GZIP Reader and Writer taking up ~50% of the total allocations for my server.

I propose we use a sync.Pool (or similar) to pool the GZIP readers and writers in the underlying HTTP client and server. This should be able to reduce GC churn and make memory allocations for clients and servers more in line with the size of the body associated with the request.

Alternatively, we could do this at the flate layer and bring the reduction of GC churn to anything that uses Huffman Encoding.

Happy to submit a PR for this. It's burnt me enough to justify just fixing it once upstream rather than re-fixing it in every HTTP service I review or write.

@gopherbot gopherbot added this to the Proposal milestone May 15, 2021
@networkimprov
Copy link

cc @fraenkel @neild @bradfitz

@ianlancetaylor
Copy link
Contributor

This is not an API change so it doesn't have to go through the approval process. This is an optimization suggestion.

@ianlancetaylor ianlancetaylor changed the title Proposal: net/http: Pool GZIP writers and readers to amortise initialisation costs net/http: pool GZIP writers and readers to amortise initialisation costs May 17, 2021
@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed Proposal labels May 17, 2021
@ianlancetaylor ianlancetaylor modified the milestones: Proposal, Backlog May 17, 2021
@fraenkel
Copy link
Contributor

Do you have any data to show the improvement?

@bradfitz
Copy link
Contributor

SGTM. I thought we actually already did this. :)

@neetle
Copy link
Author

neetle commented May 17, 2021

SGTM. I thought we actually already did this. :)

We pool the connections in HTTP, but not any of the GZIP-level components.

Side note, never add issues before coffee. It looks like only the HTTP clients from stdlib are impacted here if we change in net/http. net/http's server never encodes responses as GZIP unless done in application code. My bad 🤦‍♂️

@neetle neetle changed the title net/http: pool GZIP writers and readers to amortise initialisation costs net/http: pool GZIP readers to amortise initialisation costs May 17, 2021
@neetle
Copy link
Author

neetle commented May 20, 2021

Do you have any data to show the improvement?

It's employer code atm, working on getting the OK to work on outside OS work. Benchmark is as follows:

BenchmarkHTTPClient/without_pooling-12             10647            101213 ns/op        50353 B/op         68 allocs/op
BenchmarkHTTPClient/with_pooling-12                14616            142666 ns/op         9859 B/op         66 allocs/op

The speed benchmark likely isn't accurate, as my laptop is currently busy doing a bunch of other stuff. Benchmarked solution had to also do a bunch of work cloning the request that stdlib doesn't, so there is some overhead there to lose.

@fraenkel
Copy link
Contributor

Interesting. I did a simple benchmark.
If the gzipped bytes are small (240 bytes), my numbers are close to yours.
Once I gzip 1M of data, there is almost no difference.

@neetle
Copy link
Author

neetle commented May 20, 2021

In the grand scheme of things, we are only talking about an allocation of roughly what, 50kb? Once you're measuring body sizes of ~1mb, that may as well be a rounding error 😛

Edit: thinking about it more, I'd expect a variance of about ~5% per request that uses a pooled reader. Is this (roughly) what you found?

@fraenkel
Copy link
Contributor

These are 50k, 100k, and 1024k files.

name                old time/op    new time/op    delta
ClientServer50-8       198µs ± 1%     139µs ± 1%  -30.01%  (p=0.000 n=8+9)
ClientServer100-8      243µs ±21%     217µs ± 1%  -10.98%  (p=0.000 n=10+10)
ClientServer1024-8    1.30ms ± 1%    1.29ms ± 1%   -0.58%  (p=0.011 n=10+10)

name                old alloc/op   new alloc/op   delta
ClientServer50-8       262kB ± 0%     224kB ± 0%  -14.36%  (p=0.000 n=10+8)
ClientServer100-8      558kB ± 0%     525kB ± 0%   -5.97%  (p=0.000 n=10+9)
ClientServer1024-8    5.92MB ± 0%    5.90MB ± 0%   -0.32%  (p=0.000 n=10+9)

name                old allocs/op  new allocs/op  delta
ClientServer50-8        93.0 ± 0%      88.0 ± 0%   -5.38%  (p=0.000 n=10+10)
ClientServer100-8       98.0 ± 0%      95.0 ± 0%   -3.06%  (p=0.000 n=10+10)
ClientServer1024-8       120 ± 0%       120 ± 0%     ~     (all equal)

@qieqieplus
Copy link

I ran into the same issue when optimizing seaweedfs, pooling makes a lot difference for sub 100k contents.
P.S. my simple benchmark code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

8 participants