Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: frameScratchBuffer can consume lots of memory for idle conns #38049

Closed
zoic21 opened this issue Mar 24, 2020 · 12 comments
Closed

x/net/http2: frameScratchBuffer can consume lots of memory for idle conns #38049

zoic21 opened this issue Mar 24, 2020 · 12 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@zoic21
Copy link

zoic21 commented Mar 24, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build656777711=/tmp/go-build -gno-record-gcc-switches"

What did you do?

To begin I'am new in goland community so maybe it's an issue in my side... I use this project https://github.com/mmatczuk/go-http-tunnel to make a http2 reverse proxy. I test with about 700 connected clients (but with 0 I get same issue). Every request grow up memory, after some analyse with pprof I got :

(pprof) top
Showing nodes accounting for 449.13MB, 98.24% of 457.15MB total
Dropped 42 nodes (cum <= 2.29MB)
Showing top 10 nodes out of 14
      flat  flat%   sum%        cum   cum%
  446.12MB 97.59% 97.59%   446.12MB 97.59%  golang.org/x/net/http2.(*ClientConn).frameScratchBuffer
    2.51MB  0.55% 98.14%     2.51MB  0.55%  bufio.NewWriterSize (inline)
    0.50MB  0.11% 98.24%     3.01MB  0.66%  golang.org/x/net/http2.(*Transport).newClientConn
         0     0% 98.24%     2.51MB  0.55%  bufio.NewWriter (inline)
         0     0% 98.24%     5.01MB  1.10%  crypto/tls.(*Conn).Handshake
         0     0% 98.24%     5.01MB  1.10%  crypto/tls.(*Conn).serverHandshake
         0     0% 98.24%     5.01MB  1.10%  crypto/tls.(*serverHandshakeStateTLS13).handshake

And deeper :

(pprof) list frameScratchBuffer
Total: 457.15MB
ROUTINE ======================== golang.org/x/net/http2.(*ClientConn).frameScratchBuffer in /root/go/src/golang.org/x/net/http2/transport.go
  446.12MB   446.12MB (flat, cum) 97.59% of Total
         .          .    884:                   cc.mu.Unlock()
         .          .    885:                   return buf[:size]
         .          .    886:           }
         .          .    887:   }
         .          .    888:   cc.mu.Unlock()
  446.12MB   446.12MB    889:   return make([]byte, size)
         .          .    890:}
         .          .    891:
         .          .    892:func (cc *ClientConn) putFrameScratchBuffer(buf []byte) {
         .          .    893:   cc.mu.Lock()
         .          .    894:   defer cc.mu.Unlock()

Same request few minute later show 517.31MB (I always request the http2 server)

What did you expect to see?

Stabilization of memory occupancy

What did you see instead?

Memory grows without stopping

I discover goland univers and I hope that it's a bad analysis on my part but if someone cloud help me it's with pleasure.

Thank in advance

@fraenkel
Copy link
Contributor

Every new connection has its own set of scratch buffers which can be at most 4 x 512KB.

@zoic21
Copy link
Author

zoic21 commented Mar 25, 2020

Ok so If I understand max memory should be arround 2Mo*number client ? I will make some test to see that.

It's possible to change this buffer to 2 * 512KB for exemple ?

@fraenkel
Copy link
Contributor

@andybons
Copy link
Member

Will await test from @zoic21

@andybons andybons added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Mar 25, 2020
@andybons andybons added this to the Unplanned milestone Mar 25, 2020
@andybons andybons changed the title net/http2 Possible memory leak in frameScratchBuffer net/http2: possible memory leak in frameScratchBuffer Mar 25, 2020
@zoic21
Copy link
Author

zoic21 commented Mar 26, 2020

After 1 day test with 700 client with 5 requetes in parrallele for 10 hours in continous, memory do that :
image

I think it's ok. We can see that the memory grows very quickly at the beginning (surely the time that the buffers fill up) and that thereafter it calms down until stabilizing around 1.1go.

But if someday someone add I option to change maxBuffer I'll take it, or maybe a system to `remove old client buffer. For information what I have now on go pprof :

Showing nodes accounting for 1205.87MB, 97.59% of 1235.60MB total
Dropped 81 nodes (cum <= 6.18MB)
Showing top 10 nodes out of 27
      flat  flat%   sum%        cum   cum%
 1198.35MB 96.98% 96.98%  1198.35MB 96.98%  golang.org/x/net/http2.(*ClientConn).frameScratchBuffer
    7.03MB  0.57% 97.55%     7.03MB  0.57%  golang.org/x/net/http2/hpack.(*headerFieldTable).addEntry
    0.50MB  0.04% 97.59%    10.54MB  0.85%  golang.org/x/net/http2.(*Transport).newClientConn
         0     0% 97.59%     7.01MB  0.57%  crypto/tls.(*Conn).Handshake
         0     0% 97.59%     7.01MB  0.57%  crypto/tls.(*Conn).serverHandshake
         0     0% 97.59%     7.01MB  0.57%  crypto/tls.(*serverHandshakeStateTLS13).handshake
         0     0% 97.59%     7.17MB  0.58%  github.com/jeedom/gotunnel.(*Server).ServeHTTP
         0     0% 97.59%    18.04MB  1.46%  github.com/jeedom/gotunnel.(*Server).handleClient
         0     0% 97.59%    10.54MB  0.85%  github.com/jeedom/gotunnel.(*connPool).AddConn
         0     0% 97.59%     6.67MB  0.54%  github.com/jeedom/gotunnel.transfer
(pprof) list frameScratchBuffer
Total: 1.21GB
ROUTINE ======================== golang.org/x/net/http2.(*ClientConn).frameScratchBuffer in /root/go/src/golang.org/x/net/http2/transport.go
    1.17GB     1.17GB (flat, cum) 96.98% of Total
         .          .    884:                   cc.mu.Unlock()
         .          .    885:                   return buf[:size]
         .          .    886:           }
         .          .    887:   }
         .          .    888:   cc.mu.Unlock()
    1.17GB     1.17GB    889:   return make([]byte, size)
         .          .    890:}
         .          .    891:
         .          .    892:func (cc *ClientConn) putFrameScratchBuffer(buf []byte) {
         .          .    893:   cc.mu.Lock()
         .          .    894:   defer cc.mu.Unlock()

So for me it's ok after understand how it's work, we can close the issue (sorry for the inconvenience) if everybody is ok.

@fraenkel
Copy link
Contributor

Given go-http-tunnel has vendored the http2 code, you could experiment and adjust the value. I would however be curious how it affects throughput as well as memory.

@zoic21
Copy link
Author

zoic21 commented Mar 26, 2020

I already change that by last version of net/http2, my go.mod :

module github.com/jeedom/gotunnel

go 1.14

require (
	github.com/calmh/luhn v2.0.0+incompatible
	github.com/felixge/tcpkeepalive v0.0.0-20160804073959-5bb0b2dea91e
	github.com/go-sql-driver/mysql v1.5.0
	github.com/inconshreveable/go-vhost v0.0.0-20160627193104-06d84117953b
	golang.org/x/net v0.0.0-20200324143707-d3edc9973b7e
	golang.org/x/text v0.1.1-0.20171102192421-88f656faf3f3 // indirect
	gopkg.in/yaml.v2 v2.2.2
	github.com/golang/mock v1.2.0
	github.com/cenkalti/backoff v2.1.1+incompatible
	)

Same result

ribbybibby added a commit to utilitywarehouse/vault-kube-cloud-credentials that referenced this issue May 4, 2020
We were leaking memory steadily in
`golang.org/x/net/http2.(*ClientConn).frameScratchBuffer` every time we
retrieved a new set of credentials.

I believe this was because we were creating a new `*vault.Config` every time we
retrieved credentials from vault so that we would pick up any new changes to
`VAULT_CACERT`. This had the unfortunate side effect of creating, amongst other
things, a new pool of connections for each new config. Each connection has a set
of 4 scratch buffers with a maximum size of 512MB each. I believe that by
creating new connection pools we were proliferating these buffers.

See: golang/go#38049.

I haven't investigated and confirmed this theory fully but in any case, creating
one `*vault.Config` and simply reading config values from the environment with
`config.ReadEnvironment()` stops the leak.

I've also added the UW operational endpoint so that we can profile these issues
live in future.
ribbybibby added a commit to utilitywarehouse/vault-kube-cloud-credentials that referenced this issue May 4, 2020
We were leaking memory steadily in
`golang.org/x/net/http2.(*ClientConn).frameScratchBuffer` every time we
retrieved a new set of credentials.

I believe this was because we were creating a new `*vault.Config` every time we
retrieved credentials from vault so that we would pick up any new changes to
`VAULT_CACERT`. This had the unfortunate side effect of creating, amongst other
things, a new pool of connections for each new config. Each connection has a set
of 4 scratch buffers with a maximum size of 512MB each. I believe that by
creating new connection pools we were proliferating these buffers.

See: golang/go#38049.

I haven't investigated and confirmed this theory fully but in any case, creating
one `*vault.Config` and simply reading config values from the environment with
`config.ReadEnvironment()` stops the leak.

I've also added the UW operational endpoint, which includes pprof so that we can
profile these kinds of issues live in the future.
ribbybibby added a commit to utilitywarehouse/vault-kube-cloud-credentials that referenced this issue May 5, 2020
We were leaking memory steadily in
`golang.org/x/net/http2.(*ClientConn).frameScratchBuffer` every time we
retrieved a new set of credentials.

I believe this is because we were never closing the response body when we were
retrieving a lease from vault. From my testing, it seems that the rate at which
memory was leaking was also exacerbated by the fact that we were creating a new
`*vault.Config` every time we retrieved credentials from vault. Presumably this
was because we were creating a new pool of connections (and therefore a new set
of scratch buffers) for each config, although I haven't fully validated this hypothesis.

See: golang/go#38049.

Ensuring that the response body is closed after decoding the json from it and
using one `*vault.Config` seems to stop the leak.

I've also added the UW operational endpoint so that we can profile these issues
live in future.
ribbybibby added a commit to utilitywarehouse/vault-kube-cloud-credentials that referenced this issue May 5, 2020
We were leaking memory steadily in
`golang.org/x/net/http2.(*ClientConn).frameScratchBuffer` every time we
retrieved a new set of credentials.

I believe this is because we were never closing the response body when we were
retrieving a lease from vault. From my testing, it seems that the rate at which
memory was leaking was also exacerbated by the fact that we were creating a new
`*vault.Config` every time we retrieved credentials from vault. Presumably this
was because we were creating a new pool of connections (and therefore a new set
of scratch buffers) for each config, although I haven't fully validated this hypothesis.

See: golang/go#38049.

Ensuring that the response body is closed after decoding the json from it and
using one `*vault.Config` seems to stop the leak.

I've also added the UW operational endpoint so that we can profile these issues
live in future.
ribbybibby added a commit to utilitywarehouse/vault-kube-cloud-credentials that referenced this issue May 5, 2020
We were leaking memory steadily in
`golang.org/x/net/http2.(*ClientConn).frameScratchBuffer` every time we
retrieved a new set of credentials.

I believe this is because we were never closing the response body when we were
retrieving a lease from vault. From my testing, it seems that the rate at which
memory was leaking was also exacerbated by the fact that we were creating a new
`*vault.Config` every time we retrieved credentials from vault. Presumably this
was because we were creating a new pool of connections (and therefore a new set
of scratch buffers) for each config, although I haven't fully validated this hypothesis.

See: golang/go#38049.

Ensuring that the response body is closed after decoding the json from it and
using one `*vault.Config` seems to stop the leak.

I've also added the UW operational endpoint so that we can profile these issues
live in future.
ribbybibby added a commit to utilitywarehouse/vault-kube-cloud-credentials that referenced this issue May 6, 2020
We were leaking memory steadily in
`golang.org/x/net/http2.(*ClientConn).frameScratchBuffer` every time we
retrieved a new set of credentials.

I believe this is because we were never closing the response body when we were
retrieving a lease from vault. From my testing, it seems that the rate at which
memory was leaking was also exacerbated by the fact that we were creating a new
`*vault.Config` every time we retrieved credentials from vault. Presumably this
was because we were creating a new pool of connections (and therefore a new set
of scratch buffers) for each config, although I haven't fully validated this hypothesis.

See: golang/go#38049.

Ensuring that the response body is closed after decoding the json from it and
using one `*vault.Config` seems to stop the leak.

I've also added the UW operational endpoint so that we can profile these issues
live in future.
@neild
Copy link
Contributor

neild commented May 21, 2021

Every http2.ClientConn keeps a cache of up to four scratch buffers, with each scratch buffer being up to 512KiB in size. Each in-flight request requires a scratch buffer. Scratch buffers are allocated as needed, so the steady-state per-ClientConn memory usage is a function of the maximum number of simultaneous requests made.

The graph in #38049 (comment) looks like what I'd expect from this implementation: Initial growth as buffers are allocated and retained, eventually leveling off once every conn has allocated all its buffers.

So I don't believe there is a memory leak here. (If you leak ClientConns, then you're potentially leaking a lot of memory in scratch buffers, though, so definitely don't do that.)

The per-ClientConn buffer freelist will waste a lot of memory on a server with many connections that are usually idle but occasionally bursty. Perhaps we should have a global sync.Pool instead, although this gets tricky because the maximum buffer size varies by ClientConn.

The variable buffer size is something else that might be worth looking at. The buffer is sized to min(peer's max frame size, 512KiB), which means a ClientConn to a peer with the spec default of 16KiB will consume vastly less memory than one to a peer with a default of >= 512KiB.

There are a few different approaches we could take and I don't have a good sense yet for which one I favor, but the current state where idle ClientConns consume 64-2048KiB of memory doesn't seem like a good one.

@neild neild changed the title net/http2: possible memory leak in frameScratchBuffer net/http2: frameScratchBuffer can consume lots of memory for idle conns May 21, 2021
@neild neild removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label May 21, 2021
@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 21, 2021
@dmitshur dmitshur changed the title net/http2: frameScratchBuffer can consume lots of memory for idle conns x/net/http2: frameScratchBuffer can consume lots of memory for idle conns May 21, 2021
@DeepAQ
Copy link

DeepAQ commented May 26, 2021

In current http/2 server implementation, the default value of MaxReadFrameSize is 1MiB, and the server will send this value as MAX_FRAME_SIZE to clients in setting frames.

As a result, the frame buffer size is always 512KiB at the client side, which means the total buffer size per client connection can reach 2MiB (there can be 4 cached buffers). This is not very friendly to devices with lower size of RAM.

Can we make the two constants (maxAllocFrameSize and maxBufs) adjustable for http/2 clients?

@neild neild self-assigned this Jun 2, 2021
@neild
Copy link
Contributor

neild commented Jun 2, 2021

I think we should just have reasonable behavior here without requiring users to fiddle around with tuning knobs to avoid wasting a bunch of idle memory.

This probably means having a single shared buffer pool rather than maintaining a per-ClientConn pool. We can't safely just use a sync.Pool, however, because buffer sizes vary. Something along the lines of @dsnet's proposal for a pool of heterogenously-sized buffers seems like it would work.

@gopherbot
Copy link

Change https://golang.org/cl/325169 mentions this issue: http2: remove the frameScratchBuffer

@adranwit
Copy link

adranwit commented Jul 8, 2021

I believe there is somewhere memory leak on the HTTP2, client, my staging boxes is out of memory after 2 hours, where frameScratchBuffer can amass up to 4GB+ memory, tried various think and failed to see what's going on,
once switch to HTTP 1.0 application does not exhibit any memory issue.
Could you please reopen the issue and help:
For reference:
client implementation:
https://github.com/viant/mly/tree/main/shared/client
server implementation:
https://github.com/viant/mly/tree/main/service/endpoint

dteh pushed a commit to dteh/fhttp that referenced this issue Jun 22, 2022
Use a sync.Pool, not per ClientConn.

Co-authored with dneil@google.com

Fixes golang/go#38049

Change-Id: I93f06b19857ab495ffcf15d7ed2aaa2a6cb31515
Reviewed-on: https://go-review.googlesource.com/c/net/+/325169
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Trust: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
@rsc rsc unassigned neild Jun 23, 2022
@golang golang locked and limited conversation to collaborators Jun 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

8 participants