Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: window updates on randomWriteScheduler after stream closed cause memory leaks #33812

Closed
jared2501 opened this issue Aug 24, 2019 · 11 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@jared2501
Copy link

jared2501 commented Aug 24, 2019

What version of Go are you using (go version)?

go version go1.12.4 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What did you do? What did you expect to see? What did you see instead?

I've been tracing down some issues in a creeping response times in a gRPC server that uses HTTP/2.

image

The big jump downwards is when I restart the server (purple=median, blue=p95, yellow=max response times).

After much poking and logging, I've traced it down to the entries in the randomWriteScheduler.sq growing, and not being garbage collected. What I believe is happening is that RST_STREAM frames are being sent on closed streams, which causes the randomWriteScheduler to allocate new entries in the randomWriteScheduler.sq map (see code here). The RST_STREAM frames will get queued and emitted by WriteScheduler.Pop() eventually, however since the stream has already been closed, they will never get deleted from the randomWriteScheduler.sq map since randomWriteScheduler.CloseStream will never be called, which causes this map to grow unbounded.

The priorityWriteScheduler handles this case by not allocating new nodes, but instead attaching RST_STREAM frames for closed streams to it's root node (in fact, it has a nice comment explaining this logic here).

I can look at providing a minimum repro of the issue if required, but it's a bit tricky to isolate since it requires delay in the stream to reproduce. Hopefully, the difference in behaviour between the randomWriteScheduler and the priorityWriteScheduler, and reading the code, is enough to identify & fix the issue here.

@odeke-em odeke-em changed the title http2: window updates on randomWriteScheduler after stream closed cause memory leaks x/net/http2: window updates on randomWriteScheduler after stream closed cause memory leaks Aug 24, 2019
@odeke-em odeke-em added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 24, 2019
@jared2501
Copy link
Author

jared2501 commented Aug 26, 2019

I've run the gRPC server through a full traffic cycle with the priorityWriteScheduler enabled and it seems to have fixed the issue!

image

Here's a suggested fix to the leak:

diff --git a/http2/writesched_random.go b/http2/writesched_random.go
index 36d7919..4c34683 100644
--- a/http2/writesched_random.go
+++ b/http2/writesched_random.go
@@ -51,8 +51,14 @@ func (ws *randomWriteScheduler) Push(wr FrameWriteRequest) {
        }
        q, ok := ws.sq[id]
        if !ok {
-               q = ws.queuePool.get()
-               ws.sq[id] = q
+               // id is an idle or closed stream. wr should not be a HEADERS or
+               // DATA frame. However, wr can be a RST_STREAM. In this case, we
+               // push wr onto the zero write queue, rather than creating a new
+               // write queue, since RST_STREAM is tiny.
+               if wr.DataSize() > 0 {
+                       panic("add DATA on non-open stream")
+               }
+               q = &ws.zero
        }
        q.push(wr)
 }

@jared2501
Copy link
Author

jared2501 commented Sep 6, 2019

Quick ping @bradfitz @odeke-em - any chance of landing this in 1.14?

@jared2501
Copy link
Author

Sorry to harp on this issue, but quick ping on this @bradfitz (now that you're back from leave - congrats btw!)

@bradfitz bradfitz self-assigned this Oct 1, 2019
@bradfitz bradfitz added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 1, 2019
@bradfitz bradfitz added this to the Go1.14 milestone Oct 1, 2019
@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 1, 2019
@bradfitz
Copy link
Contributor

bradfitz commented Oct 1, 2019

@jared2501, thanks for the great bug report & diagnosis!

@gopherbot, please backport to Go 1.13.

/cc @tombergan

@gopherbot
Copy link

Backport issue(s) opened: #34636 (for 1.13).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@gopherbot
Copy link

Change https://golang.org/cl/198462 mentions this issue: http2: fix memory leak in random write scheduler

@bradfitz
Copy link
Contributor

bradfitz commented Oct 2, 2019

@jared2501, can you try out https://golang.org/cl/198462 and report back about whether it helps or not? And any performance numbers if that's easy?

Also, code reviews welcome from anybody.

@gopherbot
Copy link

Change https://golang.org/cl/198617 mentions this issue: [release-branch.go1.13] http2: fix memory leak in random write scheduler

gopherbot pushed a commit to golang/net that referenced this issue Oct 4, 2019
In certain shutdown cases (from the client and/or server), the http2
Server can Push stream-specific frames on closed streams. This caused
memory leaks in the random write scheduler.

As a conservative fix for backporting, just clear the map element
whenever its queue value is empty. The map entry is re-created as
needed anyway. This isn't perfectly ideal (it adds a map+delete and
free queue put+get) in the case where a stream is open & actively
writing, but it's an easy fix for now. A future CL can optimize all
this code. It looks like there are some other good optimization
opportunities in related code anyway. But I'd rather that happen on
master and not be done in a backported change.

Updates golang/go#34636 (needs bundle to std before fixed)
Updates golang/go#33812

Change-Id: I21508ba2ebc361e8b8532d0d1cebf882e82c473c
Reviewed-on: https://go-review.googlesource.com/c/net/+/198462
Reviewed-by: Bryan C. Mills <bcmills@google.com>
(cherry picked from commit d98b1b4)
Reviewed-on: https://go-review.googlesource.com/c/net/+/198617
Reviewed-by: Andrew Bonventre <andybons@golang.org>
Run-TryBot: Andrew Bonventre <andybons@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/198897 mentions this issue: [release-branch.go1.13] net/http: update bundled http2 for memory leak fix

gopherbot pushed a commit that referenced this issue Oct 4, 2019
…k fix

This re-runs go generate with x/net checked out at CL 198617 on the
release-branch.go1.13 branch for:

   [release-branch.go1.13] http2: fix memory leak in random write scheduler

Fixes #34636
Updates #33812

Change-Id: Ibce630c6c7ffe43ff760d2ad40b44731c07ba870
Reviewed-on: https://go-review.googlesource.com/c/go/+/198897
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
@jared2501
Copy link
Author

@bradfitz change looks like a reasonable fix to me. Thanks! Will try and find a chance to run it some time next week.

I'll see what I can do re perf numbers, but I suspect we don't have good enough instrumentation to capture any difference here. Will include them if there is :)

@gopherbot
Copy link

Change https://golang.org/cl/200102 mentions this issue: net/http: update bundled x/net/http2

gopherbot pushed a commit that referenced this issue Oct 9, 2019
Updates x/net/http2 to git rev d66e71096ffb9f08f36d9aefcae80ce319de6d68

    http2: end stream eagerly after sending the request body
    https://golang.org/cl/181157 (fixes #32254)

    all: fix typos
    https://golang.org/cl/193799

    http2: fix memory leak in random write scheduler
    https://golang.org/cl/198462 (fixes #33812)

    http2: do not sniff body if Content-Encoding is set
    https://golang.org/cl/199841 (updates #31753)

Also unskips tests from CL 199799.

Change-Id: I241c0b1cd18cad5041485be92809137a973e33bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/200102
Run-TryBot: Emmanuel Odeke <emm.odeke@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Oct 8, 2020
dteh pushed a commit to dteh/fhttp that referenced this issue Jun 22, 2022
In certain shutdown cases (from the client and/or server), the http2
Server can Push stream-specific frames on closed streams. This caused
memory leaks in the random write scheduler.

As a conservative fix for backporting, just clear the map element
whenever its queue value is empty. The map entry is re-created as
needed anyway. This isn't perfectly ideal (it adds a map+delete and
free queue put+get) in the case where a stream is open & actively
writing, but it's an easy fix for now. A future CL can optimize all
this code. It looks like there are some other good optimization
opportunities in related code anyway. But I'd rather that happen on
master and not be done in a backported change.

Fixes golang/go#33812

Change-Id: I21508ba2ebc361e8b8532d0d1cebf882e82c473c
Reviewed-on: https://go-review.googlesource.com/c/net/+/198462
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants