x/net,net/http/httputil: ReverseProxy attempts to log each http2 stream cancellation, causing concurrent streams to grow unbounded due to logger buildup #53257
Labels
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone
Summary: ReverseProxy backed up on mutex in go logger while cleaning up cancelled HTTP2 streams.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes - We weren't able to try this, but the httputil code in question has not changed.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I have a service that uses the httputil ReverseProxy as a load balancer, proxying requests to upstream services. My service averages ~10k requests per second, nearly all of which are streams. The streams are short lived - The P95 stream duration (the duration of time that a finished stream was alive) is 400ms.
I upgraded the x/net version from
v0.0.0-20210929193557-e81a3d93ecf6
tov0.0.0-20220531201128-c960675eff93
. I confirmed that the change directly after the old version, http2: refactor request write flow, is the first version that introduces the following symptoms.What did you expect to see?
No noticeable behavior change to my service.
What did you see instead?
My service's number of currently open streams began growing unbounded. When healthy, my service has ~60 concurrent streams - With the upgrade, within 3 minutes it had reached 300k open streams and was continuing to grow.
I took a goroutine dump and it showed nearly all goroutines were waiting on the go logger's mutex to log the stream cancellation. Example log: (Message is always identical, except for the stream ID)
From the pprof:

I do not set a custom ReverseProxy ErrorLogger. My theory is that due to the cleanupWriteRequest introduced in the transport.go change, a
CANCEL
ed stream will now cause http2's Transport to return an error in RoundTrip, which is then consumed by ReverseProxy. ReverseProxy attempts to log each instance of this. Since the service has a high workload (rate of stream requests proxied), the logger falls behind and causes the buildup of concurrent streams.It looks like ReverseProxy attempts to filter out other request cancellation errors - io.EOF and context.Canceled - and not log in those cases. It seems like it should also filter out http2 stream cancellation?
To test this theory, I set the ReverseProxy ErrorLogger to a custom go Logger with a no-op io.Writer:
With this change, using x/net
v0.0.0-20220531201128-c960675eff93
, the service performed as expected - The concurrent streams no longer grew unbounded.The text was updated successfully, but these errors were encountered: