Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: max connections overflow #39389

Closed
stevenh opened this issue Jun 3, 2020 · 7 comments
Closed

x/net/http2: max connections overflow #39389

stevenh opened this issue Jun 3, 2020 · 7 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@stevenh
Copy link
Contributor

stevenh commented Jun 3, 2020

What version of Go are you using (go version)?

go version go1.14.4 freebsd/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE=""
GOARCH="amd64"
GOBIN="/data/go/bin"
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="freebsd"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="freebsd"
GOPATH="/data/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/freebsd_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/data/go/src/github.com/multiplay/go/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build080411694=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Highly concurrent requests to a http2 endpoint.

Sample reproduction code:

package main

import (
        "flag"
        "log"
        "net"
        "net/http"
        "sync"
        "sync/atomic"
        "time"
)

var (
        client    *http.Client
        successes int32
        errors    int32
        max       int
        url          string
)

func worker(wg *sync.WaitGroup, run <-chan struct{}) {
        defer wg.Done()

        req, err := http.NewRequest(http.MethodGet, url, nil)
        if err != nil {
                log.Println(err)
                atomic.AddInt32(&errors, 1)
                return
        }

        <-run
        if _, err = client.Do(req); err != nil {
                log.Println(err)
                atomic.AddInt32(&errors, 1)
        } else {
                atomic.AddInt32(&successes, 1)
        }
}

func main() {
        flag.IntVar(&max, "max", 10, "maximum concurrent requests")
        flag.StringVar(&url, "max", "https://example/com", "url to test against")
        flag.Parse()

        client = &http.Client{
                Transport: &http.Transport{
                        Proxy: http.ProxyFromEnvironment,
                        DialContext: (&net.Dialer{
                                Timeout:   30 * time.Second,
                                KeepAlive: 30 * time.Second,
                                DualStack: true,
                        }).DialContext,
                        ForceAttemptHTTP2:     true,
                        MaxIdleConns:          max,
                        IdleConnTimeout:       90 * time.Second,
                        TLSHandshakeTimeout:   10 * time.Second,
                        ExpectContinueTimeout: 1 * time.Second,
                },
        }

        var wg sync.WaitGroup
        wg.Add(max)
        run := make(chan struct{})
        for i := 0; i < max; i++ {
                go worker(&wg, run)
        }
        close(run)
        wg.Wait()
        log.Printf("successes: %d errors: %d\n", successes, errors)
}

In our tests we ran the above with -max=1000 to trigger the issue as the server has MAX_CONCURRENT_STREAMS=256.

What did you expect to see?

All requests succeed.

What did you see instead?

http2: server sent GOAWAY and closed the connection; LastStreamID=533, ErrCode=PROTOCOL_ERROR

Debugging so far

After doing some debugging the issue is down to the fact transport configures maxConcurrentStreams to 1000 and with lots of very quick requests as generated by the above a connection ends up getting more than the max streams before it processes the settings header from the server which includes MAX_CONCURRENT_STREAMS=256.

The result of this is that all streams > 256 get refused with a mixture of:

  1. RST_STREAM ErrCode=REFUSED_STREAM
  2. GOAWAY ErrCode=PROTOCOL_ERROR

The former is handled by http2canRetryError but go away isn't as there are two distinct types of GOAWAY errors http2errClientConnGotGoAway and http2GoAwayError and it's http2GoAwayError I'm seeing.

Which begs the question should http2GoAwayError be handled hence that fix is simply:

@@ -7028,10 +7047,14 @@ func http2canRetryError(err error) bool {
        if err == http2errClientConnUnusable || err == http2errClientConnGotGoAway {
                return true
        }
-       if se, ok := err.(http2StreamError); ok {
-               return se.Code == http2ErrCodeRefusedStream
+       switch err := err.(type) {
+       case http2GoAwayError:
+               return err.ErrCode == http2ErrCodeProtocol
+       case http2StreamError:
+               return err.Code == http2ErrCodeRefusedStream
+       default:
+               return false
        }
-       return false
 }

 func (t *http2Transport) dialClientConn(addr string, singleUse bool) (*http2ClientConn, error) {

With this in place I now see retries don't get GOAWAY errors but start getting unexpected EOF.

Finally if I change maxConcurrentStreams to 256 as desired by the server everything works perfectly.

Reading the HTTP2 spec it seems valid for the client to send additional frames after its connection preface and in particular before its received the servers connection preface so setting up additional streams is valid.

Given all this I wonder if setting a lower SETTINGS_MAX_CONCURRENT_STREAMS such as the min recommended by the spec of 100 should be done until a settings frame has been received, as which it could be upgraded to either the value sent or the current default if no SETTINGS_MAX_CONCURRENT_STREAMS is present. This should have minimal impact on normal operation but in high concurrent situations avoid significant extra overhead of trying to setup streams which will never seen.

To put this into context I've seen the go client hit 500 streams before it processes the settings frame, so in our case which is talking to a cloudflare endpoint means up to half the streams need to be retried on different connections.

Thoughts?

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 4, 2020
@dmitshur dmitshur added this to the Backlog milestone Jun 4, 2020
@dmitshur
Copy link
Contributor

dmitshur commented Jun 4, 2020

/cc @bradfitz @tombergan

stevenh added a commit to stevenh/net that referenced this issue Jun 4, 2020
Prevent the client trying to establish more streams than the server is
willing to accept during the initial life time of a connection by
limiting maxConcurrentStreams to 100, the http2 specifications
recommended minimum, until we've received the initial SETTINGS frame
from the server.

After a SETTINGS frame has been received use the servers
MAX_CONCURRENT_STREAMS, if present, otherwise use 1000 as a reasonable
value.

For normal consumers this will have very little impact, allowing a
decent level of concurrency from the start, and for highly concurrent
consumers or large bursts it will prevent significant number of rejected
streams being attempted hence actually increasing performance.

Fixes golang/go#39389
@gopherbot
Copy link

Change https://golang.org/cl/236497 mentions this issue: http2: limit client initial MAX_CONCURRENT_STREAMS

@networkimprov
Copy link

cc @fraenkel

@dmitshur, note that Michael is a good reference for http2 issues.

@fraenkel
Copy link
Contributor

fraenkel commented Jun 7, 2020

My one concern is the number of connections opened prior to upgrading the MAX_CONCURRENT_STREAMS. You described a situation whereby 250+ requests were processed before the initial SETTINGS. This would create at least 4 additional connections prior to adjusting the count.

I am looking at other implementations both client and server to get a better idea of what defaults are being used. Envoy is 1 << 29, nginx is 128, Java went from 100 -> Integer.MAX_VALUE.

There are more examples of using a larger default over a smaller default.

@stevenh
Copy link
Contributor Author

stevenh commented Jun 7, 2020

I think our example is very much worse case scenario being a load test which was specifically designed to request as many connections at startup as configured which was 10k resulting in 100 - 2000 GOAWAY from cloudflare due to this. Enforcing a limit may even allow the client to process SETTINGS as from enabling debugging it was clear the SETTINGS frame was received way before it was processed, likely due to lock contention from the new streams being opened.

Given this ensuring we stay within the limit seems more important in real world scenarios.

If you compare this to http 1 all would be individual connections, hence in the grand scheme of things 4 connections is not a huge number.

So I guess the question is keep min at 100 or go slightly higher with 128 for nginx thoughts?

@ofavre
Copy link

ofavre commented Mar 24, 2021

In my use case, I suddenly have thousands of requests that I must perform at once.

We've lowered the default value to 1 to avoid seeing REFUSED_STREAM and PROTOCOL_ERROR, at the expense of creating a second connection until the first connection receives MAX_CONCURRENT_STREAMS from the server.
As the connections seem to be created only one by one, the first connection received its settings and thus can accept more streams, so the overhead in number of opened connections is low.

I have seen servers in the wild that default to MAX_CONCURRENT_STREAMS=50, or even a dynamically determined value, like Apple Push Service for instance.

I see in the code that REFUSED_STREAM are retried... provided that you used http.Request.GetBody instead of http.Request.Body, or if you're lucky enough that the body has not already been read.

I see at least two possibilities: waiting for the SETTINGS frame before opening new connections ; or defaulting the MAX_CONCURRENT_STREAMS to 1 instead of 100 like proposed.
I do not know which is better, or even if there are better alternatives.

@nekomeowww
Copy link

nekomeowww commented Sep 1, 2021

Are there any restrictions that preventing setting a default value for clients in Transport configurations?
Or we should implement it like right now, and set the MAX_CONCURRENT_STREAMS when we receive SETTINGS?

What if the origin server (or remote server) does have the ability to handle the concurrency, and they will never return a SETTINGS to the client to specify a MAX_CONCURRENT_STREAMS value? Such as the gRPC setup. So in this case, the client and server are both the high concurrency ends. We aren't expecting unlimited or very low value of concurrency connections, we are expecting to dynamically change the value based on our needs (or configure the value inside the configuration files).

What would be the best practice on that?

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 7, 2021
@dmitshur dmitshur modified the milestones: Backlog, Go1.18 Sep 7, 2021
dteh pushed a commit to dteh/fhttp that referenced this issue Jun 22, 2022
Prevent the client trying to establish more streams than the server is willing to accept during the initial life time of a connection by limiting `maxConcurrentStreams` to `100`, the http2 specifications
recommended minimum, until we've received the initial `SETTINGS` frame from the server.

After a `SETTINGS` frame has been received use the servers `MAX_CONCURRENT_STREAMS`, if present, otherwise use `1000` as a reasonable value.

For normal consumers this will have very little impact, allowing a decent level of concurrency from the start, and for highly concurrent consumers or large bursts it will prevent significant number of rejected streams being attempted hence actually increasing performance.

Fixes golang/go#39389

Change-Id: I35fecd501ca39cd059c7afd1d44090b023f16e1e
GitHub-Last-Rev: 0d1114d3a558cefed17008aba3e4a4d7b2ad3866
GitHub-Pull-Request: golang/net#73
Reviewed-on: https://go-review.googlesource.com/c/net/+/236497
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Trust: Brad Fitzpatrick <bradfitz@golang.org>
Trust: Joe Tsai <joetsai@digital-static.net>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
@golang golang locked and limited conversation to collaborators Sep 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants