Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: context cancellation can leave HTTP client with deadlocked HTTP/1.1 connections in Go1.22 #65705

Closed
tibbes opened this issue Feb 14, 2024 · 9 comments · Fixed by git-lfs/git-lfs#5673
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@tibbes
Copy link
Contributor

tibbes commented Feb 14, 2024

Go version

go version go1.22.0 darwin/arm64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/tibbes/Library/Caches/go-build'
GOENV='/Users/tibbes/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/tibbes/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/tibbes/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.22.0'
GCCGO='gccgo'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/18/_76dkntd3f76rb3fql9q5jrw0000gn/T/go-build1573940289=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

There seems to be a bug in go 1.22.0 (not present in earlier versions), that can leave an HTTP client deadlocked when MaxConnsPerHost is used.

The following program reproduces the problem. It:

  • starts a local HTTP server
  • creates an HTTP client with MaxConnsPerHost, MaxIdleConns, and MaxIdleConnsPerHost all set to 1
  • sends 3 requests (with a 1s context timeout) to show the client is working normally
  • stresses the HTTP client by sending 10 requests sequentially with a 1 microsecond context timeout
  • attempts to send 3 requests (with a 1s context timeout)
package main

import (
	"bytes"
	"context"
	"fmt"
	"net/http"
	"time"
)

func main() {
	go serve()
	client := createHTTPClient()

	fmt.Println("=== Before ===")
	fmt.Println(download(client, 1*time.Second))
	fmt.Println(download(client, 1*time.Second))
	fmt.Println(download(client, 1*time.Second))

	// Try and put the connection in a bad state
	for i := 0; i < 10; i++ {
		download(client, time.Microsecond)
	}
	fmt.Println()

	fmt.Println("=== After ===")
	fmt.Println(download(client, 1*time.Second))
	fmt.Println(download(client, 1*time.Second))
	fmt.Println(download(client, 1*time.Second))
}

func download(client *http.Client, timeout time.Duration) (string, error) {
	ctx, cancel := context.WithTimeout(context.Background(), timeout)
	defer cancel()
	req, err := http.NewRequestWithContext(ctx, http.MethodGet, "http://127.0.0.1:8888", nil)
	if err != nil {
		return "", err
	}

	resp, err := client.Do(req)
	if err != nil {
		return "", err
	}
	defer resp.Body.Close()

	buf := bytes.Buffer{}
	_, err = buf.ReadFrom(resp.Body)
	if err != nil {
		return "", err
	}
	return buf.String(), nil
}

func createHTTPClient() *http.Client {
	// Make sure that the same connection is reused.
	transport := &http.Transport{
		MaxIdleConns:        1,
		MaxIdleConnsPerHost: 1,
		MaxConnsPerHost:     1,
	}
	return &http.Client{Transport: transport}
}

func serve() {
	http.ListenAndServe("127.0.0.1:8888", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		fmt.Println("Server received a request")
		w.Write([]byte("Hello, world!"))
	}))
}

What did you see happen?

With go 1.22.0, the HTTP client is not usable after stressing it with short timeouts:

=== Before ===
Server received a request
Hello, world! <nil>
Server received a request
Hello, world! <nil>
Server received a request
Hello, world! <nil>
Server received a request

=== After ===
 Get "http://127.0.0.1:8888": context deadline exceeded
 Get "http://127.0.0.1:8888": context deadline exceeded
 Get "http://127.0.0.1:8888": context deadline exceeded

What did you expect to see?

I would expect to see the same behaviour as go 1.21.7:

=== Before ===
Server received a request
Hello, world! <nil>
Server received a request
Hello, world! <nil>
Server received a request
Hello, world! <nil>

=== After ===
Server received a request
Server received a request
Hello, world! <nil>
Server received a request
Hello, world! <nil>
Server received a request
Hello, world! <nil>
@tibbes
Copy link
Contributor Author

tibbes commented Feb 14, 2024

Digging into this further, it looks like an accounting bug triggered by 38a0c5b.

In particular, the early return in Transport.dialConnFor (if ctx == nil) does not call t.decConnsPerHost(w.key):

	ctx := w.getCtxForDial()
	if ctx == nil {
		return
	}

However, there is a decrement lower down, which seems to indicate that the contract of this function was supposed to be to call t.decConnsPerHost(w.key) if a connection is not successfully created.

        if err != nil {
                t.decConnsPerHost(w.key)
        }

Certainly, the following patch seems to solve the problem:

diff --git a/src/net/http/transport.go b/src/net/http/transport.go
index 2a549a9576..411f6b2912 100644
--- a/src/net/http/transport.go
+++ b/src/net/http/transport.go
@@ -1478,6 +1478,7 @@ func (t *Transport) dialConnFor(w *wantConn) {
        defer w.afterDial()
        ctx := w.getCtxForDial()
        if ctx == nil {
+               t.decConnsPerHost(w.key)
                return
        }

/cc @AlexanderYastrebov, @neild, @bradfitz

@odeke-em odeke-em changed the title net/http: context cancellation can leave HTTP client with deadlocked HTTP/1.1 connections net/http: context cancellation can leave HTTP client with deadlocked HTTP/1.1 connections in Go1.22 Feb 14, 2024
@tibbes
Copy link
Contributor Author

tibbes commented Feb 14, 2024

I've added a unit-test that triggers this problem, along with the patch I mentioned in my previous comment, and opened https://go-review.googlesource.com/c/go/+/564036

@odeke-em odeke-em added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 14, 2024
@gopherbot
Copy link

Change https://go.dev/cl/564036 mentions this issue: net/http: add missing call to decConnsPerHost

@bcmills
Copy link
Contributor

bcmills commented Feb 15, 2024

(attn @neild)

@neild
Copy link
Contributor

neild commented Feb 16, 2024

Thanks for the nice analysis (and the fix!).

@gopherbot please open backport issue for go1.21. This is significant regression with no workaround.

@gopherbot
Copy link

Backport issue(s) opened: #65759 (for 1.21).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@tibbes
Copy link
Contributor Author

tibbes commented Feb 16, 2024

Thanks @neild, but I think you mean backport for 1.22!

bk2204 added a commit to bk2204/build-dockers that referenced this issue Feb 16, 2024
Let's update to the latest version of Go 1.21 to get the latest security
updates.  We're not using Go 1.22 because it has a bug
(golang/go#65705) that can cause problems for
HTTP clients in some situations.
@dmitshur dmitshur added this to the Go1.23 milestone Feb 16, 2024
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Feb 20, 2024
@richardartoul
Copy link

Thank you for catching this! Saved me a lot of time debugging...

@gopherbot
Copy link

Change https://go.dev/cl/566536 mentions this issue: [release-branch.go1.22] net/http: add missing call to decConnsPerHost

gopherbot pushed a commit that referenced this issue Feb 28, 2024
A recent change to Transport.dialConnFor introduced an early return that
skipped dialing. This path did not call decConnsPerHost, which can cause
subsequent HTTP calls to hang if Transport.MaxConnsPerHost is set.

For #65705
Fixes #65759

Change-Id: I157591114b02a3a66488d3ead7f1e6dbd374a41c
Reviewed-on: https://go-review.googlesource.com/c/go/+/564036
Reviewed-by: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Damien Neil <dneil@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
(cherry picked from commit 098a87f)
Reviewed-on: https://go-review.googlesource.com/c/go/+/566536
Reviewed-by: Carlos Amedee <carlos@golang.org>
valyala added a commit to VictoriaMetrics/VictoriaMetrics that referenced this issue Feb 29, 2024
Go1.22.0 has the bug golang/go#65705 ,
which prevents vmagent from normal operation.
valyala added a commit to VictoriaMetrics/VictoriaMetrics that referenced this issue Feb 29, 2024
Go1.22.0 has the bug golang/go#65705 ,
which prevents vmagent from normal operation.
valyala added a commit to VictoriaMetrics/VictoriaMetrics that referenced this issue Feb 29, 2024
Go1.22.0 contains the bug golang/go#65705 ,
which prevents vmagent from normal operation.
valyala added a commit to VictoriaMetrics/VictoriaMetrics that referenced this issue Feb 29, 2024
Go1.22.0 contains the bug golang/go#65705 ,
which prevents vmagent from normal operation.
bk2204 added a commit to bk2204/git-lfs that referenced this issue Mar 4, 2024
Due to the issue noted in golang/go#65705, we
don't want to build our release assets with 1.22.0, since this can cause
deadlock in the HTTP client.  Instead, let's keep testing with both 1.21
and 1.22 in our CI jobs, but build with 1.21 for release.

In the future, once that bug is fixed and an updated version is
released, we can move back to 1.22.
bradfitz pushed a commit to tailscale/go that referenced this issue Mar 5, 2024
A recent change to Transport.dialConnFor introduced an early return that
skipped dialing. This path did not call decConnsPerHost, which can cause
subsequent HTTP calls to hang if Transport.MaxConnsPerHost is set.

For golang#65705
Fixes golang#65759

Change-Id: I157591114b02a3a66488d3ead7f1e6dbd374a41c
Reviewed-on: https://go-review.googlesource.com/c/go/+/564036
Reviewed-by: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Damien Neil <dneil@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
(cherry picked from commit 098a87f)
Reviewed-on: https://go-review.googlesource.com/c/go/+/566536
Reviewed-by: Carlos Amedee <carlos@golang.org>
romaindoumenc pushed a commit to TroutSoftware/go that referenced this issue Mar 6, 2024
A recent change to Transport.dialConnFor introduced an early return that
skipped dialing. This path did not call decConnsPerHost, which can cause
subsequent HTTP calls to hang if Transport.MaxConnsPerHost is set.

For golang#65705
Fixes golang#65759

Change-Id: I157591114b02a3a66488d3ead7f1e6dbd374a41c
Reviewed-on: https://go-review.googlesource.com/c/go/+/564036
Reviewed-by: Damien Neil <dneil@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Damien Neil <dneil@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
(cherry picked from commit 098a87f)
Reviewed-on: https://go-review.googlesource.com/c/go/+/566536
Reviewed-by: Carlos Amedee <carlos@golang.org>
bk2204 added a commit to bk2204/git-lfs that referenced this issue Mar 7, 2024
Now that Go 1.22.1 is out with a backport to fix
golang/go#65705, revert the change that forced
us to build with Go 1.21 to avoid the problem.

This reverts commit ba2889b.
chrisd8088 added a commit to chrisd8088/build-dockers that referenced this issue Mar 8, 2024
Now that the Go 1.22.1 has been released with a backported fix
(golang/go@16830ab, per golang/go#65759)
for the problem reported in golang/go#65705, we can upgrade to Go 1.22
to align with our CI and release workflows in the main Git LFS repository.
possibull pushed a commit to possibull/VictoriaMetrics that referenced this issue Mar 27, 2024
Go1.22.0 contains the bug golang/go#65705 ,
which prevents vmagent from normal operation.
possibull pushed a commit to possibull/VictoriaMetrics that referenced this issue Mar 27, 2024
Go1.22.0 contains the bug golang/go#65705 ,
which prevents vmagent from normal operation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants