Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: possible race on persistConn.roundtrip vs. persistConn.readloop in transport when using httptrace #59310

Open
domdom82 opened this issue Mar 29, 2023 · 2 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@domdom82
Copy link

domdom82 commented Mar 29, 2023

What version of Go are you using (go version)?

go version go1.20.2 darwin/arm64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/d068861/Library/Caches/go-build"
GOENV="/Users/d068861/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/d068861/go/pkg/mod"
GONOPROXY="github.tools.sap"
GONOSUMDB="github.tools.sap"
GOOS="darwin"
GOPATH="/Users/d068861/go"
GOPRIVATE="github.tools.sap"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/opt/homebrew/Cellar/go/1.20.2/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/opt/homebrew/Cellar/go/1.20.2/libexec/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.20.2"
GCCGO="gccgo"
AR="ar"
CC="cc"
CXX="c++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/cr/bcds0pbj3g95vnwk20h4cvjw0000gn/T/go-build3474477496=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Given a simple tcp server, that immediately closes an incoming connection:

package main

import (
	"net"
)

func main() {
	listener, _ := net.Listen("tcp", ":8080")

	for {
		conn, _ := listener.Accept()
		conn.Close()
	}
}

Given a client that uses httptrace:

package main

import (
	"context"
	"fmt"
	"net/http"
	"net/http/httptrace"
	"strings"
)

func main() {
	trace := &httptrace.ClientTrace{
		GotConn: func(info httptrace.GotConnInfo) {
			fmt.Printf("GotConn: conn = src: %s tgt: %s reused: %v idle: %v idletime: %s\n",
				info.Conn.LocalAddr(), info.Conn.RemoteAddr(), info.Reused, info.WasIdle, info.IdleTime)
			//time.Sleep(100 * time.Microsecond)
		},
	}

	ctx := context.Background()
	ctx = httptrace.WithClientTrace(ctx, trace)

	url := "http://localhost:8080/"
	req, _ := http.NewRequestWithContext(ctx, http.MethodPost, url, strings.NewReader("some body"))

	client := &http.Client{}
	res, err := client.Do(req)

	if err != nil {
		fmt.Println("ERROR", err)
		return
	}

	res.Body.Close()
}

What did you expect to see?

http: server closed idle connection

What did you see instead?

EOF

Background:

The use-case for this scenario is that we want to know if we can safely retry POST requests. The hypothesis is that if GotConn or WroteHeaders on httptrace were not called during the request, we know that no data of the POST request has actually reached the backend. In our case, the backend is behind a L4 reverse proxy which always accepts connections first, but immediately closes them if it cannot forward the connection to its own backend. This has lead to issues where the actual backend would not even accept connections but we don't see that because from our end, the connection got accepted by the reverse proxy. So we want to have some way of knowing whether we could not send HTTP headers. Unfortunately, the error EOF is very generic and we can't really make any assumptions just based on this error, as it can happen anywhere during the lifecycle of a connection.

Technical details:

When the GotConn callback adds a minimal delay of 0.1ms, the behavior is different and http: server closed idle connection is returned. The race I observed is this:

  1. When no delay is used, persistConn.roundtrip is called first, increasing numExpectedResponses to 1, which prevents persistConn.readLoopPeekFailLocked from being called later. readLoopPeekFailLocked is used to peek into the connection to see if it is dead or alive. If EOF is encountered during the peek, errServerClosedIdle is returned. In this case this does not work.
  2. When a tiny delay of just a couple microseconds is added in the GotConn callback function of httptrace, persistConn.readLoop is called first, while numExpectedResponses is still 0 (from zero value initialization, NOT from getting reduced after actually reading a response, which leads to readLoopPeekFailLocked getting called and the final errServerClosedIdle returned to the caller.
  3. When the delay in GotConnis below 100 microseconds, the returned error starts flapping between EOF and http: server closed idle connection. Above 500 microseconds we consistently see http: server closed idle connection, below 10 microseconds we consistently see EOF.
  4. I've noticed the EOF in the no delay case is actually a nothingWrittenError which gets unwrapped to a transportReadFromServerError which then unwraps the EOF that is returned to Transport.RoundTrip, stripping all helpful information unfortunately.
@domdom82
Copy link
Author

Likely related issues: #16465 , #13667

@bcmills
Copy link
Contributor

bcmills commented Mar 31, 2023

(attn @neild)

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 31, 2023
@bcmills bcmills added this to the Backlog milestone Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

2 participants