Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: readLoopPeekFailLocked: <nil> from client when server sends response immediately #31259

Open
logrusorgru opened this issue Apr 4, 2019 · 20 comments
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@logrusorgru
Copy link

logrusorgru commented Apr 4, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.1 linux/amd64

Does this issue reproduce with the latest release?

Yep. 1.14.1

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/user/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/user/go"
GOPROXY=""
GORACE=""
GOROOT="/opt/go"
GOTMPDIR=""
GOTOOLDIR="/opt/go/pkg/tool/linux_amd64"
GCCGO="/usr/bin/gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build705707104=/tmp/go-build -gno-record-gcc-switches"

What did you do?

100% reproducible on Linux x64, and play.golang.org. The behavior is unpredictable and the code (see below) to reproduce should be executed few times to got the error and the log.

The error is:

Get http://127.0.0.1:9101/path: readLoopPeekFailLocked: <nil>

The log is:

transport.go:1829: Unsolicited response received on idle HTTP channel starting with "HTTP/1.1 502 Bad Gateway\r\n\r\n"; err=<nil>

What happens?

HTTP server sends response without reading request and without a delay. This approach is good for proxy servers where a client determined by his IP (i.e. using white-list). Thus, we haven't to read request, if the user is not white-listed.

Reproducing

(try few times to got error)

play.golang.org

package main

import (
	"fmt"
	"log"
	"net"
	"net/http"
	"net/url"
	"sync"
	"time"
)

func mustParseURL(uri string) *url.URL {
	x, err := url.Parse(uri)
	if err != nil {
		panic(err)
	}
	return x
}

func main() {

	l, err := net.Listen("tcp", "127.0.0.1:9100")
	if err != nil {
		log.Fatal(err)
	}
	go func(l net.Listener) {
		for {
			c, err := l.Accept()
			if err != nil {
				log.Fatal(err)
			}
			go func(c net.Conn) {
				defer c.Close()
				fmt.Fprint(c, "HTTP/1.1 502 Bad Gateway\r\n\r\n")
			}(c)
		}
	}(l)
	time.Sleep(100 * time.Millisecond)

	var cl http.Client
	cl.Transport = &http.Transport{
		Proxy:           http.ProxyURL(mustParseURL("http://127.0.0.1:9100/")),
		MaxConnsPerHost: 100,
	}

	const N = 100

	var wg sync.WaitGroup
	wg.Add(N)
	for i := 0; i < N; i++ {
		go func() {
			defer wg.Done()
			resp, err := cl.Get("http://127.0.0.1:9101/path")
			if err != nil {
				log.Print("error:", err)
				return
			}
			if resp.StatusCode != 502 {
				log.Println("status:", resp.StatusCode)
			}
			resp.Body.Close()
		}()
	}

	wg.Wait()

}

Also, playing with the N constant I've got this error for N=10, N=100 and N=1000. For N=1000 this error occurs every time. For N=10 rarely. Seems, it depends on goroutines scheduling.

What did you expect to see?

Got this error every time or never.

What did you see instead?

It's unpredictable. That makes testing a bit harder.

Also

Probably related issues #15253, #15446

@logrusorgru logrusorgru changed the title readLoopPeekFailLocked: <nil> or unpredictable behaviour of http client wherer server sends response immediately readLoopPeekFailLocked: <nil> or unpredictable behaviour of http client where server sends response immediately Apr 4, 2019
@logrusorgru logrusorgru changed the title readLoopPeekFailLocked: <nil> or unpredictable behaviour of http client where server sends response immediately readLoopPeekFailLocked: <nil> or, other words, unpredictable behaviour of http client where server sends response immediately Apr 4, 2019
@bcmills bcmills changed the title readLoopPeekFailLocked: <nil> or, other words, unpredictable behaviour of http client where server sends response immediately net/http: readLoopPeekFailLocked: <nil> from client when server sends response immediately Apr 12, 2019
@bcmills
Copy link
Contributor

bcmills commented Apr 12, 2019

CC @bradfitz

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 12, 2019
@bcmills bcmills added this to the Go1.13 milestone Apr 12, 2019
@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@lamg
Copy link

lamg commented Oct 11, 2019

Go version

go version go1.13 linux/amd64

Behavior

Running the following code with go test -race fails most of the times, but not always. I've never seen it fail with go test.

package race

import (
	"bytes"
	"context"
	"net"
	h "net/http"
	ht "net/http/httptest"
	"testing"
	"time"
)

func TestDataRace(t *testing.T) {
	d := &dialer{content: "HTTP/1.1 502 Bad Gateway\r\n\r\n"}
	tr := &h.Transport{
		DialContext: d.dial,
	}
	req := ht.NewRequest(h.MethodGet, "http://example.com", nil)
	resp, e := tr.RoundTrip(req)
	if e != nil {
		t.Error(e)
	} else if resp.StatusCode != h.StatusBadGateway {
		t.Error("status:", resp.Status)
	}
}

type dialer struct {
	content string
}

func (d *dialer) dial(ctx context.Context, network,
	addr string) (n net.Conn, e error) {
	n = newMockConn(d.content)
	return
}

type mockConn struct {
	read  *bytes.Buffer
	write *bytes.Buffer
}

func newMockConn(content string) (m *mockConn) {
	m = &mockConn{
		read:  bytes.NewBufferString(content),
		write: new(bytes.Buffer),
	}
	return
}

func (m *mockConn) Read(p []byte) (n int, e error) {
	n, e = m.read.Read(p)
	return
}

func (m *mockConn) Write(p []byte) (n int, e error) {
	n, e = m.write.Write(p)
	return
}

func (m *mockConn) Close() (e error) { return }

func (m *mockConn) LocalAddr() (a net.Addr) { return }

func (m *mockConn) RemoteAddr() (a net.Addr) { return }

func (m *mockConn) SetDeadline(t time.Time) (e error) { return }

func (m *mockConn) SetReadDeadline(t time.Time) (e error) {
	return
}

func (m *mockConn) SetWriteDeadline(t time.Time) (e error) {
	return
}

Output when failing

2019/10/11 17:19:50 Unsolicited response received on idle HTTP channel starting with "HTTP/1.1 502 Bad Gateway\r\n\r\n"; err=<nil>
--- FAIL: TestDataRace (0.00s)
    race_test.go:21: readLoopPeekFailLocked: <nil>
FAIL
exit status 1

@lamg
Copy link

lamg commented Oct 11, 2019

Defining in the previous code mockConn, newMockConn, mockConn.Read and mockConn.Write the following way, the error never appeared so far, with go test or go test -race:

type mockConn struct {
	read    *bytes.Buffer
	write   *bytes.Buffer
	writeOk chan bool
}

func newMockConn(content string) (m *mockConn) {
	m = &mockConn{
		read:    bytes.NewBufferString(content),
		write:   new(bytes.Buffer),
		writeOk: make(chan bool, 0),
	}
	return
}

func (m *mockConn) Read(p []byte) (n int, e error) {
	<-m.writeOk
	n, e = m.read.Read(p)
	return
}

func (m *mockConn) Write(p []byte) (n int, e error) {
	n, e = m.write.Write(p)
	m.writeOk <- true
	return
}

@cuonglm
Copy link
Member

cuonglm commented Oct 12, 2019

Maybe related #34066

lamg added a commit to lamg/proxy that referenced this issue Oct 12, 2019
@sebglon
Copy link

sebglon commented Jan 21, 2020

I have exactly the same with golang 1.13 and a fake http server like that:
echo -e "HTTP/1.1 200 OK\n\n{}" | nc -l -p ${SERVER_PORT}

@cyx
Copy link

cyx commented Mar 25, 2020

@logrusorgru seeing this happen as well. Can you reproduce it on go 1.11 as well or just 1.12++ ?

@bcmills
Copy link
Contributor

bcmills commented Mar 25, 2020

@cyx, note that per the release policy, neither Go 1.11 nor 1.12 is supported at this point.

(@sebglon's report of a repro using Go 1.13 is helpful, though.)

@logrusorgru
Copy link
Author

@cyx, I just have checked out go version go1.11.13 linux/amd64. I can say that this is reproducible in the entire go version go1.11.13 linux/amd64 - go version go1.14.1 linux/amd64 range.

I don't think this issue is a real problem, but anyone is working with HTTP "manually" may run into it.

I just use (let's say it's app.go)

var isTesting bool

// [...]

// handler function
    if isTesting {
       time.Sleep(100*time.Microsecond)
    }

(let's say it's app_test.go)

var _ = func() struct{} {
    isTesting = true
    return struct{}{}
}()

To get my tests 100% stable.

Because of the 100µs, I think it's about goroutines scheduling.

@cyx
Copy link

cyx commented Mar 26, 2020 via email

@jsha
Copy link

jsha commented Apr 24, 2020

I can report that we're seeing this issue in production at Let's Encrypt, using https://github.com/letsencrypt/boulder/. Specifically, there are some "hasty" HTTP servers out there that will send an HTTP response as soon as the TCP connection completes. When we make multiple concurrent requests to such a server, we get Unsolicited response in our stdout logs.

The reason we'd like to see this fixed: In general we try to route all log output through our logging system, and this is the one place in net/http that regularly prints to stdout. I have a minimized repro case below that is a bit shorter than ones posted to this thread. Reproduces 100% on Go 1.14.2 (latest as of 2020-04-23).

package main

import (
	"log"
	"net"
	"net/http"
	"sync"
	"time"
)

const port = ":8081"

var wg sync.WaitGroup

func main() {
	go hastyHTTPServer()
	time.Sleep(100 * time.Millisecond)
	for i := 0; i < 2; i++ {
		wg.Add(1)
		go fetchLoop()
	}
	wg.Wait()
}

func fetchLoop() {
	for i := 0; i < 1e3; i++ {
		resp, err := http.Get("http://localhost" + port)
		if err != nil {
			log.Println(err)
			continue
		}
		resp.Body.Close()
	}
	wg.Done()
}

// hastyHTTPServer runs an HTTP server that sends a response and closes the
// connection as soon as a client makes a TCP connection, rather than waiting
// for the client to actually make an HTTP request.
// A real-world example of this type of server can be seen at 8849777.com
func hastyHTTPServer() {
	ln, err := net.Listen("tcp", port)
	if err != nil {
		log.Fatal(err)
	}
	for {
		conn, err := ln.Accept()
		if err != nil {
			log.Fatal(err)
		}
		go func() {
			conn.Write([]byte("HTTP/1.0 200 OK\r\n\r\nQuick response!"))
			conn.Close()
		}()
	}
}

The output is like below (the number of errors varies from run to run, but there's almost always at least one error):

$ go run hastyserver.go 
2020/04/23 16:26:37 Unsolicited response received on idle HTTP channel starting with "HTTP/1.0 200 OK\r\n\r\nQuick response!"; err=<nil>
2020/04/23 16:26:37 Get "http://localhost:8081": readLoopPeekFailLocked: <nil>
2020/04/23 16:26:37 Unsolicited response received on idle HTTP channel starting with "HTTP/1.0 200 OK\r\n\r\nQuick response!"; err=<nil>
2020/04/23 16:26:37 Get "http://localhost:8081": readLoopPeekFailLocked: <nil>
2020/04/23 16:26:37 Unsolicited response received on idle HTTP channel starting with "HTTP/1.0 200 OK\r\n\r\nQuick response!"; err=<nil>
2020/04/23 16:26:37 Get "http://localhost:8081": readLoopPeekFailLocked: <nil>

Expected behavior:

No errors output.

Additional detail:

This test case also reproduces when I set http.DisableKeepAlives, and when I set Connection:Close on the response from the hasty server. Also, each connection yields at most 1 HTTP response, so this is not really an issue about an unsolicited response on an idle channel as the message suggests, but seems like a race between roundTrip incrementing numExpectedResponses and readLoop checking whether numExpectedResponses is 0. I don't know this code very well but I wonder if it makes sense to treat every connection as starting with numExpectedResponses at 1?

I've tried to figure out whether hasty responses are allowed or disallowed by RFC 7230. The closest language I can find is:

HTTP only presumes a reliable
transport with in-order delivery of requests and the corresponding
in-order delivery of responses. The mapping of HTTP request and
response structures onto the data units of an underlying transport
protocol is outside the scope of this specification.

Which seems to implicitly allow them. At any rate it's not clear that it would be possible to consistently reject hasty responses, since there's an unavoidable network race between writing a request and reading a response. Given that, I think it probably makes sense to make the current behavior (accepting hasty responses) more consistent.

@jsha
Copy link

jsha commented Apr 24, 2020

I can confirm that this patch, which starts persistConns with numExpectedResponses = 1, causes the repro code above to run without errors.

0001-Start-persistConns-with-numExpectedResponses-1.patch.txt

@qingyunha
Copy link
Contributor

move <-pc.reqch to the front of Peek may solve this problem

--- net/http/transport.go.bk    2020-04-24 10:06:11.253338231 +0800
+++ net/http/transport.go       2020-04-24 10:07:03.676671394 +0800
@@ -1893,6 +1893,7 @@
 
        alive := true
        for alive {
+               rc := <-pc.reqch
                pc.readLimit = pc.maxHeaderResponseSize()
                _, err := pc.br.Peek(1)
 
@@ -1904,7 +1905,6 @@
                }
                pc.mu.Unlock()
 
-               rc := <-pc.reqch
                trace := httptrace.ContextClientTrace(rc.req.Context())
 
                var resp *Response

@Porkbutts
Copy link

hi i'm receiving the same error. is there a way I can try this patch out without having to rebuild a fork fo go?

@logrusorgru
Copy link
Author

Yep, it's still actual with go version go1.16.6 linux/amd64. I've improved the code to get detailed stat https://play.golang.org/p/V3YkMeJ__4F (result can't be given on the play.golang.org). For example, for 100 000 total requests, only 329 ends with the error.

@RussellLuo
Copy link

RussellLuo commented Sep 1, 2021

Specifically, there are some "hasty" HTTP servers out there that will send an HTTP response as soon as the TCP connection completes.

... but seems like a race between roundTrip incrementing numExpectedResponses and readLoop checking whether numExpectedResponses is 0.

@jsha Thanks for your repro code!

As far as I understand, I think the race you mentioned above may also happen for "non-hasty" HTTP servers, in which case pc.numExpectedResponses++ does not necessarily happen before if pc.numExpectedResponses == 0 {...} either.

However, I can not reproduce the same error by replacing hastyHTTPServer() with normalHTTPServer() as below:

package main

import (
	"fmt"
	"log"
	"net"
	"net/http"
	"sync"
	"time"
)

const port = ":8081"

var wg sync.WaitGroup

func init() {
	tr := http.DefaultTransport.(*http.Transport)
	tr.DisableKeepAlives = true
}

func main() {
	//go hastyHTTPServer()
	go normalHTTPServer()
	time.Sleep(100 * time.Millisecond)
	for i := 0; i < 2; i++ {
		wg.Add(1)
		go fetchLoop()
	}
	wg.Wait()
}

func fetchLoop() {
	for i := 0; i < 1e3; i++ {
		resp, err := http.Get("http://localhost" + port)
		if err != nil {
			log.Println(err)
			continue
		}
		resp.Body.Close()
	}
	wg.Done()
}

// hastyHTTPServer runs an HTTP server that sends a response and closes the
// connection as soon as a client makes a TCP connection, rather than waiting
// for the client to actually make an HTTP request.
// A real-world example of this type of server can be seen at 8849777.com
func hastyHTTPServer() {
	ln, err := net.Listen("tcp", port)
	if err != nil {
		log.Fatal(err)
	}
	for {
		conn, err := ln.Accept()
		if err != nil {
			log.Fatal(err)
		}
		go func() {
			conn.Write([]byte("HTTP/1.0 200 OK\r\n\r\nQuick response!"))
			conn.Close()
		}()
	}
}

func normalHTTPServer() {
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		fmt.Fprintf(w, "HTTP/1.0 200 OK\r\n\r\nQuick response!")
	})
	log.Fatal(http.ListenAndServe(port, nil))
}

UPDATE: Sorry for my misunderstanding!

In fact, the following line will block until the HTTP response is available (or until the Client.Timeout is exceeded):

_, err := pc.br.Peek(1)

Therefore, in a typical HTTP interaction (i.e. the response follows the request), pc.numExpectedResponses++ will always happen before if pc.numExpectedResponses == 0 {...}.

@RussellLuo
Copy link

For those who are interested, I have finally figured out the reason for the readLoopPeekFailLocked: <nil> errors occurred in the production environment.

TL;DR: An abnormal 204 response, which includes a body, will cause the readLoopPeekFailLocked: <nil> error.

Background

We have deployed two types of backends behind Caddy (a web server in Go):

  1. Client -> Caddy -> Go Backend
  2. Client -> Caddy -> Erlang Backend (Cowboy based)

But only HTTP requests in the 2nd case (i.e. Caddy <-> Erlang Backend) are encountering the readLoopPeekFailLocked: <nil> errors occasionally.

Root Cause

The Erlang Backend is using Cowboy 2.7.0, which unfortunately allows developers to construct a 204 response with a body. And the fact is that our Erlang developer does do this!

After correcting these abnormal 204 responses, all readLoopPeekFailLocked: <nil> errors disappear.

NOTE: Cowboy 2.8.0 has fixed this issue (commit).

Reproducing Code

package main

import (
	"log"
	"net/http"
	"time"
)

const port = ":8081"

func main() {
	go abnormalHTTPServer()
	time.Sleep(100 * time.Millisecond)

	fetchLoop()
}

func fetchLoop() {
	client := &http.Client{Timeout: 2 * time.Second}
	for i := 0; i < 2; i++ {
		resp, err := client.Get("http://localhost" + port)
		if err != nil {
			log.Println(err)
			continue
		}
		resp.Body.Close()
	}
}

// abnormalHTTPServer runs an HTTP server that sends a 204 response with a body.
func abnormalHTTPServer() {
	http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
		hj, ok := w.(http.Hijacker)
		if !ok {
			panic("impossible for HTTP/1.x")
		}

		conn, _, err := hj.Hijack()
		if err != nil {
			panic(err)
		}

		// Mimic the abnormal 204 response from Cowboy 2.7.0.
		conn.Write([]byte("HTTP/1.1 204 No Content\r\n\r\n{}"))
		//conn.Close()
	})
	log.Fatal(http.ListenAndServe(port, nil))
}

@misha-ridge
Copy link

misha-ridge commented Feb 2, 2023

The same error still can be reproduced with #31259 (comment) on go1.19.5 linux/amd64

maxmoehl added a commit to sap-contributions/gorouter that referenced this issue Apr 13, 2023
The delay causes a race condition in the go transport that results in a
502 Bad Gateway with:
  `endpoint_failure (readLoopPeekFailLocked: %!w(<nil>))`.

This happens because the transport peeks the first few bytes on the
connection and gets some data even though it doesn't expect any. This
causes it to go into an error state even though there is no error
resulting in the formatting directive to break.

This commit removes the delay and adds a note why we can't do this for
now. This will reduce the amount of requests we can retry because the
client will send data before we know that the connection is good. After
we sent _some_ data we can't be sure that the server hasn't started
processing, hence no retry in such cases.

See: https://cloudfoundry.slack.com/archives/C033ALST37V/p1680888356483179
See: golang/go#31259
Resolves: cloudfoundry/routing-release#316
geofffranks pushed a commit to cloudfoundry/gorouter that referenced this issue Apr 17, 2023
The delay causes a race condition in the go transport that results in a
502 Bad Gateway with:
  `endpoint_failure (readLoopPeekFailLocked: %!w(<nil>))`.

This happens because the transport peeks the first few bytes on the
connection and gets some data even though it doesn't expect any. This
causes it to go into an error state even though there is no error
resulting in the formatting directive to break.

This commit removes the delay and adds a note why we can't do this for
now. This will reduce the amount of requests we can retry because the
client will send data before we know that the connection is good. After
we sent _some_ data we can't be sure that the server hasn't started
processing, hence no retry in such cases.

See: https://cloudfoundry.slack.com/archives/C033ALST37V/p1680888356483179
See: golang/go#31259
Resolves: cloudfoundry/routing-release#316
@domdom82
Copy link

domdom82 commented Apr 19, 2023

Related to #59310 error still exists in Go 1.20

@domdom82
Copy link

FYI, a new RFC 9112 was published in June 2022 adding more clarity to unsolicited responses. This spec supersedes RFC 7230 in that regard:

If a client receives data on a connection that doesn't have outstanding requests, the client MUST NOT consider that data to be a valid response; the client SHOULD close the connection, since message delimitation is now ambiguous, unless the data consists only of one or more CRLF

@jsha
Copy link

jsha commented Jun 4, 2023

That's very interesting @domdom82 ! Thanks for sharing.

Strangely, even though RFC 9112 seems to make this more clear, I think it winds up being less clear - and to some extent, introduces a race condition into protocol handling.

Traditional single-threaded HTTP implementations would do something like:

  • connect
  • write the request
  • read the socket for a response

If you do that, a response from a "hasty" HTTP server will look normal - it will look as if it was a response to the request.

However, my understanding is that Go winds up in a different situation:

  • connect
  • fire off a goroutine to read from the socket and process results
  • write the request

So depending on the speed of the server and the speed of the network, a "hasty" response could arrive before or after the request is written. So the RFC 9112 requirement to reject responses on idle channels seems to require the current racy behavior - but only if the client happens to be reading concurrently.

Perhaps a more useful fix on the HTTP RFC side would have been to specify that servers MUST NOT be hasty. That would make it somewhat easier to say that the client is free to do whatever it likes (including logging an error to stdout) when a hasty server is encountered.

Go0p added a commit to yaklang/yaklang that referenced this issue Jul 19, 2023
VillanCh pushed a commit to yaklang/yaklang that referenced this issue Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests