Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: broken https connections reuse on arm64 #39114

Open
DarthRamone opened this issue May 17, 2020 · 5 comments
Open

net/http: broken https connections reuse on arm64 #39114

DarthRamone opened this issue May 17, 2020 · 5 comments
Labels
arch-arm64 NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@DarthRamone
Copy link

What version of Go are you using (go version)?

$ go version
go1.14.3 linux/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

Ubuntu 18.04 LTS ARM64

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/home/ubuntu/.cache/go-build"
GOENV="/home/ubuntu/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/ubuntu/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/snap/go/5770"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/snap/go/5770/pkg/tool/linux_arm64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build987108063=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I wrote telegram bot with one of go-based telegram framework, and tested it on my MacBook. Everything works just fine, until I deployed it to my Raspberry Pi 4. On rpi it works fine, until I do several calls to telegram API, after third call bot stops responding, because of it can't do request and gets EOF.

I've spent a week trying to debug and find an issue. I thought it was related with Russian telegram ban, but I've routed all the telegram traffic through VPN, I've secured my DNS as much as possible, and issue still reproduces. Same network, same codebase but different processor arch - works good, move to arm64 - everything is broken.

I do not know much how secure connections works at "low levels", and started to debug it with tcpdump and netstat, and noticed, that bot in idle state opens just one connection to API server for polling, but if I do any request - a second connect is appearing and stays open. I thought it was a problem, but then I googled a little, and figured out that it is totally fine and it's common practice to reuse opened connections. It was confirmed with another tests on MacBook - same second connection and everything works fine.

Okay, what if I'll try somehow not to reuse these connections? I've set response.Close flag to true - nothing happens. Then I tried to call CloseIdleConnections() after every request to API, and it works! No more EOFs. Then I've found another option - set DisableKeepAlives to false to http client transport, and now it works fine(without manual CloseIdleConnections and resp.Close true flag). But it is not good I think, it opens and closes connections on every request and doesn't reuse them.

What did you expect to see?

Correct reusing opened connections.

What did you see instead?

EOF

@odeke-em
Copy link
Member

Thank you for filing this issue @DarthRamone and welcome to the Go project!

Could you perhaps please try to isolate a code snippet that illustrates the problem? Also could you check if the problem exists for Go1.13? You can get different Go versions using the tool dl

I shall also alert some experts about this issue, as we await a repro, cc @ianlancetaylor @cherrymui

@odeke-em odeke-em changed the title broken(?) https connections reuse on arm64 net/http: broken https connections reuse on arm64 May 18, 2020
@odeke-em odeke-em added arch-arm64 NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels May 18, 2020
@DarthRamone
Copy link
Author

Well, I rebuilt container with golang:1.13.11 image and played a little, and it worked well, but after some steps I've got EOF anyway. Plus I realised, that I forgot to comment DisableKeepAlives=true, so this flag is not even workaround, I think. Only difference - it works much longer and require slightly more intense requests to get EOF.

So issue exists at 1.13 as well, and DisableKeepAlives=true just helps to stay alive little longer, but doesn't fix anything. So maybe it's not about reusing connections at all, but the truth is somewhere near.

Providing code snippet maybe difficult, because I'm not sure how to exactly reproduce it. Even in worst case, when DisableKeepAlives set to false by default, I'm getting EOF not on every third call, but on every third certain call to certain endpoint. It may looks like API problem, but for some reasons it reproduces only on my RPi. I'm not sure I can figure out all conditions that causes this behaviour.

@odeke-em
Copy link
Member

Gotcha, thanks @DarthRamone! To seed a reproduction, please take a look at this short program which creates an HTTPS server serving HTTP/1.1 traffic at https://play.golang.org/p/TaaJx_MJ2ad or inlined below

package main

import (
	"crypto/tls"
	"net/http"
	"net/http/httptest"
	"net/http/httputil"
	"time"
)

func main() {
	cst := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		conn, _, err := w.(http.Hijacker).Hijack()
		if err != nil {
			panic(err)
		}
		tcpConn := conn.(*tls.Conn)
		if true {
			<-time.After(100 * time.Millisecond)
		}
		tcpConn.Write([]byte("HTTP/1.1 200 OK\r\nContent-Length: 2\r\n\r\nab\r\n\r\n"))
		tcpConn.Close()
	}))
	cst.StartTLS()
	defer cst.Close()

	println("Server address:", cst.URL)

	client := cst.Client()
	tr := client.Transport.(*http.Transport)
	tr.IdleConnTimeout = 50 * time.Millisecond
	for i := 0; i < 5; i++ {
		println("Request:", i, "*****")
		req, _ := http.NewRequest("GET", cst.URL, nil)
		res, err := client.Do(req)
		if err != nil {
			panic(err)
		}
		blob, err := httputil.DumpResponse(res, true)
		res.Body.Close()
		if err != nil {
			panic(err)
		}
		println(string(blob), "\n******\n\n")
	}
}

I intentionally have idleTimeouts and then invoke tcpConn.Close(), to kill/make some idle connections. Perhaps you can tweak this code to try to behave as your server is behaving?
If you need to just get back an EOF, please comment the line that sends back the HTTP response in the server and you'll be served back a straight up EOF.

One other tip, is that if you know the Telegram server is serving HTTP/2 traffic, you can use this environment flag GODEBUG=http2debug=2 which will print out for you logs e.g.

GODEBUG=http2debug=2 go run main.go

Kindly /cc-ing to your radar @bradfitz @fraenkel

@DarthRamone
Copy link
Author

DarthRamone commented May 18, 2020

I've changed your snipped a little to reproduce all what happens in bot library:
https://play.golang.org/p/tk0YnYy8gBv
But couldn't get EOF yet, only weird:
"Unsolicited response received on idle HTTP channel starting with "\r\n\r\n"; err=nil"
What's this btw?

I'll continue play with this snippet, maybe I will reproduce it somehow

UPD:
Fast google says that this weird error(weird for me) is somehow related to idle connections, https and all current subjects. There are even couple issues in golang repo about it

@bleggett
Copy link

bleggett commented Jun 10, 2020

I believe this is related to #5312 - which was never fixed, just closed.

I've seen similar EOF behavior in my own code, which makes parallel requests using goroutines - it appears there's a race condition where Go will incorrectly reuse a terminated connection before that connection can be properly marked as "terminated" by the runtime.

Even though you're not using goroutines, I suspect you're hitting some aspect of that race condition.

It seems to be something that can crop up more frequently in resource-constrained environments - see this comment for a hypothesis as to what's happening: #5312 (comment)

@joedian joedian removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 16, 2022
@seankhliao seankhliao added this to the Unplanned milestone Aug 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm64 NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants