Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: RST_STREAM ErrCode=CANCEL sent to a closed stream #41570

Open
martin-g opened this issue Sep 23, 2020 · 12 comments
Open

x/net/http2: RST_STREAM ErrCode=CANCEL sent to a closed stream #41570

martin-g opened this issue Sep 23, 2020 · 12 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@martin-g
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.15.2 linux/amd64

$ vegeta --version                                                                                                                                                      
Version: 12.8.3
Commit: d9b795aec8585a0fb435072f68d842d596c332de
Runtime: go1.14 linux/amd64
Date: 2020-03-25T11:03:54Z"

Does this issue reproduce with the latest release?

Yes!

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/martin/.cache/go-build"
GOENV="/home/martin/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/martin/golang-path/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/martin/golang-path"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build611792435=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I am load testing Apache Tomcat web server with Vegeta
Together with Tomcat's developers we have found that Golang sends RST_STREAM + CANCEL after reading the DATA frame of a response.
Initially I thought it is a problem in Vegeta (issue) but after some investigation I've
found #27208 and its fix.

I've exported GODEBUG="http2debug=2" and re-run Vegeta. It doesn't fail always but just for some HTTP2 streams. The debug information for a problematic stream looks like:

/tmp/vegeta-debug.txt:410:2020/09/23 11:11:36 http2: Framer 0xc000360000: wrote HEADERS flags=END_STREAM|END_HEADERS stream=443 len=10
/tmp/vegeta-debug.txt:520:2020/09/23 11:11:36 http2: Framer 0xc000360000: read HEADERS flags=END_HEADERS stream=443 len=35
/tmp/vegeta-debug.txt:525:2020/09/23 11:11:36 http2: Transport received HEADERS flags=END_HEADERS stream=443 len=35
/tmp/vegeta-debug.txt:526:2020/09/23 11:11:36 http2: Framer 0xc000360000: read DATA flags=END_STREAM stream=443 len=12 data="Hello world!"
/tmp/vegeta-debug.txt:527:2020/09/23 11:11:36 http2: Transport received DATA flags=END_STREAM stream=443 len=12 data="Hello world!"
/tmp/vegeta-debug.txt:536:2020/09/23 11:11:36 http2: Framer 0xc000360000: wrote RST_STREAM stream=443 len=4 ErrCode=CANCEL

What did you expect to see?

I expect to see error message explaining what kind of problem there is with the DATA frame.
I do not expect Golang HTTP2 library to send more frames to a closed stream. The server sends DATA with END_STREAM then Golang http2 fails for some reason and tries to send new frames to the server

What did you see instead?

I see Golang trying to send new frames to a closed stream and errors in the server logs explaining that the client is trying to use a closed stream.

Steps to Reproduce

I guess the following steps won't be very friendly to a Golang developer (or any non-Java developer) but I wasn't able to find a way to make it smaller.
There is a lenghtly discussion at https://lists.apache.org/thread.html/r6f9ce1521346d5ced72c82a4b5d882a624514c9a7b308ad54f064ba1%40%3Cusers.tomcat.apache.org%3E where I explained that I have tried with other HTTP2 servers (Golang, Rust, Node.js and Netty) - there is no such issue with them!
So I guess Tomcat produces problematic DATA frames in some cases.
I'd be very grateful if someone helps me to find out what exactly causes Golang net library to decide to send the RST_STREAM + CANCEL frame, for example by adding more debug statements to http2debug=2.

Steps:

  1. Download Tomcat from https://tomcat.apache.org/download-90.cgi
  2. Unzip/Untar it
  3. Edit /path/to/tomcat/conf/server.conf and remove the comments around this connector: https://github.com/apache/tomcat/blob/27c4109a7a686b71454205faab909e5e1f041b18/conf/server.xml#L102-L113
  4. Generate self-signed certificate and update the paths for certificateKeyFile and certificateFile attributes. The certificateChainFile attribute could be removed.
  5. Start Tomcat with: cd /path/to/tomcat/, ./bin/startup.sh
  6. Check ./logs/catalina.out that there are no errors
  7. Run Vegeta against Tomcat:
echo -e '{"method": "GET", "url": "https://localhost:8443/examples/index.html"}' | vegeta attack -format=json -http2 -rate=0 -max-workers=100 -insecure -duration=5s | vegeta encode > /tmp/http2.json; and vegeta report -type=json /tmp/http2.json | jq . 
@networkimprov
Copy link

cc @fraenkel

@cagedmantis
Copy link
Contributor

/cc @tombergan

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 28, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Sep 28, 2020
@fraenkel
Copy link
Contributor

So I won't comment on the setup but its not exactly trivial and the above well, wasn't enough.

What I ended up seeing were 2 types of reasons for the RST_STREAM:

  1. context cancelled
  2. missing stream end when the response body was closed

@martin-g
Copy link
Author

@fraenkel Could I provide you more debug information that will help you say why the CANCEL is being send ?

Here

/tmp/vegeta-debug.txt:527:2020/09/23 11:11:36 http2: Transport received DATA flags=END_STREAM stream=443 len=12 data="Hello world!"

I see all required data: the stream id, the END_STREAM flag and the body/data. But for some reason Golang net::Transport marks it as insufficient/incorrect and sends back the RST_STREAM+CANCEL.

@fraenkel
Copy link
Contributor

I told you why. I added debug to see why cancel was sent. I can give you the line numbers. Again, the client is cancelling the context or the response body is closed before we finishing reading the body.

@martin-g
Copy link
Author

Sorry, I am not so familiar with Golang net internals and I didn't (I still don't) understand whether the problem is in the client (Golang) or in the server (Tomcat).
Please just let me know if you want me to retest with a nightly build of Golang or to change something at the server side!

@networkimprov
Copy link

networkimprov commented Sep 29, 2020

@martin-g could you cc the Vegeta folks on this, since their client is apparently the cause?

@martin-g
Copy link
Author

Actually I first created an issue at Vegeta but then I've found #27208 and decided this is a problem in the library.
@tsenart Ping!

@tsenart
Copy link

tsenart commented Sep 30, 2020

@martin-g: It's possible your requests are timing out, since you're sending at max rate. There's a default timeout of 30s. Do you see the same thing when you set -timeout=0?

@martin-g
Copy link
Author

The default timeout is 30s but the load test runs for just 5s. I think this excludes this possibility. But I will test it anyway!

@martin-g
Copy link
Author

Confirmed! Same behavior with -timeout=0.

@ivandeex
Copy link

ivandeex commented Oct 11, 2020

This same issue (RST_STREAM + CANCEL sent by HTTP2 client to server after reading the DATA frame of a response) was reported with regard to exchange between restic backup and rclone utility: rclone/rclone#2598

The root cause in that case was that restic (client) read exactly the number of response bytes it requested but did not wait for stream EOF. Not waiting for EOF after reading full response body never posed a problem with HTTP/1.1 but results in sporadic RST_STREAM issues with HTTP/2.

@fraenkel
Can this difference between HTTP/1.1 and HTTP/2 implementations be considered a BUG??
If not, is it documented?

The issue was investigated here: https://github.com/ivandeex/rclone/tree/stream-reset/stream-reset#premature-stream-reset-between-restic-and-rclone
and can be reproduced by running a simple test https://github.com/ivandeex/rclone/blob/stream-reset/stream-reset/http2test.go

go run http2test.go http1 nodrain 10

runs successfully

go run http2test.go http2 nodrain 10

prints error messages

go run http2test.go http2 drain 10

is successful again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants