Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/http2: flow control desync when receiving data for canceled stream #56558

Closed
mrdg opened this issue Nov 3, 2022 · 5 comments
Closed

x/net/http2: flow control desync when receiving data for canceled stream #56558

mrdg opened this issue Nov 3, 2022 · 5 comments
Assignees
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@mrdg
Copy link

mrdg commented Nov 3, 2022

What version of Go are you using (go version)?

$ go version
go version go1.19 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

What did you do?

We have a proxy server in Go that proxies over HTTP/2 to a third-party server. We're seeing that the third-party server sometimes runs out of flow control tokens and can't send anymore data back to the client. There's no clear reproduction path, but it seems to happen when the proxy is under heavy load and seeing an increase in canceled requests, due to higher latency.

Looking for potential causes in the http2 package, I noticed that in the case where the transport reads a data frame for a stream that was already canceled, it returns the flow control tokens, but it also increments the flow control counter for the connection:

https://github.com/golang/net/blob/a1278a7f7ee0c218caeda793b867e0568bbe1e77/http2/transport.go#L2557-L2559

As far as I can tell, the tokens haven't been subtracted at this point, so we shouldn't be adding to the window? I ran the test case below, with some logging added to the lines linked above, e.g.:

cc.inflow.add(int32(f.Length))
cc.logf("http2: Transport received DATA frame for canceled stream (window: %d)", cc.inflow.available())
func TestFlowControlDesync(t *testing.T) {
	ct := newClientTester(t)
	sync := make(chan struct{})

	ct.client = func() error {
		ctx, cancel := context.WithCancel(context.Background())
		req, _ := http.NewRequest("GET", "https://dummy.tld/", nil)
		req = req.WithContext(ctx)
		_, err := ct.tr.RoundTrip(req)
		if err != nil {
			t.Fatal(err)
		}
		<-sync
		cancel()
		return nil
	}
	ct.server = func() error {
		ct.greet()
		window := initialWindowSize

		f, err := ct.fr.ReadFrame()
		if err != nil {
			t.Fatal(err)
		}
		wuf := f.(*WindowUpdateFrame)
		window += int(wuf.Increment)

		f, err = ct.fr.ReadFrame()
		if err != nil {
			t.Fatal(err)
		}
		hf := f.(*HeadersFrame)

		var buf bytes.Buffer
		enc := hpack.NewEncoder(&buf)
		enc.WriteField(hpack.HeaderField{Name: ":status", Value: "200"})
		ct.fr.WriteHeaders(HeadersFrameParam{
			StreamID:      hf.StreamID,
			EndHeaders:    true,
			EndStream:     false,
			BlockFragment: buf.Bytes(),
		})
		close(sync)
		for {
			data := make([]byte, maxFrameSize)
			window -= len(data)
			ct.fr.WriteData(hf.StreamID, false, data)
			f, err := ct.fr.ReadFrame()
			if err != nil {
				t.Fatal(err)
			}
			switch fr := f.(type) {
			case *WindowUpdateFrame:
				window += int(fr.Increment)
				t.Logf("server window: %v", window)
			}
			time.Sleep(time.Second)
		}
	}
	ct.run()
}

What did you expect to see?

I'd expect flow control to be returned to the server, but the transport's counter shouldn't increase in this case.

What did you see instead?

The flow control counter keeps increasing. I think will lead to flow control desync between client and server.

@mdempsky mdempsky added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 3, 2022
@mdempsky
Copy link
Member

mdempsky commented Nov 3, 2022

/cc @neild

@mdempsky mdempsky added this to the Go1.20 milestone Nov 3, 2022
@neild neild self-assigned this Nov 3, 2022
@neild
Copy link
Contributor

neild commented Nov 3, 2022

cc.inflow.add(int32(f.Length))

cc.inflow.add is increasing the flow control window, and the subsequent lines send the increase to the server in a WINDOW_UPDATE frame. There might be a flow control bug here somewhere, but I don't think these lines are it unless I'm misreading something.

@mrdg
Copy link
Author

mrdg commented Nov 4, 2022

The reason those lines seemed wrong to me is that in the general case, when data is received for an open stream, the window is reduced first (both stream and connection):

if cs.inflow.available() >= int32(f.Length) {
  cs.inflow.take(int32(f.Length))
}

It then refunds for padding right away, and later increases the window again when the caller reads the body:

if v := cc.inflow.available(); v < transportDefaultConnFlow/2 {
  connAdd = transportDefaultConnFlow - v
  cc.inflow.add(connAdd)
}

In the case of a canceled stream, it looks like the window is only ever increased.

@neild
Copy link
Contributor

neild commented Nov 4, 2022

Ah, you're right. That does look wrong; we're returning flow control tokens to cs.inflow that were never taken. (I thought we consumed the connection-level flow earlier, but we don't; connection and stream flow are consumed at the same time.) Over time, that'll lead to the transport thinking the server has more tokens than it does, and it'll stop sending window updates.

Thanks for the analysis.

@gopherbot
Copy link

Change https://go.dev/cl/448155 mentions this issue: http2: refactor inbound flow control tracking

@dmitshur dmitshur modified the milestones: Go1.20, Unreleased Jan 6, 2023
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jan 6, 2023
@golang golang locked and limited conversation to collaborators Jan 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants