Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: panic in net/http.(*chunkWriter).Write via bufio.Writer #61172

Closed
krkeshav opened this issue Jul 5, 2023 · 8 comments
Closed

net/http: panic in net/http.(*chunkWriter).Write via bufio.Writer #61172

krkeshav opened this issue Jul 5, 2023 · 8 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@krkeshav
Copy link

krkeshav commented Jul 5, 2023

What version of Go are you using (go version)?

$ go version
go version go1.20.5 linux/amd64

Does this issue reproduce with the latest release?

Not reproducable

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOOS="linux"
GOARCH="amd64"

What did you do?

Recently we did 2 things in our production application.

  1. We upgraded to Go 1.20.5 using golang:1.20 docker image
  2. We started heavily using io.Copy() function which has definition
func Copy(dst Writer, src Reader) (written int64, err error) {
	return copyBuffer(dst, src, nil)
}

Our use case for this function is basically we are getting response from some api.
It is a json response and the size of json can be very big (30MB to 50MB).
So instead of doing io.ReadAll and then json.Unmarshal and then sending bytes through writer we are
directly copying the response.Body to writer. This helps in saving CPU.

Here is the panic. Please note the panic was copied line by line from
AWS Cloudwatch so ordering may not be accurate for stacktrace.

http: panic serving 127.0.0.1:43680: runtime **error**: slice bounds out of range [4097:4096]
goroutine 324311 [running]:
net/http.(*conn).serve.func1()
    /usr/local/go/src/net/http/server.go:1854 +0xbf
panic({0x1693f60, 0xc001607db8})
    /usr/local/go/src/runtime/panic.go:890 +0x263
bufio.(*Writer).Write(0xc009525ec0, {0xc009fbab00?, 0x4?, 0xc003be9af8?})
    /usr/local/go/src/bufio/bufio.go:670 +0x1c8
fmt.Fprintf({0x1c35980, 0xc009525ec0}, {0x180c732, 0x4}, {0xc003be9af8, 0x1, 0x1})
    /usr/local/go/src/fmt/print.go:225 +0x9b
net/http.(*chunkWriter).Write(0xc003409620, {0xc003882000, 0x800, 0x800})
    /usr/local/go/src/net/http/server.go:382 +0xdb
bufio.(*Writer).Flush(0xc00a5bdc00)
    /usr/local/go/src/bufio/bufio.go:628 +0x62
net/http.(*response).finishRequest(0xc0034095e0)
    /usr/local/go/src/net/http/server.go:1649 +0x49
net/http.(*conn).serve(0xc0035ca5a0, {0x1c44ab0, 0xc0025acc00})
    /usr/local/go/src/net/http/server.go:2001 +0x65a
created by net/http.(*Server).Serve
    /usr/local/go/src/net/http/server.go:3089 +0x5ed
panic: runtime error: slice bounds out of range [4097:4096] [recovered]

panic: runtime error: slice bounds out of range [:4097] with capacity 4096
goroutine 324311 [running]:
bufio.(*Writer).Flush(0xc00242f7a0?)
    /usr/local/go/src/bufio/bufio.go:628 +0x185
net/http.(*conn).finalFlush(0xc0035ca5a0)
    /usr/local/go/src/net/http/server.go:1723 +0x73
net/http.(*conn).close(0xc0035ca5a0)
    /usr/local/go/src/net/http/server.go:1733 +0x1e
net/http.(*conn).serve.func1()
    /usr/local/go/src/net/http/server.go:1865 +0x1cc
panic({0x1693f60, 0xc001607db8})
    /usr/local/go/src/runtime/panic.go:890 +0x263
bufio.(*Writer).Write(0xc009525ec0, {0xc009fbab00?, 0x4?, 0xc003be9af8?})
    /usr/local/go/src/bufio/bufio.go:670 +0x1c8
fmt.Fprintf({0x1c35980, 0xc009525ec0}, {0x180c732, 0x4}, {0xc003be9af8, 0x1, 0x1})
    /usr/local/go/src/fmt/print.go:225 +0x9b
net/http.(*chunkWriter).Write(0xc003409620, {0xc003882000, 0x800, 0x800})
    /usr/local/go/src/net/http/server.go:382 +0xdb
bufio.(*Writer).Flush(0xc00a5bdc00)
    /usr/local/go/src/bufio/bufio.go:628 +0x62
net/http.(*response).finishRequest(0xc0034095e0)
    /usr/local/go/src/net/http/server.go:1649 +0x49
net/http.(*conn).serve(0xc0035ca5a0, {0x1c44ab0, 0xc0025acc00})
    /usr/local/go/src/net/http/server.go:2001 +0x65a
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:3089 +0x5ed

What did you expect to see?

We expected this panic to be caught so that application wont be down.
This is the first time we have seen this panic.

What did you see instead?

This panic is not caught and hence not recovered and our application crashed.

@krkeshav krkeshav changed the title affected/package: panic in net/http package due to slice bounds out of range net/http: panic in net/http package due to slice bounds out of range Jul 5, 2023
@seankhliao seankhliao added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jul 5, 2023
@krkeshav
Copy link
Author

krkeshav commented Jul 5, 2023

Adding more context here.
We use github.com/gin-gonic/gin for running our http web server. The routes which we have defined are in following way:

// Router is defined here
func AddUrl(group *gin.RouterGroup) {
        group.GET("/endpoint", endpointHandlerChain()...)
}

// handler chain is called
func endpointHandlerChain() gin.HandlersChain {
        return gin.HandlersChain{
                getController(someApiHandler),
        }
}

// This calls handler and then calls Flush through gin context.
func getController(handler func(http.ResponseWriter, *http.Request)) gin.HandlerFunc {
	return func(c *gin.Context) {
		handler(c.Writer, c.Request) // This is the handler which is called and it calls io.Copy()
		c.Writer.Flush()
	}
}

// This is the actual handler being called
func SomeApiHandler(w http.ResponseWriter, _ *http.Request) {
        resp := getResp()
        io.Copy(w, resp.Body) // io.Copy used
}

Please tell if more information is required.

@krkeshav
Copy link
Author

krkeshav commented Jul 5, 2023

@seankhliao Added more context. Please tell if more information is needed.

@krkeshav
Copy link
Author

krkeshav commented Jul 5, 2023

UPDATE:
This issue is becoming frequent now. This time the panic had different slice length. Any help will be great. :)

panic: runtime error: slice bounds out of range [4586:4096] [recovered]
panic: runtime error: slice bounds out of range [:4586] with capacity 4096

@bcmills
Copy link
Contributor

bcmills commented Jul 5, 2023

According to that stack trace, the panic is happening in bufio.(*Writer).Write.

net/http.(*conn).serve.func1 is just printing the panic:
https://cs.opensource.google/go/go/+/refs/tags/go1.20.5:src/net/http/server.go;l=1854-1855;drc=ea4631cc0cf301c824bd665a7980c13289ab5c9d

@krkeshav
Copy link
Author

krkeshav commented Jul 5, 2023

@krkeshav krkeshav changed the title net/http: panic in net/http package due to slice bounds out of range bufio: panic in bufio package due to slice bounds out of range Jul 5, 2023
@bcmills
Copy link
Contributor

bcmills commented Jul 5, 2023

The recover is only best-effort. Really the problem here is that the panic is happening in the first place.

The sites of the two panics are both places involving b.buf and b.n. Smells like a data race to me: it looks like there are somehow two goroutines writing to the same bufio.Writer concurrently, which is definitely not allowed.

I suggest building the binary with -race and sending the same traffic pattern to it to see what that turns up. (Perhaps you have some goroutine writing to the http.ResponseWriter after the handler has already returned?)

@bcmills bcmills added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Jul 5, 2023
@bcmills bcmills changed the title bufio: panic in bufio package due to slice bounds out of range net/http: panic in net/http.(*chunkWriter).Write via bufio.Writer Jul 5, 2023
@krkeshav
Copy link
Author

krkeshav commented Jul 5, 2023

The recover is only best-effort. Really the problem here is that the panic is happening in the first place.

About recover. We already have panic handlers running but from stacktrace I can see there is no mention of our actual application files. Since application is crashing then its probably in a different routine. So I think maybe we cant use recover here.

The sites of the two panics are both places involving b.buf and b.n. Smells like a data race to me: it looks like there are somehow two goroutines writing to the same bufio.Writer concurrently, which is definitely not allowed.

I suggest building the binary with -race and sending the same traffic pattern to it to see what that turns up. (Perhaps you have some goroutine writing to the http.ResponseWriter after the handler has already returned?)

Will add this and check. Thanks!

@seankhliao seankhliao added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Jul 6, 2023
@krkeshav
Copy link
Author

krkeshav commented Jul 8, 2023

Hi @bcmills

It seems it was a race condition. At some place of code we were using fmt.Fprint(w io.Writer, a ...any) concurrently.
After adding Mutex, we are not seeing those panics anymore.

Thanks for the insights!

@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Jul 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

3 participants