internal/poll: panic: runtime error: slice bounds out of range [xx:30] #66755

calmh · 2024-04-10T07:00:18Z

Go version

go1.22.2 darwin/arm64

Output of `go env` in your module/workspace:

Many different builders over years.

What did you do?

Syncthing gathers crash reports from users, when the binary panics or otherwise exits abnormally for some reason. There's a lot of very mysterious stuff in there, which we'll often attribute to system/hardware problems of some kind if it's not systematic. However, there is one thing that is systematic and has existed for years now (at least back to Go 1.19) with the same backtrace.

I'm posting this here not because I have an obvious smoking gun, but to see if it makes sense to anyone familiar with the internals in question...

What did you see happen?

Essentially, this:

panic: runtime error: slice bounds out of range [76:30]

goroutine 14953 [running]:
internal/poll.(*FD).Write(0x14000b1fd00, {0x140038405c0, 0x1e, 0x1e})
	internal/poll/fd_unix.go:380 +0x3ac
net.(*netFD).Write(0x14000b1fd00, {0x140038405c0?, 0x0?, 0x14001566168?})
	net/fd_posix.go:96 +0x28
net.(*conn).Write(0x140005c2f40, {0x140038405c0?, 0x0?, 0xc?})
	net/net.go:191 +0x34
crypto/tls.(*Conn).write(0x14000120388, {0x140038405c0?, 0x5?, 0x5?})
	crypto/tls/conn.go:945 +0x100
crypto/tls.(*Conn).writeRecordLocked(0x14000120388, 0x17, {0x14003febb90, 0x8, 0x8})
	crypto/tls/conn.go:1026 +0x504
crypto/tls.(*Conn).Write(0x14000120388, {0x14003febb90, 0x8, 0x8})
	crypto/tls/conn.go:1233 +0x34c
github.com/syncthing/syncthing/lib/connections.(*limitedWriter).Write(0x1400094bce8?, {0x14003febb90?, 0x140001343f0?, 0x1400094bdf8?})
	github.com/syncthing/syncthing/lib/connections/limiter.go:249 +0x98
github.com/syncthing/syncthing/lib/protocol.(*countingWriter).Write(0x14004d8ab10, {0x14003febb90?, 0x10599fde0?, 0x14000d95370?})
	github.com/syncthing/syncthing/lib/protocol/counting.go:46 +0x34
github.com/syncthing/syncthing/lib/protocol.(*rawConnection).writeMessage(0x14002fe58c0, {0x1052221c0, 0x10599fde0})
	github.com/syncthing/syncthing/lib/protocol/protocol.go:839 +0x550
github.com/syncthing/syncthing/lib/protocol.(*rawConnection).writerLoop(0x14002fe58c0)
	github.com/syncthing/syncthing/lib/protocol/protocol.go:774 +0x1a0
github.com/syncthing/syncthing/lib/protocol.(*rawConnection).Start.func3()
	github.com/syncthing/syncthing/lib/protocol/protocol.go:332 +0x24
created by github.com/syncthing/syncthing/lib/protocol.(*rawConnection).Start in goroutine 75
	github.com/syncthing/syncthing/lib/protocol/protocol.go:331 +0x184

Things I note here:

Thousands of separate users are reporting this crash
The end index of the slice is almost always 30, but not precisely always. The start index is always higher than 30.
It's all darwin all the time, but both amd64 and arm64. This has never been reported for other platforms, even though darwin is a small minority of users.

Examples of other indexes:

What did you expect to see?

The slice passed should be fine... Especially, since the slice with the impossible indexes is after TLS has done it's thing. My gut feeling is there are shenanigans in the TLS stack on Darwin...?

The text was updated successfully, but these errors were encountered:

jrick · 2024-04-10T14:52:33Z

I see that the byte slice being written is allocated by a buffer pool.

https://github.com/syncthing/syncthing/blob/main/lib/protocol/bufferpool.go

Without digging super deep into this code, I suspect that it's not concurrent safe.

Does the race detector report anything?

calmh · 2024-04-10T15:11:26Z

This has been stress tested with the race detector quite extensively over the years, without anything popping up. I also hammered it as much as I could before filing as I had the same suspicion. :) As mentioned it never shows up on other platforms than darwin. And in any case the slice that gets written after passing through TLS should be somewhat different from the slice passed into TLS?

randall77 · 2024-04-10T15:16:31Z

This is probably a dup of #61060. In particular, it feels like the system write is returning an invalid result (one longer than the amount we asked it to write).
If we could find a way to reproduce, we could figure out how to fix, which may involve fixing Darwin somehow.

calmh · 2024-04-10T15:19:31Z

Ah yes, looks like the same issue, thanks.

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 10, 2024

seankhliao added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 10, 2024

seankhliao removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 10, 2024

randall77 closed this as completed Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internal/poll: panic: runtime error: slice bounds out of range [xx:30] #66755

internal/poll: panic: runtime error: slice bounds out of range [xx:30] #66755

calmh commented Apr 10, 2024 •

edited

jrick commented Apr 10, 2024

calmh commented Apr 10, 2024 •

edited

randall77 commented Apr 10, 2024

calmh commented Apr 10, 2024

internal/poll: panic: runtime error: slice bounds out of range [xx:30] #66755

internal/poll: panic: runtime error: slice bounds out of range [xx:30] #66755

Comments

calmh commented Apr 10, 2024 • edited

Go version

Output of go env in your module/workspace:

What did you do?

What did you see happen?

What did you expect to see?

jrick commented Apr 10, 2024

calmh commented Apr 10, 2024 • edited

randall77 commented Apr 10, 2024

calmh commented Apr 10, 2024

calmh commented Apr 10, 2024 •

edited

Output of `go env` in your module/workspace:

calmh commented Apr 10, 2024 •

edited