Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/poll: panic: runtime error: slice bounds out of range [xx:30] #66755

Closed
calmh opened this issue Apr 10, 2024 · 4 comments
Closed

internal/poll: panic: runtime error: slice bounds out of range [xx:30] #66755

calmh opened this issue Apr 10, 2024 · 4 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime.

Comments

@calmh
Copy link
Contributor

calmh commented Apr 10, 2024

Go version

go1.22.2 darwin/arm64

Output of go env in your module/workspace:

Many different builders over years.

What did you do?

Syncthing gathers crash reports from users, when the binary panics or otherwise exits abnormally for some reason. There's a lot of very mysterious stuff in there, which we'll often attribute to system/hardware problems of some kind if it's not systematic. However, there is one thing that is systematic and has existed for years now (at least back to Go 1.19) with the same backtrace.

I'm posting this here not because I have an obvious smoking gun, but to see if it makes sense to anyone familiar with the internals in question...

What did you see happen?

Essentially, this:

panic: runtime error: slice bounds out of range [76:30]

goroutine 14953 [running]:
internal/poll.(*FD).Write(0x14000b1fd00, {0x140038405c0, 0x1e, 0x1e})
	internal/poll/fd_unix.go:380 +0x3ac
net.(*netFD).Write(0x14000b1fd00, {0x140038405c0?, 0x0?, 0x14001566168?})
	net/fd_posix.go:96 +0x28
net.(*conn).Write(0x140005c2f40, {0x140038405c0?, 0x0?, 0xc?})
	net/net.go:191 +0x34
crypto/tls.(*Conn).write(0x14000120388, {0x140038405c0?, 0x5?, 0x5?})
	crypto/tls/conn.go:945 +0x100
crypto/tls.(*Conn).writeRecordLocked(0x14000120388, 0x17, {0x14003febb90, 0x8, 0x8})
	crypto/tls/conn.go:1026 +0x504
crypto/tls.(*Conn).Write(0x14000120388, {0x14003febb90, 0x8, 0x8})
	crypto/tls/conn.go:1233 +0x34c
github.com/syncthing/syncthing/lib/connections.(*limitedWriter).Write(0x1400094bce8?, {0x14003febb90?, 0x140001343f0?, 0x1400094bdf8?})
	github.com/syncthing/syncthing/lib/connections/limiter.go:249 +0x98
github.com/syncthing/syncthing/lib/protocol.(*countingWriter).Write(0x14004d8ab10, {0x14003febb90?, 0x10599fde0?, 0x14000d95370?})
	github.com/syncthing/syncthing/lib/protocol/counting.go:46 +0x34
github.com/syncthing/syncthing/lib/protocol.(*rawConnection).writeMessage(0x14002fe58c0, {0x1052221c0, 0x10599fde0})
	github.com/syncthing/syncthing/lib/protocol/protocol.go:839 +0x550
github.com/syncthing/syncthing/lib/protocol.(*rawConnection).writerLoop(0x14002fe58c0)
	github.com/syncthing/syncthing/lib/protocol/protocol.go:774 +0x1a0
github.com/syncthing/syncthing/lib/protocol.(*rawConnection).Start.func3()
	github.com/syncthing/syncthing/lib/protocol/protocol.go:332 +0x24
created by github.com/syncthing/syncthing/lib/protocol.(*rawConnection).Start in goroutine 75
	github.com/syncthing/syncthing/lib/protocol/protocol.go:331 +0x184

Things I note here:

  • Thousands of separate users are reporting this crash
  • The end index of the slice is almost always 30, but not precisely always. The start index is always higher than 30.
  • It's all darwin all the time, but both amd64 and arm64. This has never been reported for other platforms, even though darwin is a small minority of users.

Examples of other indexes:

Screenshot 2024-04-10 at 08 57 36

What did you expect to see?

The slice passed should be fine... Especially, since the slice with the impossible indexes is after TLS has done it's thing. My gut feeling is there are shenanigans in the TLS stack on Darwin...?

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 10, 2024
@jrick
Copy link
Contributor

jrick commented Apr 10, 2024

I see that the byte slice being written is allocated by a buffer pool.

https://github.com/syncthing/syncthing/blob/main/lib/protocol/bufferpool.go

Without digging super deep into this code, I suspect that it's not concurrent safe.

Does the race detector report anything?

@seankhliao seankhliao added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 10, 2024
@calmh
Copy link
Contributor Author

calmh commented Apr 10, 2024

This has been stress tested with the race detector quite extensively over the years, without anything popping up. I also hammered it as much as I could before filing as I had the same suspicion. :) As mentioned it never shows up on other platforms than darwin. And in any case the slice that gets written after passing through TLS should be somewhat different from the slice passed into TLS?

@seankhliao seankhliao removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 10, 2024
@randall77
Copy link
Contributor

This is probably a dup of #61060. In particular, it feels like the system write is returning an invalid result (one longer than the amount we asked it to write).
If we could find a way to reproduce, we could figure out how to fix, which may involve fixing Darwin somehow.

@calmh
Copy link
Contributor Author

calmh commented Apr 10, 2024

Ah yes, looks like the same issue, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime.
Projects
None yet
Development

No branches or pull requests

5 participants