-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: BoltDB with SSA crashes with: unexpected fault address 0xc420a00000 #16772
Comments
Looking at this. I did notice an appearance of the u-word on github.com/boltdb/bolt/node.go:205 |
Changing the depth from |
After playing around for a bit, I believe I've found the problem. BoltDB allocates byte slices for some structs, and uses a marker field on the end of the page type to indicate where the key/value pairs for that page start being stored. In the case that the page has no key/value pairs, it allocates less space than for the entire struct. Later, we construct a slice pointing at that last marker field. Since there were no key/value pairs, the slice is unused, but under the SSA code, something attempts to do a read anyway. When the allocated memory happens to end right at some unmapped memory or something, a SIGBUS happens. It seems like the main take away is that before it wasn't a problem to create a slice pointing past some allocation, but now it is. No idea if that's intended behavior, or whatever. |
Can you provide a minimal test case, or at least a minimal representative sketch of what you think the problem is? (I think I understand from your description, but it'll be less ambiguous if we have actual Go code to discuss.) |
The playground link above is fairly minimal. Most of it is to exercise making as many page writes as possible. The slice is constructed and read from every time, but it will only crash when the bytes after the allocation will cause a fault. If there's a debug flag that puts guard pages around memory allocations that would make this much more deterministic. I'll do my best with an ascii diagram.
We construct a slice who's backing array starts at p.ptr, which is outside of the allocation. The slice ends up unused because the for loop has zero iterations, but for some reason, the SSA code dereferences the first byte of the slice (I think. asm output for both here: https://play.golang.org/p/QNYzVjU1ls) I've added code to Bolt to avoid constructing these slices when it's known the size of the slice will be zero and the crashes go away. |
This program demonstrates what the problem is: package main
import "unsafe"
func main() {
b := (*[10]byte)(unsafe.Pointer(uintptr(0xdeadbeef)))[0:3]
_ = b
} |
I see, thanks! (Sorry, I thought the playground link was just a sample application that demonstrated an issue within BoltDB itself.) So when slicing a pointer-to-array like that, we're supposed to check for and panic on nil. We used to implement that with an explicit nil comparison; and if we found nil, we would conditional store to it to trigger a SIGSEGV. It looks like with SSA we've switched to a more implicit nil check by unconditionally trying to load from the pointed-to address, which causes the SIGSEGV now also on non-nil-but-still-invalid pointers. /cc @randall77 |
@mdempsky: Yes, that is my understanding as well. |
#16727 is essentially the same problem. So I guess the question is, can we do a nil check by doing a test load from the pointer to-be-nilchecked? This is a change in behavior since 1.6, but as you can only trigger it by using unsafe, maybe that's ok. But nevertheless we might want to consider going back to the old way. Opinions? |
Upon reading the rules outlined on https://golang.org/pkg/unsafe/ section 3 for pointer arithemetic states that: "the result must continue to point into the original allocated object. Unlike in C, it is not valid to advance a pointer just beyond the end of its original allocation". That section is specifically talking about doing pointer arithmetic with unintptr conversions, but it seems that creating these slice headers is implicitly creating a pointer that is not pointing into any allocated object, and so I'd argue that the section still applies and code is invalid. |
@zeebo That section starts with "If p points into an allocated object", which is referring to GC managed Go variables. It's meant to allow pointer arithmetic within a variable, but not to limit us from adopting a moving garbage collector in the future. (For example, you can't assume the distance between two different variables will remain the same.) I don't think it's applicable here because in this case p points to unmapped memory. That said, I do agree that this appears to be unspecified behavior. I think the first option we should consider is how hard it is for applications to avoid relying on it. It seems like slicing unmapped memory should be a relatively uncommon operation, whereas a simpler nil check benefits much more code. Could BoltDB be reasonably modified to avoid creating these problematic slices? |
In this case, we are starting with GC managed Go variables. A byte slice is created and we use unsafe to create a pointer to a struct inside of it, but the struct extends past the end of the allocation. We then use field access to essentially do pointer arithmetic to find where we should create the slice, and it is created past the end of the allocation.
|
This commit fixes a bug where page end-of-header pointers were being converted to byte slices even when the pointer did not point to allocated memory. This occurs with pages that have a `page.count` of zero. Note: This was not an issue in Go 1.6 but the new Go 1.7 SSA backend handles `nil` checks differently. See golang/go#16772
I posted a fix for BoltDB here: boltdb/bolt#584 It now checks for pages which don't extend past the end of the header before building the slice header. |
I'm going to close this issue. It is fixed on the BoltDB side. |
go version
)?go version go1.7 darwin/amd64
go version go1.7 linux/amd64
go env
)?GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.7/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.7/libexec/pkg/tool/darwin_amd64"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/lm/g9hbhwd10gj8g_t64zzm63pm0000gn/T/go-build065044530=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build785786852=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
When creating a number of nested buckets, my app will intermittently crash. Here is the relevant part of the stack trace. The location of the failure is quite consistent under both Mac OS and Linux.
unexpected fault address 0xc420a00000
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0xc420a00000 pc=0xef856]
goroutine 7 [running]:
runtime.throw(0x27f950, 0x5)
/usr/local/Cellar/go/1.7/libexec/src/runtime/panic.go:566 +0x95 fp=0xc42004f210 sp=0xc42004f1f0
runtime.sigpanic()
/usr/local/Cellar/go/1.7/libexec/src/runtime/sigpanic_unix.go:21 +0x1d0 fp=0xc42004f268 sp=0xc42004f210
github.com/boltdb/bolt.(_node).write(0xc420098070, 0xc4209ffff0)
/Users/ebusto/Perforce/mq-watch/src/github.com/boltdb/bolt/node.go:205 +0x86 fp=0xc42004f3a8 sp=0xc42004f268
github.com/boltdb/bolt.(_Bucket).write(0xc42004f4c0, 0xc420098070, 0x9, 0x0)
/Users/ebusto/Perforce/mq-watch/src/github.com/boltdb/bolt/bucket.go:598 +0xb1 fp=0xc42004f408 sp=0xc42004f3a8
github.com/boltdb/bolt.(_Bucket).CreateBucket(0xc4200bec40, 0xc420910600, 0x9, 0x9, 0x9, 0x9, 0x0)
/Users/ebusto/Perforce/mq-watch/src/github.com/boltdb/bolt/bucket.go:181 +0x2bf fp=0xc42004f508 sp=0xc42004f408
pkg/journal.(_TX).set(0xc420152028, 0xc4200bec40, 0xc4209105f6, 0x9, 0x9, 0x237f80, 0xc4205a85a0, 0x15)
Under Go 1.6, and Go 1.7 with SSA disabled (go build -gcflags="-ssa=0"), Bolt will happily create buckets without fail.
SIGBUS on Mac OS, and SIGSEGV on Linux.
The text was updated successfully, but these errors were encountered: