Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: BoltDB with SSA crashes with: unexpected fault address 0xc420a00000 #16772

Closed
ebusto opened this issue Aug 17, 2016 · 17 comments
Closed

Comments

@ebusto
Copy link

ebusto commented Aug 17, 2016

  1. What version of Go are you using (go version)?
    go version go1.7 darwin/amd64
    go version go1.7 linux/amd64
  2. What operating system and processor architecture are you using (go env)?
    GOARCH="amd64"
    GOBIN=""
    GOEXE=""
    GOHOSTARCH="amd64"
    GOHOSTOS="darwin"
    GOOS="darwin"
    GOPATH=""
    GORACE=""
    GOROOT="/usr/local/Cellar/go/1.7/libexec"
    GOTOOLDIR="/usr/local/Cellar/go/1.7/libexec/pkg/tool/darwin_amd64"
    CC="clang"
    GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/lm/g9hbhwd10gj8g_t64zzm63pm0000gn/T/go-build065044530=/tmp/go-build -gno-record-gcc-switches -fno-common"
    CXX="clang++"
    CGO_ENABLED="1"

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH=""
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build785786852=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"

  1. What did you do?
    When creating a number of nested buckets, my app will intermittently crash. Here is the relevant part of the stack trace. The location of the failure is quite consistent under both Mac OS and Linux.

unexpected fault address 0xc420a00000
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0xc420a00000 pc=0xef856]

goroutine 7 [running]:
runtime.throw(0x27f950, 0x5)
/usr/local/Cellar/go/1.7/libexec/src/runtime/panic.go:566 +0x95 fp=0xc42004f210 sp=0xc42004f1f0
runtime.sigpanic()
/usr/local/Cellar/go/1.7/libexec/src/runtime/sigpanic_unix.go:21 +0x1d0 fp=0xc42004f268 sp=0xc42004f210
github.com/boltdb/bolt.(_node).write(0xc420098070, 0xc4209ffff0)
/Users/ebusto/Perforce/mq-watch/src/github.com/boltdb/bolt/node.go:205 +0x86 fp=0xc42004f3a8 sp=0xc42004f268
github.com/boltdb/bolt.(_Bucket).write(0xc42004f4c0, 0xc420098070, 0x9, 0x0)
/Users/ebusto/Perforce/mq-watch/src/github.com/boltdb/bolt/bucket.go:598 +0xb1 fp=0xc42004f408 sp=0xc42004f3a8
github.com/boltdb/bolt.(_Bucket).CreateBucket(0xc4200bec40, 0xc420910600, 0x9, 0x9, 0x9, 0x9, 0x0)
/Users/ebusto/Perforce/mq-watch/src/github.com/boltdb/bolt/bucket.go:181 +0x2bf fp=0xc42004f508 sp=0xc42004f408
pkg/journal.(_TX).set(0xc420152028, 0xc4200bec40, 0xc4209105f6, 0x9, 0x9, 0x237f80, 0xc4205a85a0, 0x15)

  1. What did you expect to see?
    Under Go 1.6, and Go 1.7 with SSA disabled (go build -gcflags="-ssa=0"), Bolt will happily create buckets without fail.
  2. What did you see instead?
    SIGBUS on Mac OS, and SIGSEGV on Linux.
@ebusto
Copy link
Author

ebusto commented Aug 17, 2016

A helpful fellow in the Gopher slack channel, @zeebo, produced this test case.

@dr2chase
Copy link
Contributor

Looking at this. I did notice an appearance of the u-word on github.com/boltdb/bolt/node.go:205

@zeebo
Copy link
Contributor

zeebo commented Aug 17, 2016

Changing the depth from rand.Intn(10) to 100 seems to cause it to more consistently and quickly crash for me. Since I'm basically shooting blind, YMMV. :)

@mdempsky mdempsky changed the title BoltDB with SSA crashes with: unexpected fault address 0xc420a00000 cmd/compile: BoltDB with SSA crashes with: unexpected fault address 0xc420a00000 Aug 17, 2016
@zeebo
Copy link
Contributor

zeebo commented Aug 17, 2016

After playing around for a bit, I believe I've found the problem.

BoltDB allocates byte slices for some structs, and uses a marker field on the end of the page type to indicate where the key/value pairs for that page start being stored. In the case that the page has no key/value pairs, it allocates less space than for the entire struct. Later, we construct a slice pointing at that last marker field. Since there were no key/value pairs, the slice is unused, but under the SSA code, something attempts to do a read anyway. When the allocated memory happens to end right at some unmapped memory or something, a SIGBUS happens.

It seems like the main take away is that before it wasn't a problem to create a slice pointing past some allocation, but now it is. No idea if that's intended behavior, or whatever.

@mdempsky
Copy link
Member

mdempsky commented Aug 17, 2016

Can you provide a minimal test case, or at least a minimal representative sketch of what you think the problem is? (I think I understand from your description, but it'll be less ambiguous if we have actual Go code to discuss.)

@zeebo
Copy link
Contributor

zeebo commented Aug 17, 2016

The playground link above is fairly minimal. Most of it is to exercise making as many page writes as possible. The slice is constructed and read from every time, but it will only crash when the bytes after the allocation will cause a fault. If there's a debug flag that puts guard pages around memory allocations that would make this much more deterministic.

I'll do my best with an ascii diagram.

+---------------------------------------+
|Full allocation                        |
+--------------------+------------------+------|
|Some bucket header  |The page struct   |p.ptr |
|--------------------|------------------|------|
|blah fields here    |fields here       |      |
+--------------------+------------------+------+

We construct a slice who's backing array starts at p.ptr, which is outside of the allocation. The slice ends up unused because the for loop has zero iterations, but for some reason, the SSA code dereferences the first byte of the slice (I think. asm output for both here: https://play.golang.org/p/QNYzVjU1ls)

I've added code to Bolt to avoid constructing these slices when it's known the size of the slice will be zero and the crashes go away.

@ulikunitz
Copy link
Contributor

This program demonstrates what the problem is:

package main

import "unsafe"

func main() {
    b := (*[10]byte)(unsafe.Pointer(uintptr(0xdeadbeef)))[0:3]
    _ = b
}

@mdempsky
Copy link
Member

I see, thanks! (Sorry, I thought the playground link was just a sample application that demonstrated an issue within BoltDB itself.)

So when slicing a pointer-to-array like that, we're supposed to check for and panic on nil. We used to implement that with an explicit nil comparison; and if we found nil, we would conditional store to it to trigger a SIGSEGV.

It looks like with SSA we've switched to a more implicit nil check by unconditionally trying to load from the pointed-to address, which causes the SIGSEGV now also on non-nil-but-still-invalid pointers.

/cc @randall77

@ulikunitz
Copy link
Contributor

@mdempsky: Yes, that is my understanding as well.

@randall77
Copy link
Contributor

#16727 is essentially the same problem. So I guess the question is, can we do a nil check by doing a test load from the pointer to-be-nilchecked? This is a change in behavior since 1.6, but as you can only trigger it by using unsafe, maybe that's ok. But nevertheless we might want to consider going back to the old way. Opinions?

@zeebo
Copy link
Contributor

zeebo commented Aug 17, 2016

Upon reading the rules outlined on https://golang.org/pkg/unsafe/ section 3 for pointer arithemetic states that: "the result must continue to point into the original allocated object. Unlike in C, it is not valid to advance a pointer just beyond the end of its original allocation".

That section is specifically talking about doing pointer arithmetic with unintptr conversions, but it seems that creating these slice headers is implicitly creating a pointer that is not pointing into any allocated object, and so I'd argue that the section still applies and code is invalid.

@mdempsky
Copy link
Member

@zeebo That section starts with "If p points into an allocated object", which is referring to GC managed Go variables. It's meant to allow pointer arithmetic within a variable, but not to limit us from adopting a moving garbage collector in the future. (For example, you can't assume the distance between two different variables will remain the same.) I don't think it's applicable here because in this case p points to unmapped memory.

That said, I do agree that this appears to be unspecified behavior. I think the first option we should consider is how hard it is for applications to avoid relying on it. It seems like slicing unmapped memory should be a relatively uncommon operation, whereas a simpler nil check benefits much more code.

Could BoltDB be reasonably modified to avoid creating these problematic slices?

@dgryski
Copy link
Contributor

dgryski commented Aug 18, 2016

@benbjohnson

@benbjohnson
Copy link

@mdempsky I was chatting with @zeebo yesterday and I'm going to make a patch today. I'm happy to fix it on my side. Thanks for all the investigation everybody! 👍

@zeebo
Copy link
Contributor

zeebo commented Aug 18, 2016

In this case, we are starting with GC managed Go variables. A byte slice is created and we use unsafe to create a pointer to a struct inside of it, but the struct extends past the end of the allocation. We then use field access to essentially do pointer arithmetic to find where we should create the slice, and it is created past the end of the allocation.

On Aug 18, 2016, at 1:10 AM, Matthew Dempsky notifications@github.com wrote:

@zeebo That section starts with "If p points into an allocated object", which is referring to GC managed Go variables. It's meant to allow pointer arithmetic within a variable, but not to limit us from adopting a moving garbage collector in the future. (For example, you can't assume the distance between two different variables will remain the same.) I don't think it's applicable here because in this case p points to unmapped memory.

That said, I do agree that this appears to be unspecified behavior. I think the first option we should consider is how hard it is for applications to avoid relying on it. It seems like slicing unmapped memory should be a relatively uncommon operation, whereas a simpler nil check benefits much more code.

Could BoltDB be reasonably modified to avoid creating these problematic slices?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

benbjohnson added a commit to benbjohnson/bolt that referenced this issue Aug 18, 2016
This commit fixes a bug where page end-of-header pointers were being
converted to byte slices even when the pointer did not point to
allocated memory. This occurs with pages that have a `page.count`
of zero.

Note: This was not an issue in Go 1.6 but the new Go 1.7 SSA backend
handles `nil` checks differently.

See golang/go#16772
@benbjohnson
Copy link

I posted a fix for BoltDB here: boltdb/bolt#584

It now checks for pages which don't extend past the end of the header before building the slice header.

@randall77
Copy link
Contributor

I'm going to close this issue. It is fixed on the BoltDB side.

samdfonseca pushed a commit to samdfonseca/flipadelphia that referenced this issue Dec 12, 2016
@golang golang locked and limited conversation to collaborators Aug 23, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants