Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: SIGBUS during runtime.init when running an executable stored on GCS via gcsfuse #48997

Open
tetsuok opened this issue Oct 15, 2021 · 12 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@tetsuok
Copy link
Contributor

tetsuok commented Oct 15, 2021

What version of Go are you using (go version)?

$ go version
go version go1.17.2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/tetsuo.kiso/.cache/go-build"
GOENV="/home/tetsuo.kiso/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/tetsuo.kiso/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/tetsuo.kiso/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.2"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1844747333=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Ran a Go executable stored on GCS via gcsfuse (v0.35.1) after clearing the cache on Linux.
The following is the repro step. Assumes there is a GCS bucket (my_bucket in this case).

$ mkdir gcs
$ gcsfuse -o rw -file-mode=777 -dir-mode=777 --foreground my_bucket ./gcs &
$ cat hello.go
package main

import (
        "fmt"
)

func main() {
        fmt.Println("hello, world!")
}
$ go build -gcflags=all='-N -l' -o ./gcs/hello hello.go

# Clear cache on Linux by unmounting and mounting the bucket.
$ fusermount -u gcs
$ gcsfuse -o rw -file-mode=777 -dir-mode=777 --foreground my_bucket ./gcs &

$ ./gcs/hello

What did you expect to see?

It runs successfully.

What did you see instead?

unexpected fault address 0x4d21dd
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0x4d21dd pc=0x44f0a9]

goroutine 1 [running, locked to thread]:
runtime.throw({0x4aac19, 0x0})
        /usr/local/go/src/runtime/panic.go:1198 +0x71 fp=0xc000056558 sp=0xc000056528 pc=0x432d11
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_unix.go:732 +0xb6 fp=0xc000056588 sp=0xc000056558 pc=0x446eb6
runtime.step({0x4d21dd, 0x7331, 0x0}, 0xc000056608, 0x44de3d, 0x0)
        /usr/local/go/src/runtime/symtab.go:1030 +0x29 fp=0xc0000565d0 sp=0xc000056588 pc=0x44f0a9
runtime.funcMaxSPDelta({0x4fc4f8, 0x51d740})
        /usr/local/go/src/runtime/symtab.go:973 +0xb4 fp=0xc000056628 sp=0xc0000565d0 pc=0x44ec54
runtime.init.6()
        /usr/local/go/src/runtime/preempt.go:321 +0x56 fp=0xc000056650 sp=0xc000056628 pc=0x433df6
runtime.doInit(0x51abc0)
        /usr/local/go/src/runtime/proc.go:6498 +0x127 fp=0xc000056780 sp=0xc000056650 pc=0x4421c7
runtime.main()
        /usr/local/go/src/runtime/proc.go:204 +0xf3 fp=0xc0000567e0 sp=0xc000056780 pc=0x435173
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc0000567e8 sp=0xc0000567e0 pc=0x45f721

Note that the crash doesn't happen if the file is being cached.

@randall77
Copy link
Contributor

What's at address 0x4d21dd? Can you show us the binary segments/sections?
Nothing looks out of place in the traceback. The code is trying to read some data from the Go symbol table. SIGBUS is a very weird error there - like the fuse adapter in Linux is failing to get a portion of the file somehow.

If after this error occurs, and you the cp the file out of the gcsfuse file system onto local disk and try to run it, does it still fail?

I ask because I'm curious if this is a problem writing the binary out during go build, or reading it back in when executing it.

It is possible one of the fuse limitations listed here is to blame. I think we mmap binaries when writing them, so the mmap portion might be relevant.

@tetsuok
Copy link
Contributor Author

tetsuok commented Oct 15, 2021

What's at address 0x4d21dd? Can you show us the binary segments/sections?

Sure. Here's an output of objdump at the address. Let me know if you need additional info!

$ objdump -d -s --start-address=0x4d21dd ./gcs/hello | head

./gcs/hello:     file format elf64-x86-64

Contents of section .gopclntab:
 4d21dd 020430 502f0100 a2015500 de040e02 0d ..0P/....U......
 4d21ed 040a02 0e040c04 0c020a00 002c0429 00 ............,.).
 4d21fd 020e40 6e3f0140 493f0a00 a201d001 00 ..@n?.@I?.......
 4d220d fc0418 02180209 0218020b 040e0209 1c ................
 4d221d 0a0532 0217250a 00000401 06020e01 b3 ..2..%..........
 4d222d 010205 00002602 4d010a02 49010a00 02 .....&.M...I....

If after this error occurs, and you the cp the file out of the gcsfuse file system onto local disk and try to run it, does it still fail?

No, it doesn't.

@tetsuok
Copy link
Contributor Author

tetsuok commented Oct 15, 2021

The same crash happens when we build the Go program in local disk, copy it to the gcsfuse file system, clear the cache, and then run the program on the gcsfuse file system:

$ go build -gcflags=all='-N -l' hello.go
$ gcsfuse -o rw -file-mode=777 -dir-mode=777 --foreground my_bucket ./gcs &
$ cp hello ./gcs/
# Clear cache on Linux by unmounting and mounting the bucket.
$ fusermount -u gcs
$ gcsfuse -o rw -file-mode=777 -dir-mode=777 --foreground my_bucket ./gcs &
$ ./gcs/hello
unexpected fault address 0x4d21dd
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0x4d21dd pc=0x44f0a9]

goroutine 1 [running, locked to thread]:
runtime.throw({0x4aac19, 0x0})
        /usr/local/go/src/runtime/panic.go:1198 +0x71 fp=0xc000056558 sp=0xc000056528 pc=0x432d11
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_unix.go:732 +0xb6 fp=0xc000056588 sp=0xc000056558 pc=0x446eb6
runtime.step({0x4d21dd, 0x7331, 0x0}, 0xc000056608, 0x44de3d, 0x0)
        /usr/local/go/src/runtime/symtab.go:1030 +0x29 fp=0xc0000565d0 sp=0xc000056588 pc=0x44f0a9
runtime.funcMaxSPDelta({0x4fc4f8, 0x51d740})
        /usr/local/go/src/runtime/symtab.go:973 +0xb4 fp=0xc000056628 sp=0xc0000565d0 pc=0x44ec54
runtime.init.6()
        /usr/local/go/src/runtime/preempt.go:321 +0x56 fp=0xc000056650 sp=0xc000056628 pc=0x433df6
runtime.doInit(0x51abc0)
        /usr/local/go/src/runtime/proc.go:6498 +0x127 fp=0xc000056780 sp=0xc000056650 pc=0x4421c7
runtime.main()
        /usr/local/go/src/runtime/proc.go:204 +0xf3 fp=0xc0000567e0 sp=0xc000056780 pc=0x435173
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1581 +0x1 fp=0xc0000567e8 sp=0xc0000567e0 pc=0x45f721

UPDATE: Corrected the stack trace after initially posted the comment.

@randall77
Copy link
Contributor

If after this error occurs, and you the cp the file out of the gcsfuse file system onto local disk and try to run it, does it still fail?

No, it doesn't.

That kind of eliminates the write-wasn't-flushed-correctly theory.

I'm going to guess that this is some sort of failure in the fuse filesystem attempting to get the pages of the binary.

@randall77
Copy link
Contributor

I was able to reproduce on a Google Cloud instance.

$ gcsfuse -o rw -file-mode=777 -dir-mode=777 --foreground issue48997 ./gcs &
$ go/bin/go build -gcflags=all='-N -l' -o ./gcs/hello hello.go
$ fusermount -u gcs
$ gcsfuse -o rw -file-mode=777 -dir-mode=777 --foreground issue48997 ./gcs &
$ ./gcs/hello
2021/10/15 06:42:56.786478 Not retrying Read("hello", 1634280018008310) after error of type *url.Error ("Get \"https://www.googleapis.com:443/download/storage/v1/b/issue48997/o/hello?alt=media&generation=1634280018008310\": net/http: request canceled"): &url.Error{Op:"Get", URL:"https://www.googleapis.com:443/download/storage/v1/b/issue48997/o/hello?alt=media&generation=1634280018008310", Err:(*errors.errorString)(0xc00009b3a0)}
2021/10/15 06:42:56.786556 ReadFile: operation canceled, fh.reader.ReadAt: readFull: Get "https://www.googleapis.com:443/download/storage/v1/b/issue48997/o/hello?alt=media&generation=1634280018008310": net/http: request canceled
fuse: 2021/10/15 06:42:56.786587 *fuseops.ReadFileOp error: operation canceled
2021/10/15 06:42:56.787348 Not retrying Read("hello", 1634280018008310) after error of type *url.Error ("Get \"https://www.googleapis.com:443/download/storage/v1/b/issue48997/o/hello?alt=media&generation=1634280018008310\": net/http: request canceled"): &url.Error{Op:"Get", URL:"https://www.googleapis.com:443/download/storage/v1/b/issue48997/o/hello?alt=media&generation=1634280018008310", Err:(*errors.errorString)(0xc00009b3a0)}
2021/10/15 06:42:56.787387 ReadFile: operation canceled, fh.reader.ReadAt: readFull: Get "https://www.googleapis.com:443/download/storage/v1/b/issue48997/o/hello?alt=media&generation=1634280018008310": net/http: request canceled
fuse: 2021/10/15 06:42:56.787405 *fuseops.ReadFileOp error: operation canceled
unexpected fault address 0x4d283d
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0x4d283d pc=0x44f0a9]

goroutine 1 [running, locked to thread]:
2021/10/15 06:42:56.790418 ReadFile: operation canceled, fh.reader.ReadAt: readFull: Get "https://www.googleapis.com:443/download/storage/v1/b/issue48997/o/hello?alt=media&generation=1634280018008310": net/http: request canceled
fuse: 2021/10/15 06:42:56.790449 *fuseops.ReadFileOp error: operation canceled
fatal error: unexpected signal during runtime execution
panic during panic
[signal SIGBUS: bus error code=0x2 addr=0x4d241a pc=0x44f0a9]

runtime stack:
2021/10/15 06:42:56.790615 ReadFile: operation canceled, fh.reader.ReadAt: readFull: Get "https://www.googleapis.com:443/download/storage/v1/b/issue48997/o/hello?alt=media&generation=1634280018008310": net/http: request canceled
fuse: 2021/10/15 06:42:56.790629 *fuseops.ReadFileOp error: operation canceled
fatal error: unexpected signal during runtime execution
stack trace unavailable

Those are a lot of scary messages from gcsfuse. I find it very likely they point to the cause of the failure.
Do you see such messages?

@tetsuok
Copy link
Contributor Author

tetsuok commented Oct 15, 2021

@randall77 Thanks for trying it out! I'm glad to hear that. Yes, I see very similar message.

@randall77
Copy link
Contributor

I tried putting other binaries in the gcs filesystem (e.g. /usr/bin/python3.7), and they work.
I tried making the Go binary locally and copying it into the gcs filesystem using cp, and that still crashed.
So I think there's something particular about Go binaries that is causing this failure. But I don't think this is a Go bug - I think gcsfuse can't handle them for some strange reason. Not sure why.

Executing other binaries gets warnings like these:

2021/10/15 06:58:11.672289 Not retrying StatObject("pyvenv.cfg/") after error of type *gcs.NotFoundError ("gcs.NotFoundError: googleapi: Error 404: No such object: issue48997/pyvenv.cfg/, notFound"): &gcs.NotFoundError{Err:(*googleapi.Error)(0xc00028e780)}

Which just makes it look like it was looking for supporting files and not finding them.

Executing Go binaries gets warnings like these:

2021/10/15 07:08:28.404458 Not retrying Read("hello", 1634280018008310) after error of type *url.Error ("Get \"https://www.googleapis.com:443/download/storage/v1/b/issue48997/o/hello?alt=media&generation=1634280018008310\": net/http: request canceled"): &url.Error{Op:"Get", URL:"https://www.googleapis.com:443/download/storage/v1/b/issue48997/o/hello?alt=media&generation=1634280018008310", Err:(*errors.errorString)(0xc00009b3a0)}
2021/10/15 07:08:28.404509 ReadFile: operation canceled, fh.reader.ReadAt: readFull: Get "https://www.googleapis.com:443/download/storage/v1/b/issue48997/o/hello?alt=media&generation=1634280018008310": net/http: request canceled
fuse: 2021/10/15 07:08:28.404526 *fuseops.ReadFileOp error: operation canceled

which look much more dire.

@tetsuok
Copy link
Contributor Author

tetsuok commented Oct 15, 2021

Thank you very much for your help!

I tried putting other binaries in the gcs filesystem (e.g. /usr/bin/python3.7), and they work.

Yes. I confirmed it as well.

Executing Go binaries gets warnings like these:
2021/10/15 07:08:28.404458 Not retrying Read("hello", 1634280018008310) after error of type *url.Error ("Get "https://www.googleapis.com:443/download/storage/v1/b/issue48997/o/hello?alt=media&generation=1634280018008310\": net/http: request canceled"): &url.Error{Op:"Get", URL:"https://www.googleapis.com:443/download/storage/v1/b/issue48997/o/hello?alt=media&generation=1634280018008310", Err:(*errors.errorString)(0xc00009b3a0)}
2021/10/15 07:08:28.404509 ReadFile: operation canceled, fh.reader.ReadAt: readFull: Get "https://www.googleapis.com:443/download/storage/v1/b/issue48997/o/hello?alt=media&generation=1634280018008310": net/http: request canceled
fuse: 2021/10/15 07:08:28.404526 *fuseops.ReadFileOp error: operation canceled
which look much more dire.

Sounds like I should file a new issue in gcsfuse, and then refer to this issue.

@cherrymui cherrymui added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Oct 18, 2021
@cherrymui
Copy link
Member

Thanks for the investigation.

Sounds like I should file a new issue in gcsfuse, and then refer to this issue.

Yeah, that would be great. Thanks.

@tetsuok
Copy link
Contributor Author

tetsuok commented Oct 19, 2021

@cherrymui Sure. I'll file a issue in gcsfuse shortly.

I confirmed that

  • this issue is caused by the asynchronous preemption (runtime: non-cooperative goroutine preemption #24543) introduced in go1.14. The program did not crash when the asynchronous preemption was turned off by setting GODEBUG=asyncpreemptoff=1. Also, when I made the Go binary locally with go1.13.15, and copying it into the gcs filesystem using cp, the program did not crash.
  • gcsfuse cancels the HTTP request to GCS as soon as the interrupt request is received by the kernel. (I suspect the interrupt request is caused by the SIGURG signal generated by the go runtime.)

@DemiMarie
Copy link

Could Go register all of its handlers with SA_RESTART, if it does not already?

@ianlancetaylor
Copy link
Contributor

Go does already register all signal handlers with SA_RESTART.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@seankhliao seankhliao added this to the Unplanned milestone Aug 20, 2022
@seankhliao seankhliao removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 20, 2022
@heschi heschi added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: Triage Backlog
Development

No branches or pull requests

8 participants