Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: segfault during conservative scan of asynchronously preempted goroutine #39499

Open
jamesl33 opened this issue Jun 10, 2020 · 7 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@jamesl33
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.14.1 linux/amd64

Does this issue reproduce with the latest release?

Yes (but not consistently) - We have reproductions up to 1.14.3 (and have just
updated to 1.14.4 but no tests have been run as of yet).

What operating system and processor architecture are you using?

CentOS 7 amd64 - E5-2630 v2 (24 vCPU)

What issue are we seeing?

From a brief look at the stacktrace and runtime it looks like we are currently
seeing a segfault during the conservative scan of an asynchronously preempted
goroutine. While we have only seen this issue since updating to 1.14.1 (we
skipped 1.14) we do rely on a couple of libraries that make use of 'unsafe' so
we wouldn't be surprised if this was due to the misuse of 'unsafe' rather than
an issue with the runtime itself.

I've included the full stacktrace below along with a snippet from the same
backtrace displaying what I've described above. Any help debugging this issue
would be greatly appreciated whether that be tips on which GODEBUG settings to
use so that we can get more information about why this is happening or some
steps we could take to debug this issue/to provide you with extra information.

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x2 addr=0x7f2f9047b8ef pc=0x42f616]

runtime stack:
runtime.throw(0xbfd246, 0x2a)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/panic.go:1114 +0x72
runtime.sigpanic()
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/signal_unix.go:679 +0x46a
runtime.(*mspan).isFree(...)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mbitmap.go:255
runtime.scanConservative(0xc002b9fbd8, 0x178, 0x0, 0xc00004a698, 0x7f2f65cb3348)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:1368 +0xf6
runtime.scanframeworker(0x7f2f65cb3238, 0x7f2f65cb3348, 0xc00004a698)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:875 +0x29d
runtime.scanstack.func1(0x7f2f65cb3238, 0x0, 0x13c7920)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:736 +0x3d
runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc0006f1500, 0x0, 0x0, 0x7fffffff, 0x7f2f65cb3330, 0x0, 0x0, ...)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/traceback.go:334 +0x110e
runtime.scanstack(0xc0006f1500, 0xc00004a698)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:739 +0x15e
runtime.markroot.func1()
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:226 +0xbf
runtime.markroot(0xc00004a698, 0x153)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:199 +0x2f3
runtime.gcDrainN(0xc00004a698, 0x10000, 0x10000)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:1119 +0xff
runtime.gcAssistAlloc1(0xc000c84d80, 0x10000)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:531 +0xf3
runtime.gcAssistAlloc.func1()
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/mgcmark.go:442 +0x33
runtime.systemstack(0x0)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.1/go/src/runtime/proc.go:1041

stack_trace.txt

@ALTree ALTree changed the title Segfault during conservative scan of asynchronously preempted goroutine runtime: segfault during conservative scan of asynchronously preempted goroutine Jun 10, 2020
@ALTree ALTree added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 10, 2020
@ALTree
Copy link
Member

ALTree commented Jun 10, 2020

cc @aclements @mknyszek

@golang golang deleted a comment Jun 10, 2020
@randall77
Copy link
Contributor

This does look like corruption of internal runtime data structures.

@aclements
Copy link
Member

You mentioned possible misuse of unsafe. Could you compile with -gcflags=all=-d=checkptr (note that this is implied if you're already using -race or -msan)?

Can you reproduce this with GOTRACEBACK=system or GOTRACEBACK=crash (the latter is even more verbose, but will send around lots of signals when it crashes, which can in principle make have bad effects)? These will include SP values in the traceback, which should let us match up exactly which call frame is being scanned and where it's stopped.

@jamesl33
Copy link
Author

Thank you for the information. I've got a build running with the requested gcflags and have got the GOTRACEBACK environment variable set to crash. I'll update the issue once I've been able to reproduce the segfault.

@jamesl33
Copy link
Author

jamesl33 commented Jul 6, 2020

We've been able to reproduce this issue on 1.14.4 with the gcflags set as requested, however, our Jenkins environment didn't have the GOTRACEBACK variable set as we'd intended (we run multiple Jenkins instances and the reproduction is not running on our own). I've re-setup the environment on the performance machine and will continue running until I can provide an extended stack trace and hopefully a core dump.

For the time being I've attached the complete stack traces for reproductions that we have so far. stack_trace-1.txt is noteworthy because we appear fail in a different location in the runtime.

stack_trace-1.txt, stack_trace-2.txt, stack_trace-3.txt

@jamesl33
Copy link
Author

jamesl33 commented Jul 8, 2020

I've got a reproduction of the segfault with the requested gcflags and the GOTRACEBACK environment variable set to include runtime created goroutines, however, it does appear that, as in stack_trace-1.txt we have failed in a different location than in the other stack traces. If there's anything else you require to debug the issue please let me know and I'll do my best to provide it.

I'll leave the existing environment setup and update the issue whenever we come across any more reproductions. It's also worth noting that we have since downgraded a branch to Go 1.13.12 (and continued running testing) and we haven't yet encountered this issue.

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x2 addr=0x7f940c242512 pc=0x42fa4a]

runtime stack:
runtime.throw(0xc26236, 0x2a)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/panic.go:1116 +0x72 fp=0x7f93d37fbe68 sp=0x7f93d37fbe38 pc=0x443502
runtime.sigpanic()
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/signal_unix.go:679 +0x46a fp=0x7f93d37fbe98 sp=0x7f93d37fbe68 pc=0x459cfa
runtime.markBits.setMarked(...)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mbitmap.go:295
runtime.greyobject(0xc0011a5100, 0xc0031efcd8, 0x1baf0, 0x7f93e00b5db8, 0xc000031698, 0x99b10)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:1439 +0x22a fp=0x7f93d37fbec8 sp=0x7f93d37fbe98 pc=0x42fa4a
runtime.scanConservative(0xc0031efcd8, 0x20068, 0x0, 0xc000031698, 0x7f93d37fc350)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:1374 +0x15d fp=0x7f93d37fbf18 sp=0x7f93d37fbec8 pc=0x42f69d
runtime.scanframeworker(0x7f93d37fc240, 0x7f93d37fc350, 0xc000031698)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:875 +0x29d fp=0x7f93d37fbfa8 sp=0x7f93d37fbf18 pc=0x42e99d
runtime.scanstack.func1(0x7f93d37fc240, 0x0, 0x1413ec0)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:736 +0x3d fp=0x7f93d37fbfd0 sp=0x7f93d37fbfa8 pc=0x46eebd
runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc00011ad80, 0x0, 0x0, 0x7fffffff, 0x7f93d37fc338, 0x0, 0x0, ...)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/traceback.go:334 +0x110e fp=0x7f93d37fc2a8 sp=0x7f93d37fbfd0 pc=0x467d7e
runtime.scanstack(0xc00011ad80, 0xc000031698)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:739 +0x15e fp=0x7f93d37fc4b0 sp=0x7f93d37fc2a8 pc=0x42e07e
runtime.markroot.func1()
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:226 +0xbf fp=0x7f93d37fc500 sp=0x7f93d37fc4b0 pc=0x46ed5f
runtime.markroot(0xc000031698, 0xc3)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:199 +0x2f3 fp=0x7f93d37fc580 sp=0x7f93d37fc500 pc=0x42d113
runtime.gcDrain(0xc000031698, 0x3)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:999 +0x107 fp=0x7f93d37fc5d8 sp=0x7f93d37fc580 pc=0x42ead7
runtime.gcBgMarkWorker.func2()
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgc.go:1940 +0x80 fp=0x7f93d37fc618 sp=0x7f93d37fc5d8 pc=0x46eb70
runtime.systemstack(0x0)
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/asm_amd64.s:370 +0x66 fp=0x7f93d37fc620 sp=0x7f93d37fc618 pc=0x471816
runtime.mstart()
	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/proc.go:1041 fp=0x7f93d37fc628 sp=0x7f93d37fc620 pc=0x4482e0

stack_trace-4.txt

@jamesl33
Copy link
Author

Just giving an update, we've just updated to Go 1.15 and will continue running our performance testing to see if we can reproduce the issue further (the performance cluster we were reproducing on recently had some issues/downtime). With that said, is there anything else we can provide to help debug the issue?

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@seankhliao seankhliao added this to the Unplanned milestone Aug 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: Triage Backlog
Development

No branches or pull requests

6 participants