Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: segfault in _tsan_enter_func from systemstack when called in race mode from a generic function #60439

Closed
mknyszek opened this issue May 25, 2023 · 8 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@mknyszek
Copy link
Contributor

mknyszek commented May 25, 2023

This came when trying to write some runtime test code that called systemstack from a generic function. It's possible this is just not allowed for some reason, but it was surprising to me.

Here's a patch for reproducing the issue.

diff --git a/src/runtime/export_test.go b/src/runtime/export_test.go
index 22039a495c..6220eb59bc 100644
--- a/src/runtime/export_test.go
+++ b/src/runtime/export_test.go
@@ -1923,3 +1923,11 @@ func SetPinnerLeakPanic(f func()) {
 func GetPinnerLeakPanic() func() {
        return pinnerLeakPanic
 }
+
+var testUintptr uintptr
+
+func MyGenericFunc[T any]() {
+       systemstack(func() {
+               testUintptr = 4
+       })
+}
diff --git a/src/runtime/gc_test.go b/src/runtime/gc_test.go
index bd01e36103..0c21cd43cd 100644
--- a/src/runtime/gc_test.go
+++ b/src/runtime/gc_test.go
@@ -929,3 +929,7 @@ func TestMemoryLimitNoGCPercent(t *testing.T) {
                t.Fatalf("expected %q, but got %q", want, got)
        }
 }
+
+func TestMyGenericFunc(t *testing.T) {
+       runtime.MyGenericFunc[int]()
+}

I can produce the failure by running this command, with the above patch applied.

go test -run="MyGenericFunc" -race runtime

Here's the failure:

SIGSEGV: segmentation violation
PC=0x41f700 m=5 sigcode=1

goroutine 0 [idle]:

goroutine 6 [running]:
runtime.systemstack_switch()
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/asm_amd64.s:474 +0x8 fp=0xc00011ae58 sp=0xc00011ae48 pc=0x4b32c8
runtime.MyGenericFunc[...]()
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/export_test.go:1930 +0x47 fp=0xc00011ae80 sp=0xc00011ae58 pc=0x9a4f87
runtime_test.TestMyGenericFunc(0x0?)
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/gc_test.go:934 +0x25 fp=0xc00011ae98 sp=0xc00011ae80 pc=0x8d4e85
testing.tRunner(0xc0001c6340, 0xa99ee8)
	/usr/local/google/home/mknyszek/work/go-2/src/testing/testing.go:1595 +0x239 fp=0xc00011afb0 sp=0xc00011ae98 pc=0x591ef9
testing.(*T).Run.func1()
	/usr/local/google/home/mknyszek/work/go-2/src/testing/testing.go:1648 +0x45 fp=0xc00011afe0 sp=0xc00011afb0 pc=0x5939e5
runtime.goexit()
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00011afe8 sp=0xc00011afe0 pc=0x4b52a1
created by testing.(*T).Run in goroutine 1
	/usr/local/google/home/mknyszek/work/go-2/src/testing/testing.go:1648 +0x82b

goroutine 1 [chan receive, locked to thread]:
runtime.gopark(0x0?, 0x0?, 0x50?, 0x21?, 0x18?)
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/proc.go:398 +0xce fp=0xc00018f5f8 sp=0xc00018f5d8 pc=0x47720e
runtime.chanrecv(0xc0000dc1c0, 0xc00018f6df, 0x1)
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/chan.go:583 +0x385 fp=0xc00018f670 sp=0xc00018f5f8 pc=0x43fc25
runtime.chanrecv1(0xa66f20?, 0x9de1c0?)
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/chan.go:442 +0x12 fp=0xc00018f698 sp=0xc00018f670 pc=0x43f872
testing.(*T).Run(0xc0001c61a0, {0xa741f6, 0x11}, 0xa99ee8)
	/usr/local/google/home/mknyszek/work/go-2/src/testing/testing.go:1649 +0x856 fp=0xc00018f7b8 sp=0xc00018f698 pc=0x593776
testing.runTests.func1(0x0?)
	/usr/local/google/home/mknyszek/work/go-2/src/testing/testing.go:2054 +0x85 fp=0xc00018f810 sp=0xc00018f7b8 pc=0x5972a5
testing.tRunner(0xc0001c61a0, 0xc00018fa58)
	/usr/local/google/home/mknyszek/work/go-2/src/testing/testing.go:1595 +0x239 fp=0xc00018f928 sp=0xc00018f810 pc=0x591ef9
testing.runTests(0xc000182780?, {0xddeb00, 0x1ba, 0x1ba}, {0x4b82e9?, 0xc00018fb68?, 0xde2c60?})
	/usr/local/google/home/mknyszek/work/go-2/src/testing/testing.go:2052 +0x897 fp=0xc00018fa88 sp=0xc00018f928 pc=0x597117
testing.(*M).Run(0xc000182780)
	/usr/local/google/home/mknyszek/work/go-2/src/testing/testing.go:1925 +0xb58 fp=0xc00018fe08 sp=0xc00018fa88 pc=0x594938
runtime_test.TestMain(0xc8912813b0409898?)
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/crash_test.go:27 +0x2f fp=0xc00018fe58 sp=0xc00018fe08 pc=0x8bb2cf
main.main()
	_testmain.go:1539 +0x308 fp=0xc00018ff40 sp=0xc00018fe58 pc=0x9b04e8
runtime.main()
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/proc.go:267 +0x2bb fp=0xc00018ffe0 sp=0xc00018ff40 pc=0x476d9b
runtime.goexit()
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00018ffe8 sp=0xc00018ffe0 pc=0x4b52a1

goroutine 2 [force gc (idle)]:
runtime.gopark(0xdb3e00?, 0xde3640?, 0x0?, 0x0?, 0x0?)
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/proc.go:398 +0xce fp=0xc0000b07a8 sp=0xc0000b0788 pc=0x47720e
runtime.goparkunlock(...)
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/proc.go:404
runtime.forcegchelper()
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/proc.go:322 +0xb3 fp=0xc0000b07e0 sp=0xc0000b07a8 pc=0x477073
runtime.goexit()
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000b07e8 sp=0xc0000b07e0 pc=0x4b52a1
created by runtime.init.6 in goroutine 1
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/proc.go:310 +0x1a

goroutine 3 [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/proc.go:398 +0xce fp=0xc0000b0f78 sp=0xc0000b0f58 pc=0x47720e
runtime.goparkunlock(...)
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/proc.go:404
1875 }
runtime.bgsweep(0x0?)
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/mgcsweep.go:280 +0x94 fp=0xc0000b0fc8 sp=0xc0000b0f78 pc=0x460074
runtime.gcenable.func1()
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/mgc.go:200 +0x25 fp=0xc0000b0fe0 sp=0xc0000b0fc8 pc=0x454c45
runtime.goexit()
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000b0fe8 sp=0xc0000b0fe0 pc=0x4b52a1
created by runtime.gcenable in goroutine 1
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/mgc.go:200 +0x66

goroutine 4 [GC scavenge wait]:
runtime.gopark(0xc0000dc000?, 0xb483a0?, 0x1?, 0x0?, 0xc0000071e0?)
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/proc.go:398 +0xce fp=0xc0000c6f70 sp=0xc0000c6f50 pc=0x47720e
runtime.goparkunlock(...)
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/proc.go:404
runtime.(*scavengerState).park(0xde2ce0)
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc0000c6fa0 sp=0xc0000c6f70 pc=0x45d8c9
runtime.bgscavenge(0x0?)
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc0000c6fc8 sp=0xc0000c6fa0 pc=0x45de3c
runtime.gcenable.func2()
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/mgc.go:201 +0x25 fp=0xc0000c6fe0 sp=0xc0000c6fc8 pc=0x454be5
runtime.goexit()
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000c6fe8 sp=0xc0000c6fe0 pc=0x4b52a1
created by runtime.gcenable in goroutine 1
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/mgc.go:201 +0xa5

goroutine 5 [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/proc.go:398 +0xce fp=0xc0000c0e28 sp=0xc0000c0e08 pc=0x47720e
runtime.runfinq()
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/mfinal.go:193 +0x13b fp=0xc0000c0fe0 sp=0xc0000c0e28 pc=0x453c3b
runtime.goexit()
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000c0fe8 sp=0xc0000c0fe0 pc=0x4b52a1
created by runtime.createfing in goroutine 1
	/usr/local/google/home/mknyszek/work/go-2/src/runtime/mfinal.go:163 +0x3d

rax    0x41f700
rbx    0xc00011ae60
rcx    0xc00011ae60
rdx    0xc00011ae60
rdi    0x0
rsi    0x4b332a
rbp    0x7f9deed86e28
rsp    0x7f9deed86e08
r8     0x21800038c7f4
r9     0x0
r10    0xc0001021a0
r11    0x4b332a
r12    0x7f9deed86e10
r13    0xc000100000
r14    0xc0001021a0
r15    0x8
rip    0x41f700
rflags 0x10202
cs     0x33
fs     0x0
gs     0x0
FAIL	runtime	0.023s
FAIL

When I poke at the faulting thread in gdb, it segfaults when calling _tsan_enter_func at entry to the function passed to systemstack.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label May 25, 2023
@mknyszek
Copy link
Contributor Author

CC @golang/compiler

I marked this as cmd/compile because it seems like it might be a race instrumentation issue? I'm not familiar with the race detector's invariants, so I'm not quite sure what's going on here.

@mknyszek mknyszek added this to the Backlog milestone May 25, 2023
@mknyszek
Copy link
Contributor Author

The test notably does not fail if:

  • I make MyGenericFunc not generic.
  • I make the function passed to systemstack do nothing. (I suspect it's just optimized away.)

@mdempsky
Copy link
Member

The problem here is when runtime.MyGenericFunc[int] is instantiated in package runtime_test, the closure gets duplicated in the runtime_test compilation unit. However, the compiler fails to recognize that it originally came from package runtime, and thus shouldn't be race instrumented.

I can look into this, but it only affects the runtime, so it seems relatively low priority unless it causes problems for testing.

You should be able to workaround the issue for now by making sure any instantiations happen within package runtime.

@mknyszek
Copy link
Contributor Author

+1 to very low priority. Thanks for taking the time to look into it!

@mdempsky
Copy link
Member

I think the issue here is in walk.instrument.

We're deciding whether to instrument a function based on base.Flag.Race and base.Compiling, which look at the current compilation unit's flags. But for functions instantiated across compilation unit boundaries (e.g., for generics, though theoretically this could affect other generated functions like equality and hash functions too), this isn't necessarily correct.

The fix should be to ensure IR builders set the Norace pragma as appropriate.

@mknyszek mknyszek added NeedsFix The path to resolution is known, but the work has not been done. help wanted labels May 30, 2023
@cuonglm
Copy link
Member

cuonglm commented Jun 13, 2023

The fix should be to ensure IR builders set the Norace pragma as appropriate.

How about making walk.instrument skips runtime functions? Something like this:

diff --git a/src/cmd/compile/internal/walk/race.go b/src/cmd/compile/internal/walk/race.go
index 859e5c57f0..f8b77f8131 100644
--- a/src/cmd/compile/internal/walk/race.go
+++ b/src/cmd/compile/internal/walk/race.go
@@ -15,6 +15,9 @@ func instrument(fn *ir.Func) {
        if fn.Pragma&ir.Norace != 0 || (fn.Linksym() != nil && fn.Linksym().ABIWrapper()) {
                return
        }
+       if types.IsRuntimePkg(fn.Sym().Pkg) {
+               return
+       }
 
        if !base.Flag.Race || !base.Compiling(base.NoRacePkgs) {
                fn.SetInstrumentBody(true)

@gopherbot
Copy link

Change https://go.dev/cl/502895 mentions this issue: cmd/compile: skip instrumenting functions from package runtime

@mdempsky
Copy link
Member

@cuonglm We need to also prevent runtime functions from getting inlined into other packages, otherwise they could still get instrumented that way.

@cuonglm cuonglm self-assigned this Jun 28, 2023
@cuonglm cuonglm modified the milestones: Backlog, Go1.22 Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants