Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: fatal error: "attempt to execute system stack code on user stack" during TestProgWideChdir #30591

Closed
ALTree opened this issue Mar 5, 2019 · 5 comments
Assignees
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@ALTree
Copy link
Member

ALTree commented Mar 5, 2019

The linux/arm builder just failed (at tip) on the os test suite, with the following error:

fatal error: attempt to execute system stack code on user stack

goroutine 87 [running]:
runtime.throw(0x195855, 0x32)
	/workdir/go/src/runtime/panic.go:627 +0x5c fp=0x4cef44 sp=0x4cef30 pc=0x3ce28
runtime.morestackc()
	/workdir/go/src/runtime/stack.go:1310 +0x18 fp=0x4cef50 sp=0x4cef44 pc=0x57580
runtime.stackfree(0x45c000, 0x45e000)
	/workdir/go/src/runtime/stack.go:414 +0x2dc fp=0x4cef50 sp=0x4cef50 pc=0x552a0
runtime.newm(0x49e9a0, 0x4267c)
	/workdir/go/src/runtime/proc.go:1812 +0x24 fp=0x4cef64 sp=0x4cef50 pc=0x42404
runtime: unexpected return pc for runtime.newm called from 0x49c000
stack: frame={sp:0x4cef64, fp:0x4cef78} stack=[0x4ce800,0x4cf000)
004ceee4:  00000000  00000000  00000000  00000000 
004ceef4:  00000000  00000000  00000000  00000000 
004cef04:  00000000  00000000  00000000  00000000 
004cef14:  0003cfc8 <runtime.fatalthrow+68>  0003ce28 <runtime.throw+92>  004cef20  00068358 <runtime.fatalthrow.func1+0> 
004cef24:  0049e9a0  0003ce28 <runtime.throw+92>  004cef30  00057580 <runtime.morestackc+24> 
004cef34:  004cef38  000682ec <runtime.throw.func1+0>  00195855  00000032 
004cef44:  000552a0 <runtime.stackfree+732>  00195855  00000032  00042404 <runtime.newm+36> 
004cef54:  0045c000  0045e000  00000000  00000000 
004cef64: <0049c000  0049e9a0  0004267c <runtime.startTemplateThread+76>  00000000 
004cef74:  001972ec >00000000  00000000  00046afc <runtime.LockOSThread+100> 
004cef84:  001972ec  00000000  00000001  00046a01 <runtime.gfpurge+81> 
004cef94:  0013ca64 <os_test.TestProgWideChdir.func1+292>  002c3920  00000000  0006b650 <runtime.goexit+4> 
004cefa4:  00000000  00000000  00000000  00000000 
004cefb4:  00000000  00000000  00000000  00000000 
004cefc4:  00000000  00000000  00000000  00000000 
004cefd4:  00000000  00000000  00000000  004eee80 
004cefe4:  004f10e0  004eeec0  00000009  00000000 
004ceff4:  00000000 
runtime.newm(0x0, 0x46afc)
	/workdir/go/src/runtime/proc.go:1812 +0x24 fp=0x4cef78 sp=0x4cef64 pc=0x42404
created by os_test.TestProgWideChdir
	/workdir/go/src/os/os_test.go:1255 +0x80

goroutine 1 [chan receive]:
testing.(*T).Run(0x4f10e0, 0x18cee8, 0x11, 0x196f40, 0x1)
	/workdir/go/src/testing/testing.go:917 +0x2d8
testing.runTests.func1(0x496000)
	/workdir/go/src/testing/testing.go:1157 +0x68
testing.tRunner(0x496000, 0x480f14)
	/workdir/go/src/testing/testing.go:865 +0xa4
testing.runTests(0x40c0a0, 0x2b3478, 0x77, 0x77, 0x0)
	/workdir/go/src/testing/testing.go:1155 +0x240
testing.(*M).Run(0x446100, 0x0)
	/workdir/go/src/testing/testing.go:1072 +0x138
main.main()
	_testmain.go:308 +0x130

goroutine 5 [syscall]:
os/signal.signal_recv(0x0)
	/workdir/go/src/runtime/sigqueue.go:139 +0x130
os/signal.loop()
	/workdir/go/src/os/signal/signal_unix.go:23 +0x14
created by os/signal.init.0
	/workdir/go/src/os/signal/signal_unix.go:29 +0x30

goroutine 77 [runnable]:
os_test.TestProgWideChdir(0x4f10e0)
	/workdir/go/src/os/os_test.go:1300 +0x35c
testing.tRunner(0x4f10e0, 0x196f40)
	/workdir/go/src/testing/testing.go:865 +0xa4
created by testing.(*T).Run
	/workdir/go/src/testing/testing.go:916 +0x2b8

goroutine 78 [runnable]:
os_test.TestProgWideChdir.func1(0x4eee80, 0x4f10e0, 0x4eeec0, 0x0)
	/workdir/go/src/os/os_test.go:1255
created by os_test.TestProgWideChdir
	/workdir/go/src/os/os_test.go:1255 +0x80

goroutine 79 [chan send, locked to thread]:
os_test.TestProgWideChdir.func1(0x4eee80, 0x4f10e0, 0x4eeec0, 0x1)
	/workdir/go/src/os/os_test.go:1272 +0x11c
created by os_test.TestProgWideChdir
	/workdir/go/src/os/os_test.go:1255 +0x80

goroutine 80 [chan send]:
os_test.TestProgWideChdir.func1(0x4eee80, 0x4f10e0, 0x4eeec0, 0x2)
	/workdir/go/src/os/os_test.go:1272 +0x11c
created by os_test.TestProgWideChdir
	/workdir/go/src/os/os_test.go:1255 +0x80

goroutine 81 [chan send, locked to thread]:
os_test.TestProgWideChdir.func1(0x4eee80, 0x4f10e0, 0x4eeec0, 0x3)
	/workdir/go/src/os/os_test.go:1272 +0x11c
created by os_test.TestProgWideChdir
	/workdir/go/src/os/os_test.go:1255 +0x80

goroutine 82 [chan send]:
os_test.TestProgWideChdir.func1(0x4eee80, 0x4f10e0, 0x4eeec0, 0x4)
	/workdir/go/src/os/os_test.go:1272 +0x11c
created by os_test.TestProgWideChdir
	/workdir/go/src/os/os_test.go:1255 +0x80

goroutine 85 [runnable]:
os_test.TestProgWideChdir.func1(0x4eee80, 0x4f10e0, 0x4eeec0, 0x7)
	/workdir/go/src/os/os_test.go:1255
created by os_test.TestProgWideChdir
	/workdir/go/src/os/os_test.go:1255 +0x80

goroutine 86 [runnable]:
os_test.TestProgWideChdir.func1(0x4eee80, 0x4f10e0, 0x4eeec0, 0x8)
	/workdir/go/src/os/os_test.go:1255
created by os_test.TestProgWideChdir
	/workdir/go/src/os/os_test.go:1255 +0x80
FAIL	os	0.178s
ok  	os/exec	3.672s
2019/03/05 04:45:47 Failed: exit status 1

Full log: https://build.golang.org/log/9592acb86aec954a29903bf39b780f3af5b7d30e

That builder is usually not flaky at all, so this probably warrants an investigation.

cc @ianlancetaylor

@ALTree ALTree added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 5, 2019
@ALTree ALTree added this to the Go1.13 milestone Mar 5, 2019
@ALTree ALTree changed the title os/exec: fatal error: attempt to execute system stack code on user stack (on linux/arm builder) os: fatal error: attempt to execute system stack code on user stack (on linux/arm builder) Mar 5, 2019
@ianlancetaylor
Copy link
Contributor

Looks like maybe LockOSThread called startTemplateThread called newm. But I don't see how we got from there to stackfree.

CC @aclements

@bcmills
Copy link
Contributor

bcmills commented Jun 20, 2019

Also seen in a linux-amd64 TryBot run (https://storage.googleapis.com/go-build-log/fa07c28c/linux-amd64_1e846603.log).

Curiously, TestProgWideChdir is again on the stack.

fatal error: attempt to execute system stack code on user stack

goroutine 95 [running]:
runtime.throw(0x5b8e24, 0x32)
	/workdir/go/src/runtime/panic.go:758 +0x72 fp=0xc0000e3688 sp=0xc0000e3658 pc=0x42df72
runtime.morestackc()
	/workdir/go/src/runtime/stack.go:1308 +0x27 fp=0xc0000e36a8 sp=0xc0000e3688 pc=0x447e57
runtime.stackfree(0xc00003c000, 0xc00003e000)
	/workdir/go/src/runtime/stack.go:414 +0x33a fp=0xc0000e36b0 sp=0xc0000e36a8 pc=0x44589a
runtime.allocm(0x0, 0x5baf90, 0x0)
	/workdir/go/src/runtime/proc.go:1480 +0xdd fp=0xc0000e36f0 sp=0xc0000e36b0 pc=0x432ebd
runtime.newm(0x5baf90, 0x0)
	/workdir/go/src/runtime/proc.go:1804 +0x39 fp=0xc0000e3720 sp=0xc0000e36f0 pc=0x433689
runtime.startTemplateThread(...)
	/workdir/go/src/runtime/proc.go:1868
runtime.LockOSThread()
	/workdir/go/src/runtime/proc.go:3537 +0x98 fp=0xc0000e3740 sp=0xc0000e3720 pc=0x4380e8
os_test.TestProgWideChdir.func1(0xc0001922a0, 0xc00015cc00, 0xc000192300, 0x9)
	/workdir/go/src/os/os_test.go:1281 +0x170 fp=0xc0000e37c0 sp=0xc0000e3740 pc=0x54bbe0
runtime.goexit()
	/workdir/go/src/runtime/asm_amd64.s:1357 +0x1 fp=0xc0000e37c8 sp=0xc0000e37c0 pc=0x45d631
created by os_test.TestProgWideChdir
	/workdir/go/src/os/os_test.go:1272 +0xc3

goroutine 1 [chan receive]:
testing.(*T).Run(0xc00015cc00, 0x5aff10, 0x11, 0x5ba800, 0x47bb01)
	/workdir/go/src/testing/testing.go:961 +0x377
testing.runTests.func1(0xc0000a0000)
	/workdir/go/src/testing/testing.go:1207 +0x78
testing.tRunner(0xc0000a0000, 0xc00007adc0)
	/workdir/go/src/testing/testing.go:909 +0xc9
testing.runTests(0xc00000c140, 0x6e7580, 0x7b, 0x7b, 0x0)
	/workdir/go/src/testing/testing.go:1205 +0x2a7
testing.(*M).Run(0xc00009c000, 0x0)
	/workdir/go/src/testing/testing.go:1122 +0x176
main.main()
	_testmain.go:318 +0x135

goroutine 6 [syscall]:
os/signal.signal_recv(0xc00002e787)
	/workdir/go/src/runtime/sigqueue.go:139 +0x9c
os/signal.loop()
	/workdir/go/src/os/signal/signal_unix.go:23 +0x22
created by os/signal.init.0
	/workdir/go/src/os/signal/signal_unix.go:29 +0x41

goroutine 85 [runnable]:
os_test.TestProgWideChdir(0xc00015cc00)
	/workdir/go/src/os/os_test.go:1317 +0x45b
testing.tRunner(0xc00015cc00, 0x5ba800)
	/workdir/go/src/testing/testing.go:909 +0xc9
created by testing.(*T).Run
	/workdir/go/src/testing/testing.go:960 +0x350

goroutine 88 [runnable]:
os_test.TestProgWideChdir.func1(0xc0001922a0, 0xc00015cc00, 0xc000192300, 0x2)
	/workdir/go/src/os/os_test.go:1272
created by os_test.TestProgWideChdir
	/workdir/go/src/os/os_test.go:1272 +0xc3

goroutine 89 [runnable]:
os_test.TestProgWideChdir.func1(0xc0001922a0, 0xc00015cc00, 0xc000192300, 0x3)
	/workdir/go/src/os/os_test.go:1272
created by os_test.TestProgWideChdir
	/workdir/go/src/os/os_test.go:1272 +0xc3

goroutine 91 [syscall, locked to thread]:
syscall.Syscall(0x4f, 0xc0001c8e18, 0x1000, 0x0, 0x0, 0x0, 0x0)
	/workdir/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.Getcwd(0xc0001c8e18, 0x1000, 0x1000, 0x0, 0x0, 0x0)
	/workdir/go/src/syscall/zsyscall_linux_amd64.go:182 +0x59
syscall.Getwd(0xc000125e64, 0x12, 0x5e7300, 0xc0001b40d0)
	/workdir/go/src/syscall/syscall_linux.go:231 +0x77
os.Getwd(0xc0001922a0, 0x0, 0x0, 0x0)
	/workdir/go/src/os/getwd.go:48 +0xab
os_test.TestProgWideChdir.func1(0xc0001922a0, 0xc00015cc00, 0xc000192300, 0x5)
	/workdir/go/src/os/os_test.go:1284 +0x6d
created by os_test.TestProgWideChdir
	/workdir/go/src/os/os_test.go:1272 +0xc3

goroutine 92 [runnable]:
sync.runtime_SemacquireMutex(0x6e9b94, 0x436d00, 0x1)
	/workdir/go/src/runtime/sema.go:71 +0x47
sync.(*Mutex).lockSlow(0x6e9b90)
	/workdir/go/src/sync/mutex.go:138 +0xfc
sync.(*Mutex).Lock(...)
	/workdir/go/src/sync/mutex.go:81
testing/internal/testdeps.(*testLog).add(0x6e9b90, 0x5ad15a, 0x6, 0x5acb3b, 0x3)
	/workdir/go/src/testing/internal/testdeps/deps.go:89 +0x18b
testing/internal/testdeps.(*testLog).Getenv(0x6e9b90, 0x5acb3b, 0x3)
	/workdir/go/src/testing/internal/testdeps/deps.go:68 +0x54
internal/testlog.Getenv(0x5acb3b, 0x3)
	/workdir/go/src/internal/testlog/log.go:53 +0x92
os.Getenv(0x5acb3b, 0x3, 0x5e7300, 0xc0001b0000)
	/workdir/go/src/os/env.go:102 +0x35
os.Getwd(0xc0001922a0, 0x0, 0x0, 0x0)
	/workdir/go/src/os/getwd.go:37 +0x8e
os_test.TestProgWideChdir.func1(0xc0001922a0, 0xc00015cc00, 0xc000192300, 0x6)
	/workdir/go/src/os/os_test.go:1284 +0x6d
created by os_test.TestProgWideChdir
	/workdir/go/src/os/os_test.go:1272 +0xc3

goroutine 93 [chan send, locked to thread]:
os_test.TestProgWideChdir.func1(0xc0001922a0, 0xc00015cc00, 0xc000192300, 0x7)
	/workdir/go/src/os/os_test.go:1289 +0x161
created by os_test.TestProgWideChdir
	/workdir/go/src/os/os_test.go:1272 +0xc3

goroutine 94 [runnable]:
os_test.TestProgWideChdir.func1(0xc0001922a0, 0xc00015cc00, 0xc000192300, 0x8)
	/workdir/go/src/os/os_test.go:1272
created by os_test.TestProgWideChdir
	/workdir/go/src/os/os_test.go:1272 +0xc3
FAIL	os	0.066s

@bcmills bcmills changed the title os: fatal error: attempt to execute system stack code on user stack (on linux/arm builder) os: fatal error: "attempt to execute system stack code on user stack" during TestProgWideChdir Jun 20, 2019
@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@prattmic
Copy link
Member

I also got this from manual runs of https://go-review.googlesource.com/c/go/+/232978/2/src/runtime/testdata/testprog/lockosthread.go, without the core runtime patch from the CL applied, with a much cleaner stack trace.

fatal error: attempt to execute system stack code on user stack                   
                                         
goroutine 8 [running]:
runtime.throw(0x50fbd9, 0x32)
        /usr/local/google/home/mpratt/src/go/src/runtime/panic.go:1116 +0x72 fp=0xc00004b6c0 sp=0xc00004b690 pc=0x4337f2                                            
runtime.morestackc()
        /usr/local/google/home/mpratt/src/go/src/runtime/stack.go:1320 +0x27 fp=0xc00004b6e0 sp=0xc00004b6c0 pc=0x462ba7                                            
runtime.stackfree(0xc000056000, 0xc000058000)                                     
        /usr/local/google/home/mpratt/src/go/src/runtime/stack.go:427 +0x36b fp=0xc00004b6e8 sp=0xc00004b6e0 pc=0x44a80b                                            
runtime.allocm(0x0, 0x511228, 0x45ff40)
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:1440 +0xd7 fp=0xc00004b728 sp=0xc00004b6e8 pc=0x439397                                             
runtime.newm(0x511228, 0x0)
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:1761 +0x39 fp=0xc00004b758 sp=0xc00004b728 pc=0x439b99                                             
runtime.startTemplateThread(...)         
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:1830
runtime.LockOSThread()
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:3773 +0x98 fp=0xc00004b778 sp=0xc00004b758 pc=0x43ee58
main.LockOSThreadTemplateThreadRace.func2(0xbfa7c99b86868dd1, 0x9c1163, 0x5ccae0, 0xc0000140d0)
        /usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprog/lockosthread.go:233 +0x6d fp=0xc00004b7c0 sp=0xc00004b778 pc=0x4d786d
runtime.goexit()
        /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc00004b7c8 sp=0xc00004b7c0 pc=0x465f81
created by main.LockOSThreadTemplateThreadRace
        /usr/local/google/home/mpratt/src/go/src/runtime/testdata/testprog/lockosthread.go:225 +0xdf

It seems that newm simply must be called on the system stack, which is not done consistently (stoplockedm -> handoffp -> startm is another case).

cc @mknyszek @aclements

@prattmic prattmic added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Linux labels Jul 24, 2020
@prattmic prattmic modified the milestones: Backlog, Go1.16 Jul 24, 2020
@prattmic prattmic self-assigned this Jul 24, 2020
@odeke-em odeke-em modified the milestones: Go1.16, Go1.17 Feb 6, 2021
@ianlancetaylor ianlancetaylor modified the milestones: Go1.17, Backlog Apr 19, 2021
@prattmic
Copy link
Member

I believe this is fixed by http://golang.org/cl/267257. That CL explicitly moves stackfree() to the system stack, though the rationale is incorrect. The CL states that allocm is called off the system stack for the first time with that CL, while clearly it was happening before with startTemplateThread.

@golang golang locked and limited conversation to collaborators Apr 20, 2022
@prattmic prattmic self-assigned this Jun 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

8 participants