Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: hang in StopTrace in runtime.test #39004

Closed
prattmic opened this issue May 11, 2020 · 8 comments
Closed

runtime: hang in StopTrace in runtime.test #39004

prattmic opened this issue May 11, 2020 · 8 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@prattmic
Copy link
Member

Passing -trace to go test runtime often (but not always) makes the test hang on exit, stuck in runtime.StopTrace / runtime.ReadTrace.

The immediate issue seems to be LockOSThread in init (https://github.com/golang/go/blob/master/src/runtime/runtime_linux_test.go#L24), as the issue goes away if that is removed.

../bin/go test -run=TestIntString -v -trace=/tmp/trace.out runtime                                                                         
=== RUN   TestIntString                                    
--- PASS: TestIntString (0.00s)                                     
=== RUN   TestIntStringAllocs
--- PASS: TestIntStringAllocs (0.00s)                                                                                                        
PASS             

SIGQUIT: quit                                                         
PC=0x4783e1 m=0 sigcode=0               
                                                                      
goroutine 0 [idle]:                                                                                                                          
runtime.futex(0x833c88, 0x80, 0x0, 0x0, 0x0, 0x0, 0xc000000000, 0xc00000001, 0x7ffd4808d368, 0x40e2bf, ...)                                  
        /usr/local/google/home/mpratt/src/go/src/runtime/sys_linux_amd64.s:567 +0x21 fp=0x7ffd4808d2f0 sp=0x7ffd4808d2e8 pc=0x4783e1         
runtime.futexsleep(0x833c88, 0x7ffd00000000, 0xffffffffffffffff)
        /usr/local/google/home/mpratt/src/go/src/runtime/os_linux.go:45 +0x46 fp=0x7ffd4808d340 sp=0x7ffd4808d2f0 pc=0x4359e6                
runtime.notesleep(0x833c88)                                         
        /usr/local/google/home/mpratt/src/go/src/runtime/lock_futex.go:159 +0x9f fp=0x7ffd4808d378 sp=0x7ffd4808d340 pc=0x40e2bf
runtime.stoplockedm()                                                 
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:2031 +0x8d fp=0x7ffd4808d3b8 sp=0x7ffd4808d378 pc=0x44090d                  
runtime.schedule()                                                    
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:2552 +0x48c fp=0x7ffd4808d420 sp=0x7ffd4808d3b8 pc=0x44276c                 
runtime.park_m(0xc000000180)                   
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:2786 +0x9d fp=0x7ffd4808d450 sp=0x7ffd4808d420 pc=0x442b5d                  
runtime.mcall(0x0)                                                    
        /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:318 +0x5b fp=0x7ffd4808d460 sp=0x7ffd4808d450 pc=0x47475b               
                                                                      
goroutine 1 [semacquire, locked to thread]:                                                                                                  
runtime.gopark(0x6a98b8, 0x839f40, 0xc000011912, 0x4)
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:306 +0xe5 fp=0xc000141b10 sp=0xc000141af0 pc=0x43c605                       
runtime.goparkunlock(...)                 
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:312                                                                         
runtime.semacquire1(0x83a794, 0x0, 0x0, 0x0)                          
        /usr/local/google/home/mpratt/src/go/src/runtime/sema.go:144 +0x1c5 fp=0xc000141b78 sp=0xc000141b10 pc=0x44d0e5                      
runtime.semacquire(...)                                               
        /usr/local/google/home/mpratt/src/go/src/runtime/sema.go:95                                                                                                                                                                                                                       
runtime.StopTrace()                                                                                                                                                                                                                                                                       
        /usr/local/google/home/mpratt/src/go/src/runtime/trace.go:327 +0x1b7 fp=0xc000141bb0 sp=0xc000141b78 pc=0x45d1b7
runtime/trace.Stop()                                             
        /usr/local/google/home/mpratt/src/go/src/runtime/trace/trace.go:147 +0x74 fp=0xc000141be0 sp=0xc000141bb0 pc=0x4edab4
testing.(*M).writeProfiles(0xc000136000)                          
        /usr/local/google/home/mpratt/src/go/src/testing/testing.go:1472 +0x7af fp=0xc000141cd8 sp=0xc000141be0 pc=0x4fa5cf                  
testing.(*M).after.func1()                               
        /usr/local/google/home/mpratt/src/go/src/testing/testing.go:1453 +0x2a fp=0xc000141cf0 sp=0xc000141cd8 pc=0x4fc98a                   
sync.(*Once).doSlow(0xc000136060, 0xc000141d50)                       
        /usr/local/google/home/mpratt/src/go/src/sync/once.go:66 +0xec fp=0xc000141d40 sp=0xc000141cf0 pc=0x47fbcc                           
sync.(*Once).Do(...)                            
        /usr/local/google/home/mpratt/src/go/src/sync/once.go:57                                                                             
testing.(*M).after(0xc000136000)                                      
        /usr/local/google/home/mpratt/src/go/src/testing/testing.go:1452 +0x65 fp=0xc000141d70 sp=0xc000141d40 pc=0x4f9e05                   
testing.(*M).Run(0xc000136000, 0x0)                        
        /usr/local/google/home/mpratt/src/go/src/testing/testing.go:1295 +0x41f fp=0xc000141e80 sp=0xc000141d70 pc=0x4f863f                  
runtime_test.TestMain(0xc000136000)                                   
        /usr/local/google/home/mpratt/src/go/src/runtime/crash_test.go:28 +0x2f fp=0xc000141ed0 sp=0xc000141e80 pc=0x58e3af                  
main.main()                                                                                                                                                                                                                                                                               
        _testmain.go:1225 +0x165 fp=0xc000141f88 sp=0xc000141ed0 pc=0x61d285                                                                 
runtime.main()                       
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:204 +0x223 fp=0xc000141fe0 sp=0xc000141f88 pc=0x43c1e3                      
runtime.goexit()                                                      
        /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc000141fe8 sp=0xc000141fe0 pc=0x476561               

goroutine 2 [force gc (idle)]:                                        
runtime.gopark(0x6a98b8, 0x832800, 0x1411, 0x1)                       
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:306 +0xe5 fp=0xc00004afb0 sp=0xc00004af90 pc=0x43c605                       
runtime.goparkunlock(...)                                  
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:312
runtime.forcegchelper()      
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:255 +0xc5 fp=0xc00004afe0 sp=0xc00004afb0 pc=0x43c4a5                       
runtime.goexit()                  
        /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc00004afe8 sp=0xc00004afe0 pc=0x476561                   
created by runtime.init.6                                             
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:243 +0x35

goroutine 3 [GC sweep wait]:
runtime.gopark(0x6a98b8, 0x832c20, 0x140c, 0x1)
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:306 +0xe5 fp=0xc00004b7a8 sp=0xc00004b788 pc=0x43c605
runtime.goparkunlock(...)
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:312
runtime.bgsweep(0xc000072000)
        /usr/local/google/home/mpratt/src/go/src/runtime/mgcsweep.go:163 +0x9e fp=0xc00004b7d8 sp=0xc00004b7a8 pc=0x4274be
runtime.goexit()
        /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc00004b7e0 sp=0xc00004b7d8 pc=0x476561
created by runtime.gcenable
        /usr/local/google/home/mpratt/src/go/src/runtime/mgc.go:217 +0x5c

goroutine 4 [GC scavenge wait]:
runtime.gopark(0x6a98b8, 0x832de0, 0x140d, 0x1)
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:306 +0xe5 fp=0xc00004bf78 sp=0xc00004bf58 pc=0x43c605
runtime.goparkunlock(...)
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:312
runtime.bgscavenge(0xc000072000)
        /usr/local/google/home/mpratt/src/go/src/runtime/mgcscavenge.go:265 +0xd2 fp=0xc00004bfd8 sp=0xc00004bf78 pc=0x4254f2
runtime.goexit()
        /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc00004bfe0 sp=0xc00004bfd8 pc=0x476561
created by runtime.gcenable
        /usr/local/google/home/mpratt/src/go/src/runtime/mgc.go:218 +0x7e

goroutine 5 [finalizer wait]:
runtime.gopark(0x6a98b8, 0x8635e8, 0x1410, 0x1)
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:306 +0xe5 fp=0xc00004c758 sp=0xc00004c738 pc=0x43c605
runtime.goparkunlock(...)
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:312
runtime.runfinq()
        /usr/local/google/home/mpratt/src/go/src/runtime/mfinal.go:175 +0xa9 fp=0xc00004c7e0 sp=0xc00004c758 pc=0x41c349
runtime.goexit()
        /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc00004c7e8 sp=0xc00004c7e0 pc=0x476561
created by runtime.createfing
        /usr/local/google/home/mpratt/src/go/src/runtime/mfinal.go:156 +0x65

goroutine 6 [trace reader (blocked)]:
runtime.gopark(0x6a98b8, 0x83a780, 0x1416, 0x2)
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:306 +0xe5 fp=0xc00004a710 sp=0xc00004a6f0 pc=0x43c605
runtime.goparkunlock(...)
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:312
runtime.ReadTrace(0xc000010028, 0x7f2f65e24418, 0x21)
        /usr/local/google/home/mpratt/src/go/src/runtime/trace.go:398 +0x4a5 fp=0xc00004a788 sp=0xc00004a710 pc=0x45d825
runtime/trace.Start.func1(0x6e79c0, 0xc000010028)
        /usr/local/google/home/mpratt/src/go/src/runtime/trace/trace.go:129 +0x47 fp=0xc00004a7d0 sp=0xc00004a788 pc=0x4edb67
runtime.goexit()
        /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc00004a7d8 sp=0xc00004a7d0 pc=0x476561
created by runtime/trace.Start
        /usr/local/google/home/mpratt/src/go/src/runtime/trace/trace.go:127 +0xdb

rax    0xca
rbx    0x833b40
rcx    0x4783e3
rdx    0x0
rdi    0x833c88
rsi    0x80
rbp    0x7ffd4808d330
rsp    0x7ffd4808d2e8
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0x2
r13    0x817fe0
r14    0x0
r15    0x476560
rip    0x4783e1
rflags 0x286
cs     0x33
fs     0x0
gs     0x0
FAIL    runtime 41.181s
FAIL
@prattmic prattmic added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 11, 2020
@prattmic prattmic added this to the Go1.15 milestone May 11, 2020
@prattmic
Copy link
Member Author

cc @mknyszek

@prattmic
Copy link
Member Author

The issue is that the M that stops tracing may be the only one awake. If it is locked, then it will call stoplockedm before checking that it needs to wake the trace reader: https://github.com/golang/go/blob/master/src/runtime/proc.go#L2551-L2597

A rudimentary trace I made:

--- PASS: TestIntStringAllocs (0.00s)
[g=0, m=5, p=0] try wake tracer goid=6
[g=0, m=5, p=0] wakep
[g=6, m=5, p=0] woken for data
[g=6, m=5, p=0] wait for data
[g=0, m=0, p=0] stoplockedm
PASS
[g=1, m=0, p=0] wait for shutdown
[g=0, m=0, p=0] stoplockedm

The reader is woken once, but when it is time to shutdown ("wait for shutdown"), the reader is never woken.

It may make sense to move the trace check (and perhaps GC worker as well?) before stoplockedm, or perhaps we should make some more fundamental change. (Why can't StopTrace call wakep directly?)

cc @aclements

@gopherbot
Copy link

Change https://golang.org/cl/233417 mentions this issue: runtime: directly goready() in StopTrace

@ianlancetaylor
Copy link
Contributor

Rolling forward to 1.16.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.15, Go1.16 Jun 26, 2020
@aclements
Copy link
Member

Not a new issue and we don't really have a fix, so pushing to 1.17.

@ianlancetaylor
Copy link
Contributor

Shall we just move this to Backlog?

@prattmic prattmic modified the milestones: Go1.17, Backlog Apr 27, 2021
@gopherbot
Copy link

Change https://go.dev/cl/393880 mentions this issue: runtime: move scheduling decisions by schedule into findrunnable

@mknyszek
Copy link
Contributor

Forgot to mention, but I can't reproduce the original problem after https://go.dev/cl/393880.

@golang golang locked and limited conversation to collaborators Apr 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants