Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc/cgo/testsanitizers: occasional hangs in TestTSAN/tsan12 #52998

Closed
dmitshur opened this issue May 19, 2022 · 17 comments
Closed

misc/cgo/testsanitizers: occasional hangs in TestTSAN/tsan12 #52998

dmitshur opened this issue May 19, 2022 · 17 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@dmitshur
Copy link
Contributor

(This is mostly a continuation of #33249, with some new information.)

There's evidence of a problem where this test can sometimes hang indefinitely, though in some environments there is an external timeout, causing it to be eventually retried and then succeed. The hang delays success by however long the timeout for the retry was.

Additional background

I've observed an occurrence this week in a SlowBot run in https://go.dev/406897 (linux-amd64-longtest took almost 2 hrs), another one last week. There are mentions of it in #42699 (comment) and in #36629 (comment). So I think it's been around for a while.

Issue #33249 was closed because there weren't failures since late 2019, but that is likely due to an unintentional behavior change in the coordinator in Nov 2019 leading to #42699, rather than due to a change in testsanitizers test.

I was able to reproduce the problem locally (using tip commit c6965ad) by running the test in a loop up to a 1000 times (it reproduced on iteration 189 in the first run, on iteration 42 in another, on iteration 53 in yet another):

$ for i in {1..1000}; do printf "\nattempt $i\n"; time go tool dist test testsanitizers; sleep 1; done
[...]
attempt 189

##### Test execution environment.
# GOARCH: amd64
# CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
# GOOS: linux
# OS Version: Linux 5.16.18-1rodete2-amd64 #1 SMP PREEMPT Debian 5.16.18-1rodete2 (2022-04-07) x86_64

##### ../misc/cgo/testsanitizers
(hang indefinitely here)

Sending a SIQQUIT during the hang produced the following output:

Output
[...]

attempt 188

##### Test execution environment.
# GOARCH: amd64
# CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
# GOOS: linux
# OS Version: Linux 5.16.18-1rodete2-amd64 #1 SMP PREEMPT Debian 5.16.18-1rodete2 (2022-04-07) x86_64

##### ../misc/cgo/testsanitizers
PASS

ALL TESTS PASSED (some were excluded)

real	0m6.893s
user	0m25.925s
sys	0m12.214s

attempt 189

##### Test execution environment.
# GOARCH: amd64
# CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
# GOOS: linux
# OS Version: Linux 5.16.18-1rodete2-amd64 #1 SMP PREEMPT Debian 5.16.18-1rodete2 (2022-04-07) x86_64

##### ../misc/cgo/testsanitizers
^\SIGQUIT: quit
PC=0x463d20 m=0 sigcode=128

goroutine 0 [idle]:
runtime.epollwait()
	$HOME/gotip/src/runtime/sys_linux_amd64.sSIGQUIT: quit:
706PC= +0x46a6a10x20 m= fp=00x7fff1ab80948 sigcode= sp=1280x7fff1ab80940
 pc=
0x463d20
goroutine 0 [runtime.netpollidle(]:
0xc000036f00?)
	$HOME/gotip/src/runtime/netpoll_epoll.go:126 +0xdc fp=0x7fff1ab80fa8 sp=0x7fff1ab80948 pc=0x42f8fc
runtime.findRunnable()
	$HOME/gotip/src/runtime/proc.go:2829 +0x6a5 fp=0x7fff1ab81098 sp=0x7fff1ab80fa8 pc=0x43b3c5
runtime.futex()
	$HOME/gotip/src/runtime/sys_linux_amd64.s:559 +0x21 fp=0x7ffccfc29b50runtime.schedule sp=(0x7ffccfc29b48)
 pc=	0x46a6a1$HOME/gotip/src/runtime/proc.go
:3206 +runtime.futexsleep0xbe( fp=0x7fff1ab810d0 sp=0x7fff1ab81098 pc=0x43c51e
0x10000000000runtime.park_m?(, 0xc0000061a00x0??)
, 	0x7ffccfc29bd8$HOME/gotip/src/runtime/proc.go?:)
3355	 +$HOME/gotip/src/runtime/os_linux.go0x14d: fp=690x7fff1ab81100 + sp=0x360x7fff1ab810d0 fp= pc=0x7ffccfc29ba00x43ca4d sp=
0x7ffccfc29b50 pc=runtime.mcall0x432536(
)
	$HOME/gotip/src/runtime/asm_amd64.sruntime.notesleep:(4480x672e68 +)
0x43	 fp=$HOME/gotip/src/runtime/lock_futex.go0x7fff1ab81110: sp=1600x7fff1ab81100 + pc=0x870x45fb43 fp=
0x7ffccfc29bd8 sp=
0x7ffccfc29ba0goroutine  pc=10x40b627 [
syscall]:
runtime.mPark(...)
	$HOME/gotip/src/runtime/proc.go:1456
runtime.stopm()
	$HOME/gotip/src/runtime/proc.go:syscall.Syscall62238( +0x20x8c? fp=, 0x7ffccfc29c080x2 sp=?0x7ffccfc29bd8,  pc=0x430x43ce0c?
, 0x2f7300c0000c60e0?, 0x43?runtime.findRunnable, (0x3)
?	, $HOME/gotip/src/runtime/proc.go0x3c:?2866)
 +	0x9f2$HOME/gotip/src/syscall/syscall_linux.go fp=:0x7ffccfc29cf889 sp= +0x7ffccfc29c080x36 pc= fp=0x43e4b20xc00008d718
 sp=0xc00008d690runtime.schedule pc=(0x48c796)

	$HOME/gotip/src/runtime/proc.go:3206 +0xbe fp=0x7ffccfc29d30 sp=0x7ffccfc29cf8 pc=0x43f2be
runtime.park_m(0xc000007380?)
	$HOME/gotip/src/runtime/proc.go:3355 +0x14d fp=0x7ffccfc29d60 sp=0x7ffccfc29d30os.(*Process).blockUntilWaitable pc=(0x43f7ed0xc0000cc390
)
	runtime.mcall$HOME/gotip/src/os/wait_waitid.go(:)
32	 +$HOME/gotip/src/runtime/asm_amd64.s0x87: fp=4480xc00008d7f0 + sp=0x430xc00008d718 fp= pc=0x7ffccfc29d700x4a78c7 sp=
0x7ffccfc29d60 pc=os.(*Process).wait0x4666a3(
0xc0000cc390)
	$HOME/gotip/src/os/exec_unix.go:22 +0x28 fp=0xc00008d850 sp=0xc00008d7f0 pc=0x4a2608

goroutine 1 [chan receive]:
runtime.gopark(0xc00009a000?, 0x1?, 0xe0?, 0x71?, 0x421325?)
	$HOME/gotip/src/runtime/proc.goos.(*Process).Wait:(...)
363	 +$HOME/gotip/src/os/exec.go0xd6: fp=1320xc0001069b0
 sp=os/exec.(*Cmd).Wait0xc000106990( pc=0xc0000c84200x439016)

	$HOME/gotip/src/os/exec/exec.go:594 +0x4b fp=runtime.chanrecv0xc00008d8c8( sp=0xc0001a80000xc00008d850,  pc=0x00x4cca8b, 
0x1)
	$HOME/gotip/src/runtime/chan.go:583os/exec.(*Cmd).Run +(0x49b0xc0000b4200 fp=?0xc000106a40)
 sp=	0xc0001069b0$HOME/gotip/src/os/exec/exec.go pc=:0x4066fb432
 +0x39 fp=0xc00008d8e8 sp=0xc00008d8c8 pc=0x4cbc99
runtime.chanrecv1(0xc0001001e0?, 0xc0001a8058?)
	$HOME/gotip/src/runtime/chan.go:442 +0x18 fp=0xc000106a68 sp=0xc000106a40 pc=0x406238
testing.tRunner.func1(main.(*tester).runHostTest)
(	0x53c274$HOME/gotip/src/testing/testing.go?:, 1412{ +0x54163f0x4a5,  fp=0x170xc000106bd0} sp=, 0xc000106a68{ pc=0x53bc800x4d4585, 
0x1}testing.tRunner)
(	0xc0001164e0$HOME/gotip/src/cmd/dist/test.go, :0xc000106cd81184)
 +	0x4de$HOME/gotip/src/testing/testing.go fp=:0xc00008da701452 sp= +0xc00008d8e80x144 pc= fp=0x50a03e0xc000106c20
 sp=0xc000106bd0main.(*tester).registerHostTest.func1 pc=(0x4d40640xc000238000
?)
	testing.runTests$HOME/gotip/src/cmd/dist/test.go(:0xc0001320a01153? +, 0x10d{ fp=0x66d9000xc00008db18,  sp=0x40xc00008da70,  pc=0x40x509a8d}
, {0x7f07a87b9a68?main.(*tester).run, (0x400xc000238000?)
, 	0x0$HOME/gotip/src/cmd/dist/test.go?:}236)
 +	0x1247$HOME/gotip/src/testing/testing.go fp=:0xc00008de401844 sp= +0xc00008db180x456 pc= fp=0x4fee070xc000106d08
 sp=0xc000106c20 pc=main.cmdtest0x4d6bb6(
)
	$HOME/gotip/src/cmd/dist/test.gotesting.(*M).Run:(500xc0001320a0 +)
0x33c	 fp=$HOME/gotip/src/testing/testing.go0xc00008de98: sp=17260xc00008de40 + pc=0x5d90x4fdb9c fp=
0xc000106f00 sp=main.xmain0xc000106d08( pc=)
0x4d5699	
$HOME/gotip/src/cmd/dist/main.go:main.main186( +)
0x111	 fp=_testmain.go0xc00008df00: sp=530xc00008de98 + pc=0x1aa0x4fd731 fp=
0xc000106f80 sp=0xc000106f00main.main pc=(0x52f44a)

	$HOME/gotip/src/cmd/dist/main.goruntime.main:(169)
 +	0xb7e$HOME/gotip/src/runtime/proc.go fp=:0xc00008df80250 sp= +0xc00008df000x212 pc= fp=0x4fd5de0xc000106fe0
 sp=0xc000106f80runtime.main pc=(0x438c52)

	$HOME/gotip/src/runtime/proc.goruntime.goexit:(250)
 +	0x212$HOME/gotip/src/runtime/asm_amd64.s fp=:0xc00008dfe01594 sp= +0xc00008df800x1 pc= fp=0x435eb20xc000106fe8
 sp=0xc000106fe0runtime.goexit pc=(0x4687a1)

	$HOME/gotip/src/runtime/asm_amd64.s
:goroutine 15942 + [0x1force gc (idle) fp=]:
0xc00008dfe8runtime.gopark(0x0 sp=?0xc00008dfe0,  pc=0x00x461c41?
, 
0x0goroutine ?2,  [0x0force gc (idle)?]:
, 0x0runtime.gopark?()
0x0	?$HOME/gotip/src/runtime/proc.go, :0x0363? +, 0xd60x0 fp=?0xc000050fb0,  sp=0x00xc000050f90? pc=, 0x4390160x0
?)
runtime.goparkunlock	(...)
$HOME/gotip/src/runtime/proc.go	:$HOME/gotip/src/runtime/proc.go363: +3690xd6
 fp=runtime.forcegchelper0xc000050fb0( sp=)
0xc000050f90	 pc=$HOME/gotip/src/runtime/proc.go0x436276:
302 +0xad fp=0xc000050fe0 sp=0xc000050fb0 pc=0x438ead
runtime.goexit()
	$HOME/gotip/src/runtime/asm_amd64.s:1594runtime.goparkunlock +(...)
0x1	 fp=$HOME/gotip/src/runtime/proc.go0xc000050fe8: sp=3690xc000050fe0
 pc=0x4687a1runtime.forcegchelper
(created by )
runtime.init.6	
$HOME/gotip/src/runtime/proc.go	:$HOME/gotip/src/runtime/proc.go302: +2900xad + fp=0x250xc000050fe0
 sp=0xc000050fb0
 pc=goroutine 0x43610d17
 [runtime.goexitGC sweep wait(]:
)
	runtime.gopark$HOME/gotip/src/runtime/asm_amd64.s(:0x01594? +, 0x10x0 fp=?0xc000050fe8,  sp=0x00xc000050fe0? pc=, 0x461c410x0
?created by , runtime.init.60x0
?)
		$HOME/gotip/src/runtime/proc.go$HOME/gotip/src/runtime/proc.go::290363 + +0x250xd6
 fp=0xc00004c790
 sp=goroutine 0xc00004c7703 pc= [0x439016GC sweep wait
]:
runtime.goparkunlock(...)
runtime.gopark	($HOME/gotip/src/runtime/proc.go0x0:?369, 
0x0?, 0x0?, 0x0?, runtime.bgsweep0x0(?0x0)
?	)
$HOME/gotip/src/runtime/proc.go	:$HOME/gotip/src/runtime/mgcsweep.go363: +2780xd6 + fp=0x8e0xc000051790 fp= sp=0xc00004c7c80xc000051770 sp= pc=0xc00004c7900x436276 pc=
0x42404e
runtime.goparkunlockruntime.gcenable.func1(...)
(	)
$HOME/gotip/src/runtime/proc.go	:$HOME/gotip/src/runtime/mgc.go369:
178 +runtime.bgsweep0x26( fp=0x00xc00004c7e0? sp=)
0xc00004c7c8	 pc=$HOME/gotip/src/runtime/mgcsweep.go0x4191a6:
278runtime.goexit +(0x8e)
 fp=	0xc0000517c8$HOME/gotip/src/runtime/asm_amd64.s sp=:0xc0000517901594 pc= +0x4233ae0x1
 fp=runtime.gcenable.func10xc00004c7e8( sp=)
0xc00004c7e0	 pc=$HOME/gotip/src/runtime/mgc.go0x4687a1:
178created by  +runtime.gcenable0x26
 fp=	0xc0000517e0$HOME/gotip/src/runtime/mgc.go sp=:0xc0000517c8178 pc= +0x4187060x6b

runtime.goexit
(goroutine )
18	 [$HOME/gotip/src/runtime/asm_amd64.s:1594 +0x1 fp=GC scavenge wait0xc0000517e8]:
 sp=0xc0000517e0runtime.gopark pc=(0x461c410xc000094000
?created by , runtime.gcenable0x59a710
?	, $HOME/gotip/src/runtime/mgc.go0x1:?178,  +0x00x6b?
, 
0x0goroutine ?4)
 [	GC scavenge wait$HOME/gotip/src/runtime/proc.go]:
:363runtime.gopark +(0xd60xc00006e000 fp=?0xc00004cf70,  sp=0x56f3300xc00004cf50? pc=, 0x4390160x1
?, runtime.goparkunlock0x0(...)
?	, $HOME/gotip/src/runtime/proc.go0x0:?369)

	$HOME/gotip/src/runtime/proc.goruntime.(*scavengerState).park:(3630x672560 +)
0xd6	 fp=$HOME/gotip/src/runtime/mgcscavenge.go0xc000051f70: sp=3890xc000051f50 + pc=0x530x436276 fp=
0xc00004cfa0 sp=runtime.goparkunlock0xc00004cf70(...)
 pc=	0x422133$HOME/gotip/src/runtime/proc.go
:369runtime.bgscavenge
(runtime.(*scavengerState).park0x0(?0x62bc60)
)
		$HOME/gotip/src/runtime/mgcscavenge.go$HOME/gotip/src/runtime/mgcscavenge.go::617389 + +0x450x53 fp= fp=0xc00004cfc80xc000051fa0 sp= sp=0xc00004cfa00xc000051f70 pc= pc=0x4227050x421493

runtime.gcenable.func2runtime.bgscavenge(()
0x0	?$HOME/gotip/src/runtime/mgc.go)
:	179$HOME/gotip/src/runtime/mgcscavenge.go +:0x26617 fp= +0xc00004cfe00x45 sp= fp=0xc00004cfc80xc000051fc8 pc= sp=0x4191460xc000051fa0
 pc=runtime.goexit0x421a65(
)
runtime.gcenable.func2	($HOME/gotip/src/runtime/asm_amd64.s)
:	1594$HOME/gotip/src/runtime/mgc.go +:0x1179 fp= +0xc00004cfe80x26 sp= fp=0xc00004cfe00xc000051fe0 pc= sp=0x4687a10xc000051fc8
 pc=created by 0x4186a6runtime.gcenable

runtime.goexit	($HOME/gotip/src/runtime/mgc.go)
:	179$HOME/gotip/src/runtime/asm_amd64.s +:0xaa1594
 +
0x1goroutine  fp=330xc000051fe8 [ sp=finalizer wait0xc000051fe0]:
 pc=0x461c41runtime.gopark
(created by 0x0runtime.gcenable?
, 	0x0$HOME/gotip/src/runtime/mgc.go?:, 1790x0 +?0xaa, 
0x0?
, goroutine 0xc00005077017? [)
finalizer wait	]:
$HOME/gotip/src/runtime/proc.go:runtime.gopark363( +0x00xd6? fp=, 0xc0000506280xc000050670 sp=?0xc000050608,  pc=0xc50x439016?
, runtime.goparkunlock0x5b(...)
?	, $HOME/gotip/src/runtime/proc.go0xc000050770:?369)
	$HOME/gotip/src/runtime/proc.go:
363 +runtime.runfinq0xd6( fp=)
0xc000050628	 sp=$HOME/gotip/src/runtime/mfinal.go0xc000050608: pc=1800x436276 +
0x10f fp=runtime.goparkunlock0xc0000507e0(...)
 sp=	0xc000050628$HOME/gotip/src/runtime/proc.go pc=:0x4182af369

runtime.goexit(runtime.runfinq)
(	)
$HOME/gotip/src/runtime/asm_amd64.s	:$HOME/gotip/src/runtime/mfinal.go1594: +1800x1 + fp=0x10f0xc0000507e8 sp= fp=0xc0000507e00xc0000507e0 pc= sp=0x4687a10xc000050628
 pc=0x41780fcreated by 
runtime.createfingruntime.goexit
(	)
$HOME/gotip/src/runtime/mfinal.go	:$HOME/gotip/src/runtime/asm_amd64.s157: +15940x45 +
0x1 fp=
0xc0000507e8goroutine  sp=60xc0000507e0 [ pc=chan receive0x461c41]:

runtime.goparkcreated by (runtime.createfing0xc00009a0c0
?	, $HOME/gotip/src/runtime/mfinal.go0x0:?157,  +0x400x45?
, 
0x0goroutine ?33,  [0xc0001001d8select?]:
)
	runtime.gopark$HOME/gotip/src/runtime/proc.go(:0xc00004cfa8363? +, 0xd60x2 fp=?0xc00008fd50,  sp=0xaf0xc00008fd30? pc=, 0x4390160xf6
?, runtime.chanrecv0xc00004cf84(?0xc0004204d0)
, 	0x0$HOME/gotip/src/runtime/proc.go, :0x1363)
 +	0xd6$HOME/gotip/src/runtime/chan.go fp=:0xc00004ce08583 sp= +0xc00004cde80x49b pc= fp=0x4362760xc00008fde0
 sp=0xc00008fd50 pc=0x4066fb
runtime.chanrecv1(0xc00002a060runtime.selectgo?(, 0xc00004cfa80x565aea, ?0xc00004cf80)
, 	0x0$HOME/gotip/src/runtime/chan.go?:, 4420x0 +, 0x180xc0000ae040 fp=?0xc00008fe08,  sp=0x10xc00008fde0)
 pc=	0x406238$HOME/gotip/src/runtime/select.go
:testing.tRunner.func1328( +)
0x7bc	 fp=$HOME/gotip/src/testing/testing.go0xc00004cf48: sp=14120xc00004ce08 + pc=0x4a50x4454fc fp=
0xc00008ff70 sp=main.bghelper0xc00008fe08( pc=)
0x4d4585	
$HOME/gotip/src/cmd/dist/util.go:testing.tRunner142( +0xc0000071e00xcb,  fp=0x5717400xc00004cfe0)
 sp=	0xc00004cf48$HOME/gotip/src/testing/testing.go pc=:0x51062b1452
 +runtime.goexit0x144( fp=)
0xc00008ffc0	 sp=$HOME/gotip/src/runtime/asm_amd64.s0xc00008ff70: pc=15940x4d4064 +
0x1 fp=testing.(*T).Run.func10xc00004cfe8( sp=)
0xc00004cfe0	 pc=$HOME/gotip/src/testing/testing.go0x461c41:
1493created by  +main.bginit0x2a
 fp=	0xc00008ffe0$HOME/gotip/src/cmd/dist/util.go sp=:0xc00008ffc0135 pc= +0x4d4eca0x3c

runtime.goexit
(goroutine )
34	 [$HOME/gotip/src/runtime/asm_amd64.sselect:]:
1594runtime.gopark +(0x10xc0002147a8 fp=?0xc00008ffe8,  sp=0x20xc00008ffe0? pc=, 0x4687a10x0
?created by , testing.(*T).Run0x0
?	, $HOME/gotip/src/testing/testing.go0xc000214784:?1493)
 +	0x35f$HOME/gotip/src/runtime/proc.go
:363 +0xd6 fp=0xc000214608 sp=0xc0002145e8 pc=0x436276
runtime.selectgo
(goroutine 0xc0002147a841,  [0xc000214780runnable, ]:
0x0?, 0x0, 0x0?, 0x1)
	syscall.Syscall$HOME/gotip/src/runtime/select.go(:3280x48c265 +?0x7bc,  fp=0x00xc000214748? sp=, 0xc0002146080x48c432 pc=?0x4454fc, 
0x7ffff800000?main.bghelper)
(	)
$HOME/gotip/src/syscall/syscall_linux.go	:$HOME/gotip/src/cmd/dist/util.go67: +1420x27 + fp=0xcb0xc00014e540 fp= sp=0xc0002147e00xc00014e4d0 sp= pc=0xc0002147480x475207 pc=
0x51062b
syscall.read(runtime.goexit0xc00054a3c0(?, )
{	0xc0001cec94$HOME/gotip/src/runtime/asm_amd64.s?:, 15940x1 +?0x1,  fp=0xc00014e6100xc0002147e8? sp=}0xc0002147e0)
 pc=	0x461c41$HOME/gotip/src/syscall/zsyscall_linux_amd64.go
:created by 696main.bginit +
0x45	 fp=$HOME/gotip/src/cmd/dist/util.go0xc00014e580: sp=1350xc00014e540 + pc=0x3c0x4745e5


goroutine 35 [select]:
syscall.Readruntime.gopark(...)
(	0xc000214fa8$HOME/gotip/src/syscall/syscall_unix.go?:, 1830x2
?, internal/poll.ignoringEINTRIO0x0(...)
?	, $HOME/gotip/src/internal/poll/fd_unix.go0x0:?794, 
0xc000214f84internal/poll.(*FD).Read?()
0xc00054a3c0	?$HOME/gotip/src/runtime/proc.go, :{3630xc0001cec94 +?0xd6,  fp=0x36c0xc000214e08? sp=, 0xc000214de80x36c pc=?0x436276}
)
	runtime.selectgo$HOME/gotip/src/internal/poll/fd_unix.go(:0xc000214fa8163,  +0xc000214f800x285,  fp=0x00xc00014e600? sp=, 0xc00014e5800x0 pc=, 0x48cd250x0
?, os.(*File).read0x1(...)
)
		$HOME/gotip/src/os/file_posix.go$HOME/gotip/src/runtime/select.go::31328
 +os.(*File).Read0x7bc( fp=0xc0002302f80xc000214f48,  sp={0xc000214e080xc0001cec94 pc=?0x4454fc, 
0x7f078114ce40?main.bghelper, (0xc00014e6a0)
?	}$HOME/gotip/src/cmd/dist/util.go)
:	142$HOME/gotip/src/os/file.go +:0xcb118 fp= +0xc000214fe00x5e sp= fp=0xc000214f480xc00014e658 pc= sp=0x51062b0xc00014e600
 pc=0x49005eruntime.goexit
()
	$HOME/gotip/src/runtime/asm_amd64.sbytes.(*Buffer).ReadFrom:(15940xc00022ce10 +0x1 fp=0xc000214fe8,  sp={0xc000214fe00x59b320 pc=, 0x461c410xc0002302f8
}created by )
main.bginit	
$HOME/gotip/src/bytes/buffer.go	:$HOME/gotip/src/cmd/dist/util.go202: +1350x98 + fp=0x3c0xc00014e6b0
 sp=
0xc00014e658goroutine  pc=360x4c3c98 [
select]:
runtime.gopark(0xc0002157a8?, io.copyBuffer0x2(?{, 0x59b1400x0, ?0xc00022ce10, }0x0, ?{, 0x59b3200xc000215784, ?0xc0002302f8)
}	, $HOME/gotip/src/runtime/proc.go{:0x0363,  +0x00xd6,  fp=0x00xc000215608} sp=)
0xc0002155e8	 pc=$HOME/gotip/src/io/io.go0x436276:
412 +runtime.selectgo0x14b( fp=0xc0002157a80xc00014e730,  sp=0xc0002157800xc00014e6b0,  pc=0x00x489ccb?
, io.Copy0x0(...)
, 	0x0$HOME/gotip/src/io/io.go?:, 3850x1
)
os/exec.(*Cmd).writerDescriptor.func1	($HOME/gotip/src/runtime/select.go)
:	328$HOME/gotip/src/os/exec/exec.go +:0x7bc402 fp= +0xc0002157480x3a sp= fp=0xc0002156080xc00014e790 pc= sp=0x4454fc0xc00014e730
 pc=0x4df83amain.bghelper
()
os/exec.(*Cmd).Start.func1	($HOME/gotip/src/cmd/dist/util.go0x0:?142)
 +	0xcb$HOME/gotip/src/os/exec/exec.go fp=:0xc0002157e0539 sp= +0xc0002157480x25 pc= fp=0x51062b0xc00014e7c8
 sp=runtime.goexit0xc00014e790( pc=)
0x4e05e5
	os/exec.(*Cmd).Start.func2$HOME/gotip/src/runtime/asm_amd64.s(:)
1594	 +$HOME/gotip/src/os/exec/exec.go0x1: fp=5400xc0002157e8 + sp=0x2a0xc0002157e0 fp= pc=0xc00014e7e00x461c41 sp=
0xc00014e7c8created by  pc=main.bginit0x4e058a

	runtime.goexit$HOME/gotip/src/cmd/dist/util.go(:)
135	 +$HOME/gotip/src/runtime/asm_amd64.s0x3c:
1594
 +rax    0x10xfffffffffffffffc fp=
0xc00014e7e8rbx     sp=0x00xc00014e7e0
 pc=rcx    0x4687a10x463d20

created by rdx    os/exec.(*Cmd).Start0x80

	rdi    $HOME/gotip/src/os/exec/exec.go0x4:
538rsi     +0x7fff1ab809980x73e

rbp    0x7fff1ab80f98
rsp    0x7fff1ab80940
r8     0x0

goroutine r9     930x8 [
syscallr10    ]:
0xffffffff
r11    0x246
r12    0x7fff1ab809c8syscall.Syscall6
(r13    0x18
r14    0x62be80
r15    0x7fa9a2e45d130x1
?rip    , 0x463d200x1
?rflags , 0x2460x0
?cs     , 0x330xc00043ec58
?fs     , 0x00x40ce1f
?gs     , 0x00xc00043ec68
?, 0x30?)
	$HOME/gotip/src/syscall/syscall_linux.go:89 +0x36 fp=0xc00043ebc8 sp=0xc00043eb40 pc=0x475296
os.(*Process).blockUntilWaitable(0xc000134450)
	$HOME/gotip/src/os/wait_waitid.go:32 +0x87 fp=0xc00043eca0 sp=0xc00043ebc8 pc=0x494487
os.(*Process).wait(0xc000134450)
	$HOME/gotip/src/os/exec_unix.go:22 +0x28 fp=0xc00043ed00 sp=0xc00043eca0 pc=0x48fac8
os.(*Process).Wait(...)
	$HOME/gotip/src/os/exec.go:132
os/exec.(*Cmd).Wait(0xc0002678c0)
	$HOME/gotip/src/os/exec/exec.go:594 +0x4b fp=0xc00043ed78 sp=0xc00043ed00 pc=0x4e06ab
os/exec.(*Cmd).Run(0x4d37a0?)
	$HOME/gotip/src/os/exec/exec.go:432 +0x39 fp=0xc00043ed98 sp=0xc00043ed78 pc=0x4df8b9
os/exec.(*Cmd).CombinedOutput(0xc0002678c0)
	$HOME/gotip/src/os/exec/exec.go:702 +0xa8 fp=0xc00043edc0 sp=0xc00043ed98 pc=0x4e0f28
misc/cgo/testsanitizers_test.mustRun(0xc0004364e0, 0xc0002678c0)
	$HOME/gotip/misc/cgo/testsanitizers/cc_test.go:93 +0x4a fp=0xc00043ee58 sp=0xc00043edc0 pc=0x52a0aa
misc/cgo/testsanitizers_test.TestTSAN.func1(0xc0004364e0)
	$HOME/gotip/misc/cgo/testsanitizers/tsan_test.go:53 +0x2b8 fp=0xc00043ef70 sp=0xc00043ee58 pc=0x52ee78
testing.tRunner(0xc0004364e0, 0xc000418300)
	$HOME/gotip/src/testing/testing.go:1446 +0x10b fp=0xc00043efc0 sp=0xc00043ef70 pc=0x4d402b
testing.(*T).Run.func1()
	$HOME/gotip/src/testing/testing.go:1493 +0x2a fp=0xc00043efe0 sp=0xc00043efc0 pc=0x4d4eca
runtime.goexit()
	$HOME/gotip/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc00043efe8 sp=0xc00043efe0 pc=0x4687a1
created by testing.(*T).Run
	$HOME/gotip/src/testing/testing.go:1493 +0x35f

rax    0xca
rbx    0x0
rcx    0x46a6a3
rdx    0x0
rdi    0x672e68
rsi    0x80
rbp    0x7ffccfc29b90
rsp    0x7ffccfc29b48
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0xc000036f00
r13    0x2
r14    0x672780
r15    0xffffffffffffffff
rip    0x46a6a1
rflags 0x286
cs     0x33
fs     0x0
gs     0x0

real	70m15.451s
user	0m3.040s
sys	0m1.621s

attempt 190

[...]

This is the tracking issue for improving the test not to hang indefinitely, if possible.

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 19, 2022
@dmitshur dmitshur added this to the Backlog milestone May 19, 2022
@bcmills
Copy link
Contributor

bcmills commented May 19, 2022

Unfortunately in the above trace the test is blocked on running a subprocess in a table-driven test:
https://cs.opensource.google/go/go/+/master:misc/cgo/testsanitizers/tsan_test.go;l=53;drc=cbdf9ade5604917593af3ce5ce576a465edfca83

We probably need to send SIGQUIT to that subprocess to figure out what's going on (#50436), although just having information about which subtest stalled would help somewhat (#39038).

It's odd that the test itself doesn't time out, though.

@ianlancetaylor
Copy link
Contributor

cmd/dist runs the test by first running go test -c and then running the generated test binary. When it runs the test binary it does not pass a -test.timeout option. The default timeout when you run a test binary is to have no timeout. So, there is no timeout.

I will send a CL for that.

@gopherbot
Copy link

Change https://go.dev/cl/407374 mentions this issue: cmd/dist: pass a -test.timeout option to a host test

@dmitshur
Copy link
Contributor Author

Thanks Ian.

For the record, my upcoming coordinator change was going to include a 5 minute timeout on the x/build side via DistTestsExecTimeout:

 // DistTestsExecTimeout returns how long the coordinator should wait
 // for a cmd/dist test execution to run the provided dist test names.
 func (c *BuildConfig) DistTestsExecTimeout(distTests []string) time.Duration {
+	if len(distTests) == 1 && distTests[0] == "testsanitizers" {
+		// This test is known to complete in under a minute normally,
+		// but can sometimes stall indefinitely; see go.dev/issue/52998.
+		// This cap may help a bit until that issue is resolved.
+		return 5 * time.Minute
+	}
+
 	...

Since CL 407374 does the same in the test, that won't be necessary anymore (I had it just in case progress on this issue wasn't going to be as quick).

gopherbot pushed a commit that referenced this issue May 19, 2022
For a host test we build the test using "go test -c" and then run the
test binary. A test binary run in this way has no default timeout.
This CL gives it a timeout of 5 minutes, scaled for the target.
We can adjust the timeout if necessary.

For #52998

Change-Id: Ib759142f3e71cbb37ec858182998fc5d4fba7ab6
Reviewed-on: https://go-review.googlesource.com/c/go/+/407374
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
@gopherbot
Copy link

Change https://go.dev/cl/407954 mentions this issue: misc/cgo/testsanitizers: terminate commands with SIGQUIT if hung

@bcmills
Copy link
Contributor

bcmills commented May 23, 2022

I modified the test itself to use the testing.T deadline (https://go.dev/cl/407954) and ran it with some short timeouts, and came up with two occurrences of this stack trace from TestTSAN/tsan12:

--- FAIL: TestTSAN (0.47s)
    --- FAIL: TestTSAN/tsan12 (2.37s)
        tsan_test.go:53: /tmp/TestTSAN2162129093/tsan12 exited with exit status 2
            SIGQUIT: quit
            PC=0x52bfc3 m=0 sigcode=0

            goroutine 0 [idle]:
            runtime.futex()
                /usr/local/google/home/bcmills/go/src/runtime/sys_linux_amd64.s:560 +0x23 fp=0x7fff9b494540 sp=0x7fff9b494538 pc=0x52bfc3
            runtime.asmcgocall(0x80, 0x7fff9b494570)
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:874 +0xa1 fp=0x7fff9b494548 sp=0x7fff9b494540 pc=0x529e21

            goroutine 1 [chan receive]:
            runtime.gopark(0x55c0001821c4?, 0x55c0001a6000?, 0xe0?, 0xee?, 0x56660b?)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:363 +0xd6 fp=0x55c00005ee58 sp=0x55c00005ee38 pc=0x4fefd6
            runtime.chanrecv(0x55c00008e0c0, 0x0, 0x1)
                /usr/local/google/home/bcmills/go/src/runtime/chan.go:583 +0x49b fp=0x55c00005eee8 sp=0x55c00005ee58 pc=0x4cfefb
            runtime.chanrecv1(0x27?, 0x0?)
                /usr/local/google/home/bcmills/go/src/runtime/chan.go:442 +0x18 fp=0x55c00005ef10 sp=0x55c00005eee8 pc=0x4cfa38
            main.main()
                /usr/local/google/home/bcmills/go/misc/cgo/testsanitizers/testdata/tsan12.go:34 +0x113 fp=0x55c00005ef80 sp=0x55c00005ef10 pc=0x568353
            runtime.main()
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:250 +0x212 fp=0x55c00005efe0 sp=0x55c00005ef80 pc=0x4fec12
            runtime.goexit()
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0x55c00005efe8 sp=0x55c00005efe0 pc=0x52a121

            goroutine 2 [force gc (idle)]:
            runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:363 +0xd6 fp=0x55c000050fb0 sp=0x55c000050f90 pc=0x4fefd6
            runtime.goparkunlock(...)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:369
            runtime.forcegchelper()
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:302 +0xad fp=0x55c000050fe0 sp=0x55c000050fb0 pc=0x4fee6d
            runtime.goexit()
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0x55c000050fe8 sp=0x55c000050fe0 pc=0x52a121
            created by runtime.init.6
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:290 +0x25

            goroutine 3 [GC sweep wait]:
            runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:363 +0xd6 fp=0x55c000051790 sp=0x55c000051770 pc=0x4fefd6
            runtime.goparkunlock(...)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:369
            runtime.bgsweep(0x0?)
                /usr/local/google/home/bcmills/go/src/runtime/mgcsweep.go:278 +0x8e fp=0x55c0000517c8 sp=0x55c000051790 pc=0x4ec14e
            runtime.gcenable.func1()
                /usr/local/google/home/bcmills/go/src/runtime/mgc.go:178 +0x26 fp=0x55c0000517e0 sp=0x55c0000517c8 pc=0x4e14a6
            runtime.goexit()
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0x55c0000517e8 sp=0x55c0000517e0 pc=0x52a121
            created by runtime.gcenable
                /usr/local/google/home/bcmills/go/src/runtime/mgc.go:178 +0x6b

            goroutine 4 [GC scavenge wait]:
            runtime.gopark(0x55c000100000?, 0x5b8778?, 0x1?, 0x0?, 0x0?)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:363 +0xd6 fp=0x55c000051f70 sp=0x55c000051f50 pc=0x4fefd6
            runtime.goparkunlock(...)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:369
            runtime.(*scavengerState).park(0x1091e60)
                /usr/local/google/home/bcmills/go/src/runtime/mgcscavenge.go:389 +0x53 fp=0x55c000051fa0 sp=0x55c000051f70 pc=0x4ea233
            runtime.bgscavenge(0x0?)
                /usr/local/google/home/bcmills/go/src/runtime/mgcscavenge.go:617 +0x45 fp=0x55c000051fc8 sp=0x55c000051fa0 pc=0x4ea805
            runtime.gcenable.func2()
                /usr/local/google/home/bcmills/go/src/runtime/mgc.go:179 +0x26 fp=0x55c000051fe0 sp=0x55c000051fc8 pc=0x4e1446
            runtime.goexit()
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0x55c000051fe8 sp=0x55c000051fe0 pc=0x52a121
            created by runtime.gcenable
                /usr/local/google/home/bcmills/go/src/runtime/mgc.go:179 +0xaa

            goroutine 18 [finalizer wait]:
            runtime.gopark(0x1092540?, 0x55c0000824e0?, 0x0?, 0x0?, 0x55c000050770?)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:363 +0xd6 fp=0x55c000050628 sp=0x55c000050608 pc=0x4fefd6
            runtime.goparkunlock(...)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:369
            runtime.runfinq()
                /usr/local/google/home/bcmills/go/src/runtime/mfinal.go:180 +0x10f fp=0x55c0000507e0 sp=0x55c000050628 pc=0x4e05af
            runtime.goexit()
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0x55c0000507e8 sp=0x55c0000507e0 pc=0x52a121
            created by runtime.createfing
                /usr/local/google/home/bcmills/go/src/runtime/mfinal.go:157 +0x45

            goroutine 19 [select, locked to thread]:
            runtime.gopark(0x55c00004c7a8?, 0x2?, 0x57?, 0xf3?, 0x55c00004c7a4?)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:363 +0xd6 fp=0x55c00004c618 sp=0x55c00004c5f8 pc=0x4fefd6
            runtime.selectgo(0x55c00004c7a8, 0x55c00004c7a0, 0x0?, 0x0, 0x0?, 0x1)
                /usr/local/google/home/bcmills/go/src/runtime/select.go:328 +0x7bc fp=0x55c00004c758 sp=0x55c00004c618 pc=0x50e2dc
            runtime.ensureSigM.func1()
                /usr/local/google/home/bcmills/go/src/runtime/signal_unix.go:979 +0x1b0 fp=0x55c00004c7e0 sp=0x55c00004c758 pc=0x512350
            runtime.goexit()
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0x55c00004c7e8 sp=0x55c00004c7e0 pc=0x52a121
            created by runtime.ensureSigM
                /usr/local/google/home/bcmills/go/src/runtime/signal_unix.go:962 +0xbd

            goroutine 5 [syscall]:
            runtime.notetsleepg(0xffffffffffffffff?, 0x55c000052728?)
                /usr/local/google/home/bcmills/go/src/runtime/lock_futex.go:236 +0x34 fp=0x55c0000527a0 sp=0x55c000052768 pc=0x4d4e34
            os/signal.signal_recv()
                /usr/local/google/home/bcmills/go/src/runtime/sigqueue.go:152 +0x2f fp=0x55c0000527c0 sp=0x55c0000527a0 pc=0x526dcf
            os/signal.loop()
                /usr/local/google/home/bcmills/go/src/os/signal/signal_unix.go:23 +0x19 fp=0x55c0000527e0 sp=0x55c0000527c0 pc=0x5681b9
            runtime.goexit()
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0x55c0000527e8 sp=0x55c0000527e0 pc=0x52a121
            created by os/signal.Notify.func1.1
                /usr/local/google/home/bcmills/go/src/os/signal/signal.go:151 +0x2a

            rax    0xfffffffffffffffc
            rbx    0x0
            rcx    0x52bfc3
            rdx    0x0
            rdi    0x1092688
            rsi    0x80
            rbp    0x7fff9b494580
            rsp    0x7fff9b494538
            r8     0x0
            r9     0x0
            r10    0x7fff9b494570
            r11    0x202
            r12    0x505660
            r13    0x58
            r14    0x1091fa0
            r15    0x7f85b45f450f
            rip    0x52bfc3
            rflags 0x202
            cs     0x33
            fs     0x0
            gs     0x0
FAIL
FAIL    misc/cgo/testsanitizers 2.853s
FAIL

That test program is the regression test for https://go.dev/cl/50251. It calls signal.Notify with SIGUSR1, then execs the true command in a subprocess, then sends itself SIGUSR1 and waits for the resulting signal to be delivered.

I don't see any runnable goroutines in the stack dump, so this smells like a runtime deadlock to me (attn @golang/runtime).

@cherrymui
Copy link
Member

cherrymui commented May 23, 2022

TSAN intercepts signals and invokes the signal handler from TSAN at some point(?). Maybe that never happens? Not sure if we actively trigger TSAN to deliver the signal in this case.

@bcmills
Copy link
Contributor

bcmills commented May 23, 2022

Generally in TSAN mode we call cgo_yield from the runtime to trigger signal delivery. (But maybe that's the runtime.asmcgocall in this trace?)

@cherrymui
Copy link
Member

But maybe that's the runtime.asmcgocall in this trace?

Possible.

            runtime.futex()
                /usr/local/google/home/bcmills/go/src/runtime/sys_linux_amd64.s:560 +0x23 fp=0x7fff9b494540 sp=0x7fff9b494538 pc=0x52bfc3
            runtime.asmcgocall(0x80, 0x7fff9b494570)
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:874 +0xa1 fp=0x7fff9b494548 sp=0x7fff9b494540 pc=0x529e21

This stack is weird. I don't think runtime.futex is ever called from asmcgocall.

gopherbot pushed a commit that referenced this issue May 23, 2022
If the test hangs due to a deadlock in a subprocess, we want a
goroutine dump of that process to figure out the nature of the
deadlock. SIGQUIT causes the Go runtime to produce exactly
such a dump (unless the runtime itself is badly deadlocked).

For #52998.

Change-Id: Id9b3ba89d8f705e14f6cd789353fc2b7f4774ad3
Reviewed-on: https://go-review.googlesource.com/c/go/+/407954
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@bcmills
Copy link
Contributor

bcmills commented May 23, 2022

I upped the timeout and ran more iterations of tsan12, and caught another failure. Same stack trace, much longer hang. This looks like our culprit test, although the failure rate is low enough that I'm not sure it's feasible to git bisect.

--- FAIL: TestTSAN (0.49s)
    --- FAIL: TestTSAN/tsan12 (53.64s)
        tsan_test.go:53: /tmp/TestTSAN3009813241/tsan12 exited with exit status 2
            SIGQUIT: quit
            PC=0x52bfc3 m=0 sigcode=0

            goroutine 0 [idle]:
            runtime.futex()
                /usr/local/google/home/bcmills/go/src/runtime/sys_linux_amd64.s:560 +0x23 fp=0x7ffc7746c660 sp=0x7ffc7746c658 pc=0x52bfc3
            runtime.asmcgocall(0x80, 0x7ffc7746c690)
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:874 +0xa1 fp=0x7ffc7746c668 sp=0x7ffc7746c660 pc=0x529e21

            goroutine 1 [chan receive]:
            runtime.gopark(0x55c0000203d4?, 0x55c00019e000?, 0xe0?, 0x3e?, 0x56660b?)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:363 +0xd6 fp=0x55c000063e58 sp=0x55c000063e38 pc=0x4fefd6
            runtime.chanrecv(0x55c00002a0c0, 0x0, 0x1)
                /usr/local/google/home/bcmills/go/src/runtime/chan.go:583 +0x49b fp=0x55c000063ee8 sp=0x55c000063e58 pc=0x4cfefb
            runtime.chanrecv1(0x27?, 0x0?)
                /usr/local/google/home/bcmills/go/src/runtime/chan.go:442 +0x18 fp=0x55c000063f10 sp=0x55c000063ee8 pc=0x4cfa38
            main.main()
                /usr/local/google/home/bcmills/go/misc/cgo/testsanitizers/testdata/tsan12.go:34 +0x113 fp=0x55c000063f80 sp=0x55c000063f10 pc=0x568353
            runtime.main()
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:250 +0x212 fp=0x55c000063fe0 sp=0x55c000063f80 pc=0x4fec12
            runtime.goexit()
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0x55c000063fe8 sp=0x55c000063fe0 pc=0x52a121

            goroutine 2 [force gc (idle)]:
            runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:363 +0xd6 fp=0x55c000050fb0 sp=0x55c000050f90 pc=0x4fefd6
            runtime.goparkunlock(...)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:369
            runtime.forcegchelper()
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:302 +0xad fp=0x55c000050fe0 sp=0x55c000050fb0 pc=0x4fee6d
            runtime.goexit()
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0x55c000050fe8 sp=0x55c000050fe0 pc=0x52a121
            created by runtime.init.6
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:290 +0x25

            goroutine 18 [GC sweep wait]:
            runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:363 +0xd6 fp=0x55c00004c790 sp=0x55c00004c770 pc=0x4fefd6
            runtime.goparkunlock(...)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:369
            runtime.bgsweep(0x0?)
                /usr/local/google/home/bcmills/go/src/runtime/mgcsweep.go:278 +0x8e fp=0x55c00004c7c8 sp=0x55c00004c790 pc=0x4ec14e
            runtime.gcenable.func1()
                /usr/local/google/home/bcmills/go/src/runtime/mgc.go:178 +0x26 fp=0x55c00004c7e0 sp=0x55c00004c7c8 pc=0x4e14a6
            runtime.goexit()
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0x55c00004c7e8 sp=0x55c00004c7e0 pc=0x52a121
            created by runtime.gcenable
                /usr/local/google/home/bcmills/go/src/runtime/mgc.go:178 +0x6b

            goroutine 19 [GC scavenge wait]:
            runtime.gopark(0x55c00008e000?, 0x5b8778?, 0x1?, 0x0?, 0x0?)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:363 +0xd6 fp=0x55c00004cf70 sp=0x55c00004cf50 pc=0x4fefd6
            runtime.goparkunlock(...)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:369
            runtime.(*scavengerState).park(0x1091e60)
                /usr/local/google/home/bcmills/go/src/runtime/mgcscavenge.go:389 +0x53 fp=0x55c00004cfa0 sp=0x55c00004cf70 pc=0x4ea233
            runtime.bgscavenge(0x0?)
                /usr/local/google/home/bcmills/go/src/runtime/mgcscavenge.go:617 +0x45 fp=0x55c00004cfc8 sp=0x55c00004cfa0 pc=0x4ea805
            runtime.gcenable.func2()
                /usr/local/google/home/bcmills/go/src/runtime/mgc.go:179 +0x26 fp=0x55c00004cfe0 sp=0x55c00004cfc8 pc=0x4e1446
            runtime.goexit()
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0x55c00004cfe8 sp=0x55c00004cfe0 pc=0x52a121
            created by runtime.gcenable
                /usr/local/google/home/bcmills/go/src/runtime/mgc.go:179 +0xaa

            goroutine 3 [finalizer wait]:
            runtime.gopark(0x1092540?, 0x55c0001041a0?, 0x0?, 0x0?, 0x55c000050770?)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:363 +0xd6 fp=0x55c000050628 sp=0x55c000050608 pc=0x4fefd6
            runtime.goparkunlock(...)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:369
            runtime.runfinq()
                /usr/local/google/home/bcmills/go/src/runtime/mfinal.go:180 +0x10f fp=0x55c0000507e0 sp=0x55c000050628 pc=0x4e05af
            runtime.goexit()
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0x55c0000507e8 sp=0x55c0000507e0 pc=0x52a121
            created by runtime.createfing
                /usr/local/google/home/bcmills/go/src/runtime/mfinal.go:157 +0x45

            goroutine 4 [select, locked to thread]:
            runtime.gopark(0x55c0000517a8?, 0x2?, 0x57?, 0xf3?, 0x55c0000517a4?)
                /usr/local/google/home/bcmills/go/src/runtime/proc.go:363 +0xd6 fp=0x55c000051618 sp=0x55c0000515f8 pc=0x4fefd6
            runtime.selectgo(0x55c0000517a8, 0x55c0000517a0, 0x0?, 0x0, 0x0?, 0x1)
                /usr/local/google/home/bcmills/go/src/runtime/select.go:328 +0x7bc fp=0x55c000051758 sp=0x55c000051618 pc=0x50e2dc
            runtime.ensureSigM.func1()
                /usr/local/google/home/bcmills/go/src/runtime/signal_unix.go:979 +0x1b0 fp=0x55c0000517e0 sp=0x55c000051758 pc=0x512350
            runtime.goexit()
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0x55c0000517e8 sp=0x55c0000517e0 pc=0x52a121
            created by runtime.ensureSigM
                /usr/local/google/home/bcmills/go/src/runtime/signal_unix.go:962 +0xbd

            goroutine 5 [syscall]:
            runtime.notetsleepg(0xffffffffffffffff?, 0x55c000051f28?)
                /usr/local/google/home/bcmills/go/src/runtime/lock_futex.go:236 +0x34 fp=0x55c000051fa0 sp=0x55c000051f68 pc=0x4d4e34
            os/signal.signal_recv()
                /usr/local/google/home/bcmills/go/src/runtime/sigqueue.go:152 +0x2f fp=0x55c000051fc0 sp=0x55c000051fa0 pc=0x526dcf
            os/signal.loop()
                /usr/local/google/home/bcmills/go/src/os/signal/signal_unix.go:23 +0x19 fp=0x55c000051fe0 sp=0x55c000051fc0 pc=0x5681b9
            runtime.goexit()
                /usr/local/google/home/bcmills/go/src/runtime/asm_amd64.s:1594 +0x1 fp=0x55c000051fe8 sp=0x55c000051fe0 pc=0x52a121
            created by os/signal.Notify.func1.1
                /usr/local/google/home/bcmills/go/src/os/signal/signal.go:151 +0x2a

            rax    0xfffffffffffffffc
            rbx    0x0
            rcx    0x52bfc3
            rdx    0x0
            rdi    0x1092688
            rsi    0x80
            rbp    0x7ffc7746c6a0
            rsp    0x7ffc7746c658
            r8     0x0
            r9     0x0
            r10    0x7ffc7746c690
            r11    0x202
            r12    0x505660
            r13    0x58
            r14    0x1091fa0
            r15    0x7f16b7ff3a06
            rip    0x52bfc3
            rflags 0x202
            cs     0x33
            fs     0x0
            gs     0x0
FAIL
FAIL    misc/cgo/testsanitizers 54.147s
FAIL

@bcmills bcmills changed the title misc/cgo/testsanitizers: occasionally hangs indefinitely misc/cgo/testsanitizers: occasional hangs in TestTSAN/tsan12 May 23, 2022
@bcmills
Copy link
Contributor

bcmills commented May 23, 2022

This failure reproduces fairly readily (well within a day's CPU time) on linux/amd64 (which is a first class port), and the failure rate without the retry-masking from #36629 is fairly high, so I think this is a release-blocker for Go 1.19.

However, given that we have evidence that it may have existed for a long time (masked by #42699), I think it would be ok to unblock the release by adding a skip to the tsan12 subtest and moving this issue to either the Go 1.20 milestone or the backlog. (I will leave that up to @golang/runtime to decide.)

greplogs -l -e 'panic: test timed out (?:.*\n)*misc/cgo/testsanitizers'
2022-05-20T23:05:38-c3470ca/linux-amd64-wsl
2022-05-20T21:32:57-7ec6ef4/linux-amd64-nounified

@bcmills bcmills modified the milestones: Backlog, Go1.19 May 23, 2022
@cherrymui
Copy link
Member

I'll look into it.

@gopherbot
Copy link

Change https://go.dev/cl/407888 mentions this issue: misc/cgo/testsanitizers: use buffered channel in tsan12.go

@bcmills
Copy link
Contributor

bcmills commented May 24, 2022

@gopherbot, please backport to Go 1.18 and 1.17. This is a test-only fix for a test that occasionally deadlocks, which can cause TryBot stalls and spurious failures on release branches.

@gopherbot
Copy link

Backport issue(s) opened: #53042 (for 1.17), #53043 (for 1.18).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot
Copy link

Change https://go.dev/cl/408114 mentions this issue: [release-branch.go1.18] misc/cgo/testsanitizers: use buffered channel in tsan12.go

@gopherbot
Copy link

Change https://go.dev/cl/408115 mentions this issue: [release-branch.go1.17] misc/cgo/testsanitizers: use buffered channel in tsan12.go

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 24, 2022
gopherbot pushed a commit that referenced this issue May 25, 2022
… in tsan12.go

os/signal.Notify requires that "the caller must ensure that c has
sufficient buffer space to keep up with the expected signal rate"
as it does a nonblocking send when it receives a signal. The test
currently using a unbuffered channel, which means it may miss the
signal if the signal arrives before the channel receive operation.

Fixes #53043.
Updates #52998.

Change-Id: Icdcab9396d735506480ef880fb45a4669fa7cc8f
Reviewed-on: https://go-review.googlesource.com/c/go/+/407888
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
(cherry picked from commit 62e1302)
Reviewed-on: https://go-review.googlesource.com/c/go/+/408114
gopherbot pushed a commit that referenced this issue May 25, 2022
… in tsan12.go

os/signal.Notify requires that "the caller must ensure that c has
sufficient buffer space to keep up with the expected signal rate"
as it does a nonblocking send when it receives a signal. The test
currently using a unbuffered channel, which means it may miss the
signal if the signal arrives before the channel receive operation.

Fixes #53042.
Updates #52998.

Change-Id: Icdcab9396d735506480ef880fb45a4669fa7cc8f
Reviewed-on: https://go-review.googlesource.com/c/go/+/407888
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
(cherry picked from commit 62e1302)
Reviewed-on: https://go-review.googlesource.com/c/go/+/408115
@golang golang locked and limited conversation to collaborators May 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

5 participants