Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: test timeout in TestConcurrentTimerReset{,Stop} on aix-ppc64 builder #37894

Closed
bcmills opened this issue Mar 17, 2020 · 8 comments
Closed
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Mar 17, 2020

2020-03-17T00:45:15-2f54081/aix-ppc64

SIGQUIT: quit
PC=0x9000000005a2d7c m=7 sigcode=0

goroutine 0 [idle]:
runtime.osyield()
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/os2_aix.go:619 +0x44 fp=0xa0001000005e708 sp=0xa0001000005e6c8 pc=0x1000355c4
runtime.deltimer(0xa000100000aa148, 0x100269d00)
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/time.go:340 +0x254 fp=0xa0001000005e730 sp=0xa0001000005e708 pc=0x10005be34
time.stopTimer(0xa000100000aa148, 0x100000000000000)
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/time.go:214 +0x2c fp=0xa0001000005e760 sp=0xa0001000005e730 pc=0x10005b71c
time.(*Timer).Stop(...)
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/time/sleep.go:79
time_test.TestConcurrentTimerResetStop.func2(0xa00010000400080, 0xa000100000aa140, 0x2)
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/time/time_test.go:1439 +0x74 fp=0xa0001000005e7a8 sp=0xa0001000005e760 pc=0x1001b47c4
runtime.goexit()
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/asm_ppc64x.s:884 +0x4 fp=0xa0001000005e7a8 sp=0xa0001000005e7a8 pc=0x10006dbc4
created by time_test.TestConcurrentTimerResetStop
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/time/time_test.go:1437 +0xc8

goroutine 197 [running]:
runtime.asmcgocall(0x110001280, 0xa0001000005e690)
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/asm_ppc64x.s:594 +0x38 fp=0xa0001000005e650 sp=0xa0001000005e630 pc=0x10006da78
runtime.syscall0(0x11002f308, 0x0, 0x0)
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/os2_aix.go:187 +0x8c fp=0xa0001000005e6c8 sp=0xa0001000005e650 pc=0x1000348dc
runtime.osyield()
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/os2_aix.go:619 +0x44 fp=0xa0001000005e708 sp=0xa0001000005e6c8 pc=0x1000355c4
runtime.deltimer(0xa000100000aa148, 0x100269d00)
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/time.go:340 +0x254 fp=0xa0001000005e730 sp=0xa0001000005e708 pc=0x10005be34
time.stopTimer(0xa000100000aa148, 0x100000000000000)
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/time.go:214 +0x2c fp=0xa0001000005e760 sp=0xa0001000005e730 pc=0x10005b71c
time.(*Timer).Stop(...)
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/time/sleep.go:79
time_test.TestConcurrentTimerResetStop.func2(0xa00010000400080, 0xa000100000aa140, 0x2)
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/time/time_test.go:1439 +0x74 fp=0xa0001000005e7a8 sp=0xa0001000005e760 pc=0x1001b47c4
runtime.goexit()
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/asm_ppc64x.s:884 +0x4 fp=0xa0001000005e7a8 sp=0xa0001000005e7a8 pc=0x10006dbc4
created by time_test.TestConcurrentTimerResetStop
	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/time/time_test.go:1437 +0xc8

CC @ianlancetaylor @cherrymui

@bcmills
Copy link
Contributor Author

bcmills commented Mar 17, 2020

It's not clear to me whether this indicates a bug in the timer code, a bug in the test, or a bug in the AIX kernel, and whether it applies only to AIX or just happened to be caught there first.

The test is new in Go 1.15 (CL 221077), so marking as release-blocker until we better understand the cause.

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Mar 17, 2020
@bcmills bcmills added this to the Go1.15 milestone Mar 17, 2020
@bcmills bcmills changed the title time: test timeout in TestConcurrentTimerResetStop time: test timeout in TestConcurrentTimerResetStop on aix-ppc64 builder Mar 17, 2020
@bcmills
Copy link
Contributor Author

bcmills commented Mar 17, 2020

CC @trex58 @Helflym @mknyszek

@Helflym
Copy link
Contributor

Helflym commented Mar 17, 2020

First, TestConcurrentTimerReset is also failing.

SIGQUIT: quit
PC=0x900000000532d3c m=0 sigcode=8

goroutine 0 [idle]:
runtime.osyield()
        /opt/freeware/src/packages/BUILD/go-root/src/runtime/os2_aix.go:619 +0x44 fp=0xa000100003bee78 sp=0xa000100003bee38 pc=0x1000355c4
runtime.modtimer(0xa00010000164058, 0x15fd33f628503228, 0x0, 0x100230170, 0x1001d4100, 0xa0001000010fb00, 0x0)
        /opt/freeware/src/packages/BUILD/go-root/src/runtime/time.go:433 +0x35c fp=0xa000100003beeb0 sp=0xa000100003bee78 pc=0x10005c59c
runtime.resettimer(...)
        /opt/freeware/src/packages/BUILD/go-root/src/runtime/time.go:494
time.resetTimer(0xa00010000164058, 0x15fd33f628503228)
        /opt/freeware/src/packages/BUILD/go-root/src/runtime/time.go:223 +0x68 fp=0xa000100003bef08 sp=0xa000100003beeb0 pc=0x10005b7a8
time.(*Timer).Reset(0xa00010000164050, 0x34630b8a948, 0x100000000000000)
        /opt/freeware/src/packages/BUILD/go-root/src/time/sleep.go:127 +0x80 fp=0xa000100003bef50 sp=0xa000100003bef08 pc=0x100089820
time_test.TestConcurrentTimerReset.func1(0xa00010000114060, 0xa00010000164050, 0x4)
        /opt/freeware/src/packages/BUILD/go-root/src/time/time_test.go:1416 +0x7c fp=0xa000100003befa8 sp=0xa000100003bef50 pc=0x1001b464c
runtime.goexit()
        /opt/freeware/src/packages/BUILD/go-root/src/runtime/asm_ppc64x.s:884 +0x4 fp=0xa000100003befa8 sp=0xa000100003befa8 pc=0x10006dbc4
created by time_test.TestConcurrentTimerReset
        /opt/freeware/src/packages/BUILD/go-root/src/time/time_test.go:1413 +0x98

goroutine 180 [running]:
runtime.asmcgocall(0x110000d80, 0xa000100003bee00)
        /opt/freeware/src/packages/BUILD/go-root/src/runtime/asm_ppc64x.s:594 +0x38 fp=0xa000100003bedc0 sp=0xa000100003beda0 pc=0x10006da78
runtime.syscall0(0x11002ee08, 0x0, 0x0)
        /opt/freeware/src/packages/BUILD/go-root/src/runtime/os2_aix.go:187 +0x8c fp=0xa000100003bee38 sp=0xa000100003bedc0 pc=0x1000348dc
runtime.osyield()
        /opt/freeware/src/packages/BUILD/go-root/src/runtime/os2_aix.go:619 +0x44 fp=0xa000100003bee78 sp=0xa000100003bee38 pc=0x1000355c4
runtime.modtimer(0xa00010000164058, 0x15fd33f628503228, 0x0, 0x100230170, 0x1001d4100, 0xa0001000010fb00, 0x0)
        /opt/freeware/src/packages/BUILD/go-root/src/runtime/time.go:433 +0x35c fp=0xa000100003beeb0 sp=0xa000100003bee78 pc=0x10005c59c
runtime.resettimer(...)
        /opt/freeware/src/packages/BUILD/go-root/src/runtime/time.go:494
time.resetTimer(0xa00010000164058, 0x15fd33f628503228)
        /opt/freeware/src/packages/BUILD/go-root/src/runtime/time.go:223 +0x68 fp=0xa000100003bef08 sp=0xa000100003beeb0 pc=0x10005b7a8
time.(*Timer).Reset(0xa00010000164050, 0x34630b8a948, 0x100000000000000)
        /opt/freeware/src/packages/BUILD/go-root/src/time/sleep.go:127 +0x80 fp=0xa000100003bef50 sp=0xa000100003bef08 pc=0x100089820
time_test.TestConcurrentTimerReset.func1(0xa00010000114060, 0xa00010000164050, 0x4)
        /opt/freeware/src/packages/BUILD/go-root/src/time/time_test.go:1416 +0x7c fp=0xa000100003befa8 sp=0xa000100003bef50 pc=0x1001b464c
runtime.goexit()
        /opt/freeware/src/packages/BUILD/go-root/src/runtime/asm_ppc64x.s:884 +0x4 fp=0xa000100003befa8 sp=0xa000100003befa8 pc=0x10006dbc4
created by time_test.TestConcurrentTimerReset
        /opt/freeware/src/packages/BUILD/go-root/src/time/time_test.go:1413 +0x98

goroutine 181 [runnable]:
runtime.resettimer(...)
        /opt/freeware/src/packages/BUILD/go-root/src/runtime/time.go:494
time.resetTimer(0xa00010000164058, 0x15fd33f62852e800)
        /opt/freeware/src/packages/BUILD/go-root/src/runtime/time.go:223 +0x68
time.(*Timer).Reset(0xa00010000164050, 0x34630b8a000, 0x0)
        /opt/freeware/src/packages/BUILD/go-root/src/time/sleep.go:127 +0x80
time_test.TestConcurrentTimerReset.func1(0xa00010000114060, 0xa00010000164050, 0x5)
        /opt/freeware/src/packages/BUILD/go-root/src/time/time_test.go:1416 +0x7c
created by time_test.TestConcurrentTimerReset
        /opt/freeware/src/packages/BUILD/go-root/src/time/time_test.go:1413 +0x98

Secondly, I've managed to reproduce it when launching the whole time testsuite, but not when I'm launching only one of TestConcurrentTimerReset tests.

Finally, it has failed on AIX 7.2TL0 and 7.2TL2, but not yet on 7.2TL3, so it might indeed be a kernel bug.

@bcmills bcmills changed the title time: test timeout in TestConcurrentTimerResetStop on aix-ppc64 builder time: test timeout in TestConcurrentTimerReset{,Stop} on aix-ppc64 builder Mar 24, 2020
@bcmills
Copy link
Contributor Author

bcmills commented Mar 24, 2020

A TestConccurrentTimerReset in the builder:
2020-03-23T21:15:27-fb2a634/aix-ppc64

@gopherbot
Copy link

Change https://golang.org/cl/225497 mentions this issue: runtime: prevent preemption while timer is in timerModifying

gopherbot pushed a commit that referenced this issue Mar 25, 2020
Currently if a goroutine is preempted while owning a timer in the
timerModifying state, it could self-deadlock. When the goroutine is
preempted and calls into the scheduler, it could call checkTimers. If
checkTimers encounters the timerModifying timer and calls runtimer on
it, then runtimer will spin, waiting for that timer to leave the
timerModifying state, which it never will.

So far we got lucky that for the most part that there were no preemption
points while timerModifying is happening, however CL 221077 seems to
have introduced one, leading to sporadic self-deadlocks.

This change disables preemption explicitly while a goroutines holds a
timer in timerModifying. Since only checkTimers (and thus runtimer) is
called from the scheduler, this is sufficient to prevent
preemption-based self-deadlocks.

Fixes #38070.
Updates #37894.

Change-Id: Idbfac310889c92773023733ff7e2ff87e9896f0c
Reviewed-on: https://go-review.googlesource.com/c/go/+/225497
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/225521 mentions this issue: runtime: prevent preemption while timer is in timerModifying

@dmitshur
Copy link
Contributor

The status on March 16th was:

It's not clear to me whether this indicates a bug in the timer code, a bug in the test, or a bug in the AIX kernel, and whether it applies only to AIX or just happened to be caught there first.

There've been some additional test failure instances, findings, and @mknyszek's CL 225497 mentioning this issue since then.

Has that helped narrow down where the problem may be, or does this issue still need more investigation?

@ianlancetaylor
Copy link
Contributor

This hasn't happened since March 23, and it was plausibly fixed by the CLs linked above, so I'm calling this fixed.

@golang golang locked and limited conversation to collaborators Apr 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

5 participants