-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/build: android-amd64-emu builder timeouts in runtime tests #32109
Comments
I see no goroutines blocked for 1min+ (no substring "min" in either of those logs), and everything looks mostly in state running. Maybe these are just legit timeouts after 3 minutes? Crank up GO_TEST_TIMEOUT_SCALE to perhaps just 2 or 3 on this builder? |
I find it odd that the timeout is in minutes but the reported time after "FAIL" is seconds, sometimes less than 10 seconds. |
I managed to reproduce the runtime crash with the odd timings. I had the following change applied to my local GOROOT:
The crash was:
Building Go cmd/dist using /go1.4.
Building Go toolchain1 using /go1.4.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Building packages and commands for host, linux/amd64.
Building packages and commands for target, android/amd64.
Testing packages.ok archive/tar 25.379s t0 2019-05-24 17:46:14.59233666 +0000 UTC m=+4.543144475 now 2019-05-24 17:46:39.971607355 +0000 UTC m=+29.922415187 goroutine 24266 [running]: goroutine 1 [chan receive, locked to thread]: goroutine 2 [force gc (idle)]: goroutine 3 [GC sweep wait]: goroutine 4 [runnable]: goroutine 5 [finalizer wait]: goroutine 6 [timer goroutine (idle)]: goroutine 22510 [runnable]: goroutine 19 [syscall]: goroutine 35 [timer goroutine (idle)]: goroutine 62 [timer goroutine (idle)]: goroutine 22695 [runnable]: goroutine 22696 [runnable]: goroutine 22744 [runnable]: goroutine 22718 [runnable]: goroutine 22717 [runnable]: goroutine 22544 [runnable]: goroutine 22505 [runnable]: goroutine 22029 [runnable]: goroutine 22023 [runnable]: goroutine 22517 [runnable]: goroutine 22030 [runnable]: goroutine 22508 [runnable]: goroutine 22507 [runnable]: goroutine 22513 [runnable]: goroutine 22022 [runnable]: goroutine 22028 [runnable]: goroutine 22511 [runnable]: goroutine 23225 [runnable]: goroutine 22026 [runnable]: goroutine 22543 [runnable]: goroutine 22509 [runnable]: goroutine 22032 [runnable]: goroutine 22025 [runnable]: goroutine 22518 [runnable]: goroutine 22515 [runnable]: goroutine 22519 [runnable]: goroutine 22516 [runnable]: goroutine 22024 [runnable]: goroutine 22514 [runnable]: goroutine 22033 [runnable]: goroutine 22716 [runnable]: goroutine 23226 [GC worker (idle)]: goroutine 24344 [runnable]: goroutine 24276 [GC worker (idle)]: Notice the first line from the panic:
As far as I can tell, time.AfterFunc was fired after ~10 seconds, way before the 9 minutes were up. That only 10 seconds in real time has passed is confirmed by the FAIL line printed by the go tool (the go tool run on the host on the android emu builders):
|
With the revised debug change:
I managed to reproduce a crash with bogus monotonic time readings: Building Go cmd/dist using /go1.4. Building Go toolchain1 using /go1.4. Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1. Building Go toolchain2 using go_bootstrap and Go toolchain1. Building Go toolchain3 using go_bootstrap and Go toolchain2. Building packages and commands for host, linux/amd64. Building packages and commands for target, android/amd64. Adding proper spacing, the interesting lines are:
It's interesting that both the timerProc and startAlarm readings produced similarly bogus values. |
/cc @aclements @cherrymui @ianlancetaylor for timer debugging help |
Note that the bogus values are curiously similar to the unix time since the 1970 epoch (currently 1558861037 seconds). |
I think I got it. The nanotime implementation for android/amd64 calls clock_gettime(CLOCK_MONOTONIC, ...) if it is found in the vDSO, and falls back to gettimeofday if not. However, there is a runtime test, TestClockVDSOAndFallbackPaths that tests the fallback by temporarily disabling clock_gettime. I believe what happens is that TestClockVDSOAndFallbackPaths disables clock_gettime just as That theory explains why the early timeouts only happen during the runtime tests and also explains the "bogus" nanotime readings: the readings are not wrong in themselves, but the switch of zero time basis invalidates any time deltas computed during the switch. I don't have an obvious fix other than disabling TestClockVDSOAndFallbackPaths. |
I agree and that is what I did in https://golang.org/cl/174679 which is part of my timer rewrite series. Perhaps we should submit just that CL for 1.13. |
I have a +2 for that CL, and I think it's the right thing to do, so I will submit it. |
Change https://golang.org/cl/174679 mentions this issue: |
I got a broken TryBot due to a timeout in android-amd64-emu during the runtime tests:
https://go-review.googlesource.com/c/go/+/177698/3#message-c7cd2a06d8df686d39e9f93c77a50bac3c59694e
https://storage.googleapis.com/go-build-log/a8250e6c/android-amd64-emu_bb36c881.log
I also found a couple recent examples in the dashboard, but nothing in the previous 2 weeks:
https://build.golang.org/log/a3c742fbd107d32e6166995d2821542b81659fca
https://build.golang.org/log/9699d7f03f77276702adb54ea17492fe3d8516ee
/cc @eliasnaur @bradfitz
The text was updated successfully, but these errors were encountered: