-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime/pprof: TestMorestack failures aix-ppc64 builder #38567
Comments
Are atomic memory operations implemented using syscalls on aix/ppc64? Almost every single sample collected has the same stack trace. It’s sitting in a semaphore syscall deep in the scheduler. I don’t really see how the pprof CL could have caused that, but I’ll revert it to see experimentally whether it fixes it.
|
I wonder whether we have somehow ended up with the profiling signals synchronizing with something else in the runtime, so that instead of getting a good random sample we are instead always sampling the scheduler. |
https://golang.org/cl/228886 has been reverted. Let’s see whether that fixes the builders... |
If you're speaking of sysMap and similar functions, yes there are using syscalls. However, these syscalls are using asmcgocall directly without a lock, so I don't think there is any semaphores involved in. |
@bcmills @josharian Can you verify if reverting the CL resolved this issue? Thanks! |
|
No failures since 2020-04-24. I looked more closely at 2020-04-24T08:21:27-82f2989/aix-ppc64, which happened after CL 228886 was reverted. That failure might actually be different: almost all of the samples are in sysmon -> usleep, whereas in the previous failures before 228886 was reverted, most samples were in stopm -> sem_wait. |
@josharian , since I'm not particularly familiar with the likely culprit CL, could you look at the 2020-04-24T08:21:27-82f2989/aix-ppc64 failure. Does it look like an unrelated flake? |
@aclements I was mystified as to how the likely culprit CL caused the failures in the first place, so I am probably not the right person to ask. :) |
Helflym, who helps for the maintenance of Go on AIX, is in vacations and will be back 6th of July. |
Thanks. Since AIX isn't a first-class port, I'm going to drop the release-blocker tag on this. |
No failures since April. findflakes estimates a 0.00% change that this is still happening. I'm calling this one fixed. |
2020-04-21T11:41:40-664d270/aix-ppc64
2020-04-21T08:08:49-17fbc81/aix-ppc64
2020-04-20T22:42:49-2edd351/aix-ppc64
This failure mode started very recently, so it could perhaps be due to CL 228886 (CC @josharian @hyangah). Tentatively marking as release-blocker because this appears to be a regression in 1.15.
#38316 may be related (CC @trex58, @Helflym).
The text was updated successfully, but these errors were encountered: