-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http/pprof: TestDeltaProfile failures missing mutexHog2 on ARM architectures #50218
Comments
Change https://golang.org/cl/372794 mentions this issue: |
It is observed to be flaky on the only openbsd/arm builder. Skipping on that platform until someone can investigate. For #50218 Change-Id: Id3a6dc12b93b3cec67870d8d81bd608c4589c952 Reviewed-on: https://go-review.googlesource.com/c/go/+/372794 Trust: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org>
|
Given the |
Another
|
Change https://go.dev/cl/383997 mentions this issue: |
I suspect it may be possible to use something along the lines of the (I do not plan to follow up on that myself.) |
Marking as release-blocker for Go 1.19 due to the high failure rate on the builders (CC @golang/release). If we don't care about this test, we can merge CL 383997 (mailed ~3 weeks ago) to encode that decision as a test-skip. Otherwise, this test failure needs to be addressed.
2022-03-01T20:52:30-f4722d8/android-arm-corellium |
Given that we have seen failures with the same failure mode on both openbsd/arm and android/arm64, it seems likely that the underlying bug affects at least all ARM-based architectures. It appears that either these architectures are not able to sample at the frequency expected by the test, or the samples are for some reason being dropped. For #50218 Change-Id: I42a6c8ecda57448f8068e8facb42a4a2cecbbb37 Reviewed-on: https://go-review.googlesource.com/c/go/+/383997 Trust: Bryan Mills <bcmills@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>
@golang/runtime needs to decide whether to fix and unskip this test, or to drop the test. |
We should decide what to do here. I'll take a look at this. |
@mknyszek Have you had a chance to revisit this decision? Do we need to make it before beta 1 (aiming for next week)? |
Sorry, I didn't get around to it. I'll do it right now. |
So, this hasn't happened since March. I have a feeling this is related to a few things that were fixed with respect to profiles recently. Curiously, the failures stopped happening on March 7th, and on March 8th https://go.dev/cl/384239 landed to fix #50996. There aren't any other pprof or profile related CLs around this one. The previous one landed in February, and the following one in May. That CL, I think, only affected proto output, but then again that's exactly what the This test is checking a mutex profile, but I imagine it uses the same stack frame information that anything else does. I think it's possible that what's happening in this test is call frames are dropped because these NOPs are actually (created when inlining for a "fake" PC for the inlined function) weren't handled correctly. However, I'm not totally clear on what the failure mode for #50996 is. Specifically, neither I'm inclined to call this fixed. I haven't fully figured out the details, but I think there is a plausible connection here. |
CC @rhysh @prattmic @cherrymui who were involved to varying degrees in https://go.dev/cl/384239. It might be clearer to y'all what exactly was the problem here and whether that CL fixed this particular issue. What's also puzzling me is, if there was a problem handling these NOPs, the failure should intuitively be more deterministic for a mutex profile, but I'm also not terribly familiar with how the mutex profile works. |
At the very least, I'm moving this into the backlog optimistically to keep release progress moving. |
In that case, shall we close the issue until / unless it recurs? |
I'm not opposed, I wanted to give others a chance to chime in, but it's probably less work to just close it and move on. :) If it happens again we can take some of this information forward. |
If I recall correctly, https://go.dev/cl/384239 shouldn't affect this failure. In that case, the failure was that you'd end up with multiple Locations for things that should be the same Location. e.g., one Location including all of the inlined frames, and 2 separate Locations with one containing the inlined frame and the other containing the parent frame. But all of the frames should still appear in the profile. In this case, the mutexHog2 frame is missing entirely, so this seems different. Still, I think it is OK to wait and see if this occurs again. |
https://go.dev/cl/383997 landed on March 8 to skip the test on arm and arm64. So the bug is probably not fixed. Strange that |
Ha, OK. I missed that when I was walking back over the commit log. Thanks @rhysh. I'm going to reopen this. Still not certain this should block the release at this point, but it looks like there's a real problem and it's currently being masked. |
greplogs --dashboard -md -l -e 'FAIL: TestDeltaProfile ' --since=2021-01-01
2021-12-16T00:34:10-7f23145/openbsd-arm-jsing
2021-11-25T00:02:52-b2a5a37/openbsd-arm-jsing
2021-06-04T17:33:24-831f937/openbsd-arm-jsing
(Forked from #38544, which was mitigated in CL 229498.)
CC @4a6f656c @hyangah
The text was updated successfully, but these errors were encountered: