New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: TestPhysicalMemoryUtilization test failures on various builders #49411
Comments
This immediately makes me remember that timer granularity is really coarse on Windows. I bet that's the problem. Sigh. |
Actually, for this test, that shouldn't matter. Hm. Anyway, I'll look into it. |
Fascinating! Thanks for the additional context. So it looks like a 32-bit thing. I wonder if I can get this to fire in Linux with enough runs. |
It's not clear to me whether it's strictly 32-bit — there was one weird |
CC-ing @bufflig |
mknyszek@ do you need some windows-specific work here, like reproducing? |
A reproducer would be super helpful. If you want to take a stab at the problem, also, here's some context: This test is checking to make sure that when the heap grows, the runtime returns memory to the OS to try to make up for it. It constructs a heap with a bunch of multiple-of-8-KiB-page-sized holes, reads memstats, then triggers an allocation that grows the heap. Then it checks memstats again to see if it took back at least that much memory. This is most likely related to changes to the background scavenger in this cycle. I'm going to take a wild guess and say it'll bisect down to https://golang.org/cl/358675, but the problem is actually in https://golang.org/cl/353975. The former increases the background scavenger's rate, while the latter modifies it to try to do 1 ms of work before going to sleep. My earlier comment was alluding to poor time granularity on Windows, so that in fact the background scavenger could be doing 15 ms of work instead of 1 ms. An important detail in all this is that the scavenger runs over the heap linearly each "cycle" and maintains a pointer that represents its progress. Maybe that pointer gets stale? I'm not really sure where it goes from there. |
I'll try to see if I can reproduce, it appears it is more likely to happen on windows, so that's promising (if I'm to reproduce it, I mean :)). The timer, I'm not sure I get how 15ms of work would give this effect, I would have thought that the scavenger doing less work would cause this, not more? |
Ah, right - 32bit. Well, I wanted to set up a 32bit VM anyway. It might take a little time before I can attempt a reproduction though, but I'll give it a shot. |
This bug is also reproducing on FreeBSD 32-bit builders. It seems more strongly correlated with a smaller address space, despite that https://build.golang.org/log/30765367d142efba9a1fdeaaedfdfdf3e57d1638 |
So I've just been doing some debugging on this test to see what it could be doing differently at tip now and... well. It definitely ceased to be a useful test. I think as-is it was already fragile, but now it's just not great. Because of timing it basically relies on the background scavenger to do the work (AFAICT) and this would explain why the target gets missed on these 32-bit systems, especially if they don't have very many CPUs. |
I think we should strongly consider just marking it flaky for the moment while I try to figure something out here, again. |
OK! I have a fix. The Nth iteration of this test. Good news is that it's simpler and closer to the original intent. I was able to reproduce the issue on To follow up from my previous analysis, what seems to have happened here is that this test was pretty fragile to when GCs would run and if there were extra GC cycles. This new iteration is no longer fragile to that AFAICT and also has a tighter bound. |
Change https://golang.org/cl/362978 mentions this issue: |
Sigh. On it. At least it's the same builders as the FreeOSMemory failure. Probably something to do with these platforms having 64 KiB pages. |
Ugh, OK, the problem is clear to me now. The 64 KiB chunks that are supposed to be scavenged are all unaligned, so the scavenger glosses over them (runtime pages are 8 KiB, so they need not actually line up). I don't think this test can be engineered such that all the allocations are aligned to 64 KiB. I can make the chunks really big, I guess, so that if we miss a 64 KiB page it's not a huge deal. I'll experiment quickly. |
Change https://golang.org/cl/363415 mentions this issue: |
A trybot run:
https://storage.googleapis.com/go-build-log/25f64550/windows-386-2008_982c8967.log
But I also see a few of this on https://build.golang.org (windows/386 builder).
cc @mknyszek
The text was updated successfully, but these errors were encountered: