Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: more frequent TestFreeOSMemory failures starting 2021-11-05 #49478

Closed
bcmills opened this issue Nov 9, 2021 · 6 comments
Closed

runtime: more frequent TestFreeOSMemory failures starting 2021-11-05 #49478

bcmills opened this issue Nov 9, 2021 · 6 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 release-blocker
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Nov 9, 2021

greplogs --dashboard -md -l -e 'FAIL: TestFreeOSMemory'

2021-11-08T21:52:47-955f9f5/aix-ppc64
2021-11-08T21:30:25-0e39946/linux-s390x-ibm
2021-11-07T04:57:22-9e6ad46/linux-ppc64le-buildlet
2021-11-05T22:00:16-755ede0/linux-ppc64le-buildlet
2021-08-23T17:51:41-8486ced/linux-arm64-packet
2020-12-03T16:05:09-5246fa5/linux-arm64-packet

(@mknyszek: I'm guessing this is more fallout from enabling the new GC pacer?)

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 labels Nov 9, 2021
@bcmills bcmills added this to the Go1.18 milestone Nov 9, 2021
@mknyszek
Copy link
Contributor

mknyszek commented Nov 9, 2021

I think it's more likely this is scavenger related. Specifically, I think it's going to be related to #49411.

@mknyszek
Copy link
Contributor

mknyszek commented Nov 9, 2021

I'm somewhat worried about this. If debug.FreeOSMemory isn't working as expected, that can be a significant regression.

@mknyszek mknyszek removed the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Nov 9, 2021
@mknyszek
Copy link
Contributor

OK, I think I know the problem. It's fairly benign. Previously debug.FreeOSMemory would block the whole runtime from doing anything with allocation, but now it doesn't. So, the background scavenger is just getting to the memory before debug.FreeOSMemory is, because a GC is forced and it gets woken up. I think the fix here might be to just set GOMAXPROCS=1, so it doesn't really get a chance.

This test is more of a problem on mips64 and ppc64 because the page sizes are much bigger, so the scavenger can do a whole lot more bytes of work before going to sleep. Not really any issue on usually small page size architectures like amd64 and arm64.

As for a fix... I'm not sure. The goal here is to test debug.FreeOSMemory. Ideally, we could just turn off the background scavenger for this, but that's difficult to do from the runtime/debug tests. GOMAXPROCS=1 might be good enough? debug.FreeOSMemory is not preemptible. Raising the amount that's available to be scavenged would also help, because the test just checks if it gets something, not necessarily everything. It probably should check that, but I don't think it'll be able to do that from the runtime/debug package.

@mknyszek mknyszek added the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Nov 10, 2021
@mknyszek
Copy link
Contributor

Unfortunately GOMAXPROCS=1 didn't work. Moving the test into runtime isn't great, though, and I'd rather not add hooks to disable the background scavenger. Though, maybe a GODEBUG mode that disables the background scavenger would be reasonable?

@mknyszek
Copy link
Contributor

I got inspiration from my fix to #49411: we can make a test that's robust to the background scavenger if we recognize that we don't care if the test occasionally passes because the background scavenger stepped up. Currently this test is fragile because it can be perturbed by the background scavenger -- it does work the test expects that debug.FreeOSMemory does. We just care that we have a test that mostly fails if the tested mechanism isn't there.

So, my fix is to read HeapReleased before the allocation is freed. Then, check to make sure at least that much has been returned to the OS. Nothing else is going on in the test, so all the background mechanisms should only ever make HeapReleased go up. If it goes up more, who cares! It just needs to go up at least by that much.

CL incoming.

@gopherbot
Copy link

Change https://golang.org/cl/363414 mentions this issue: runtime/debug: make TestFreeOSMemory more robust

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 release-blocker
Projects
None yet
Development

No branches or pull requests

3 participants