runtime: background scavenger can delay deadlock detection significantly #31966

bcmills · 2019-05-10T16:07:37Z

First failure on the dashboard is https://build.golang.org/log/b38cacf4803d9fa1c15e29e74600cf974369d26b, at CL 142960 (#30333).

The failure mode seems consistent.

CC @aclements @mwhudson @agnivade

--- FAIL: TestGoexitCrash (60.03s)
    crash_test.go:95: testprog GoexitExit exit status: exit status 2
    crash_test.go:258: output:
        SIGQUIT: quit
        PC=0x900000000216b38 m=1 sigcode=0
        
        goroutine 0 [idle]:
        runtime.sem_timedwait(...)
        	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/os2_aix.go:560
        runtime.semasleep(0xdf8475800, 0x0)
        	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/os_aix.go:65 +0x148
        runtime.notetsleep_internal(0x11001e430, 0xdf8475800, 0x0, 0x159d2af229f3dad8, 0xa00010000000480)
        	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/lock_sema.go:218 +0x198
        runtime.notetsleep(0x11001e430, 0xdf8475800, 0x1000343a0)
        	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/lock_sema.go:269 +0x64
        runtime.sysmon()
        	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/proc.go:4317 +0x4e4
        runtime.mstart1()
        	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/proc.go:1201 +0xf4
        runtime.mstart()
        	/ramdisk8GB/workdir-host-aix-ppc64-osuosl/go/src/runtime/proc.go:1167 +0x5c
        
        r0   0xffffffffffffffff	r1   0x1101602d0
        r2   0xffffffffffffffff	r3   0xffffffffffffffff
        r4   0xffffffffffffffff	r5   0xffffffffffffffff
        r6   0xffffffffffffffff	r7   0xffffffffffffffff
        r8   0xffffffffffffffff	r9   0xffffffffffffffff
        r10  0xffffffffffffffff	r11  0xffffffffffffffff
        r12  0xffffffffffffffff	r13  0x110168800
        r14  0x0	r15  0x0
        r16  0x0	r17  0x0
        r18  0x0	r19  0x0
        r20  0x0	r21  0x0
        r22  0x0	r23  0x0
        r24  0x0	r25  0x0
        r26  0x110160340	r27  0x0
        r28  0x0	r29  0x1101605f0
        r30  0x0	r31  0x110581410
        pc   0x900000000216b38	ctr  0xffffffff00000000
        link 0xffffffffffffffff	xer  0xffffffff
        ccr  0x0	trap 0x0
        
        
        want output containing: no goroutines (main called runtime.Goexit) - deadlock!
FAIL
FAIL	runtime	160.362s

The text was updated successfully, but these errors were encountered:

mknyszek · 2019-05-10T16:40:39Z

git bisect points to CL 174657 (CC @cherrymui @aclements).

I don't know if that's related, since @cherrymui pointed out to me that it should produce identical binaries (though different object files).

I think it is my change: I introduced another runtime goroutine in the offending change (on the dashboard) and I suspect that it never quite parks, and instead falls into its exponential back-off loop (it thinks there's work to do, but when it goes to do it none is there). This can happen if heap_sys-heap_released > 1.1 * next_gc but there are no free and unscavenged spans left.

I think the right thing to do is to bound the exponential backoff loop at a reasonable time (e.g. 10 ms) and if there's no work to do after waiting all that time just park. Because new spans only get freed when a goroutine exits (a stack is freed) or when spans are getting swept, if we approach the end of a GC cycle with nothing to do, it's likely it'll stay that way.

But, this is assuming that's what the problem is. If that really is the issue it's surprising to me that this has only ever been observed on AIX and not elsewhere, but I'll dig into it now on a gomote and check my hypothesis.

bcmills · 2019-05-10T16:51:44Z

I don't know if that's related

Yeah, I had commented on the wrong issue. CL 142960 is almost certainly the culprit for this one.

ianlancetaylor · 2019-05-10T17:58:53Z

CC @Helflym

aclements · 2019-05-10T18:01:39Z

The deadlock detector isn't supposed to consider system goroutines (those that were started by the runtime), but I don't think any others have this back-off behavior, so maybe that doesn't work quite right, or maybe the logic for detecting system goroutines isn't quite right on AIX.

Shouldn't the current logic eventually always park? Or are there cases where it can't achieve the scavenge goal and will just keep backing off (until the next GC)?

mknyszek · 2019-05-10T18:49:00Z

OK running this test with GODEBUG=gctrace=1 in a gomote makes it very clear as to what's going on.

The arena size on AIX is 256 MiB, and since the heap size is ~0 for this test the scavenger sees aaaaaall this work to do. But, AIX (as far as the runtime knows) doesn't have huge pages, so it's doing its work at the slow pace of 1 physical page per ms, and it takes it some time to do so.

This test, however, is looking for a deadlock, but the scavenger doesn't park until its done with its work. It does eventually park, given enough time, and the runtime finally notices the deadlock.

I'm not sure what the right solution is here. Perhaps we just... have the scavenger notice when its the only goroutine left and park at the first opportunity? If it goes into a long sleep though we kind of a have a problem. Something needs to wake it up.

Maybe the solution is much simpler. Maybe when it sleeps it shouldn't park with a "sleep" wait reason and instead should park with something else, so the runtime notices the deadlock.

WDYT? @aclements

mknyszek · 2019-05-10T21:34:24Z

For the record, the wait reason has nothing do with deadlock detection.

So turns out this is a bigger problem with how the background scavenger and the deadlock detection interact. Changing deadlock detection to handle a special case isn't really on the table because of how subtle that code is.

The fundamental problem (thanks @aclements for getting me here) is that sleeping on the regular event-driven timer means its really difficult for deadlock detection to notice that the runtime thinks there's no more work to do.

We can fix this by just changing how the scavenger sleeps.

@rsc suggested just having sysmon kick the scavenger awake, and this works. My concern is that it could wake up and do up to 10x more work than it needs (1 page / ms, max sleep time of sysmon is 10 ms). For other situations this wouldn't be so bad, except the scavenger must grab the heap lock, which means it could really hurt the tail latency of anything that grabs the heap lock (notably allocation in the slow path). Furthermore (and this isn't really a con, just a note) it's slightly more complex than just sysmon kicking it awake; if sysmon goes into its "P's are idle" long sleep then it needs to make sure it can always wake the scavenger if it goes to sleep. This means that the scavenger must wake sysmon before it goes to sleep.

@aclements suggested to instead use notetsleepg and make that the primary method through which the background scavenger parks and sleeps. The only difference (that would allow the deadlock detection mechanism to function correctly) would be to increment/decrement sched.nmsys before it goes to sleep and after it wakes up, so that the background scavenger's sleeping m isn't counted against deadlock detection. The downside to this is now we could have one more m in any given application than we otherwise would have. The additional state is not great, but the lack of interaction with sysmon and the ability to sleep for exact time periods is a boon.

I've implemented the former and I'll now be implementing the latter. I'll update this bug with what we end up going with. I'll also update the bug's title to better reflect what's really going on.

Helflym · 2019-05-13T14:38:42Z

The arena size on AIX is 256 MiB, and since the heap size is ~0 for this test the scavenger sees aaaaaall this work to do. But, AIX (as far as the runtime knows) doesn't have huge pages, so it's doing its work at the slow pace of 1 physical page per ms, and it takes it some time to do so.

Does all other OSes have huge pages ? Or the problem mainly comes from the arena size ? AIX isn't yet fully optimized. So it might be possible to hasten this part of the runtime.

mknyszek · 2019-05-13T15:24:57Z

The arena size on AIX is 256 MiB, and since the heap size is ~0 for this test the scavenger sees aaaaaall this work to do. But, AIX (as far as the runtime knows) doesn't have huge pages, so it's doing its work at the slow pace of 1 physical page per ms, and it takes it some time to do so.

Does all other OSes have huge pages ? Or the problem mainly comes from the arena size ? AIX isn't yet fully optimized. So it might be possible to hasten this part of the runtime.

The more fundamental problem is that this goroutine can block deadlock detection for a (probably bounded?) large amount of time, so it's definitely not specific to AIX. It just so happens to manifest as a failure on the trybots on AIX because arena sizes are big and there isn't any huge page awareness built into the runtime for that platform. If AIX has an equivalent of transparent huge pages (like Linux or FreeBSD) then we can at least fix the trybots in the super-short term, but the more fundamental problem still needs to be fixed ASAP.

mknyszek · 2019-05-13T19:33:34Z

The more fundamental problem is that this goroutine can block deadlock detection for a (probably bounded?) large amount of time, so it's definitely not specific to AIX. It just so happens to manifest as a failure on the trybots on AIX because arena sizes are big and there isn't any huge page awareness built into the runtime for that platform. If AIX has an equivalent of transparent huge pages (like Linux or FreeBSD) then we can at least fix the trybots in the super-short term, but the more fundamental problem still needs to be fixed ASAP.

After discussing with @aclements this issue is more broad. Technically the background sweeper and the GC could also cause delays in deadlock detection, but in most cases it's not a problem. The background scavenger just really brings this to the spotlight because of how slowly it is paced by default.

Also, the issue appearing on AIX (specifically because of the runtime's large arena size on the platform) brings to mind an accounting problem: the background scavenger technically shouldn't even need to scavenge that 256 MiB arena because those pages haven't been faulted in yet. This is a result of the fact that newly-mapped heap memory is not marked as scavenged. We can change it such that it is marked as scavenged, which should make the problem go away on the trybots specifically (and should avoid any start-up time penalties caused by the scavenger doing unnecessary work). I opened #32012 to track this.

But, we should still make it so that the scavenger may be considered "dead" at the next sleep opportunity to significantly reduce the time that the scavenger could delay deadlock detection. I have a working implementation based on @aclements idea with notes (I ended up not using nmsys because of accounting issues, but with lockOSThread we can make this work) and I'll put it up shortly.

gopherbot · 2019-05-13T19:57:02Z

Change https://golang.org/cl/176957 mentions this issue: runtime: count the scavenger's M as idle when sleeping

bcmills · 2019-05-14T13:12:39Z

it's definitely not specific to AIX

Just to confirm that empirically, here's a failure on the linux-ppc64le-power9osu builder:
https://build.golang.org/log/5b237da5c942cb3db8240d94cb73cf4e8adc10c9

mknyszek · 2019-05-14T19:53:21Z

So it's probably best to not fix this with my hack in golang.org/cl/176957. @ianlancetaylor noted offline that the deadlock detector already doesn't work for many people in many situations already, so it doesn't make as much sense to bend over backwards to make it not get delayed by things like sweeping, GC, or the scavenger.

With that said, we can solve the problem on the trybots by just fixing #32012, which should also fix the small heap use-case since the scavenger should no longer be doing much work for short-lived programs. So I'm going to fix that instead (which is a much simpler fix, and also a good one to have overall) and close up this bug. I've confirmed the fix lowers the time for the AIX builder (among other builders).

gopherbot · 2019-05-14T19:56:56Z

Change https://golang.org/cl/177097 mentions this issue: runtime: mark newly-mapped memory as scavenged

gopherbot · 2019-05-16T03:56:55Z

Change https://golang.org/cl/177557 mentions this issue: runtime: disable TestGoexitCrash

bcmills added NeedsInvestigation Soon OS-AIX labels May 10, 2019

bcmills added this to the Go1.13 milestone May 10, 2019

bcmills assigned mknyszek May 10, 2019

mknyszek changed the title ~~runtime: TestGoexitCrash broken on AIX~~ runtime: background scavenger can delay deadlock detection significantly May 10, 2019

mknyszek added release-blocker and removed OS-AIX labels May 10, 2019

mknyszek added NeedsFix and removed NeedsInvestigation labels May 13, 2019

gopherbot closed this as completed in 4e7bef8 May 16, 2019

Helflym mentioned this issue May 17, 2019

runtime/debug: SetGCPercent grabs heap lock on G leading to potential self-deadlock #32105

Closed

changkun mentioned this issue Jun 13, 2019

Update Progress: 1.15 golang-design/under-the-hood#1

Closed

bcmills mentioned this issue Sep 27, 2019

runtime: TestGoexitCrash failure on linux-ppc64le-buildlet #34575

Closed

golang locked and limited conversation to collaborators May 15, 2020

gopherbot added the FrozenDueToAge label May 15, 2020

rsc unassigned mknyszek Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: background scavenger can delay deadlock detection significantly #31966

runtime: background scavenger can delay deadlock detection significantly #31966

bcmills commented May 10, 2019

mknyszek commented May 10, 2019

bcmills commented May 10, 2019

ianlancetaylor commented May 10, 2019

aclements commented May 10, 2019

mknyszek commented May 10, 2019

mknyszek commented May 10, 2019

Helflym commented May 13, 2019

mknyszek commented May 13, 2019

mknyszek commented May 13, 2019

gopherbot commented May 13, 2019

bcmills commented May 14, 2019

mknyszek commented May 14, 2019

gopherbot commented May 14, 2019

gopherbot commented May 16, 2019

runtime: background scavenger can delay deadlock detection significantly #31966

runtime: background scavenger can delay deadlock detection significantly #31966

Comments

bcmills commented May 10, 2019

mknyszek commented May 10, 2019

bcmills commented May 10, 2019

ianlancetaylor commented May 10, 2019

aclements commented May 10, 2019

mknyszek commented May 10, 2019

mknyszek commented May 10, 2019

Helflym commented May 13, 2019

mknyszek commented May 13, 2019

mknyszek commented May 13, 2019

gopherbot commented May 13, 2019

bcmills commented May 14, 2019

mknyszek commented May 14, 2019

gopherbot commented May 14, 2019

gopherbot commented May 16, 2019