Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: TestGcSys fails on linux/amd64 #28574

Closed
tmthrgd opened this issue Nov 3, 2018 · 24 comments
Closed

runtime: TestGcSys fails on linux/amd64 #28574

tmthrgd opened this issue Nov 3, 2018 · 24 comments
Labels
FrozenDueToAge Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@tmthrgd
Copy link
Contributor

tmthrgd commented Nov 3, 2018

What version of Go are you using (go version)?

$ go version
go version devel +1645dfa23f Fri Nov 2 23:22:57 2018 +0000 linux/amd64

Does this issue reproduce with the latest release?

It reproduces at master.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/tom/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/tom/go"
GOPROXY=""
GORACE=""
GOROOT="/home/tom/go/dev-go"
GOTMPDIR=""
GOTOOLDIR="/home/tom/go/dev-go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build842792331=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Tried to build go at master with ./all.bash.

What did you expect to see?

Test passing with:

ok  	runtime XX.XXXs

What did you see instead?

Test failed with:

--- FAIL: TestGcSys (0.05s)
    gc_test.go:33: expected "OK\n", but got "using too much memory: 70877184 bytes\n"
FAIL
FAIL	runtime	18.959s

This is the same issue as #27636 and #27156, but for linux/amd64 under Fedora 29.

@agnivade
Copy link
Contributor

agnivade commented Nov 3, 2018

I am unable to reproduce this. Does this happen to you consistently ?

@tmthrgd
Copy link
Contributor Author

tmthrgd commented Nov 3, 2018

@agnivade It's only happened once, and I don't seem to be able to reproduce it.

@agnivade agnivade added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. help wanted labels Nov 3, 2018
@agnivade agnivade added this to the Unplanned milestone Nov 3, 2018
@bcmills
Copy link
Contributor

bcmills commented Dec 3, 2018

I just caught one in a local all.bash run on my workstation. Not much to go on, though.

--- FAIL: TestGcSys (0.03s)
    gc_test.go:33: expected "OK\n", but got "using too much memory: 70813696 bytes\n"
FAIL
FAIL    runtime 26.806s

@bcmills bcmills added the Testing An issue that has been verified to require only test changes, not just a test failure. label Dec 3, 2018
@timtadh
Copy link

timtadh commented Jan 18, 2019

I got this when running ./all.bash to build the tag go1.12beta2.

here is my /proc/cpuinfo and /proc/meminfo

It didn't happen again when I reran the tests.

@halturin
Copy link

just tried to build most recent version of golang 1.12.1 and got the same message

-- FAIL: TestGcSys (0.04s)
    gc_test.go:33: expected "OK\n", but got "using too much memory: 70813696 bytes\n"
FAIL
FAIL	runtime	57.344s

with exit code "Failed: exit status 1"

cpuinfo:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 45
model name	: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
stepping	: 7
microcode	: 0x70b
cpu MHz		: 3201.971
cache size	: 12288 KB
physical id	: 0
siblings	: 12
core id		: 0
cpu cores	: 6
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb kaiser tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 6403.94
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

/proc/meminfo

MemTotal:       32890440 kB
MemFree:         1538792 kB
MemAvailable:    8796444 kB
Buffers:          348696 kB
Cached:         10537972 kB
SwapCached:         2516 kB
Active:         26935560 kB
Inactive:        2275984 kB
Active(anon):   21444808 kB
Inactive(anon):  1328884 kB
Active(file):    5490752 kB
Inactive(file):   947100 kB
Unevictable:        2880 kB
Mlocked:            2880 kB
SwapTotal:       6680572 kB
SwapFree:        6624272 kB
Dirty:             21648 kB
Writeback:             0 kB
AnonPages:      18325548 kB
Mapped:          5656576 kB
Shmem:           4448828 kB
Slab:            1469644 kB
SReclaimable:    1285528 kB
SUnreclaim:       184116 kB
KernelStack:       35728 kB
PageTables:       168184 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    23125792 kB
Committed_AS:   43038324 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:     1149916 kB
DirectMap2M:    23955456 kB
DirectMap1G:     8388608 kB

git log -1

commit 0380c9ad38843d523d9c9804fe300cb7edd7cd3c
Author: Andrew Bonventre <andybons@golang.org>
Date:   Thu Mar 14 14:15:58 2019 -0400

    [release-branch.go1.12] go1.12.1
    
    Change-Id: Id5f76204b8cd3fe67c21c5adfd3a4e456a8cad14
    Reviewed-on: https://go-review.googlesource.com/c/go/+/167704
    Run-TryBot: Andrew Bonventre <andybons@golang.org>
    TryBot-Result: Gobot Gobot <gobot@golang.org>
    Reviewed-by: Katie Hockman <katie@golang.org>
    Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>

@kmcfate
Copy link

kmcfate commented Apr 12, 2019

Can we please just remove this test. Failing on 32bit platforms for me.

@bcmills
Copy link
Contributor

bcmills commented Apr 12, 2019

@kmcfate, if you're observing a different (or consistent) failure mode on 32-bit platforms, please open a separate issue — it may be easier to diagnose and fix that failure mode than the intermittent amd64 one.

@thanm
Copy link
Contributor

thanm commented May 8, 2019

As of this afternoon, seems to be failing for me with high test count:

$ cd `go env GOROOT`/src/runtime
$ go test -count 100 -test.run=TestGcSys .
--- FAIL: TestGcSys (0.05s)
    gc_test.go:33: expected "OK\n", but got "using too much memory: 71141376 bytes\n"
--- FAIL: TestGcSys (0.05s)
    gc_test.go:33: expected "OK\n", but got "using too much memory: 71075840 bytes\n"
--- FAIL: TestGcSys (0.05s)
    gc_test.go:33: expected "OK\n", but got "using too much memory: 71141376 bytes\n"
FAIL
FAIL	runtime	4.407s

My work machine is linux/amd64 Xeon CPU E5-2690.

@thanm
Copy link
Contributor

thanm commented May 8, 2019

I should add that I am working off tip (19966e9).

@siebenmann
Copy link

I get (and have been getting) these failures sporadically on some of the 64-bit x86 Linux machines that I build Go on, and I can consistently reproduce the problem with @thanm's process. The pattern of what machines fail and don't fail for me is that machines with large amounts of memory fail and machines with smaller amounts don't. The usual machines I build Go on have 32 GB or 16 GB and they don't seem to fail even with large repeat counts (1000 or even 10,000). On machines with 64 GB, 96 GB, or 256 GB, I typically see a couple of failures in a -count 100 test run.

I'm using a couple of versions of the current git tip (I would say the same version, but it updated a couple of times while I was building on different machines).

@ianlancetaylor
Copy link
Contributor

CC @mknyszek

@mknyszek
Copy link
Contributor

It's unfortunate I didn't notice this issue earlier. For those experiencing the failures back to April and earlier, I expect it should be much better or even fixed with Go 1.12.5 and Go 1.13 (see #31616).

As for the recent failures on tip, I'm unable to reproduce on a linux/amd64 machine with 30 GiB of memory, but I'm trying now on a beefier machine (based on @siebenmann's comments). If I can reproduce I'll try and bisect.

@mknyszek mknyszek self-assigned this May 13, 2019
@mknyszek
Copy link
Contributor

OK so bisection didn't really work, but I also misunderstood what the test was checking.

I though it was checking whether GCSys was increasing too much but it's not. It checks to see if Sys is increasing which... well, if you get unlucky with heap layout (or with a different number of GCs run at different times) you could end up mapping in another arena, which is why we see numbers like 70 MiB popping out in the failures consistently.

Issues affecting #31616 definitely made this fail much more often, but I think that's no longer the fundamental issue. I think something weird is happening with GC pacing.

Consider the GC trace for this successful run (the vast majority of them):

gc 1 @0.000s 0%: 0.001+0.15+0.001 ms clock, 0.001+0/0.018/0.13+0.001 ms cpu, 0->0->0 MB, 4 MB goal, 1 P (forced)
gc 2 @0.000s 0%: 0+0.34+0.001 ms clock, 0+0.15/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 3 @0.001s 1%: 0+0.40+0.001 ms clock, 0+0.15/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 4 @0.002s 1%: 0+0.40+0.001 ms clock, 0+0.15/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 5 @0.003s 2%: 0+0.41+0.001 ms clock, 0+0.15/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 6 @0.004s 2%: 0+0.39+0.001 ms clock, 0+0.15/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 7 @0.005s 3%: 0+0.39+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 8 @0.006s 3%: 0+0.40+0.001 ms clock, 0+0.15/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 9 @0.007s 3%: 0+0.39+0.001 ms clock, 0+0.15/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 10 @0.008s 4%: 0+0.39+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 11 @0.008s 4%: 0+0.39+0.001 ms clock, 0+0.15/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 12 @0.009s 5%: 0+0.39+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 13 @0.010s 5%: 0+0.39+0.001 ms clock, 0+0.15/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 14 @0.011s 5%: 0+0.39+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 15 @0.012s 5%: 0+0.39+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 16 @0.013s 6%: 0+0.39+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 17 @0.014s 6%: 0+0.40+0.001 ms clock, 0+0.15/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 18 @0.015s 6%: 0+0.38+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 19 @0.015s 6%: 0+0.40+0.001 ms clock, 0+0.15/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 20 @0.016s 7%: 0+0.39+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 21 @0.017s 7%: 0+0.39+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 22 @0.018s 7%: 0+0.39+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 23 @0.019s 7%: 0+0.38+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 24 @0.020s 7%: 0+0.39+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 25 @0.021s 8%: 0+0.39+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 26 @0.021s 8%: 0+0.39+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 27 @0.022s 8%: 0+0.39+0.001 ms clock, 0+0.15/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 28 @0.023s 8%: 0+0.39+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 29 @0.024s 8%: 0+0.39+0.001 ms clock, 0+0.15/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 30 @0.025s 8%: 0+0.39+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 31 @0.026s 9%: 0+0.39+0.001 ms clock, 0+0.15/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 32 @0.027s 9%: 0+0.39+0 ms clock, 0+0.15/0/0+0 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
OK

and this failing run:

gc 1 @0.000s 0%: 0.003+0.34+0.003 ms clock, 0.003+0/0.039/0.29+0.003 ms cpu, 0->0->0 MB, 4 MB goal, 1 P (forced)
gc 2 @0.001s 0%: 0.001+0.73+0.002 ms clock, 0.001+0.33/0/0+0.002 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 3 @0.003s 1%: 0.001+0.56+0.001 ms clock, 0.001+0.22/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 4 @0.005s 1%: 0.001+0.57+0.001 ms clock, 0.001+0.21/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 5 @0.006s 1%: 0.001+0.56+0.001 ms clock, 0.001+0.21/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 6 @0.007s 2%: 0.001+0.54+0.001 ms clock, 0.001+0.21/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 7 @0.008s 2%: 0.001+0.54+0.001 ms clock, 0.001+0.21/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 8 @0.010s 3%: 0.001+0.55+0.001 ms clock, 0.001+0.21/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 9 @0.011s 3%: 0+0.56+0.001 ms clock, 0+0.23/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 10 @0.012s 3%: 0.001+0.54+0.001 ms clock, 0.001+0.21/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 11 @0.013s 3%: 0+0.54+0.001 ms clock, 0+0.21/0/0+0.001 ms cpu, 4->5->1 MB, 5 MB goal, 1 P
gc 12 @0.015s 3%: 0.001+20+0.003 ms clock, 0.001+0.067/0.14/0+0.003 ms cpu, 4->47->43 MB, 5 MB goal, 1 P
gc 13 @0.037s 3%: 0.002+9.5+0.001 ms clock, 0.002+0.11/0.14/0+0.001 ms cpu, 44->72->28 MB, 86 MB goal, 1 P
using too much memory: 70813696 bytes

At the very end of the latter the GC just assumes a heap growth is happening in all the rapid allocation and scales up the heap goal, which results in a heap growth. I'm not really sure what's going on here and it'll be difficult to bisect because I'm pretty sure any regression related to #31616 makes the history messy. I'll keep looking into it though.

@mknyszek
Copy link
Contributor

Actually, in the failing run, the last two GCs are quite long overall, with most of that time spent in concurrent mark. Looks like something is occasionally preventing a transition into mark termination?

For posterity: I ran this on a linux/amd64 machine with 64 GiB of memory and at @thanm 's commit. Also at this point I don't think the amount of memory available on the system has much to do with what's going wrong.

@siebenmann
Copy link

Thinking about it more, it may also be relevant that the large memory machines I have access to are also high CPU count machines, and in most cases also have NUMA memory. In fact on the single high memory, high CPU, but non-NUMA machine I have access to, it seems harder (but not impossible) to reproduce this issue. On some but not all of these NUMA systems, using numactl --cpubind 0 --membind 0 ... to restrict execution to a single node's CPU and memory causes the test to pass much more reliably (I need to go up to -count 10000 to make it fail, and then it typically only reports a couple of failures over the entire run).

(The successful numactl systems are single socket Threadripper 2990WX ones with 64 GB; the ones where the test still fails reliably even with numactl are dual socket Xeon E5-2680s with either 96 GB or 256 GB. The test mostly passes on a 128 GB Threadripper 1950X machine, which is non-NUMA. Although now that I do more tests, the Threadripper 2990WX machines sometimes pass a plain -count 100 test run.)

@mknyszek
Copy link
Contributor

OK so I think I've gotten close to the bottom of this.

To make the analysis below clearer, here's an overview of the test: it sets GOMAXPROCS=1 and has exactly one user goroutine which is just allocating 1029 byte buffers in a loop. It then checks to see if more than 16 MiB of new memory was mapped, and if so fails.

The GCs that take a relatively long time invariably happen as a result of nothing actually doing any marking work. If you take a look at an execution trace of a bad run, you'll see that because GOMAXPROCS=1 for the duration of the test, the GC has two main sources of mark work: a fractional background mark worker and mark assists:

execution trace

The first really long GC then doesn't do any mark assists for a long time. Using debuglog I was able to get an idea as to why.

Basically what's happening is that for most GCs mark assist is doing most of the work, and the pace is being kept. Sometime during the first "bad" GC the fractional background mark worker kicks in for a little bit and accumulates a bunch of credit. The next time mark assist kicks in on the one running G it tries to over-assist (but to be fair it does that every time, there just isn't much credit to steal), and then steals a ton of credit (around 30-40 MiB worth, it's very consistently close to the new heap goal for the first "bad" GC). Thus the next mark assist is then scheduled to happen several times later than before. But the fractional background mark worker doesn't kick in at all for the rest of the GC because the "fractional" part of it isn't very tightly controlled.

So, the one running G allocates until it finally does a mark assist, and suddenly by the end of the GC heap_live is really high (since during this time allocations are allocated black). This ends up pacing the next GC at 2 times that, and if it happens to exceed 64 MiB on Linux then we end up mapping a new heap arena, the test detects that, and it fails.

With all that being said, I'm not actually sure what the solution here is. On a side note, if I increase GOMAXPROCS for the test the problem goes away, since I assume there's now extra Ps that can just run a background mark worker and ensure the GC pacing is kept. Perhaps the fractional background mark worker should be scheduled more often, or perhaps credit stealing shouldn't be quite so aggressive?

@bcmills
Copy link
Contributor

bcmills commented Jun 26, 2019

@mknyszek, any progress on this? Are #27636 and #27156 likely to be the same underlying issue?

(I was prompted to check because there's another TestGcSys failure in the dashboard, this time on the solaris-amd64-smartosbuildlet builder, and I didn't want to open yet another issue for the same test.)

@mknyszek
Copy link
Contributor

@bcmills no progress since the last post, but I'm looking at this again now and getting a little further (I think).

Will post back here when I have more info.

@mknyszek
Copy link
Contributor

Following the trail of things that seemed strange to me, I arrived at https://go.googlesource.com/go/+/refs/tags/go1.13beta1/src/runtime/mgc.go#492.

Basically, there would be a window during which scanWorkExpected, a value that expects half of all scannable objects will actually survive, would be less than gcController.scanWork, meaning more work was already done than the expected work. Note that every object allocated black counts toward the scan work, and that's where most of the scan work is coming from in this test. If in this window a GC assist came in and was able to steal even a little bit of credit, that would get rounded up to the minimum amount to steal (a minimum exists to amortize the cost of assists), then the work it would steal would be worth a lot since gcController.assistBytesPerWork would be high. And so this is where the "accumulates a bunch of credit" line in my May 17 post comes from.

@aclements gave me a bunch of background on why the the code at mgc.go#492 works the way it does and offered a suggestion: if you already know you've done more scan work than you would expect, you should already pace yourself against the "hard" GC goal since you've already violated the "steady-state" 50% heap live assumption.

I modified the code to do exactly this and am running the test in a loop. It's already been about 10 minutes without a failure (compare to usually getting a failure within 30 seconds). I'm going to keep running this in the background and if it's been a few hours without a failure I'm going to call this problem solved.

This code change shouldn't affect most applications (most aren't just allocating in a loop and dropping pointers), but I'll run some benchmarks against it and see how they fare.

@gopherbot
Copy link

Change https://golang.org/cl/184097 mentions this issue: runtime: use hard heap goal if we've done more scan work than expected

@mknyszek
Copy link
Contributor

mknyszek commented Jun 27, 2019

Oops, also @bcmills I forgot to say that yes, those two issues are also reporting the same problem as the one described here AFIACT.

@mknyszek
Copy link
Contributor

Yeah OK it's been an hour of continuous execution without failure, I'm calling it good. Spinning up those benchmarks I mentioned now.

@mknyszek
Copy link
Contributor

Just to circle back on this, yep, there's little-to-no change in performance. It looks good overall I think.

@mknyszek mknyszek removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. help wanted labels Jul 8, 2019
@mknyszek
Copy link
Contributor

mknyszek commented Jul 8, 2019

To update: just waiting on a review at this point.

t4n6a1ka pushed a commit to t4n6a1ka/go that referenced this issue Sep 5, 2019
This change makes it so that if we're already finding ourselves in a
situation where we've done more scan work than expected in the
steady-state (that is, 50% of heap_scan for GOGC=100), then we fall back
on the hard heap goal instead of continuing to assume the expected case.

In some cases its possible that we're already doing more scan work than
expected, and if GC assists come in just at that window where we notice
it, they might accumulate way too much assist credit, causing undue heap
growths if GOMAXPROCS=1 (since the fractional background worker isn't
guaranteed to fire). This case seems awfully specific, and that's
because it's exactly the case for TestGcSys, which has been flaky for
some time as a result.

Fixes golang#28574, golang#27636, and golang#27156.

Change-Id: I771f42bed34739dbb1b84ad82cfe247f70836031
Reviewed-on: https://go-review.googlesource.com/c/go/+/184097
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@golang golang locked and limited conversation to collaborators Sep 3, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests