runtime: gc pause bursts after upgrading from 1.16 to 1.17 #49542

dop251 · 2021-11-12T10:29:36Z

What version of Go are you using (`go version`)?

$ go version
go version go1.17.2 linux/amd64

Does this issue reproduce with the latest release?

N/A

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/dop/.cache/go-build"
GOENV="/home/dop/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/dop/pkg/mod"
GOOS="linux"
GOPATH="/home/dop/projects"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.17.2"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1526968537=/tmp/go-build -gno-record-gcc-switches"

What did you do?

After upgrading our production servers with a version built with go 1.17.2 we saw a radically different GC pause profile (comparing to go 1.16.6):

What's plotted here is a change in PauseTotalNs measured every minute. The marker corresponds to the time of the rollout. Note that it did not start straight away, only when the load kicked in in the morning.

Zooming in showed there are "bursts" among otherwise normal sub-millisecond runs:

To confirm this is a regression we did a build of the same code using go 1.16.6 and deployed it to one of the nodes. Also, we restarted another node at the same time, but left it running with 1.17.2. Comparing the graphs for the two nodes shows identical profile, except for the bursts which only occur on the node running 1.17.2:

(green represents 1.16, yellow -- 1.17).

Here is what it looks like in the trace (we've managed to get a couple of examples):

The root cause is not immediately obvious to me, but I suspect there is a rare race condition which sometimes prevents STW to complete. Note, there is one event recorded during STW. In the end stack trace it shows runtime.selectgo:327. However in another example there is also an event recorded shortly before the end of STW, but it just says "proc stop".

We haven't been able to create a reproducible case for it. It looks like this only happens when heap size grows to a few tens of GB and then the frequency (but not the size) of the bursts depends on the load.

If there is any additional info required please let me know.

What did you expect to see?

Normal, sub-millisecond GC pauses.

What did you see instead?

Random bursts of up to 100ms.

The text was updated successfully, but these errors were encountered:

cagedmantis · 2021-11-12T16:55:54Z

/cc @aclements @randall77 @mknyszek @prattmic

prattmic · 2021-11-12T17:05:09Z

@dop251 in your trace this is a forced GC (runtime.forcegchelper). Does that apply to all of the examples you've captured in the execution tracer?

dop251 · 2021-11-12T17:23:35Z

I have seen 3 examples and yes, it is runtime.forcegchelper in all of them.

prattmic · 2021-11-12T17:32:56Z

Thanks, one more question:

However in another example there is also an event recorded shortly before the end of STW, but it just says "proc stop".

This in particular does sound like the potential problem you describe. Specifically STW has to get all Ps to stop, so each P should have a "proc stop" event before STW continues. (Note that a P may already be stopped, so "proc stop" could be before STW). If a P takes a long time to stop, then that would block everything else.

I'd love to look at the trace you've collected to see if this is the case and what else is going on, if you think that is something you can share.

dop251 · 2021-11-12T17:38:14Z

Is there a way to share it privately?

prattmic · 2021-11-12T18:00:17Z

Sure, feel free to email me mpratt AT google.com.

dop251 · 2021-11-12T18:12:29Z

Shared a Google Drive folder with you.

zhouguangyuan0718 · 2023-08-08T02:59:10Z

Hi, @prattmic . I encountered a similar problem. Any update about this issue?

mknyszek · 2023-08-08T03:04:21Z

Hi @zhouguangyuan0718, please file a new issue with additional details (platform, how you measured pause times, which versions of Go you're running, etc.). I think this issue is unfortunately too stale to continue. I don't believe we were able to reproduce this at the time. Closing for now.

cagedmantis changed the title ~~GC pause bursts after upgrading from 1.16 to 1.17~~ runtime: gc pause bursts after upgrading from 1.16 to 1.17 Nov 12, 2021

cagedmantis added the NeedsInvestigation label Nov 12, 2021

cagedmantis added this to the Backlog milestone Nov 12, 2021

gopherbot added the compiler/runtime label Jul 7, 2022

mknyszek added this to Go Compiler / Runtime Jul 7, 2022

mknyszek moved this to Triage Backlog in Go Compiler / Runtime Jul 15, 2022

mknyszek closed this as not planned Aug 8, 2023

github-project-automation bot moved this from Triage Backlog to Done in Go Compiler / Runtime Aug 8, 2023

mknyszek removed this from Go Compiler / Runtime Oct 25, 2023

golang locked and limited conversation to collaborators Aug 7, 2024

gopherbot added the FrozenDueToAge label Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: gc pause bursts after upgrading from 1.16 to 1.17 #49542

runtime: gc pause bursts after upgrading from 1.16 to 1.17 #49542

dop251 commented Nov 12, 2021 •

edited

Loading

cagedmantis commented Nov 12, 2021

prattmic commented Nov 12, 2021

dop251 commented Nov 12, 2021

prattmic commented Nov 12, 2021

dop251 commented Nov 12, 2021

prattmic commented Nov 12, 2021

dop251 commented Nov 12, 2021

zhouguangyuan0718 commented Aug 8, 2023

mknyszek commented Aug 8, 2023

runtime: gc pause bursts after upgrading from 1.16 to 1.17 #49542

runtime: gc pause bursts after upgrading from 1.16 to 1.17 #49542

Comments

dop251 commented Nov 12, 2021 • edited Loading

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

cagedmantis commented Nov 12, 2021

prattmic commented Nov 12, 2021

dop251 commented Nov 12, 2021

prattmic commented Nov 12, 2021

dop251 commented Nov 12, 2021

prattmic commented Nov 12, 2021

dop251 commented Nov 12, 2021

zhouguangyuan0718 commented Aug 8, 2023

mknyszek commented Aug 8, 2023

dop251 commented Nov 12, 2021 •

edited

Loading

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?