Skip to content

runtime: long GC STW after large allocation (100+ msec waiting for runtime.memclrNoHeapPointers) #35825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cuonghuutran opened this issue Nov 25, 2019 · 5 comments
Labels
FrozenDueToAge GarbageCollector NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@cuonghuutran
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.11.4 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/cuong.tran/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/cuong.tran/go"
GOPROXY=""
GORACE=""
GOROOT="/home/cuong.tran/opt/go"
GOTMPDIR=""
GOTOOLDIR="/home/cuong.tran/opt/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build677840052=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Our code was memory-bound and cloned a large array to a very large single slice 512 MB. Here is the stack dump. (Our code is changed to clone to multiple small slices).

Thread 63 (LWP 12396):
#0 runtime.memclrNoHeapPointers () at /home/cuong.tran/opt/go/src/runtime/memclr_amd64.s:102
#1 0x0000000000424a7a in runtime.(*mheap).alloc (h=, large=, needzero=true, npage=,
spanclass=, ~r4=) at /home/cuong.tran/opt/go/src/runtime/mheap.go:340
#2 0x000000000040dd97 in runtime.largeAlloc (needzero=true, noscan=true, size=503933494, ~r3=)
at /home/cuong.tran/opt/go/src/runtime/malloc.go:1019
#3 0x0000000000458a36 in runtime.mallocgc.func1 () at /home/cuong.tran/opt/go/src/runtime/malloc.go:914
#4 0x000000000045b7c6 in runtime.systemstack () at /home/cuong.tran/opt/go/src/runtime/asm_amd64.s:351
#5 0x0000000000432590 in ?? () at /home/cuong.tran/opt/go/src/runtime/proc.go:1196

What did you expect to see?

No long GC STW due to Golang runtime. Memory clearing should clear memory in smaller chunks as long as the threads are busy and non-preemptive.

What did you see instead?

Very long GC STW (100+ msec) caused latency variance in our response time.

@ALTree
Copy link
Member

ALTree commented Nov 25, 2019

Thanks for the report.

I believe this may be basically a dup of #31222 (also see #31222 (comment)), but cc @aclements @mknyszek for confirmation.

@ALTree ALTree changed the title Long GC STW due to Golang run time (100+ msec waiting for runtime.memclrNoHeapPointers) runtime: long GC STW after large allocation (100+ msec waiting for runtime.memclrNoHeapPointers) Nov 25, 2019
@ALTree ALTree added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. GarbageCollector labels Nov 25, 2019
@cuonghuutran
Copy link
Author

Seems to me it's a different problem -- this happens when malloc'ing a very large large memory vs. sweeping. Nevertheless, the solution to make clrNoHeapPointers would work but it's not available until 1.15. Just a thought here, until then and as a band-aid, should largeAlloc() clears memory in smaller chunks and check for thread parking between clearing?

@ALTree
Copy link
Member

ALTree commented Nov 26, 2019

the solution to make clrNoHeapPointers would work but it's not available until 1.15. Just a thought here, until then and as a band-aid, should largeAlloc() clears memory in smaller chunks and check for thread parking between clearing?

We're deep into the code freeze for the 1.14 release, so it's very unlikely that changes to the runtime will be committed for 1.14 (except for changes that fix serious regressions). So it would be 1.15 anyway.

@ALTree ALTree added this to the Go1.15 milestone Nov 26, 2019
@mknyszek
Copy link
Contributor

@cuonghuutran #31222 is about a large allocation blocking STW in memclrNoHeapPointers as well, not about sweeping. The title is misleading: sweep termination is the phase which brings us into the new mark phase, which forces a STW. The problem is not that the STW is long, but rather that it's taking a long time to STW (the runtime manages to stop every thread except the one clearing this large allocation). In the execution trace, the attempt to STW is counted against pause time as well. In the other issue, a 1 GiB allocation is the root cause of the STW delay.

Clearing in smaller chunks would work at the expense of some throughput. I don't think we could get this in for 1.14 given that it's been an issue for so many releases (the freeze is mostly about fixing new issues) but as @aclements mentioned in the other issue, it dovetails nicely with the recent non-cooperative preemption work. We'll take a look at it again in 1.15.

@ALTree In sum, I believe this is a duplicate of #31222.

@ALTree
Copy link
Member

ALTree commented Nov 26, 2019

@mknyszek Thanks! Let's close this in favour of the older thread, then.

@cuonghuutran feel free to subscribe to #31222, and adding a comment there with your use case, if you think it could be useful.

@ALTree ALTree closed this as completed Nov 26, 2019
@golang golang locked and limited conversation to collaborators Nov 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge GarbageCollector NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants