Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: RSS seems to have increased in Go 1.24 while the runtime accounting has not #72991

Open
thepudds opened this issue Mar 21, 2025 · 13 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@thepudds
Copy link
Contributor

thepudds commented Mar 21, 2025

Go version

go version go1.24.1 linux/amd64

Output of go env in your module/workspace:

go env Output
AR='ar'      
CC='gcc'         
CGO_CFLAGS='-O2 -g'                                                   
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'           
CGO_LDFLAGS='-O2 -g'
CXX='g++'                                                             
GCCGO='gccgo'                                                         
GO111MODULE=''          
GOAMD64='v1'       
GOARCH='amd64'                                                        
GOAUTH='netrc'
GOBIN=''          
GOCACHE='/home/.../.cache/go-build'                    
GOCACHEPROG=''
GODEBUG=''          
GOENV='/home/.../.config/go/env'
GOEXE=''               
GOEXPERIMENT=''  
GOFIPS140='off'                                                                                                                             
GOFLAGS=''                                                                                                                                  
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1182549097=/tmp/go-build -gno-record-gcc-switches'                                                                                                                               
GOHOSTARCH='amd64'                                                                                                                          
GOHOSTOS='linux'                                                                                                                            
GOINSECURE=''
GOMOD='/dev/null'
GOMODCACHE='/home/.../go/pkg/mod'                            
GONOPROXY=''   
GONOSUMDB=''         
GOOS='linux'   
GOPATH='/home/.../go'
GOPRIVATE=''        
GOPROXY='https://proxy.golang.org,direct'                             
GOROOT='/home/.../sdk/go1.24.1'                              
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/home/.../.config/go/telemetry'              
GOTMPDIR=''   
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/.../sdk/go1.24.1/pkg/tool/linux_amd64'
GOVCS=''      
GOVERSION='go1.24.1'
GOWORK=''                                                             
PKG_CONFIG='pkg-config'

What did you do?

I saw @felixge report in Gopher Slack:

After upgrading to go1.24, we have one service that saw its RSS memory usage increase from 1 GiB to 1.2 GiB.

Interestingly, the Go-reported accounting did not seem to change.

I have a simple GC benchmarking utility that has been useful in the past to illustrate some Go runtime problems (or to illustrate something is just expected behavior and rule out bad behavior by the runtime).

I did a quick-and-dirty edit to it to have it run through various permutations of "interesting" memory properties that might be related to the issue @felixge reported, and then tested it via a few nested bash loops that ran through different flag combinations.

Sample run:

$ go install golang.org/dl/go1.23.7@latest && go1.23.7 download
$ go install golang.org/dl/go1.24.1@latest && go1.24.1 download

$ go1.23.7 run github.com/thepudds/heapbench@669420526 -livekind=nodechannel -livesize=large -baseheap=512 -garbagerate=128 -leakrate=0 -jobrate=100 -worktime=20ms -stats=2s

$ go1.24.1 run github.com/thepudds/heapbench@669420526 -livekind=nodechannel -livesize=large -baseheap=512 -garbagerate=128 -leakrate=0 -jobrate=100 -worktime=20ms -stats=2s

Note this is on an Ubuntu 22.04.5 LTS VM. (Given this seemingly hasn't been reported yet, I wonder if it might depend on the distro or a OS config).

What did you see happen?

                               RSS (MiB)
                               mean    stdev   count         
nodechannel-large   go1.23    688.8     89.3    44.0
nodechannel-large   go1.24   1263.9     81.0    44.0
-----------------------------------------------------
nodechannel-medium  go1.23    899.5    64.0     44.0
nodechannel-medium  go1.24    889.1    79.7     44.0
-----------------------------------------------------
nodechannel-small   go1.23   2945.7    283.3    43.0
nodechannel-small   go1.24   2955.0    300.7    43.0
-----------------------------------------------------
intchannel-large    go1.23    698.2    71.3     44.0
intchannel-large    go1.24    697.5    76.0     44.0
-----------------------------------------------------
intchannel-medium   go1.23    868.4    56.3     44.0
intchannel-medium   go1.24    867.0    56.4     44.0
-----------------------------------------------------
intchannel-small    go1.23   2923.1   247.2     43.0
intchannel-small    go1.24   2926.6   280.6     43.0
-----------------------------------------------------

Here, we can see nodechannel-large has a much higher RSS in go1.24 compared to go1.23, whereas the other experiments above have basically the same RSS between go1.24 and go1.23.

  • The "large" suffix represents individual allocations should in theory be over 32KiB at a time (so beyond the allocator size classes, and what the allocator calls a large object).
  • The "medium" and "small" are below 32KiB.
  • The "nodechannel" prefix represents allocations of chan node, where the node type contains pointers.
  • The "intchannel" prefix represents allocations of chan int64, where int64 of course does not have pointers.

The node type:

type node struct {
	pointers [3]*node
	next     *node
	_        [4]int
}

I ran a git bisect, which pointed to https://go.dev/cl/614257. That's part of the big mallocgc refactor and at least in the neighborhood of plausible:

8730fcf88531152c42de9ff1e80d9b3c762d9944 is the first bad commit
Date:   Wed Sep 18 21:17:03 2024 +0000

runtime: refactor mallocgc into several independent codepaths

The measured RSS results from the bisect:

                                RSS (MiB)
                                  mean
20250131-085043-37f27fbecd.out: 1209.3
20241023-151814-3cb0c039e9.out: 1217.0
20240903-222652-f90f7e90b3.out:  650.7
20240925-235600-80143607f0.out:  636.4
20241007-191201-7e2487cf65.out:  633.2
20241021-133902-0492d936c5.out:  636.5
20241021-184247-971ab11ee2.out: 1211.1
20241021-155625-8730fcf885.out: 1209.3
20241021-151809-e3d372aea8.out:  644.1
20241021-154815-8df6413e11.out:  669.9
20241021-154820-6686edc0e7.out:  640.4
20241021-155537-60ee99cf5d.out:  639.3

What did you expect to see?

Ideally, similar RSS values.

CC @mknyszek, @prattmic

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Mar 21, 2025
@thepudds
Copy link
Contributor Author

Note that @felixge might of course have been observing some other phenomena, but this seemed worth reporting given it seems to be reproducible.

Also, note that the RSS difference does not seem to be limited to channels:


                               RSS (MiB)
                              mean    stdev  count
nodemap-large    go1.23   1066.6    212.6   44.0
nodemap-large    go1.24   2151.8    214.5   44.0
------------------------------------------------
intmap-large     go1.23   1224.6    230.3   44.0
intmap-large     go1.24   1225.9    239.4   44.0
------------------------------------------------
nodeslice-large  go1.23    628.0     91.9   44.0
nodeslice-large  go1.24   1265.1     82.3   44.0
------------------------------------------------
intslice-large   go1.23    627.0     77.6   44.0
intslice-large   go1.24    629.5     69.8   44.0
------------------------------------------------

The naming conventions there are similar to those explained above.

I have more data, but maybe that's enough of a thread to tug on.

Finally, this does seem to be reproducible (e.g., see the go run commands above), but there's a non-zero chance I made a mistake in my quick-and-dirty edit to thepudds/heapbench, so some chance my description of when it does and does not happen is off... but if it's reproducible, hopefully we can tease it apart even if my description is off.

@gabyhelp gabyhelp added the BugReport Issues describing a possible bug in the Go implementation. label Mar 21, 2025
@mknyszek
Copy link
Contributor

One thing to be careful of is that faster allocation can naturally lead to more memory use, if you're explicitly trying to allocate as fast as possible in a benchmark. The GC did not get faster in these changes, so more memory gets allocated black, and you end up with more floating garbage.

There's no increase in memory and it's a pure resource win if memory allocation is not your bottleneck, which is the case in most programs. In general, I'm skeptical of memory increases that also result in a larger heap size (indicating more floating garbage).

I'll try to reproduce and check these numbers with GODEBUG=gctrace=1. IIRC, @felixge's issue explicitly does not involve more floating garbage, which is more likely to seem like a real issue (specifically, something like I accidentally added a new touch to non-needzero memory in my malloc refactoring changes).

@mknyszek mknyszek self-assigned this Mar 21, 2025
@mknyszek mknyszek added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed BugReport Issues describing a possible bug in the Go implementation. labels Mar 21, 2025
@mknyszek
Copy link
Contributor

Hm, I can't seem to reproduce. For nodechannel-large I see ~1.2 GiB in Go 1.23 and Go 1.24.

Can you run with GODEBUG=gctrace=1?

@thepudds
Copy link
Contributor Author

thepudds commented Mar 21, 2025

One other bit of data maybe worth posting is the heap goals reported by gctrace are a bit different in the "bad" vs. "good" cases, but at least at first blush, seem reasonably close.

Commit 60ee99c

For example, if we look at the gctrace from running against commit 60ee99c, a "good" commit from 2024-10-21 (20241021-155537-60ee99cf5d.out):

gc 15 @33.958s 0%: 0.14+108+0.033 ms clock, 0.56+1.1/104/36+0.13 ms cpu, 1223->1241->640 MB, 1254 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 16 @38.472s 0%: 0.10+92+0.035 ms clock, 0.42+54/92/0+0.14 ms cpu, 1256->1281->646 MB, 1281 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 17 @42.913s 0%: 0.12+70+0.025 ms clock, 0.50+8.8/69/64+0.10 ms cpu, 1244->1260->638 MB, 1293 MB goal, 0 MB stacks, 0 MB globals, 4 P

and the heapbench self-reported stats from the same time interval:

heapbench: @30.148s rss: 711.1 MiB, virt: 2944.7, cpu: 194.1%
heapbench: @32.152s rss: 711.1 MiB, virt: 2944.7, cpu: 199.3%
heapbench: @34.147s rss: 713.9 MiB, virt: 2944.9, cpu: 224.4%
heapbench: @36.148s rss: 713.9 MiB, virt: 2944.9, cpu: 209.5%
heapbench: @38.148s rss: 713.9 MiB, virt: 2944.9, cpu: 217.9%
heapbench: @40.150s rss: 755.4 MiB, virt: 3009.5, cpu: 225.7%
heapbench: @42.148s rss: 755.4 MiB, virt: 3009.5, cpu: 185.5%

Commit 8730fcf

We can compare that to commit 8730fcf, the first "bad" indicated by bisect and also from 2024-10-21 (20241021-155625-8730fcf885.out):

gc 17 @31.876s 0%: 0.069+61+0.039 ms clock, 0.27+7.0/61/88+0.15 ms cpu, 1230->1239->631 MB, 1256 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 18 @36.189s 0%: 0.051+67+0.031 ms clock, 0.20+0.57/67/85+0.12 ms cpu, 1238->1247->631 MB, 1263 MB goal, 0 MB stacks, 0 MB globals, 4 P
gc 19 @40.845s 0%: 0.11+70+0.014 ms clock, 0.44+2.2/70/81+0.059 ms cpu, 1238->1245->629 MB, 1263 MB goal, 0 MB stacks, 0 MB globals, 4 P

and the corresponding heapbench self-reported stats:

heapbench: @30.407s rss: 1276.3 MiB, virt: 2944.9, cpu: 205.7%
heapbench: @32.408s rss: 1276.8 MiB, virt: 2945.0, cpu: 219.8%
heapbench: @34.407s rss: 1276.8 MiB, virt: 2945.0, cpu: 187.2%
heapbench: @36.421s rss: 1279.1 MiB, virt: 2945.0, cpu: 247.8%
heapbench: @38.408s rss: 1279.1 MiB, virt: 2945.0, cpu: 180.1%
heapbench: @40.407s rss: 1279.1 MiB, virt: 2945.0, cpu: 188.6%
heapbench: @42.408s rss: 1279.1 MiB, virt: 2945.0, cpu: 216.4%

@thepudds
Copy link
Contributor Author

thepudds commented Mar 21, 2025

Hm, I can't seem to reproduce. For nodechannel-large I see ~1.2 GiB in Go 1.23 and Go 1.24.

Hi @mknyszek, I'm curious what distro and version?

Can you run with GODEBUG=gctrace=1?

I have that as well, including recorded during the bisect. Just posted an example above.

@mknyszek
Copy link
Contributor

mknyszek commented Mar 21, 2025

Thanks @thepudds. I think it's safe to say you reproduced the issue.

I think I found the culprit to @felixge's original issue and the issue you're seeing here. Try out https://go.dev/cl/659956.

@mknyszek mknyszek moved this from Todo to In Progress in Go Compiler / Runtime Mar 21, 2025
@mknyszek mknyszek added this to the Go1.25 milestone Mar 21, 2025
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/659956 mentions this issue: runtime: prevent unnecessary zeroing of large !noscan objects

@thepudds
Copy link
Contributor Author

That CL hopefully addresses it! 🎉

Re-running nodechannel-large just now for go1.23, go1.24, and your fix:

                     RSS (MiB)
                  mean    stdev   count
go1.23.out:      683.8     94.5    44.0
go1.24.out:     1269.9     93.9    44.0
cl-659956.out:   706.8    102.6    44.0

The small remaining difference in the mean might just be noise. I'll re-run for longer.

@thepudds
Copy link
Contributor Author

thepudds commented Mar 21, 2025

One thing to be careful of is that faster allocation can naturally lead to more memory use, if you're explicitly trying to allocate as fast as possible in a benchmark.

FWIW, the heapbench utility isn't trying to allocate as fast as possible.

It lightly tries to look like a real application, with "jobs" coming in that include some CPU work (calculating hashes or decoding varints or similar) and some allocation work with a few dials to allow different scenarios, like a different baseline live heap, or slowly increasing live heap, different rate of garbage generation, etc. It has some variability built in (including because one of the early targets were some runtime scheduler-related glitches), but the variability averages out such that you can still look at averages across ~minutes and get reasonably consistent results. It's not super sophisticated, but it tries to keep putting the requested load on the allocator and GC while doing its work, or wait for more work. (Even if it's averaging ~2 cores of CPU usage as reported every 1 or 2 seconds, if you were to hypothetically zoom in on a small enough time scale, there are moments where it isn't doing anything, assuming you haven't set the dials to overload the machine its running on).

In any event, that's probably more than you wanted to know!

Regarding possibly having more floating garbage, if you look at the GC traces from the tests, does that seem to hint close to the same amount of floating garbage between the "good" and "bad" cases?

@mknyszek
Copy link
Contributor

@thepudds Thanks! That's all good to know. I intended to get this across in an earlier message but yes, your program definitely reproduces the issue and there aren't any subtleties around floating garbage happening. The heap goal was the same before and after.

@Gandem
Copy link
Contributor

Gandem commented Mar 21, 2025

👋 We tested https://go.dev/cl/659956 with the workload which had increased memory usage in 1.24, and we can confirm the CL addresses the issue! 🎉

Image

Thank you 🙏 !

@felixge
Copy link
Contributor

felixge commented Mar 22, 2025

Wow, fantastic work everybody! Thank you @thepudds for figuring out how to reproduce this issue. Thanks @mknyszek for finding the issue and fixing it. And thanks @Gandem for doing all the work on our end to analyze the situation and try out the various mitigation ideas as well as confirming that Michael's fix works for our prod issue. 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: In Progress
Development

No branches or pull requests

6 participants