Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: sloppy struct field arrangement could be re-arranged for more compact structs and save RAM #42412

Closed
odeke-em opened this issue Nov 6, 2020 · 9 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@odeke-em
Copy link
Member

odeke-em commented Nov 6, 2020

Coming here from tip ecc3f51, with one of the static analysis tools, we have developed at Orijtech, Inc., we've found a bunch of fields in the Go standard library that could consume less space
for example, given

go/src/runtime/trace.go

Lines 760 to 769 in ecc3f51

// traceStack is a single stack in traceStackTable.
type traceStack struct {
link traceStackPtr
hash uintptr
id uint32
n int
stk [0]uintptr // real type [n]uintptr
}
type traceStackPtr uintptr

the size on 64-bit machines is 40 bytes, but it could really be 32 bytes by re-arranging it to

type traceStack struct {
	stk  [0]uintptr
	link traceStackPtr
	hash uintptr
	n    int
	id   uint32
}

and here is the exhibit https://play.golang.org/p/bG4u6hqXZvv or inlined below

package main

import "unsafe"

type traceStack1 struct {
	link traceStackPtr
	hash uintptr
	id   uint32
	n    int
	stk  [0]uintptr // real type [n]uintptr
}

type traceStackPtr uintptr

type traceStack2 struct {
	stk  [0]uintptr
	link traceStackPtr
	hash uintptr
	n    int
	id   uint32
}

func main() {
	println(unsafe.Sizeof(traceStack1{}))
	println(unsafe.Sizeof(traceStack2{}))
}

which when run shows

40
32

which is a 20% reduction in RAM that'll be used in traceStack

There are about 8 reported optimal changes like

/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/mfixalloc.go:27:15: struct has size 72 (size class 80), could be 64 (size class 64), rearrange to struct{size uintptr; first func(arg unsafe.Pointer, p unsafe.Pointer); arg unsafe.Pointer; list *mlink; chunk uintptr; inuse uintptr; stat *sysMemStat; nchunk uint32; zero bool} for optimal size (20.00% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/mgc.go:1009:10: struct has size 400 (size class 416), could be 376 (size class 384), rearrange to struct{wbufSpans struct{lock mutex; free mSpanList; busy mSpanList}; assistQueue struct{lock mutex; q gQueue}; sweepWaiters struct{lock mutex; list gList}; heap2 uint64; heap1 uint64; bytesMarked uint64; heap0 uint64; pauseStart int64; pauseNS int64; tstart int64; tEnd int64; nFlushCacheRoots int; empty lfstack; nBSSRoots int; nSpanRoots int; nStackRoots int; tMarkTerm int64; tMark int64; bgMarkReady note; full lfstack; mode gcMode; tSweepTerm int64; totaltime int64; initialHeapLive uint64; nDataRoots int; heapGoal uint64; cycles uint32; stwprocs int32; maxprocs int32; _ uint32; markDoneSema uint32; startSema uint32; nwait uint32; nproc uint32; markrootJobs uint32; markrootNext uint32; bgMarkDone uint32; pad0 cpu.CacheLinePad; userForced bool} for optimal size (7.69% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/mgcscavenge.go:161:14: struct has size 40 (size class 48), could be 32 (size class 32), rearrange to struct{lock mutex; g *g; timer *timer; sysmonWake uint32; parked bool} for optimal size (33.33% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/proc.go:1977:17: struct has size 40 (size class 48), could be 32 (size class 32), rearrange to struct{lock mutex; newm muintptr; wake note; haveTemplateThread uint32; waiting bool} for optimal size (33.33% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/proc.go:5159:17: struct has size 32 (size class 32), could be 24 (size class 24), rearrange to struct{schedwhen int64; syscallwhen int64; schedtick uint32; syscalltick uint32} for optimal size (25.00% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/runtime2.go:753:10: struct has size 32 (size class 32), could be 24 (size class 24), rearrange to struct{runnable gQueue; n int32; user bool} for optimal size (25.00% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/trace.go:761:17: struct has size 40 (size class 48), could be 32 (size class 32), rearrange to struct{stk [0]uintptr; link traceStackPtr; hash uintptr; n int; id uint32} for optimal size (33.33% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/runtime_test.go:265:10: struct has size 16 (size class 16), could be 8 (size class 8), rearrange to struct{z struct{}; n int64} for optimal size (50.00% savings)

We plan on releasing the tool as open source code with a permissive license, and it is already an x/tools/go/analysis/pass that can be plugged into IDEs or into cmd/vet if needed later on.

Cheers to @cuonglm our core developer at Orijtech, whom we tasked with writing it!

EDIT: now accounts for size classes!

@davecheney
Copy link
Contributor

How does this interact with size classes? I’m sure there is a 32byte size class, but if 40 bytes is rounded up to 48 the saving could be even larger.

@odeke-em
Copy link
Member Author

odeke-em commented Nov 6, 2020

Great question @davecheney, and thank you!
@josharian also independently raised this to me offline. We don't yet have that baked in and
hadn't thought about, but I've filed an issue internal issue after @josharian's same question :)

@odeke-em
Copy link
Member Author

odeke-em commented Nov 6, 2020

Thanks @davecheney @josharian, we now account for size classes and also show the percentage savings too #GEICOLizardSavings for example now

/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/mfixalloc.go:27:15: struct has size 72 (size class 80), could be 64 (size class 64), rearrange to struct{size uintptr; first func(arg unsafe.Pointer, p unsafe.Pointer); arg unsafe.Pointer; list *mlink; chunk uintptr; inuse uintptr; stat *sysMemStat; nchunk uint32; zero bool} for optimal size (20.00% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/mgc.go:1009:10: struct has size 400 (size class 416), could be 376 (size class 384), rearrange to struct{wbufSpans struct{lock mutex; free mSpanList; busy mSpanList}; assistQueue struct{lock mutex; q gQueue}; sweepWaiters struct{lock mutex; list gList}; heap2 uint64; heap1 uint64; bytesMarked uint64; heap0 uint64; pauseStart int64; pauseNS int64; tstart int64; tEnd int64; nFlushCacheRoots int; empty lfstack; nBSSRoots int; nSpanRoots int; nStackRoots int; tMarkTerm int64; tMark int64; bgMarkReady note; full lfstack; mode gcMode; tSweepTerm int64; totaltime int64; initialHeapLive uint64; nDataRoots int; heapGoal uint64; cycles uint32; stwprocs int32; maxprocs int32; _ uint32; markDoneSema uint32; startSema uint32; nwait uint32; nproc uint32; markrootJobs uint32; markrootNext uint32; bgMarkDone uint32; pad0 cpu.CacheLinePad; userForced bool} for optimal size (7.69% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/mgcscavenge.go:161:14: struct has size 40 (size class 48), could be 32 (size class 32), rearrange to struct{lock mutex; g *g; timer *timer; sysmonWake uint32; parked bool} for optimal size (33.33% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/proc.go:1977:17: struct has size 40 (size class 48), could be 32 (size class 32), rearrange to struct{lock mutex; newm muintptr; wake note; haveTemplateThread uint32; waiting bool} for optimal size (33.33% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/proc.go:5159:17: struct has size 32 (size class 32), could be 24 (size class 24), rearrange to struct{schedwhen int64; syscallwhen int64; schedtick uint32; syscalltick uint32} for optimal size (25.00% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/runtime2.go:753:10: struct has size 32 (size class 32), could be 24 (size class 24), rearrange to struct{runnable gQueue; n int32; user bool} for optimal size (25.00% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/trace.go:761:17: struct has size 40 (size class 48), could be 32 (size class 32), rearrange to struct{stk [0]uintptr; link traceStackPtr; hash uintptr; n int; id uint32} for optimal size (33.33% savings)
/Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/runtime_test.go:265:10: struct has size 16 (size class 16), could be 8 (size class 8), rearrange to struct{z struct{}; n int64} for optimal size (50.00% savings)

@martisch
Copy link
Contributor

martisch commented Nov 6, 2020

Note that there are other reasons why a different struct layout then the most packed may be wanted:

  • alignment of some fields for more performant access might be better not packing everything together
  • fields that are accessed together should be in the same cache line
  • frequently accessed fields at the beginning so the rest of the struct stays cold in cache
  • adding a frequently accessed field (e.g. len) of a map at the beginning so the instruction constructing the pointer to the field is short which affects binary size
  • ordering pointers at the beginning of the struct so the garbage collector only needs to scan the beginning of the struct (this is likely not incompatible with packing but should be taken into account)
  • ordered so that a specific field is 64bit aligned on 32bit and 64bit architectures

@BenLubar
Copy link

BenLubar commented Nov 6, 2020

If the [0]uintptr field is moved from the end of the struct, it would need to be statically sized rather than dynamically sized.

@toothrot toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 6, 2020
@toothrot toothrot added this to the Backlog milestone Nov 6, 2020
@toothrot
Copy link
Contributor

toothrot commented Nov 6, 2020

@mknyszek
Copy link
Contributor

mknyszek commented Nov 6, 2020

In general, we've already analyzed this tradeoff for a lot of the popular runtime structures (e.g. mspan), though I'm sure there are some we missed. Patches are welcome of course to fix those up; I don't think we recently ran an automated tool against the source.

Broadly-speaking, I do think a lot of the big low-hanging fruit has already been found, just based on my own familiarity with the runtime and my discussions with other people who work on the runtime. For instance, we already track the sizes of some of these structures explicitly in tests to avoid accidental growth (e.g. the g struct). @martisch gave a very good list of reasons that we explicitly don't do this. For a real-world example of one of these, the first few fields g struct I think might not be optimally ordered, but these fields are accessed quite often so we'd like to keep them together, in the same cache line ideally.

As @BenLubar notes in this particular case we can't actually move the [0]uinptr because it's one of those borrowed-from-C variable-sized-struct tricks. The size of this type isn't actually 40 bytes. A smaller "header" on this type is still probably useful, though I'm not even sure if this type gets heap allocated. Someone would need to take a closer look, but feel free to send a CL and we can figure it out.

@rsc
Copy link
Contributor

rsc commented Nov 6, 2020

Rearranging fields for struct packing is like writing code in assembly - there may be a performance win but it also makes code harder to maintain, and not all struct can be reordered. The cost here is only worth paying if there is a benefit. So just like an assembly function needs to be a CPU time hot spot, careful struct packing needs to be a memory hotspot - a struct there are many of in a typical program, so that it is a significant source of memory usage.

Looking at the ones you identified:

  • src/runtime/mfixalloc.go:27 (type fixalloc) - Only 5 of these in the entire runtime.
  • src/runtime/mgc.go:1009 (var work) - Only 1 of these (it's a var not a type).
  • src/runtime/mgcscavenge.go:161 (var scavenge) - Only 1.
  • src/runtime/proc.go:1977 (var newmHandoff) - Only 1.
  • src/runtime/proc.go:5159 (type sysmontick) - One per p, so size classes don't matter, and the p is big anyway and there are relatively few.
  • src/runtime/runtime2.go:753 (struct inside type schedt) - Only 1 schedt in program, and this is a struct inside a struct, so size class doesn't directly matter.
  • src/runtime/trace.go:761 (type traceStack) - As noted, this is not allocated using the usual allocator and can't be rearranged.
  • src/runtime/runtime_test.go:265 (type T2 inside TestTrailingZero) - This is in a test.

I would strongly suggest trying to combine your tool with memory profiling results, so that it can focus on the cases that matter most.

@rsc
Copy link
Contributor

rsc commented Nov 7, 2020

Given that none of these would be a significant savings, closing this issue.

@rsc rsc closed this as completed Nov 7, 2020
@golang golang locked and limited conversation to collaborators Nov 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

8 participants