runtime: hot vars and cache lines #14980

josharian · 2016-03-26T21:15:44Z

Naive question. The runtime has a bunch of top-level vars, some of which are fairly hot, e.g. the writeBarrier struct (checked before every write barrier call), the debug struct (checked during every malloc for e.g. allocfreetrace), and the trace struct (to know whether tracing is enabled). Some are written a lot (writeBarrier), whereas others are read-mostly (debug, trace).

They are organized for readability and thus end up potentially scattered around the final binary. However, I wonder whether it would be better to ensure that all the hottest read-mostly variables are in a single cache line and ensure that the hottest read-write variables don't trigger false sharing.

Many of these aren't easy to move around and experiment with, because of compiler integration. So first: Any instincts about whether this is likely to matter in practice?

cc @dvyukov @aclements

randall77 · 2016-03-26T21:50:44Z

Having a few hot reads scattered about cache lines instead of all in one cache line shouldn't matter much. It will only take a tiny bit more work and cache space to cache them.

Keeping hot writes away from each other (and from other hot reads) will matter much more.

Is writeBarrier really written that much? Just twice per GC cycle, as far as I can tell.

aclements · 2016-03-26T23:18:11Z

I don't recall seeing serious contention on any globals when I ran https://godoc.org/github.com/aclements/go-perf/cmd/memlat, but that was quite a while ago and I wasn't necessarily looking. It would be easy enough to run that again. Particularly if it's run on a multi-node system, any globals with poor cacheability or false sharing should stick out as expensive remote DRAM events.

It would also be easy enough to crank up the PEBS recording rate and just write a simple tool to look for hot globals. Sort of like https://godoc.org/github.com/aclements/go-perf/cmd/memanim, but obviously looking for different things in the memory trace. With memanim, I found the hardware could easily record every single load over 50 cycles.

If we do find any, the cheap solution is to add padding variables around them. We already do this in a few places (grep for CacheLineSize), but I think those are all based on assumptions about hot cache lines and aren't backed up by measurements.

josharian · 2016-03-29T06:24:14Z

Thanks, Keith and Austin. I don't have a linux machine lying around now, but I should soon(ish), and I will play with this then.

dvyukov · 2016-04-03T06:51:41Z

Frequent write sharing can be very expensive and prevent scaling on higher core counts. We need to get rid of each and every case.
But note that processors don't have circuitry to distinguish between false and true sharing. They penalize both equally. So it is not about adding padding and shuffling variables, it is about elimination of frequently written to variables. You can see the following changes for examples:
d6ed1b7
d839a80
66d5c9b
909f318
013ad89
c9152a8
86e7323
And scheduler (distributed run queues), memory allocator (MCache) and parallel GC (Workbuf, parfor) were designed around the idea of not creating heavy write sharing in the first place.
If new instances of frequently written to variables were added since then, we need to get rid of them as well.

josharian added this to the Unplanned milestone Mar 26, 2016

ALTree added the NeedsInvestigation label Dec 4, 2019

gopherbot added the compiler/runtime label Jul 7, 2022

mknyszek added this to Go Compiler / Runtime Jul 7, 2022

mknyszek removed this from Go Compiler / Runtime Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: hot vars and cache lines #14980

runtime: hot vars and cache lines #14980

josharian commented Mar 26, 2016

randall77 commented Mar 26, 2016

aclements commented Mar 26, 2016

josharian commented Mar 29, 2016

dvyukov commented Apr 3, 2016

runtime: hot vars and cache lines #14980

runtime: hot vars and cache lines #14980

Comments

josharian commented Mar 26, 2016

randall77 commented Mar 26, 2016

aclements commented Mar 26, 2016

josharian commented Mar 29, 2016

dvyukov commented Apr 3, 2016