cmd/compile: -bench should correct for GC #17434

mdempsky · 2016-10-13T18:00:21Z

Currently -bench output is very sensitive to GC effects. For example:

Changing allocations in phase A might cause a GC cycle to shift from phase B to phase C, which can look like an improvement to B and a regression for phase C.
Reducing long-lived memory pressure from earlier phases gets credited to later phases, as the later phases benefit from reduced GC costs.

This makes it hard to isolate performance improvements from frontend vs backend changes.

I'm considering a few possible improvements to -bench:

Record GC pause times, and subtract them from phase times.
Record allocation stats for each phase.
Explicit GC cycle between FE and BE so we can measure how much live memory the FE has left for the BE to work with.

Any other suggestions and/or implementation advice?

/cc @griesemer @rsc @aclements

aclements · 2016-10-13T19:28:01Z

Record GC pause times, and subtract them from phase times.

I don't see why this would help. GC pause times are close to 0 and getting closer. It's not the pauses that are the problem, it's the CPU taken away from the compiler during the concurrent phase.

Record allocation stats for each phase.

This seems like a good idea in general.

Explicit GC cycle between FE and BE so we can measure how much live memory the FE has left for the BE to work with.

Adding explicit GCs between phases seems necessary if you're going to isolate the performance of the different phases. I'm not sure what you mean by "live memory the FE has left" since live memory isn't something that's left, but I don't think this is about measurement anyway. Doing an explicit GC between phases resets the pacing so the scheduling of GCs during each phase is much closer to independent from the other phases (not exactly, since a change in the live memory remaining after an earlier phase can still affect the GC scheduling in a later phase, but you'll be much closer to independence).

mdempsky · 2016-10-13T21:06:35Z

I don't see why this would help. GC pause times are close to 0 and getting closer. It's not the pauses that are the problem, it's the CPU taken away from the compiler during the concurrent phase.

I see. I said "GC pause times" just because that's the only time duration that I could see in package runtime.MemStats or runtime/debug.GCStats, and I naively assumed it somehow represented total GC overhead. I guess it actually means only STW time?

Is there a good way to measure CPU cost from the concurrent phase? Also, currently I think we only measure per-phase wallclock time. I wonder if we need to measure per-phase CPU-seconds instead, since the GC is concurrent (and possibly the compiler itself will be too, in the future).

I'm not sure what you mean by "live memory the FE has left" since live memory isn't something that's left, but I don't think this is about measurement anyway.

I meant (for example) to make an explicit runtime.GC() call at the end of the frontend phases and record the runtime.MemStats.Heap{Alloc,Objects} values. The hypothesis being that 1) they represent how much data the FE has allocated that will continue to remain live throughout the BE phases, and 2) improving those numbers should reduce the amount of GC work necessary during the BE phases. Is that sound, or is my model of GC effects too naive?

aclements · 2016-10-13T21:31:47Z

I see. I said "GC pause times" just because that's the only time duration that I could see in package runtime.MemStats or runtime/debug.GCStats, and I naively assumed it somehow represented total GC overhead. I guess it actually means only STW time?

Right. The only thing in MemStats that accounts for concurrent GC time is GCCPUFraction, but I don't think that would help here.

Also, currently I think we only measure per-phase wallclock time. I wonder if we need to measure per-phase CPU-seconds instead, since the GC is concurrent (and possibly the compiler itself will be too, in the future).

I'm not so sure. What people generally care about when they're running the compiler is how long it took, not how many CPU-seconds it took.

I meant (for example) to make an explicit runtime.GC() call at the end of the frontend phases and record the runtime.MemStats.Heap{Alloc,Objects} values. The hypothesis being that 1) they represent how much data the FE has allocated that will continue to remain live throughout the BE phases, and 2) improving those numbers should reduce the amount of GC work necessary during the BE phases. Is that sound, or is my model of GC effects too naive?

I think that's a good thing to measure, however, the effect is somewhat secondary to just how many allocations the FE does. To a first order, if the FE retained set doubles, each GC during the BE will cost twice as much but they will happen half as often, so the total cost doesn't change. (It does matter to a second order, since longer GCs are less efficient GCs because of write barrier overheads and floating garbage.)

However, my point about running a GC between phases just to reset the GC pacing still stands. Imagine the GC runs exactly 1 second, 2 seconds, etc. after the process starts; if you change the time some phase takes, all of the later phases will line up with the GC ticks differently, causing fluctuations in measured performance. runtime.GC() lets you reset the clock, so if you do it at the beginning of each compiler phase, only that phase's "timing" will matter for its own measurement. The GC actually runs in logical "heap time", but the analogy is quite close.

quentinmit · 2016-10-17T19:23:47Z

It seems like -bench should turn on an explicit GC at the end of each phase, counted against that phase's timing.

quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 17, 2016

quentinmit added this to the Go1.8Maybe milestone Oct 17, 2016

rsc modified the milestones: Go1.9, Go1.8Maybe Oct 20, 2016

josharian modified the milestones: Go1.10, Go1.9 May 11, 2017

bradfitz modified the milestones: Go1.10, Go1.11 Nov 29, 2017

gopherbot modified the milestones: Go1.11, Unplanned May 23, 2018

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: -bench should correct for GC #17434

cmd/compile: -bench should correct for GC #17434

mdempsky commented Oct 13, 2016

aclements commented Oct 13, 2016

mdempsky commented Oct 13, 2016 •

edited

aclements commented Oct 13, 2016

quentinmit commented Oct 17, 2016

cmd/compile: -bench should correct for GC #17434

cmd/compile: -bench should correct for GC #17434

Comments

mdempsky commented Oct 13, 2016

aclements commented Oct 13, 2016

mdempsky commented Oct 13, 2016 • edited

aclements commented Oct 13, 2016

quentinmit commented Oct 17, 2016

mdempsky commented Oct 13, 2016 •

edited