cmd/compile: per-phase compiler benchmarking #16169

mdempsky · 2016-06-23T21:18:47Z

Right now we can use compilebench to measure the compiler's overall run time, and we can use -cpuprofile to measure fine-grained per-function/instruction profiles. But with the interest in continuing to evolve/rewrite individual phases of the compiler, I think it would be nice to track the compiler's performance on a per-phase granularity.

Rough proposal sketch:

Add a flag to cmd/compile like -phasebench=prefix:outfile.
Add instrumentation to cmd/compile/internal/gc/main.go so that when phasebench is enabled, it will append benchmark data formatted entries to outfile of the form "prefix-parse", "prefix-typecheck", etc.
compilebench can then optionally make use of this new flag, and we can start tracking the more detailed information.

josharian · 2016-06-23T21:29:46Z

Yes please! I assume you can pass multiple phase:outfile pairs?

How about just -phasebench=prefix and benchmark all phases, with a phase-specific suffix? I worry that while optimizing one phase decisions might get made that impact other phases without getting noticed. In particular, frontend/backend interactions can be like this. As a concrete example, I have a CL waiting for 1.8 that removes some empty goto-label pairs that were introduced as a frontend convenience, the removal of which cuts overall compiler allocation by 1%.

mdempsky · 2016-06-23T21:33:33Z

To be clear, my intention was -phasebench=prefix:outfile would measure all phases, writing the benchmarks to outfile, each prefixed by the string prefix. The idea being so compilebench could use -phasebench=BenchmarkGoTypes:/tmp/benchmark.txt, -phasebench=BenchmarkUnicode:/tmp/benchmark.txt, etc. and the outputs would be lines labeled BenchmarkGoTypes-parse, BenchmarkGoTypes-typecheck, BenchmarkUnicode-parse, BenchmarkUnicode-typecheck, etc.

griesemer · 2016-06-23T21:37:09Z

Yes. Funny coincidence, I spent this morning experimenting with adding structured statistics code to the front-end. This is an item on my TODOs.

My thinking at the moment:

There's the notion of a Measurement (an interface, abstracts over a time/duration, count, or other measure).
There are Groups, which collect multiple Measurements. Depending on what it is, it may accumulate/average the data.
There are individual Measurements, such as timings, counts. etc.

The hierarchy permits collecting a lot of data in a structured (and labeled) way. The code augmentation is typically a single extra line per start/stop event.

A single "print" at the end of the compile dumps the output in a structured form.

I'm thinking that there should be a JSON variant for dumping the data to a file so that I can be processed later (i.e., so we can collect the data across many runs/times/versions and then produce graphs).

griesemer · 2016-06-23T21:45:30Z

PS: Standard-benchmark data format should also be easily possible, but I think that format might be generated after the individual measurements (e.g., a single compile produces just one data set). It's also not clear to me how one would organize more structured data in that format.

mdempsky · 2016-06-23T21:58:09Z

Certainly not opposed to a structured API. My immediate idea was to start simple with instrumentation like m := benchmarker() and then calls to m.mark("parse"), m.mark("typecheck"), etc. appropriately scattered around. If you have anything more thought out I'm all ears. :)

As for format, one of my takeaways from @aclements' presentation about github.com/aclements/go-gg was that the benchmark format may be sufficient as long as we adequately structure the benchmark names. E.g., see the examples with names like BenchmarkDecode/text:digits/level:speed/size:1e4/gomaxprocs:8.

/cc @aclements for thoughts on data format

josharian · 2016-06-23T22:10:11Z

One thing to plan for: some phases may be less clear or overlap in a more heavily concurrent compiler. Also, ssa phases already happen in a cycle, once per function.

griesemer · 2016-06-23T22:10:32Z

With "structured API" I didn't mean large framework. At the moment it's about 75 lines of code, with minimal code (one-liners) at measuring points in the front-end. The grouping mostly removes the need to think much about label names: Basically, introducing a new subgroup is a local operation; no need to rename other labels. For instance:

var stats = NewGroup("compilation of ...") // initialize compiler stats, a group of measurements
...
parse := stats.AddTiming("parse") // create a start a new time measurement
// parser invocations
parse.Stop()
...
phase1 := stats.AddTiming("phase1")
...
phase1.Stop()

etc. A bit more code than you suggest.

griesemer · 2016-06-23T22:13:46Z

That is, each time measurement gets its own Timing object which may track more than just the time. It may carry the number of bytes processed, or a count (iterations), or other things. Slightly more than just a benchmark map that tracks labels and times. But not a lot more.

mdempsky · 2016-06-23T22:28:48Z

@josharian Agreed. That's why I was thinking of using names like "typecheck" instead of "phase1" or "phase2", so that the name could remain meaningful long-term even if we end up reorganizing how the specific phase boundaries.

@griesemer Okay, yeah, that's a little more code (explicitly marking begin/end of each phase rather than making it implicit by just marking transitions), but sounds roughly isomorphic to what I was thinking.

griesemer · 2016-06-23T22:36:42Z

@mdempsky Maybe it's overkill to mark each start/end and it's good enough to just track boundaries. But it seems to me that it might be more difficult to track a specific time (say escape analysis) if some of the other phases move around. Also, it would seem more difficult to attach bytes (to compute things like source bytes parsed/time unit, etc.).

On the other hand, simpler is often better. Maybe it is good enough to just track transition times. Still exploring.

dr2chase · 2016-06-24T03:01:53Z

The ssa phases can all be timed. For example

go build -gcflags -d=ssa/all/time step_hide.go
# command-line-arguments
./step_hide.go:13:  early phielim   TIME(ns)    9458    foo
./step_hide.go:13:  early copyelim  TIME(ns)    3365    foo
./step_hide.go:13:  early deadcode  TIME(ns)    288821  foo
./step_hide.go:13:  short circuit   TIME(ns)    293607  foo
./step_hide.go:13:  decompose user  TIME(ns)    303039  foo
./step_hide.go:13:  opt TIME(ns)    2593795 foo

I think the format needs work; that seems to not conform to what I recall Austin telling me was the desired form for the benchmarking commands we already have. Right now the function name comes last because sometimes those have exciting syntax in them.

griesemer · 2016-06-24T22:09:44Z

@mdempsky, @josharian Please have a look at https://go-review.googlesource.com/24462 . After considering the feedback above I decided to abandon the structured data idea (seemed overkill) and go with a pretty simple approach, essentially along the lines of @mdempsky's proposal. I expect this to morp h a bit over time but it's a starting point.

gopherbot · 2016-06-24T23:00:14Z

CL https://golang.org/cl/24462 mentions this issue.

aclements · 2016-06-24T23:45:59Z

As for format, one of my takeaways from @aclements' presentation about github.com/aclements/go-gg was that the benchmark format may be sufficient as long as we adequately structure the benchmark names. E.g., see the examples with names like BenchmarkDecode/text:digits/level:speed/size:1e4/gomaxprocs:8.

/cc @aclements for thoughts on data format

I don't know exactly what data you want to represent, but the "extended" benchmark format is pretty flexible. It's not great at hierarchical things, but it's not clear you really want that. And, of course, you'll be able to use all of the tools that know how to speak it (which isn't much yet, but the format is young). You could have a benchmark line for each (file, phase) where the phase is a property, and results for time, allocations, etc. You might just want to name each benchmark after each phase. Something like

BenchmarkPhielem/file:step_hide.go/gomaxprocs:4 1   9458 ns/op   100 bytes-allocated    2 allocations

Maybe you want the function in there. That may be too detailed, but it would be easy to write a tool that aggregated it in various ways after the fact, e.g., using https://godoc.org/github.com/aclements/go-misc/bench.

griesemer · 2016-06-29T01:05:44Z

The pending CL now enables writing to a file and is using the benchmark data format.

Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>

rsc · 2016-10-17T17:38:12Z

This seems to be done.

mdempsky added this to the Go1.8Early milestone Jun 23, 2016

quentinmit added the NeedsFix label Oct 6, 2016

rsc closed this as completed Oct 17, 2016

golang locked and limited conversation to collaborators Oct 17, 2017

gopherbot added the FrozenDueToAge label Oct 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: per-phase compiler benchmarking #16169

cmd/compile: per-phase compiler benchmarking #16169

mdempsky commented Jun 23, 2016

josharian commented Jun 23, 2016

mdempsky commented Jun 23, 2016 •

edited

Loading

griesemer commented Jun 23, 2016

griesemer commented Jun 23, 2016

mdempsky commented Jun 23, 2016

josharian commented Jun 23, 2016

griesemer commented Jun 23, 2016

griesemer commented Jun 23, 2016 •

edited

Loading

mdempsky commented Jun 23, 2016

griesemer commented Jun 23, 2016

dr2chase commented Jun 24, 2016

griesemer commented Jun 24, 2016

gopherbot commented Jun 24, 2016

aclements commented Jun 24, 2016

griesemer commented Jun 29, 2016

rsc commented Oct 17, 2016

cmd/compile: per-phase compiler benchmarking #16169

cmd/compile: per-phase compiler benchmarking #16169

Comments

mdempsky commented Jun 23, 2016

josharian commented Jun 23, 2016

mdempsky commented Jun 23, 2016 • edited Loading

griesemer commented Jun 23, 2016

griesemer commented Jun 23, 2016

mdempsky commented Jun 23, 2016

josharian commented Jun 23, 2016

griesemer commented Jun 23, 2016

griesemer commented Jun 23, 2016 • edited Loading

mdempsky commented Jun 23, 2016

griesemer commented Jun 23, 2016

dr2chase commented Jun 24, 2016

griesemer commented Jun 24, 2016

gopherbot commented Jun 24, 2016

aclements commented Jun 24, 2016

griesemer commented Jun 29, 2016

rsc commented Oct 17, 2016

mdempsky commented Jun 23, 2016 •

edited

Loading

griesemer commented Jun 23, 2016 •

edited

Loading