-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: per-phase compiler benchmarking #16169
Comments
Yes please! I assume you can pass multiple phase:outfile pairs? How about just -phasebench=prefix and benchmark all phases, with a phase-specific suffix? I worry that while optimizing one phase decisions might get made that impact other phases without getting noticed. In particular, frontend/backend interactions can be like this. As a concrete example, I have a CL waiting for 1.8 that removes some empty goto-label pairs that were introduced as a frontend convenience, the removal of which cuts overall compiler allocation by 1%. |
To be clear, my intention was |
Yes. Funny coincidence, I spent this morning experimenting with adding structured statistics code to the front-end. This is an item on my TODOs. My thinking at the moment:
The hierarchy permits collecting a lot of data in a structured (and labeled) way. The code augmentation is typically a single extra line per start/stop event. A single "print" at the end of the compile dumps the output in a structured form. I'm thinking that there should be a JSON variant for dumping the data to a file so that I can be processed later (i.e., so we can collect the data across many runs/times/versions and then produce graphs). |
PS: Standard-benchmark data format should also be easily possible, but I think that format might be generated after the individual measurements (e.g., a single compile produces just one data set). It's also not clear to me how one would organize more structured data in that format. |
Certainly not opposed to a structured API. My immediate idea was to start simple with instrumentation like As for format, one of my takeaways from @aclements' presentation about github.com/aclements/go-gg was that the benchmark format may be sufficient as long as we adequately structure the benchmark names. E.g., see the examples with names like /cc @aclements for thoughts on data format |
One thing to plan for: some phases may be less clear or overlap in a more heavily concurrent compiler. Also, ssa phases already happen in a cycle, once per function. |
With "structured API" I didn't mean large framework. At the moment it's about 75 lines of code, with minimal code (one-liners) at measuring points in the front-end. The grouping mostly removes the need to think much about label names: Basically, introducing a new subgroup is a local operation; no need to rename other labels. For instance:
etc. A bit more code than you suggest. |
That is, each time measurement gets its own Timing object which may track more than just the time. It may carry the number of bytes processed, or a count (iterations), or other things. Slightly more than just a benchmark map that tracks labels and times. But not a lot more. |
@josharian Agreed. That's why I was thinking of using names like "typecheck" instead of "phase1" or "phase2", so that the name could remain meaningful long-term even if we end up reorganizing how the specific phase boundaries. @griesemer Okay, yeah, that's a little more code (explicitly marking begin/end of each phase rather than making it implicit by just marking transitions), but sounds roughly isomorphic to what I was thinking. |
@mdempsky Maybe it's overkill to mark each start/end and it's good enough to just track boundaries. But it seems to me that it might be more difficult to track a specific time (say escape analysis) if some of the other phases move around. Also, it would seem more difficult to attach bytes (to compute things like source bytes parsed/time unit, etc.). On the other hand, simpler is often better. Maybe it is good enough to just track transition times. Still exploring. |
The ssa phases can all be timed. For example
I think the format needs work; that seems to not conform to what I recall Austin telling me was the desired form for the benchmarking commands we already have. Right now the function name comes last because sometimes those have exciting syntax in them. |
@mdempsky, @josharian Please have a look at https://go-review.googlesource.com/24462 . After considering the feedback above I decided to abandon the structured data idea (seemed overkill) and go with a pretty simple approach, essentially along the lines of @mdempsky's proposal. I expect this to morp h a bit over time but it's a starting point. |
CL https://golang.org/cl/24462 mentions this issue. |
I don't know exactly what data you want to represent, but the "extended" benchmark format is pretty flexible. It's not great at hierarchical things, but it's not clear you really want that. And, of course, you'll be able to use all of the tools that know how to speak it (which isn't much yet, but the format is young). You could have a benchmark line for each (file, phase) where the phase is a property, and results for time, allocations, etc. You might just want to name each benchmark after each phase. Something like
Maybe you want the function in there. That may be too detailed, but it would be easy to write a tool that aggregated it in various ways after the fact, e.g., using https://godoc.org/github.com/aclements/go-misc/bench. |
The pending CL now enables writing to a file and is using the benchmark data format. |
Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
This seems to be done. |
Right now we can use compilebench to measure the compiler's overall run time, and we can use -cpuprofile to measure fine-grained per-function/instruction profiles. But with the interest in continuing to evolve/rewrite individual phases of the compiler, I think it would be nice to track the compiler's performance on a per-phase granularity.
Rough proposal sketch:
The text was updated successfully, but these errors were encountered: