cmd/compile: PGO opportunities umbrella issue #62463

aclements · 2023-09-05T19:58:32Z

This issue is to track the list of PGO optimization opportunities we're considering. As we begin work on any of these, it should be broken into its own issue. We'll edit and add to this list over time.

(This list was originally based on an old comment of mine, and this issue is partly to surface and track this list better.)

Any basic block-level optimizations likely depend on profile discriminators (#59612).

erifan · 2023-09-06T01:46:06Z

For this one "Select between branches and conditional MOVs", some architecture features such as Arm's Branch Record Buffer Extension (BRBE) may help.
By the way, are the above items something that Google is considering doing or is doing? Or is it just a list of possible optimization opportunities for PGO that anyone who is interested can do?

myaaaaaaaaa · 2023-09-06T04:39:10Z

Another possibility could be to automatically apply Function Multi-Versioning to hot functions with large loops, which would greatly improve the performance of math-heavy workloads.

Intel's Clear Linux uses this approach to automatically take advantage of modern CPU instructions while still being backwards-compatible with x86-64-v1.

From my understanding, Go doesn't have a particularly mature auto-vectorizer, but having PGO-guided FMV should allow for new opportunities in this area.

cherrymui · 2023-09-06T18:26:08Z

@erifan LBR support is on our plan, and we've been thinking about it. I'm not sure it belongs to this issue, though (perhaps it could be).

aclements · 2023-09-06T19:27:25Z

For this one "Select between branches and conditional MOVs", some architecture features such as Arm's Branch Record Buffer Extension (BRBE) may help.

@erifan Does BRBE record conditional moves? That would certainly be nice. x86's LBR does not, which would make this pretty tricky on x86.

By the way, are the above items something that Google is considering doing or is doing? Or is it just a list of possible optimization opportunities for PGO that anyone who is interested can do?

These are things we're considering doing, but we're not staking a claim or anything. :) Currently, we're definitely working on "Indirect call devirtualization" (1.21 had a version of this, but with significant limitations we're hoping to lift in 1.22). We're also thinking seriously about "Dynamic escapes on cold paths", but I don't think we've started implementing it. We haven't made inroads on the others ourselves. I believe Uber has done some work on "Function ordering", but I haven't heard any updates on that in quite a while.

aclements · 2023-09-06T19:33:14Z

Another possibility could be to automatically apply Function Multi-Versioning to hot functions with large loops, which would greatly improve the performance of math-heavy workloads.

@myaaaaaaaaa , thanks. That's what I meant by "Architecture feature check unswitching", but I've added the term "function multi-versioning" to that in my list. Function multi-versioning is a specific way to do this, and does have a nice advantage that if you have a call from A -> B and both A and B are multi-versioned, A can make a direct call to the right version of B.

erifan · 2023-09-07T05:11:33Z

Thanks @cherrymui
@aclements I just took a look at the BRBE documentation and it doesn't record conditional moves either.

myaaaaaaaaa · 2023-10-02T23:43:55Z

There may also be an opportunity to scan for code regions that can be safely parallelized (such as pure functions/loops), and automatically rewrite them to launch as separate goroutines that send their results back through a channel, effectively implementing a function-scale version of instruction-level parallelism.

This would normally risk introducing synchronization/context switching overhead, but PGO would allow the compiler to apply this optimization only to functions/loops that typically run for a long time every invocation (say, 1-10ms).

I'm unsure how broadly applicable this would be in practice. On the other hand, I would imagine that successfully detecting just a few functions can easily create enough parallelism to saturate even 32-thread machines, since goroutine counts would increase exponentially with every parallelized function in the stack.

felixge · 2023-10-03T07:03:12Z

but PGO would allow the compiler to apply this optimization only to functions/loops that typically run for a long time every invocation (say, 1-10ms).

I don’t think PGO (CPU Profiles) know anything about the number of a times a function is invoked or how long those invocations last?

shoham-b · 2023-12-28T22:51:42Z

All the pre-size could also be PGOed dynamically.
This can add a more dedicated exact PGO.
I know .NET added dynamic profiling and people were happy about it.

Dynamically not escaping maybe could also benefit from that, but I'm not sure. It would be something like the escape if the branch was used, but also escape at the beginning of the call if the dynamic PGO figured this should not be optimized.

We could even use OSR for all the other optimizations, but that is probably a bit too much for now.

aclements added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 5, 2023

aclements added this to the Backlog milestone Sep 5, 2023

aclements mentioned this issue Sep 5, 2023

cmd/compile: profile-guided optimization #55022

Closed

cherrymui added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. umbrella labels Sep 5, 2023

nvanbenschoten mentioned this issue Sep 8, 2023

build: use profile-guided optimization cockroachdb/cockroach#110262

Open

cherrymui mentioned this issue Oct 3, 2023

cmd/compile: feedback-guided optimization #28262

Closed

prattmic mentioned this issue Nov 27, 2023

runtime/pprof: compatibility guarantees for pgo file format #64394

Open

aclements mentioned this issue Dec 28, 2023

cmd/compile: add unrolling stage for automatic loop unrolling #51302

Open

alexanius mentioned this issue Feb 2, 2024

cmd/compile: add basic block counters for PGO #65466

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: PGO opportunities umbrella issue #62463

cmd/compile: PGO opportunities umbrella issue #62463

aclements commented Sep 5, 2023 •

edited

erifan commented Sep 6, 2023

myaaaaaaaaa commented Sep 6, 2023

cherrymui commented Sep 6, 2023

aclements commented Sep 6, 2023

aclements commented Sep 6, 2023

erifan commented Sep 7, 2023

myaaaaaaaaa commented Oct 2, 2023 •

edited

felixge commented Oct 3, 2023

shoham-b commented Dec 28, 2023 •

edited

cmd/compile: PGO opportunities umbrella issue #62463

cmd/compile: PGO opportunities umbrella issue #62463

Comments

aclements commented Sep 5, 2023 • edited

erifan commented Sep 6, 2023

myaaaaaaaaa commented Sep 6, 2023

cherrymui commented Sep 6, 2023

aclements commented Sep 6, 2023

aclements commented Sep 6, 2023

erifan commented Sep 7, 2023

myaaaaaaaaa commented Oct 2, 2023 • edited

felixge commented Oct 3, 2023

shoham-b commented Dec 28, 2023 • edited

aclements commented Sep 5, 2023 •

edited

myaaaaaaaaa commented Oct 2, 2023 •

edited

shoham-b commented Dec 28, 2023 •

edited