Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: overhaul inliner #61502

Open
aclements opened this issue Jul 21, 2023 · 23 comments
Open

cmd/compile: overhaul inliner #61502

aclements opened this issue Jul 21, 2023 · 23 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Milestone

Comments

@aclements
Copy link
Member

We’re starting a project to overhaul the inlining optimization pass in the Go compiler, with the goal of having a far more effective inliner in Go 1.22. This issue will track this work. This document motivates this project and outlines our general design goals. More detailed design documents will follow.

This project is being driven by @thanm and @mdempsky, but we would love to have input from the community. Inlining is inherently very heuristic and can be a deep rabbit hole to go down, so I’m sure many of our design decisions will be motivated by pragmatism and trying to ship something better than what we have for Go 1.22, even if it’s not perfect, with the goal of further improving it in later releases.

@aclements aclements added Performance compiler/runtime Issues related to the Go compiler and/or runtime. labels Jul 21, 2023
@aclements aclements added this to the Go1.22 milestone Jul 21, 2023
@gopherbot
Copy link

Change https://go.dev/cl/511565 mentions this issue: cmd/compile/internal/inline: add call site flag generation

@gopherbot
Copy link

Change https://go.dev/cl/511560 mentions this issue: cmd/compile/internal/ir: export 'reassigned', handle OASOP

@gopherbot
Copy link

Change https://go.dev/cl/511559 mentions this issue: cmd/compile/internal/inline: analyze function return properties

@gopherbot
Copy link

Change https://go.dev/cl/511555 mentions this issue: internal/goexperiment: add "InlinerRevamp" experiment

@gopherbot
Copy link

Change https://go.dev/cl/511566 mentions this issue: cmd/compile/internal/inline/inlheur: assign scores to callsites

@gopherbot
Copy link

Change https://go.dev/cl/511564 mentions this issue: cmd/compile/internal/inline: build call site table

@gopherbot
Copy link

Change https://go.dev/cl/511562 mentions this issue: cmd/compile: write "properties" to export data for inlinable funcs

@gopherbot
Copy link

Change https://go.dev/cl/511557 mentions this issue: cmd/compile/internal/inline: add framework to compute func "properties"

@gopherbot
Copy link

Change https://go.dev/cl/511561 mentions this issue: cmd/compile/internal/inline: analyze function param properties

@gopherbot
Copy link

Change https://go.dev/cl/511558 mentions this issue: cmd/compile/internal/inline: panic/exit flags analysis for inline heuristics

@gopherbot
Copy link

Change https://go.dev/cl/511563 mentions this issue: cmd/compile/internal/inline: extend panic/exit flag computation

@gopherbot
Copy link

Change https://go.dev/cl/511556 mentions this issue: cmd/compile: function "property" defs for inl heuristics

@josharian
Copy link
Contributor

Christmas in July!

@aclements
Copy link
Member Author

The 1.22 RC is planned for December, so I think it'll be more like Christmas in December. 😄

@CAFxX
Copy link
Contributor

CAFxX commented Jul 24, 2023

Cost-based inline scheduling. [...] For example, a common pattern in Go is to split computations into a small fast path function and a large slow path function, where the fast path is intended to be inlined and calls the large function if the fast path conditions aren’t satisfied.

I would argue that pattern is more of a workaround for the current inliner limitations than a deliberate one aiding readability, so if possible it would be great to avoid ossifying it further. Instead, especially since PGO is now available, I would rather suggest we should go in the direction (not necessarily as part of this overhaul) of supporting also a form of function splitting - so that if the compiler detects what is basically a hot fast path and one or more cold paths in the same function, the fast path is inlined in the callers and the cold path(s) are left where they are. This would allow to curb the proliferation of the xxx/xxxSlow functions in the standard library and elsewhere.

@cherrymui
Copy link
Member

Function splitting/outlining is tricker, especially for tracebacks and APIs that converts between function and PC (e.g. runtime.FuncForPC). It is probably solvable, but I'm not sure it is within the scope of this issue.

@evanphx
Copy link
Contributor

evanphx commented Jul 26, 2023

@aclements So excited for this work! I'm curious if there is a corpus of code that the team is planning to use to guide the changes (a corpus would probably also function as a test bed).

@aclements
Copy link
Member Author

@evanphx, for static metrics, we're doing some experiments over the Go module cache (that is, basically all public Go modules). We aren't really at performance metrics yet, but the basic corpus we'll use is golang.org/x/benchmarks.

gopherbot pushed a commit that referenced this issue Aug 2, 2023
Add "NewInliner" to the list of Go experiments, used for enabling an
updated/improved version of the function inlining phase within the Go
compiler. 

Updates #61502.

Change-Id: I3218b3ae59a2d05156e8017cd9ee1d7b66cad031
Reviewed-on: https://go-review.googlesource.com/c/go/+/511555
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/517595 mentions this issue: dashboard: add linux-amd64-newinliner builder and trybot

gopherbot pushed a commit that referenced this issue Aug 10, 2023
Add definitions for a set of Go function "properties" intended to be
useful for driving inlining decisions. This CL just defines a set of
flags and a container to hold them; a subsequent CL will add code to
compute the properties for a function given its IR/AST representation.

Updates #61502.

Change-Id: Ifa26c1ad055c02ca0ce9cf37078cee7b3385e18a
Reviewed-on: https://go-review.googlesource.com/c/go/+/511556
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
gopherbot pushed a commit that referenced this issue Aug 10, 2023
Add some machinery to support computing function "properties" for use
in driving inlining heuristics, and a unit testing framework to check
to see if the property computations are correct for a given set of
canned Go source files. This CL is mainly the analysis skeleton and a
testing framework; the code to compute the actual props will arrive in
a later patch.

Updates #61502.

Change-Id: I7970b64f713d17d7fdd7e8e9ccc7d9b0490571bf
Reviewed-on: https://go-review.googlesource.com/c/go/+/511557
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
gopherbot pushed a commit that referenced this issue Aug 10, 2023
Rename the ir-local function "reassigned" to "Reassigned" so that it
can be used as part of inline heuristic analysis. Fix up the header
comment along that way, which had some stale material. Add support for
detecting reassignments via OASOP (as opposed to just simple
assignments).

Updates #61502.

Change-Id: I50f40f81263c0d7f61f30fcf0258f0b0f93acdca
Reviewed-on: https://go-review.googlesource.com/c/go/+/511560
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Than McIntosh <thanm@google.com>
gopherbot pushed a commit to golang/build that referenced this issue Aug 14, 2023
This CL adds a builder and trybot for GOEXPERIMENT=newinliner, which
is currently under development.

Updates golang/go#61502.
Fixes golang/go#61883.

Change-Id: Iefb37ca7c9feca30bb2783170f1a8cdfa9fd02af
Reviewed-on: https://go-review.googlesource.com/c/build/+/517595
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
gopherbot pushed a commit that referenced this issue Sep 6, 2023
…stics

Add code to compute whether a given function appears to
unconditionally call panic or exit, as a means of driving inlining
decisions. Note that this determination is based on
heuristics/guesses, as opposed to strict safety analysis; in some
cases we may miss a function that does indeed always panic, or mark a
function as always invoking panic when it doesn't; the intent is get
the right answer in "most" cases.

Updates #61502.

Change-Id: Ibba3e60c06c2e54cf29b3ffa0f816518aaacb9a3
Reviewed-on: https://go-review.googlesource.com/c/go/+/511558
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Sep 6, 2023
Add code to analyze properties of function result values, specifically
heuristics for cases where we always return allocated memory, always
return the same constant, or always return the same function.

Updates #61502.

Change-Id: I8b0a3295b5be7f7ad4c2d5b9803925aea0639376
Reviewed-on: https://go-review.googlesource.com/c/go/+/511559
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Sep 8, 2023
Add code to analyze properties of function params, specifically
heuristics to look for cases where unmodified params feed into "if"
and "switch" statements in ways that might enable constant folding
and/or dead code elimination if the call were inlined at a callsite
that passes a constant to the correct param. We also look for cases
where a function parameter feeds unmodified into an interface method
call or indirect call.

Updates #61502.

Change-Id: Iaf7297e19637daeabd0ec72be88d654b545546ae
Reviewed-on: https://go-review.googlesource.com/c/go/+/511561
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
gopherbot pushed a commit that referenced this issue Sep 8, 2023
Augment the ir.Inline container to include an entry for function
properties (currently serialized as a string), and if
GOEXPERIMENT=newinliner is set, compute and store function
properties for all inline candidates processed by the inliner.

The idea here is that if the function properties are going to drive
inlining decisions, we'd like to have the same info from non-local /
imported functions as for local / in-package functions, hence we need
to include the properties in the export data.

Hand testing on the compiler itself and with k8s kubelet shows that
this increases the size of export data overall by about 2-3 percent,
so a pretty modest increase.

Updates #61502.

Change-Id: I9d1c311aa8418d02ffea3629c3dd9d8076886d15
Reviewed-on: https://go-review.googlesource.com/c/go/+/511562
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
gopherbot pushed a commit that referenced this issue Sep 8, 2023
Extend the code that computes various properties and parameter flags
to incorporate information from export data in addition to things we
can get from the current package. Specifically:

 - when deciding whether the current function always calls panic/exit,
   check to see whether it has an unconditional call to some other
   function that has that flag.

 - when computing "parameter feeds" properties, look not just for
   cases where a parameter feeds an interesting construct (if/switch,
   indirect/interface call, etc) but where it feeds a call whose
   corresponding param has that flag.

 - when computing return properties, if a given return is always the
   results of a call to X, then set the return properties to those
   of X.

Updates #61502.

Change-Id: I6472fe98759cccad05b8eed58e4fc568201d88ad
Reviewed-on: https://go-review.googlesource.com/c/go/+/511563
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Sep 8, 2023
Build up a table of (potentially) inlinable call sites during inline
heuristic analysis, and introduce a framework for analyzing each call
site to collect applicable flags (for example, is call nested in
loop). This patch doesn't include any of the flag analysis, just the
machinery to collect the callsites and a regression test harness.

Updates #61502.

Change-Id: Ieaf4a008ac9868e9762c63f5b59bd264dc71ab30
Reviewed-on: https://go-review.googlesource.com/c/go/+/511564
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Sep 8, 2023
Add code to detect call sites that are nested in loops, call sites
that are on an unconditional path to panic/exit, and call sites within
"init" functions. The panic-path processing reuses some of the
logic+state already present for the function flag version of "calls
panic/exit".

Updates #61502.

Change-Id: I1d728e0763282d3616a9cbc0a07c5cda115660f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/511565
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit that referenced this issue Sep 8, 2023
Assign scores to callsites based on previously computed function
properties and callsite properties. This currently works by taking the
size score for the function (as computed by CanInline) and then making
a series of adjustments, positive or negative based on various
function and callsite properties.

NB: much work also remaining on deciding what are the best score
adjustment values for specific heuristics. I've picked a bunch of
arbitrary constants, but they will almost certainly need tuning and
tweaking to arrive at something that has good performance.

Updates #61502.

Change-Id: I887403f95e76d7aa2708494b8686c6026861a6ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/511566
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@gopherbot
Copy link

Change https://go.dev/cl/549395 mentions this issue: doc: add release note fragment on inlining changes

gopherbot pushed a commit that referenced this issue Dec 14, 2023
Add some material to the "compiler" portion of the release
notes describing the 1.22 changes to the inliner.

For #61422.
Updates #61502.

Change-Id: Ic7f1cb7f70752446d2465ea3da6bd7488436342b
Reviewed-on: https://go-review.googlesource.com/c/go/+/549395
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@rsc
Copy link
Contributor

rsc commented Jan 24, 2024

@aclements asked me to leave this example here. Lots of code using utf8.DecodeRune and friends has manual checks for the ASCII case to avoid the function call. It would be nice to rewrite utf8.DecodeRune and friends to be "outlined": do the ASCII case and then call an unexported general-case function. Then this form would get inlined and we could delete all the manual optimizations. This doesn't work today.

For example here is what we'd like to write:

func NumSmileys(s []byte) int {
	n := 0
	for i := 0; i < len(s); {
		r, size := utf8.DecodeRune(s[i:])
		if r == '☺' {
			n++
		}
	}
	return n
}

but instead people write things like this:

func NumSmileysFast(s []byte) int {
	n := 0
	for i := 0; i < len(s); {
		var r rune
		var s size
		if s[i] < utf8.RuneSelf {
			r, size  = rune(s[i]), 1
		} else {
			r, size = utf8.DecodeRune(s[i:])
		}
		if r == '☺' {
			n++
		}
	}
	return n
}

If we rewrote utf8.DecodeRune to be:

func DecodeRune(s []byte) (r rune, size int) {
	if len(s) > 0 && s[0] < RuneSelf {
		return rune(s[0]), 1
	}
	return decodeRune(s)
}

then the hope is that inlining would turn the NumSmileys code into the NumSmileysFast code instead of us having to do it.

This fails today for at least two reasons.

First, the inlining budgets penalize function calls too much. The budget is 80 and the cost of a call is 57 (!?), with the result that the rewritten DecodeRune comes in at 87. I would suggest something more like 40: if you are doing an outline-call pattern you can spend half what you normally would.

Second, with the budgets changed, the inlining is not as efficient as the original code. The problem is that s[i:] is still computed before doing the fast path, so there are a bunch of memory operations to make this slice value. It would be nice if somehow the compiler recognized that in

t := s[i:]
if len(t) > 0 && t[0] < RuneSelf {
    return rune(t[0]), 1
}

i < len(s) implies len(t) > 0 and also t[0] is just s[i], so the code could delay materializing t entirely until the actual slowpath call.

Generally speaking, making it not necessary to manually optimize these kinds of loops is a good goal for the new inliner.

@CAFxX
Copy link
Contributor

CAFxX commented Jan 25, 2024

Related: #48195 #31666 #48192

@gopherbot gopherbot modified the milestones: Go1.22, Go1.23 Feb 6, 2024
ezz-no pushed a commit to ezz-no/go-ezzno that referenced this issue Feb 18, 2024
Add some material to the "compiler" portion of the release
notes describing the 1.22 changes to the inliner.

For golang#61422.
Updates golang#61502.

Change-Id: Ic7f1cb7f70752446d2465ea3da6bd7488436342b
Reviewed-on: https://go-review.googlesource.com/c/go/+/549395
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@overstep123
Copy link

Just a small idea: about how to identify hot functions in PGO, whether the total cost of the parent function should be used as the standard (that is, cum_rate), rather than the flat_rate of the child function. The reason is that if the cum_rate of a function is large and the flat_rate is small, then the downstream function of this function will be inline. The benefit will be relatively high (because it is highly probable that this function has many downstream calls, so at least it will save a lot of function call overhead).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Projects
Status: In Progress
Development

No branches or pull requests

10 participants