Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: better const-based optimizations handling in compiler frontend #29095

Open
quasilyte opened this issue Dec 4, 2018 · 12 comments
Open
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Milestone

Comments

@quasilyte
Copy link
Contributor

quasilyte commented Dec 4, 2018

This issue describes a problem, potential solutions and results collected by a prototype implementation.

It will be followed by a go-review CL.

The current state

Some important Go optimization phases happen at the frontend stage where we have annotated AST.

Notable examples:

  1. Inlining.
  2. Escape analysis.
  3. Things like short string comparison optimization.

While the idea of porting everything to SSA backend sounds exciting:

  1. It's not trivial.
  2. It's hard to express some things on the SSA level as it is very low-level and sometimes relevant
    information is erased during the SSA construction.
  3. The SSA backend has its own set of little problems.

My proposal is to extend frontend part a little bit so there is just a bit more information so some useful optimizations
that implemented in frontend can work better. It won't make (potential) transition to SSA harder as the changes are
very focused and affect only a few places.

The problem

The problem I want to address in this document is a loss of const nature of values used through local variables.
If you use constant value directly, frontend recognizes it. If you're using it through never modified local variable,
it does not apply optimizations.

I'll describe two things I was investigating, but there can be a few more places where the proposed solution is applicable.

Problematic case 1: escape analysis for make-slice

(See #24577)

// Can be allocated on stack if "xs" doesn't escape.
xs := make([]int, 10)

// Always escapes (non-const size reason).
n := 10
xs := make([]int, n)

One might argue that this is not useful piece of code.
True, but the code below does essentially the same thing, but is more meaningful:

func allocSlice(length int) []int {
  return make([]int, length)
}

func example() int {
  xs := allocSlice(10)
  return len(xs)
}

allocSlice is inlined. After inlining, the produced code looks like this:

func example() int {
  _length := 10
  _ret := make([]int, _length)
  xs = _ret
  return len(xs)
}

Note that it introduces the additional assignment.
It makes escape analysis to be too conservative there.

If const info is preserved, there could be no allocation at all.

Problematic case 2: short string comparison optimizations

Functions like strings.HasPrefix and strings.HasSuffix also suffer from constness info loss during the inlining.

// This is how HasPrefix is defined in stdlib.

// HasPrefix tests whether the string s begins with prefix.
func HasPrefix(s, prefix string) bool {
	return len(s) >= len(prefix) && s[0:len(prefix)] == prefix
}

Now let's define two routines:

func isComment1(line string) bool {
  return strings.HasPrefix(line, "#")
}

func isComment2(line string) bool {
  return len(line) >= 1 && line[0] == '#'
}

The first version is more readable and idiomatic. And it can be optimized to the isComment2 if we
inline HasPrefix body manually, even without substituting len("#") since it's const expression anyway,
but it won't be as efficient due to intermediate variables reason.

Here is a difference in performance:

name           old time/op  new time/op  delta
HasPrefix/#-8  16.5ns ± 3%   2.8ns ± 8%  -82.98%  (p=0.000 n=10+10)

The amount of generated code also different (for AMD64 it's 105 vs 31 bytes).

Other potentially related issues

Effectively constant value definition

We can consider local variable as effectively constant if:

  1. Its address is never taken.
  2. It is never assigned to.

The closure value capturing mechanism uses somewhat similar definition to figure out whether to capture something
by value or by reference.

Proposed solution

There are 2 parts. One is very simple and solves the simplest case, the other solves inlining-related problem.

  1. Use Node.Name.Defn for effectively constant nodes. So, if they're never assigned to,
    just use constant value in analysis instead of ONAME node inside the optimizations and escape analysis.
    The good news is that it's very trivial to do. First CL includes this step.
    The bad news is that inliner doesn't assign Name.Defn and uses OAS2 nodes.
    So inlining problem remains.
  2. During inlining, assign Node.Name.Defn field for constant initializers.
    More below.
// Statements below are introducing ONAMEs that have Defn bound to them.
// So we already have relevant data for simplest, (1) case.
x := 10
var y = 10

The first step is solved by introducing getConstValue(n *Node) *Node.
When node is passed to it, it looks into n.Name.Defn field and if it's OAS and Right field is constant, returns it.
Otherwise, it returns n itself (It could return nil as well, it's almost irrelevant detail).

In the place where constness-dependent optimization is about to happen, instead of checking the nodes itself, one
queries the const value of the node with getConstValue.

The second step implies that getConstValue can also return appropriate constant values passed into inlined function.

Note that it won't help to handle cases like this:

n1 := 10
n2 := n1
xs := make([]int, n2)

The proposed mechanism is very trivial (or primitive, even), but it covers useful cases and is deterministic, which means programmers can rely on it. It also makes functions like strings.HasPrefix appropriate in a hot-spots so you don't have to inline it manually or write specialization as demonstrated in one of the examples above.

Extra cases

Another example that can be enhanced by getConstValue is OSTR2BYTES.

package str2bytes

type object struct {
	name string
	data []byte
}

func newObject(name, data string) *object {
	return &object{
		name: name,
		data: []byte(data),
	}
}

var sink interface{}

func BenchmarkNewObject(b *testing.B) {
	var o *object
	for i := 0; i < b.N; i++ {
		o = newObject("foo", "123")
	}
	sink = o
}
name         old time/op    new time/op    delta
NewObject-8    96.6ns ± 2%    80.0ns ± 2%  -17.20%  (p=0.000 n=10+10)

name         old alloc/op   new alloc/op   delta
NewObject-8     56.0B ± 0%     51.0B ± 0%   -8.93%  (p=0.000 n=10+10)

Any use of Isconst inside compiler frontend can be a potential use case for getConstValue, but one should
not go too far (it can change compile-time semantics, like reporting out-of-bound access to an array through a variable that is effectively constant, but the spec does not permit that to happen and other spots are something that SSA backend can optimize on its own).

compilebench results

Effect on compilation time is negligible:

name       old time/op       new time/op       delta
Template         353ms ± 1%        354ms ± 1%    ~     (p=0.497 n=9+10)
Unicode          143ms ± 2%        142ms ± 1%    ~     (p=0.050 n=9+9)
GoTypes          1.40s ± 2%        1.40s ± 1%    ~     (p=0.780 n=10+9)
Compiler         6.71s ± 2%        6.69s ± 0%    ~     (p=0.370 n=9+8)
SSA              18.0s ± 1%        17.8s ± 1%  -1.00%  (p=0.004 n=10+10)
Flate            233ms ± 1%        232ms ± 1%    ~     (p=0.123 n=10+10)
GoParser         287ms ± 1%        288ms ± 1%    ~     (p=0.400 n=9+10)
Reflect          792ms ± 2%        787ms ± 0%    ~     (p=0.447 n=10+9)
Tar              324ms ± 0%        326ms ± 1%  +0.49%  (p=0.021 n=8+9)
XML              462ms ± 2%        459ms ± 1%    ~     (p=0.315 n=10+9)
StdCmd           31.8s ± 1%        31.8s ± 1%    ~     (p=0.739 n=10+10)

With changes from initial CL (short string cmp + isSimpleSliceMake) there is a slight improvement in binary sizes
due to better optimization opportunities:

name       old text-bytes    new text-bytes    delta
HelloSize        753kB ± 0%        753kB ± 0%  -0.02%  (p=0.000 n=10+10)
CmdGoSize       10.2MB ± 0%       10.2MB ± 0%  -0.19%  (p=0.000 n=10+10)

name       old exe-bytes     new exe-bytes     delta
HelloSize       1.09MB ± 0%       1.09MB ± 0%    ~     (all equal)
CmdGoSize       14.1MB ± 0%       14.1MB ± 0%  -0.17%  (p=0.000 n=10+10)
@gopherbot
Copy link

Change https://golang.org/cl/152478 mentions this issue: cmd/compile: use Node.Name.Defn in optimizations

@quasilyte quasilyte changed the title cmd/compile: better const cmd/compile: better const-based optimizations handling in compiler frontend Dec 4, 2018
@quasilyte
Copy link
Contributor Author

quasilyte commented Dec 4, 2018

https://golang.org/cl/152478 addresses (1).
The review can wait until freeze ends.

@ALTree ALTree added this to the Go1.13 milestone Dec 4, 2018
@CAFxX
Copy link
Contributor

CAFxX commented Dec 12, 2018

Probably partially related: DCE before inlining removes dead if bodies but not tests (even if constant) #8421 (comment)

@josharian
Copy link
Contributor

What do you mean by “but not tests”?

@josharian
Copy link
Contributor

Oh, you mean the condition of the if statement. Plus the if statement itself. Yeah, we should just fix that. Want to file a new bug so that that fix (which should be fairly straightforward) doesn’t get lost in the discussion of these bugs about other topics?

@mdempsky
Copy link
Member

I'm very nervous about introducing more constant folding back in the Go frontend. We've had way too many correctness issues relating to confusing "Go constants" (i.e., values that are constant according to the Go language specification), and compile-time constants (i.e., expressions that we can determine at compile-time to always evaluate to the same value). We just recently finally went as far as eliminating all constant folding in the front-end that isn't required by the Go spec. This proposal amounts to suggesting we undo that work.

@quasilyte
Copy link
Contributor Author

OK, I don't mind giving up on it.

@mdempsky
Copy link
Member

I just looked at CL 152478, and I think it's worth clarifying something.

I'm nervous about doing more constant folding/propagation during typechecking (which is what I usually think of when I hear "compiler frontend"). We used to do that, and it led to lots of issues like I mentioned.

I'm okay doing constant optimizations that happen entirely after typechecking though. For example, deadcode is called on function bodies after typechecking has finished (at least on that specific function body). Making changes to walk.go (like in CL 152478) is totally fine too. escape.go, inl.go, and order.go are also fair game.

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@smasher164 smasher164 modified the milestones: Backlog, Go1.14 Oct 11, 2019
@thanm thanm modified the milestones: Go1.14, Go1.15 Nov 13, 2019
@ianlancetaylor ianlancetaylor modified the milestones: Go1.15, Backlog May 19, 2020
@quasilyte
Copy link
Contributor Author

quasilyte commented Nov 8, 2021

It looks like the code for inlined functions with constant arguments is still not optimized in a way we may want it to be (checked on Go tip).

I'm interested in coming back to this.
I'll try to take all feedback from above into the account.

Can I start from my-best-effort attempt at this and then rely on your review to make sure that everything is according to the plan, @mdempsky?

@cuonglm
Copy link
Member

cuonglm commented Nov 8, 2021

@quasilyte I think it's better to delay this until after Unified IR is used. It's a big refactoring of the frontend, and will remove current inline/inl.go pass.

@quasilyte
Copy link
Contributor Author

@cuonglm, sounds good to me.

Is there any plan/proposal for it? I found a few "Unified IR" CLs and some mentions in the issues, but no concrete info about it. I would like to learn more about it. :)

@cuonglm
Copy link
Member

cuonglm commented Nov 23, 2021

@cuonglm, sounds good to me.

Is there any plan/proposal for it? I found a few "Unified IR" CLs and some mentions in the issues, but no concrete info about it. I would like to learn more about it. :)

You may want to look at https://go-review.googlesource.com/c/go/+/324573 for the starting. Since that CL, there're many CLs for unified IR.

I don't think there's any proposal for it, maybe we can have more after go1.18 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Projects
Status: Triage Backlog
Development

No branches or pull requests