Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: move tighten after lower #21041

Closed
josharian opened this issue Jul 17, 2017 · 5 comments
Closed

cmd/compile: move tighten after lower #21041

josharian opened this issue Jul 17, 2017 · 5 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Milestone

Comments

@josharian
Copy link
Contributor

The tighten pass is currently scheduled before lower, "to avoid splitting naturally paired instructions such as CMP/SET". Doing tighten later benefits from the removal of values enabled by lowering and lowered CSE. Empirically, moving tighten just before phiTighten generates slightly better code: fewer bytes of instructions on average, and as far I have checked, a pareto improvement on stack use.

Moving tighten after lower will allow us to do this TODO in the code:

			for _, a := range v.Args {
				switch a.Op {
				case OpConst8, OpConst16, OpConst32, OpConst64, OpAddr:
					// Probably foldable into v, don't count as an argument needing a register.
					// TODO: move tighten to a machine-dependent phase and use v.rematerializeable()?

(And as a consequence, probably remove phiTighten, or at least merge it into tighten.)

If we notice problematic split pairs of instructions, like CMP/SET, we can teach tighten explicitly to leave them alone.

cc @randall77 @cherrymui @dr2chase

@josharian josharian added this to the Go1.10 milestone Jul 17, 2017
@bradfitz bradfitz added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jul 17, 2017
josharian added a commit to josharian/go that referenced this issue Aug 12, 2017
@gopherbot
Copy link

Change https://golang.org/cl/55236 mentions this issue: runtime: split declaration and assignment in mapassign

@randall77
Copy link
Contributor

Sounds ok to me.
Normally we don't move 2-input values. But we have an exception where we do move 2-input boolean values, assuming they will become flags-generators and we want to move flag generators up to uses. If we tighten after lower, we can use flags-type instead of boolean-type to drive that decision.

@bradfitz bradfitz modified the milestones: Go1.10, Go1.11 Nov 28, 2017
@randall77
Copy link
Contributor

Another example where this would help:

type T struct {
	a int
	b *int
}
func f(p *T) {
	p.b = nil
}

The write in f requires a write barrier. So there are two blocks that care about the address written, the write-barrier-on block and the write-barrier-off block.

The write-barrier-on block must compute the address to write to, which boils down to the following instruction:

	0x0028 00040 (tmp1.go:9)	LEAQ	8(CX), DI

The write-barrier-off block doesn't compute the address explicitly, instead it folds that computation into the store itself:

	0x0030 00048 (tmp1.go:9)	MOVQ	$0, 8(CX)

So ideally we should put the LEAQ in just the write-barrier-on block.

Unfortunately, tighten happens before lowering, and at that time the address computation is not folded into the store:

b1:
    v8 = OffPtr <**int> [8] v5
    ...
    If v12 -> b2 b3 (unlikely)
b2:
    v13 = WB <mem> {runtime.gcWriteBarrier} v8 v6 v1
b3:
    v14 = Store <mem> {*int} v8 v6 v1

So it looks like both the write-barrier-on block and the write-barrier-off block use v8. So v8 is left in the parent block instead of moving it to the write-barrier-on block. Only after lowering can we see that v8 won't be used in the write-barrier-off block.

@randall77
Copy link
Contributor

That last example is #19853.

@gopherbot
Copy link

Change https://golang.org/cl/95995 mentions this issue: cmd/compile: tighten after lowering

josharian added a commit to josharian/go that referenced this issue May 8, 2018
Moving tighten after lowering benefits from the removal of values by
lowering and lowered CSE. It lets us make better decisions about
which values are rematerializable and which generate flags.
Empirically, it lowers stack usage (by avoiding spills)
and generates slightly smaller and faster binaries.


Fixes golang#19853
Fixes golang#21041

name        old time/op       new time/op       delta
Template          195ms ± 4%        193ms ± 4%  -1.33%  (p=0.000 n=92+97)
Unicode          94.1ms ± 9%       92.5ms ± 8%  -1.66%  (p=0.002 n=97+95)
GoTypes           572ms ± 5%        566ms ± 7%  -0.92%  (p=0.001 n=95+98)
Compiler          2.56s ± 4%        2.52s ± 3%  -1.41%  (p=0.000 n=94+97)
SSA               6.52s ± 2%        6.47s ± 3%  -0.82%  (p=0.000 n=96+94)
Flate             117ms ± 5%        116ms ± 7%  -0.72%  (p=0.018 n=97+97)
GoParser          148ms ± 6%        146ms ± 4%  -0.97%  (p=0.002 n=98+95)
Reflect           370ms ± 7%        363ms ± 6%  -1.79%  (p=0.000 n=99+98)
Tar               175ms ± 6%        173ms ± 6%  -1.11%  (p=0.001 n=94+95)
XML               204ms ± 6%        201ms ± 5%  -1.49%  (p=0.000 n=97+96)
[Geo mean]        363ms             359ms       -1.22%

name        old user-time/op  new user-time/op  delta
Template          251ms ± 5%        245ms ± 5%  -2.40%  (p=0.000 n=97+93)
Unicode           131ms ±10%        128ms ± 9%  -1.93%  (p=0.001 n=100+99)
GoTypes           760ms ± 4%        752ms ± 4%  -0.96%  (p=0.000 n=97+95)
Compiler          3.51s ± 3%        3.48s ± 2%  -1.04%  (p=0.000 n=96+95)
SSA               9.57s ± 4%        9.52s ± 2%  -0.50%  (p=0.004 n=97+96)
Flate             149ms ± 6%        147ms ± 6%  -1.46%  (p=0.000 n=98+96)
GoParser          184ms ± 5%        181ms ± 7%  -1.84%  (p=0.000 n=98+97)
Reflect           469ms ± 6%        461ms ± 6%  -1.69%  (p=0.000 n=100+98)
Tar               219ms ± 8%        217ms ± 7%  -0.90%  (p=0.035 n=96+96)
XML               255ms ± 5%        251ms ± 6%  -1.48%  (p=0.000 n=98+98)
[Geo mean]        476ms             469ms       -1.42%

name        old alloc/op      new alloc/op      delta
Template         37.8MB ± 0%       37.8MB ± 0%  -0.17%  (p=0.000 n=100+100)
Unicode          28.8MB ± 0%       28.8MB ± 0%  -0.02%  (p=0.000 n=100+95)
GoTypes           112MB ± 0%        112MB ± 0%  -0.20%  (p=0.000 n=100+97)
Compiler          466MB ± 0%        464MB ± 0%  -0.27%  (p=0.000 n=100+100)
SSA              1.49GB ± 0%       1.49GB ± 0%  -0.08%  (p=0.000 n=100+99)
Flate            24.4MB ± 0%       24.3MB ± 0%  -0.25%  (p=0.000 n=98+99)
GoParser         30.7MB ± 0%       30.6MB ± 0%  -0.26%  (p=0.000 n=99+100)
Reflect          76.4MB ± 0%       76.4MB ± 0%    ~     (p=0.253 n=100+100)
Tar              38.9MB ± 0%       38.8MB ± 0%  -0.20%  (p=0.000 n=100+97)
XML              41.5MB ± 0%       41.4MB ± 0%  -0.19%  (p=0.000 n=100+98)
[Geo mean]       77.5MB            77.4MB       -0.16%

name        old allocs/op     new allocs/op     delta
Template           381k ± 0%         381k ± 0%  -0.15%  (p=0.000 n=100+100)
Unicode            342k ± 0%         342k ± 0%  -0.01%  (p=0.000 n=100+98)
GoTypes           1.19M ± 0%        1.18M ± 0%  -0.24%  (p=0.000 n=100+100)
Compiler          4.52M ± 0%        4.50M ± 0%  -0.29%  (p=0.000 n=100+100)
SSA               12.3M ± 0%        12.3M ± 0%  -0.11%  (p=0.000 n=100+100)
Flate              234k ± 0%         234k ± 0%  -0.26%  (p=0.000 n=99+96)
GoParser           318k ± 0%         317k ± 0%  -0.21%  (p=0.000 n=99+100)
Reflect            974k ± 0%         974k ± 0%  -0.03%  (p=0.000 n=100+100)
Tar                392k ± 0%         391k ± 0%  -0.17%  (p=0.000 n=100+99)
XML                404k ± 0%         403k ± 0%  -0.24%  (p=0.000 n=99+99)
[Geo mean]         794k              792k       -0.17%

name        old object-bytes  new object-bytes  delta
Template          393kB ± 0%        392kB ± 0%  -0.19%  (p=0.008 n=5+5)
Unicode           207kB ± 0%        207kB ± 0%    ~     (all equal)
GoTypes          1.23MB ± 0%       1.22MB ± 0%  -0.11%  (p=0.008 n=5+5)
Compiler         4.34MB ± 0%       4.33MB ± 0%  -0.15%  (p=0.008 n=5+5)
SSA              9.85MB ± 0%       9.85MB ± 0%  -0.07%  (p=0.008 n=5+5)
Flate             235kB ± 0%        234kB ± 0%  -0.59%  (p=0.008 n=5+5)
GoParser          297kB ± 0%        296kB ± 0%  -0.22%  (p=0.008 n=5+5)
Reflect          1.03MB ± 0%       1.03MB ± 0%  -0.00%  (p=0.008 n=5+5)
Tar               332kB ± 0%        331kB ± 0%  -0.15%  (p=0.008 n=5+5)
XML               413kB ± 0%        412kB ± 0%  -0.19%  (p=0.008 n=5+5)
[Geo mean]        728kB             727kB       -0.17%

Change-Id: I9b5cdb668ed102a001897a05e833105acba220a2
@golang golang locked and limited conversation to collaborators Feb 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Projects
None yet
Development

No branches or pull requests

4 participants