Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: branch elimination opportunites when comparing constants #37608

Open
laboger opened this issue Mar 2, 2020 · 23 comments
Open

cmd/compile: branch elimination opportunites when comparing constants #37608

laboger opened this issue Mar 2, 2020 · 23 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@laboger
Copy link
Contributor

laboger commented Mar 2, 2020

What version of Go are you using (go version)?

latest

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
linux/ppc64le but I see it in the SSA on x86

What did you do?

Inspecting code for other reasons, found cases where there are constant compares and subsequent branches against those constants, resulting in several unnecessary branches.

What did you expect to see?

No unnecessary branches.

What did you see instead?

Unnecessary branching due to constant compares of known values.

This example comes from the ssa output of bytes.(Buffer).WriteByte. Here are some snippets:

v18 00011 (+107) MOVD 8(R6), R7
v38 00012 (107) MOVD 16(R6), R5
v26 00013 (107) SUB R7, R5, R8
v70 00014 (107) CMP R8, $1
// This could be BLT 36. Block starting at 46 sets R3 to 0 then branches to a CMP R3, $0.
b1 00015 (107) BLT 46 <--- Change to BLT 36

v54 00019 (108) MOVD R4, 8(R6)
// Once the block at 46 is eliminated as a predecessor of this block, then the next 3 instructions are not needed, since the CMPW has only a single predecessor and it is comparing 0 against 1 and the BEQ will never happen.
v45 00020 (108) MOVD $1, R3 <---- Remove
v52 00021 (+265) CMPW R3, $0 <---- Remove
b9 00022 (+266) BEQ 36 <---- Remove

v24 00023 (+269) PCDATA $1, $1
v24 00024 (+269) MOVD 8(R6), R4
v34 00025 (+269) PCDATA $0, $2
v34 00026 (269) MOVD (R6), R5
v88 00027 (269) CMPU R4, R7

v63 00036 (267) PCDATA $1, $0
v63 00037 (+267) MOVD R6, 32(R1)
v81 00038 (267) MOVD $1, R3
v65 00039 (267) MOVD R3, 40(R1)
v66 00040 (+267) CALL "".(*Buffer).grow(SB)
v68 00041 (267) MOVD 48(R1), R7
v36 00042 (269) PCDATA $0, $1
v36 00043 (269) PCDATA $1, $1
v36 00044 (269) MOVD "".b(R1), R6
b11 00045 (267) JMP 23

// The MOVD $0, R3 and JMP 21 could be replaced by JMP 36 since the successor is CMP $0, R3 followed by BEQ 36. Once it is decided that 36 is the successor, I think the MOVD $0, R7 could also go away since the block starting at 36 doesn't use it. That also means the predecessor of this block could branch directly to 36.
v21 00046 (267) PCDATA $1, $0 <--- Remove
v21 00047 (267) MOVD $0, R7 <--- Remove
v25 00048 (267) MOVD $0, R3 <--- Remove
b2 00049 (+265) JMP 21 <---- Remove

This scenario is common in stdlib functions. It looks like the same situation occurs in x86, so it is not specific to ppc64/ppc64le.

@josharian I think I reported this problem last fall and you commented on it, but I can't find the issue where that was done.

@randall77
Copy link
Contributor

Parts of this sound like the shortcircuit pass. Is that working correctly here?

@randall77 randall77 added this to the Unplanned milestone Mar 2, 2020
@josharian
Copy link
Contributor

Here is the email that I wrote to @dr2chase this morning. It looks apropos. (Note that the situation I describe in the email occurs in block b9 in the shortcircuit pass of GOSSAFUNC="(*Buffer).WriteByte" go build bytes.)

Hi!

You know way more than me about this stuff, so I have a question for you...

I was investigating why CL 220684 causes cmd/compile to shrink. It
turns out that it enables the shortcircuit pass to work on a few
functions, and that that causes significant (>5% text size)
improvements.

So I went to look at beefing up the shortcircuit pass. As a reminder,
we're dealing with situations like:

p1         p2         (maybe p3, p4, etc.)
  \         /
   \       /
       b
       phi (true, v1)
       if phi s1 else s2
     /     \
    /       \
s1          s2

Since going from p1 to b guarantees that we'll then go to s1, we
rewrite the p1 outbound branch to go straight to s1.

This currently requires that the phi be the only value in b, and that
it have only one use. That way we can disregard it after the CFG
change.

The case I am looking at is one in which there are also other phis in
b. This isn't necessarily a lost cause, depending on where the other
uses of those phis are.

If the use is dominated by s2, but not reachable from s1 without going
through b, then we know that we went from b to s2. And we know that if
we came in to b from p1, then we would have gone from b to s1 instead.
So we know that we came in to b from p2. And therefore we know the
value of the phi is its second arg.

If the use is dominated by s1, but not reachable from s2 without going
through b, then we know that we went from b to s1. And thus, mutatis
mutandis, that we arrived at s1 through the same predecessor we would
have arrived at b though. Therefore we can move the phi value from b
to s1.

So if there aren't any phi uses whose provenance are ambiguous, then
we can rewrite all phi uses, and thus still do the shortcircuit
optimization of moving the edge.

My questions are:

If there a name for the CFG property of being dominated by s2 but not
reachable by s1 without going through b? (It's sort of like being
dominated by an edge rather than a node.) Is there a nice data
structure for calculating it? Or should I be thinking about this in a
different way?

Sorry for the long email. This stuff is easier with a whiteboard. :(

Thanks!

-josh

@dr2chase
Copy link
Contributor

dr2chase commented Mar 2, 2020

Sorry not to reply earlier.
I don't know of a name for this offhand, but there could be one.
I might want to do a critical-edges transformation to make it easier to reason about, that way "dominated by an edge" is also "dominated by a node", and we know that s1 does not dominate s2 and vice-versa -- which gives us that each use is either

  1. dominated by s1; move xb := phi(x1,x2) to s1.
  2. dominated by s2; s/xb/x2
  3. dominated by neither, in code that is properly dominated by b
  4. dominated by neither, input to a phi that is not properly dominated by b (might be b).

I think case 3 is potentially a mess, it might require a lot of phi insertion depending on the shape of the graph.

Case 4 depends on the source of the input flow edge; it is some block, either falling into case 1, 2, or 3.
1 and 2 are easy, 3 is a mess.

Slightly off-topic, I recall recently seeing a formulation for SSA where blocks with phis are like "functions" and their predecessors "call" them, passing parameters appropriately. This simplifies the description of case 4 above because it is converted to 1, 2, or 3, as appropriate.

@josharian
Copy link
Contributor

Sorry, David. I didn’t mean to imply that I had a sub-24-hour SLA on random personal emails asking for help!

The idea of doing a critical edge removal pass first is an intriguing one, thanks!

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 3, 2020
@josharian
Copy link
Contributor

I'm working on this.

@mariecurried
Copy link

@laboger, where you referring to #33713 in the initial message?

@laboger
Copy link
Contributor Author

laboger commented Mar 11, 2020

@laboger, where you referring to #33713 in the initial message?

Yes. Not sure if they are the same, i.e., will the same fix resolve both? BTW, this does occur quite a bit in the stdlib functions. I saw it several times without looking too hard.

@josharian
Copy link
Contributor

This is proving to be a rich source of optimizations. The challenge is that every time I try a deeper analysis, it catches more cases and is also slower. Soon (I hope today) I'll start mailing some of the cheap should-definitely-do-this cases, and we can work incrementally from there.

@gopherbot
Copy link

Change https://golang.org/cl/222917 mentions this issue: cmd/compile: rename a local variable in shortcircuitBlock

@gopherbot
Copy link

Change https://golang.org/cl/222918 mentions this issue: cmd/compile: remove loop in shortcircuit

@gopherbot
Copy link

Change https://golang.org/cl/222919 mentions this issue: cmd/compile: more minor cleanup in shortcircuitBlock

@gopherbot
Copy link

Change https://golang.org/cl/222923 mentions this issue: cmd/compile: handle some additional phis in shortcircuit

gopherbot pushed a commit that referenced this issue Mar 13, 2020
v is pretty generic. Subsequent changes will make this function
more complicated, so rename it now, independently, for easier review.

v is the control value for the block (or its underlying phi);
call it ctl.

Passes toolstash-check.

Updates #37608

Change-Id: I3fbae3344f1c95aff0a69c1e4f61ef637a54774e
Reviewed-on: https://go-review.googlesource.com/c/go/+/222917
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
gopherbot pushed a commit that referenced this issue Mar 13, 2020
shortcircuitBlock contained a loop to handle blocks like

b: <- p q
  v = Phi true false
If v -> t u

in a single execution.
This change makes shortcircuitBlock do it in two instead,
one for each constant phi arg.

Motivation: Upcoming changes will expand the range of
blocks that the shortcircuit pass can handle.
Those changes need to understand what the CFG
will look like after the rewrite in shortcircuitBlock.
Making shortcircuitBlock do only a single CFG
modification at a time significantly simplifies that code.

In theory, this is less efficient, but not measurably so.
There is minor, unimportant churn in the generated code.

Updates #37608

Change-Id: Ia6dce7011e3e19b546ed1e176bd407575a0ab837
Reviewed-on: https://go-review.googlesource.com/c/go/+/222918
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
gopherbot pushed a commit that referenced this issue Mar 13, 2020
Continue to simplify, rename for clarity,
improve docs, and reduce variable scope.

This is in preparation for this function becoming
more complicated.

Passes toolstash-check.

Updates #37608

Change-Id: I630a4e07c92297c46d18aea69ec29852d6371ff0
Reviewed-on: https://go-review.googlesource.com/c/go/+/222919
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
gopherbot pushed a commit that referenced this issue Apr 8, 2020
Prior to this change, the shortcircuit pass could only
handle blocks containing only a single phi control value,
possibly wrapped in some OpNot and OpCopy values.

This change partially lifts this limitation.
It handles some cases in which the block contains other phi values.
This appears to happen most commonly in cases in which
the conditionals being checked involve the memory state,
in which case there is a phi memory value in the block.

The general idea here is to use the information we have about
the CFG to (1) move the other phi values into other blocks
and/or (2) rewrite uses of the other phi values in other blocks.

For example, consider this CFG:

p   q
 \ /
  b
 / \
t   u

And consider a phi value v in block b.
We'll write v = Phi(p: x, q: y) to say that v has value x corresponding
to inbound block p, and value y for block q.

We will rewrite this CFG to:

p    q
|   /
|  b
|/  \
t    u

What should we do with v?

Any uses of v in u can be replaced with y. Why?
If we are in block u, we came from b, and before that from q.
If prior to b we came from p, then we would have gone to t, not u.
Since we came from q, we know that v took the value y.

Uses of v in t are a bit more complicated.
It is going to end up being a phi value: Phi(p: ?, b: ?).

Suppose, after the rewrite, we came from block p.
Then, before the rewrite, we would have gone to b,
where v would have the value x.
So we have Phi(p: x, b: ?).

Suppose, after the rewrite, we came from block b.
Then we must have come from block q.
If we come from block q, v has value y.
So we have Phi(p: x, b: y).
Uses of v in t can thus be replaced with a new phi value,
with the same values as v, but with altered predecessors.

Similar reasoning can be employed to rewrite or replace
other uses of v elsewhere in the CFG, so that v itself can be eliminated,
and the CFG rewrite can proceed.

This change sets up the infrastructure for such optimizations
and adds a few cheap ones. All optimizations in this change depend
only on the shape of the CFG; future changes may also depend on where
v's uses are. That analysis is more powerful but more expensive,
and should be done incrementally.

The use of closures here is perhaps a bit unusual,
but during development it proved critical to having readable code.
We must decide early on whether we can safely do the CFG modifications,
and then later fix up the phis if so.
Safely storing state and decisions across these two phases is hard to do readably.
Closures solve the problem neatly.

I manually instrumented the code paths in shortcircuitPhiPlan.
During make.bash there are nearly 6000 invocations.
The least-visited code path gets run 85 times,
so all the code in this CL is reasonably well-exercised.

Here is a concrete example of code improved by this change:

func f(e interface{}) int {
	if x, ok := e.(int); ok {
		return x
	}
	return 0
}

Omitting PCDATA, FUNCDATA, and the like, it used to compile to:

"".f STEXT nosplit size=50 args=0x18 locals=0x0
	0x0000 00000 (x.go:4)	LEAQ	type.int(SB), AX
	0x0007 00007 (x.go:4)	MOVQ	"".e+8(SP), CX
	0x000c 00012 (x.go:4)	CMPQ	AX, CX
	0x000f 00015 (x.go:4)	JNE	43
	0x0011 00017 (x.go:4)	MOVQ	"".e+16(SP), AX
	0x0016 00022 (x.go:4)	MOVQ	(AX), AX
	0x0019 00025 (x.go:4)	JNE	33
	0x001b 00027 (x.go:5)	MOVQ	AX, "".~r1+24(SP)
	0x0020 00032 (x.go:5)	RET
	0x0021 00033 (x.go:7)	MOVQ	$0, "".~r1+24(SP)
	0x002a 00042 (x.go:7)	RET
	0x002b 00043 (x.go:7)	MOVL	$0, AX
	0x0030 00048 (x.go:4)	JMP	25

Afterwards, it compiles to:

"".f STEXT nosplit size=41 args=0x18 locals=0x0
	0x0000 00000 (x.go:4)	LEAQ	type.int(SB), AX
	0x0007 00007 (x.go:4)	MOVQ	"".e+8(SP), CX
	0x000c 00012 (x.go:4)	CMPQ	AX, CX
	0x000f 00015 (x.go:4)	JNE	31
	0x0011 00017 (x.go:4)	MOVQ	"".e+16(SP), AX
	0x0016 00022 (x.go:4)	MOVQ	(AX), AX
	0x0019 00025 (x.go:5)	MOVQ	AX, "".~r1+24(SP)
	0x001e 00030 (x.go:5)	RET
	0x001f 00031 (x.go:7)	MOVQ	$0, "".~r1+24(SP)
	0x0028 00040 (x.go:7)	RET

Note that there is now only a single JNE and a single RET $0 path.

Updates #37608

Has a minor good effect on compilation speed and memory use.

Provides widespread improvements to generated code.
The rare, minor regressions I have investigated are due to
register allocation fluctuations.

file      before    after     Δ       %       
addr2line 4376080   4371984   -4096   -0.094% 
api       5945400   5933112   -12288  -0.207% 
asm       5034312   5030216   -4096   -0.081% 
buildid   2844952   2840856   -4096   -0.144% 
cgo       4812872   4804680   -8192   -0.170% 
compile   19622064  19610368  -11696  -0.060% 
cover     5236648   5232552   -4096   -0.078% 
dist      3658312   3654216   -4096   -0.112% 
doc       4653512   4649416   -4096   -0.088% 
fix       3370072   3365976   -4096   -0.122% 
link      6671864   6667768   -4096   -0.061% 
pprof     14781652  14761172  -20480  -0.139% 
trace     11639684  11627396  -12288  -0.106% 
vet       8252280   8231800   -20480  -0.248% 
total     115052984 114934792 -118192 -0.103% 


file                                                                     before   after    Δ       %       
internal/cpu.s                                                           3298     3296     -2      -0.061% 
internal/bytealg.s                                                       1730     1737     +7      +0.405% 
cmd/vendor/golang.org/x/mod/semver.s                                     7332     7283     -49     -0.668% 
image/color.s                                                            8248     8156     -92     -1.115% 
math.s                                                                   35966    35956    -10     -0.028% 
math/cmplx.s                                                             6596     6575     -21     -0.318% 
runtime.s                                                                480566   480053   -513    -0.107% 
sync.s                                                                   16408    16385    -23     -0.140% 
math/rand.s                                                              10447    10406    -41     -0.392% 
internal/reflectlite.s                                                   28408    28366    -42     -0.148% 
errors.s                                                                 2736     2701     -35     -1.279% 
sort.s                                                                   17031    17036    +5      +0.029% 
io.s                                                                     16993    16964    -29     -0.171% 
container/heap.s                                                         2006     1997     -9      -0.449% 
text/tabwriter.s                                                         9570     9552     -18     -0.188% 
bytes.s                                                                  31823    31594    -229    -0.720% 
strconv.s                                                                52760    52717    -43     -0.082% 
vendor/golang.org/x/text/transform.s                                     16713    16706    -7      -0.042% 
strings.s                                                                42590    42563    -27     -0.063% 
bufio.s                                                                  22883    22785    -98     -0.428% 
encoding/base32.s                                                        9586     9531     -55     -0.574% 
syscall.s                                                                82237    82243    +6      +0.007% 
image.s                                                                  37465    37452    -13     -0.035% 
regexp/syntax.s                                                          82827    82769    -58     -0.070% 
image/draw.s                                                             18698    18584    -114    -0.610% 
image/jpeg.s                                                             36560    36549    -11     -0.030% 
time.s                                                                   82557    82526    -31     -0.038% 
context.s                                                                10863    10820    -43     -0.396% 
regexp.s                                                                 64114    64049    -65     -0.101% 
os.s                                                                     51751    51524    -227    -0.439% 
reflect.s                                                                168240   168049   -191    -0.114% 
cmd/go/internal/lockedfile/internal/filelock.s                           2317     2290     -27     -1.165% 
path/filepath.s                                                          17831    17766    -65     -0.365% 
io/ioutil.s                                                              6994     6990     -4      -0.057% 
encoding/binary.s                                                        30791    30726    -65     -0.211% 
cmd/vendor/golang.org/x/sys/unix.s                                       78055    78033    -22     -0.028% 
encoding/pem.s                                                           9280     9247     -33     -0.356% 
crypto/cipher.s                                                          20376    20374    -2      -0.010% 
os/exec.s                                                                29229    29140    -89     -0.304% 
internal/goroot.s                                                        4588     4579     -9      -0.196% 
cmd/internal/browser.s                                                   2246     2240     -6      -0.267% 
cmd/vendor/golang.org/x/crypto/ssh/terminal.s                            27183    27149    -34     -0.125% 
fmt.s                                                                    76625    76484    -141    -0.184% 
encoding/hex.s                                                           6154     6152     -2      -0.032% 
compress/lzw.s                                                           7063     7059     -4      -0.057% 
database/sql/driver.s                                                    18875    18862    -13     -0.069% 
debug/plan9obj.s                                                         8268     8266     -2      -0.024% 
net/url.s                                                                29724    29719    -5      -0.017% 
encoding/csv.s                                                           12872    12856    -16     -0.124% 
debug/gosym.s                                                            25303    25268    -35     -0.138% 
compress/flate.s                                                         50952    51019    +67     +0.131% 
compress/zlib.s                                                          7277     7266     -11     -0.151% 
archive/zip.s                                                            42155    42111    -44     -0.104% 
debug/dwarf.s                                                            107632   107541   -91     -0.085% 
database/sql.s                                                           98373    98028    -345    -0.351% 
os/user.s                                                                14722    14708    -14     -0.095% 
encoding/json.s                                                          105836   105711   -125    -0.118% 
debug/macho.s                                                            32598    32560    -38     -0.117% 
encoding/gob.s                                                           136478   135755   -723    -0.530% 
debug/pe.s                                                               31160    30869    -291    -0.934% 
debug/elf.s                                                              63495    63302    -193    -0.304% 
vendor/golang.org/x/text/unicode/bidi.s                                  27220    27217    -3      -0.011% 
vendor/golang.org/x/text/secure/bidirule.s                               3363     3352     -11     -0.327% 
go/token.s                                                               12036    12035    -1      -0.008% 
flag.s                                                                   22277    22256    -21     -0.094% 
mime.s                                                                   39696    39509    -187    -0.471% 
go/scanner.s                                                             19033    19020    -13     -0.068% 
archive/tar.s                                                            70936    70581    -355    -0.500% 
internal/xcoff.s                                                         22823    22820    -3      -0.013% 
text/scanner.s                                                           11631    11629    -2      -0.017% 
encoding/xml.s                                                           110534   110408   -126    -0.114% 
math/big.s                                                               183636   183545   -91     -0.050% 
image/gif.s                                                              27376    27343    -33     -0.121% 
crypto/dsa.s                                                             6029     5969     -60     -0.995% 
image/png.s                                                              42947    42939    -8      -0.019% 
crypto/rand.s                                                            6866     6854     -12     -0.175% 
vendor/golang.org/x/text/unicode/norm.s                                  66394    66354    -40     -0.060% 
runtime/trace.s                                                          2603     2521     -82     -3.150% 
crypto/ed25519.s                                                         6321     6300     -21     -0.332% 
text/template/parse.s                                                    93910    93844    -66     -0.070% 
crypto/rsa.s                                                             31460    31369    -91     -0.289% 
encoding/asn1.s                                                          57021    57023    +2      +0.004% 
crypto/elliptic.s                                                        51382    51363    -19     -0.037% 
crypto/x509/pkix.s                                                       10386    10342    -44     -0.424% 
vendor/golang.org/x/net/idna.s                                           24482    24466    -16     -0.065% 
vendor/golang.org/x/crypto/cryptobyte.s                                  33479    33280    -199    -0.594% 
crypto/ecdsa.s                                                           11936    11883    -53     -0.444% 
go/constant.s                                                            43670    42663    -1007   -2.306% 
go/ast.s                                                                 80383    80191    -192    -0.239% 
testing.s                                                                68069    68057    -12     -0.018% 
runtime/pprof.s                                                          59613    59603    -10     -0.017% 
testing/iotest.s                                                         4895     4891     -4      -0.082% 
internal/trace.s                                                         78136    78089    -47     -0.060% 
cmd/internal/goobj2.s                                                    13158    13154    -4      -0.030% 
cmd/internal/src.s                                                       17661    17657    -4      -0.023% 
go/parser.s                                                              79046    78880    -166    -0.210% 
cmd/internal/objabi.s                                                    16367    16343    -24     -0.147% 
text/template.s                                                          94899    94486    -413    -0.435% 
go/printer.s                                                             77267    76992    -275    -0.356% 
cmd/internal/goobj.s                                                     25988    25947    -41     -0.158% 
runtime/pprof/internal/profile.s                                         102066   101933   -133    -0.130% 
go/format.s                                                              5419     5371     -48     -0.886% 
cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s                            37181    37149    -32     -0.086% 
go/doc.s                                                                 74533    74132    -401    -0.538% 
html/template.s                                                          88743    88389    -354    -0.399% 
cmd/asm/internal/lex.s                                                   24881    24872    -9      -0.036% 
cmd/internal/buildid.s                                                   18263    18256    -7      -0.038% 
cmd/vendor/golang.org/x/arch/x86/x86asm.s                                80036    79980    -56     -0.070% 
go/build.s                                                               68905    68737    -168    -0.244% 
cmd/cover.s                                                              46070    45950    -120    -0.260% 
cmd/internal/obj.s                                                       117001   116991   -10     -0.009% 
cmd/doc.s                                                                62700    62419    -281    -0.448% 
cmd/internal/obj/arm.s                                                   66745    66687    -58     -0.087% 
cmd/compile/internal/syntax.s                                            145406   145062   -344    -0.237% 
cmd/internal/obj/wasm.s                                                  44049    44027    -22     -0.050% 
net.s                                                                    291835   291020   -815    -0.279% 
cmd/dist.s                                                               209020   208807   -213    -0.102% 
cmd/cgo.s                                                                241564   241102   -462    -0.191% 
vendor/golang.org/x/net/http/httpproxy.s                                 9407     9399     -8      -0.085% 
log/syslog.s                                                             7921     7909     -12     -0.151% 
go/types.s                                                               319325   317513   -1812   -0.567% 
vendor/golang.org/x/net/http/httpguts.s                                  3834     3825     -9      -0.235% 
mime/multipart.s                                                         21414    21343    -71     -0.332% 
cmd/internal/obj/ppc64.s                                                 119949   119938   -11     -0.009% 
cmd/compile/internal/logopt.s                                            10158    10118    -40     -0.394% 
vendor/golang.org/x/net/nettest.s                                        28012    27991    -21     -0.075% 
go/internal/srcimporter.s                                                6405     6380     -25     -0.390% 
go/internal/gcimporter.s                                                 34525    34493    -32     -0.093% 
net/mail.s                                                               23937    23720    -217    -0.907% 
go/internal/gccgoimporter.s                                              56095    56038    -57     -0.102% 
cmd/compile/internal/types.s                                             47247    47207    -40     -0.085% 
cmd/api.s                                                                39582    39558    -24     -0.061% 
cmd/go/internal/base.s                                                   12572    12551    -21     -0.167% 
cmd/vendor/golang.org/x/xerrors.s                                        17846    17814    -32     -0.179% 
cmd/vendor/golang.org/x/mod/sumdb/note.s                                 18142    18070    -72     -0.397% 
cmd/go/internal/search.s                                                 19994    19876    -118    -0.590% 
cmd/go/internal/imports.s                                                16457    16428    -29     -0.176% 
cmd/vendor/golang.org/x/mod/module.s                                     17838    17759    -79     -0.443% 
cmd/go/internal/cache.s                                                  30551    30514    -37     -0.121% 
cmd/vendor/golang.org/x/mod/sumdb/tlog.s                                 36356    36321    -35     -0.096% 
cmd/internal/test2json.s                                                 9452     9408     -44     -0.466% 
cmd/go/internal/mvs.s                                                    25136    25092    -44     -0.175% 
cmd/go/internal/txtar.s                                                  3488     3461     -27     -0.774% 
cmd/vendor/golang.org/x/mod/zip.s                                        18811    18800    -11     -0.058% 
cmd/go/internal/version.s                                                11213    11171    -42     -0.375% 
cmd/link/internal/benchmark.s                                            4941     4949     +8      +0.162% 
cmd/internal/obj/s390x.s                                                 126865   126849   -16     -0.013% 
cmd/gofmt.s                                                              30684    30596    -88     -0.287% 
cmd/fix.s                                                                87450    86906    -544    -0.622% 
cmd/internal/obj/x86.s                                                   88578    88556    -22     -0.025% 
cmd/vendor/golang.org/x/mod/modfile.s                                    72450    72363    -87     -0.120% 
cmd/oldlink/internal/loader.s                                            16743    16741    -2      -0.012% 
cmd/pack.s                                                               14863    14861    -2      -0.013% 
cmd/go/internal/load.s                                                   106742   106568   -174    -0.163% 
cmd/oldlink/internal/objfile.s                                           21787    21780    -7      -0.032% 
cmd/oldlink/internal/loadmacho.s                                         29309    29317    +8      +0.027% 
cmd/oldlink/internal/loadelf.s                                           35013    35021    +8      +0.023% 
cmd/asm/internal/asm.s                                                   68550    68538    -12     -0.018% 
cmd/link/internal/loader.s                                               94765    94564    -201    -0.212% 
cmd/link/internal/loadelf.s                                              35663    35667    +4      +0.011% 
cmd/link/internal/loadmacho.s                                            29501    29509    +8      +0.027% 
cmd/vendor/golang.org/x/tools/go/analysis.s                              4983     4976     -7      -0.140% 
cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s       16771    16709    -62     -0.370% 
cmd/vendor/golang.org/x/tools/go/types/objectpath.s                      18481    18456    -25     -0.135% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2100     2085     -15     -0.714% 
cmd/vendor/github.com/google/pprof/profile.s                             150141   149620   -521    -0.347% 
cmd/vendor/github.com/google/pprof/internal/measurement.s                10420    10404    -16     -0.154% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s               36814    36755    -59     -0.160% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s                 6688     6673     -15     -0.224% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s               9856     9784     -72     -0.731% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/composite.s             3011     2979     -32     -1.063% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s              9737     9682     -55     -0.565% 
cmd/vendor/golang.org/x/tools/go/cfg.s                                   30738    30725    -13     -0.042% 
cmd/vendor/github.com/ianlancetaylor/demangle.s                          175195   174513   -682    -0.389% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/httpresponse.s          3625     3520     -105    -2.897% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s           2987     2971     -16     -0.536% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s                 4372     4340     -32     -0.732% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s            8634     8611     -23     -0.266% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s                 6189     6164     -25     -0.404% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s             8089     8073     -16     -0.198% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unsafeptr.s             2208     2177     -31     -1.404% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s           8050     8047     -3      -0.037% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s          3665     3629     -36     -0.982% 
cmd/vendor/golang.org/x/tools/go/ast/astutil.s                           65773    65680    -93     -0.141% 
cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s                  13328    13286    -42     -0.315% 
cmd/vendor/golang.org/x/tools/go/types/typeutil.s                        12263    12162    -101    -0.824% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s              1459     1421     -38     -2.605% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s              5208     5191     -17     -0.326% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/unmarshal.s             1801     1782     -19     -1.055% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s            9569     9528     -41     -0.428% 
cmd/go/internal/work.s                                                   304928   304756   -172    -0.056% 
crypto/x509.s                                                            147340   147139   -201    -0.136% 
cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s                34287    34019    -268    -0.782% 
crypto/tls.s                                                             311603   310644   -959    -0.308% 
cmd/oldlink/internal/ld.s                                                533115   532651   -464    -0.087% 
cmd/oldlink/internal/wasm.s                                              16484    16458    -26     -0.158% 
cmd/oldlink/internal/x86.s                                               18832    18830    -2      -0.011% 
cmd/link/internal/ld.s                                                   548200   547626   -574    -0.105% 
cmd/link/internal/wasm.s                                                 16760    16734    -26     -0.155% 
cmd/link/internal/arm64.s                                                20850    20840    -10     -0.048% 
cmd/link/internal/x86.s                                                  17437    17435    -2      -0.011% 
net/http.s                                                               556647   555519   -1128   -0.203% 
net/http/cookiejar.s                                                     15849    15833    -16     -0.101% 
expvar.s                                                                 9521     9508     -13     -0.137% 
net/http/httptest.s                                                      16471    16452    -19     -0.115% 
cmd/vendor/github.com/google/pprof/internal/plugin.s                     4266     4264     -2      -0.047% 
net/http/cgi.s                                                           23448    23428    -20     -0.085% 
cmd/go/internal/web.s                                                    16472    16428    -44     -0.267% 
net/http/httputil.s                                                      39672    39670    -2      -0.005% 
net/rpc.s                                                                33989    33965    -24     -0.071% 
net/http/fcgi.s                                                          19167    19162    -5      -0.026% 
cmd/vendor/github.com/google/pprof/internal/symbolz.s                    5861     5857     -4      -0.068% 
cmd/vendor/github.com/google/pprof/internal/binutils.s                   35842    35823    -19     -0.053% 
cmd/vendor/github.com/google/pprof/internal/symbolizer.s                 11449    11404    -45     -0.393% 
cmd/go/internal/get.s                                                    62726    62582    -144    -0.230% 
cmd/vendor/github.com/google/pprof/internal/report.s                     80032    80022    -10     -0.012% 
cmd/go/internal/modfetch/codehost.s                                      89005    88871    -134    -0.151% 
cmd/trace.s                                                              116607   116496   -111    -0.095% 
cmd/vendor/github.com/google/pprof/internal/driver.s                     143234   143207   -27     -0.019% 
cmd/vendor/github.com/google/pprof/driver.s                              9000     8998     -2      -0.022% 
cmd/go/internal/modfetch.s                                               126300   125726   -574    -0.454% 
cmd/pprof.s                                                              12317    12312    -5      -0.041% 
cmd/go/internal/modconv.s                                                17878    17861    -17     -0.095% 
cmd/go/internal/modload.s                                                150261   149763   -498    -0.331% 
cmd/go/internal/clean.s                                                  11122    11091    -31     -0.279% 
cmd/go/internal/help.s                                                   6523     6521     -2      -0.031% 
cmd/go/internal/generate.s                                               11627    11614    -13     -0.112% 
cmd/go/internal/envcmd.s                                                 22034    21986    -48     -0.218% 
cmd/go/internal/modget.s                                                 38478    38398    -80     -0.208% 
cmd/go/internal/modcmd.s                                                 46430    46229    -201    -0.433% 
cmd/go/internal/test.s                                                   64399    64374    -25     -0.039% 
cmd/compile/internal/ssa.s                                               3615264  3608276  -6988   -0.193% 
cmd/compile/internal/gc.s                                                1538865  1537625  -1240   -0.081% 
cmd/compile/internal/amd64.s                                             33593    33574    -19     -0.057% 
cmd/compile/internal/x86.s                                               30871    30852    -19     -0.062% 
total                                                                    19343565 19311284 -32281  -0.167% 

Change-Id: Ib030eb79458827a5a5b6d0d2f98765f8325a4d7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/222923
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@josharian
Copy link
Contributor

josharian commented May 2, 2020

Since the 1.15 window has passed, I'm going to jot down a few more related things and test cases I've noticed, just to have them written down somewhere.

(1) The shortcircuit pass can be strengthened considerably, at some compiler performance risk, by figuring out the locations of all the uses of the non-control phis. I have CLs locally that play with this.

(2) The phiopt pass could be strengthened. For example, in runtime.atoi, we find code like this:

	neg := false
	if s[0] == '-' {
		neg = true
		s = s[1:]
	}

This should be ripe for the phiopt pass to convert into code like:

  neg := s[0] == '-'
  if neg {
    s = s[1]
  }

But it doesn't because of the second conditions[0] contains a bounds check. In this case, prove later removes that bounds check. But there are other cases in the tree in which it cannot. To find them, at the end of the phiopt pass, add something like:

	for _, b := range f.Blocks {
		for _, v := range b.Values {
			if v.Op != OpPhi {
				continue
			}
			all := true
			for _, a := range v.Args {
				if a.Op != OpConstBool {
					all = false
					break
				}
			}
			if !all {
				continue
			}
			fmt.Println(f.Name, b, v)
		}
	}

(3) The shortcircuit pass currently looks for situations in which the phi control value of a block is known for a particular inbound edge because it is a constant along that edge.

But there is a structurally similar scenario that can occur in which a control value of a block is known for a particular inbound edge because its predecessor branched on that exact same control value. For example, runtime.atoi contains this code:

	if !neg && un > uint(maxInt) {
		return 0, false
	}
	if neg && un > uint(maxInt)+1 {
		return 0, false
	}

After CSE, the first thing we do is branch on !neg. And if it is false, we immediately branch again on !neg. We should be able to proceed directly to un > uint(maxInt)+1. That is, we should optimize this code to:

	if !neg {
		if un > uint(maxInt) {
			return 0, false
		}
	} else {
		if un > uint(maxInt)+1 {
			return 0, false
		}
	}

Prove can catch some repeated block controls, but not conditional ones; that's where a modified short circuit pass would help.

(4) We should consider making phiopt part of the fuse pass loops, or at least after prove. cmd/compile/internal/types.(*Type).IsInteger is a good test case. It is:

func (t *Type) IsInteger() bool {
	switch t.Etype {
	case TINT8, TUINT8, TINT16, TUINT16, TINT32, TUINT32, TINT64, TUINT64, TINT, TUINT, TUINTPTR:
		return true
	}
	return false
}

This switch case should be optimized into a single unsigned comparison. But it's too complicated for phiopt. Prove simplifies it, but it's too late for phiopt. Fuse can't handle it. We end up with pointless branching.

We might want to make phiopt a fuse function, like we recently did for shortcircuit, and run it after prove.

(5) Running shortcircuit again later helps. The trick is doing so cheaply enough without having to run a deadcode afterwards. I tried this in CL 229058 and failed. The failure is very interesting, and are the effects of doing that CL piecemeal. I still think it's a good idea, but it needs some thought and investigate before trying again.

That's a bunch of stuff. But it all falls under the umbrella of "look at the CFG and do less branching". Help with any/all of these is of course welcome.

cc @mundaym

@erifan
Copy link

erifan commented May 6, 2020

A long time ago, I found that compared with gcc, the generation instruction of function math.IsInf is very cumbersome and there are many jumps, see https://godbolt.org/z/n8wWyG and https://godbolt.org/z/GcWh9S. There doesn't seem to be no improvement at the moment. I look forward to making it as concise as the instructions generated by gcc. I am interested in this case. But I won't be free until next quarter.

@erifan
Copy link

erifan commented Jun 28, 2020

Hi, I'm doing the third one.

@zhangfannie
Copy link
Contributor

If there are no other volunteers, I will look into the second one. Thank you. 😊

@gopherbot
Copy link

Change https://golang.org/cl/244737 mentions this issue: cmd/compile/ssa: optimize the derivable known branch of If block

@gopherbot
Copy link

Change https://golang.org/cl/252937 mentions this issue: cmd/compile/internal/ssa: strengthen phiopt pass

@gopherbot
Copy link

Change https://golang.org/cl/257981 mentions this issue: cmd/compile: optimize the Phi values

@erifan
Copy link

erifan commented Oct 22, 2020

This comment is an explanation of CL https://go-review.googlesource.com/c/go/+/244737 @randall77
Take math.IsInf as an example.
func IsInf(f float64, sign int) bool {
return sign >= 0 && f > MaxFloat64 || sign <= 0 && f < -MaxFloat64
}
The IR graph before shortcircuit:
image
(before)
The control value of b2 is v15, a phi op, one of its argument is v11, which is the control value of b1. So if b2 is coming from b1, we know v11 is false, then v15 is false, and we can simply redirect b1 to b5. This is what shortcircuit does. CL_244737 can't do this optimization, because CL_244737 can't infer the value of v15 from v11. From the op of v11, CL_244737 knows the relationship of v10 and v8, from the edge b1->b5, it also knows the v11 is false, but it doesn't know the value of v15 from the above knowledge.
What is missing here is to derive the output value of phi through its inputs. This is exactly what shortcircuit does. So CL_244737 needs to be based on shortcircuit.

After shortcircuit, the graph becomes as follow, then CL_244737 can play a role.
image
(after)
If b5 if coming from b1, then from the control value (b11) of b1, CL_244737 knows that v10 > v8, the control value of b5 is v17, so CL_244737 knows that b5's true branch is the taken branch. So we can redirect b1 to b7 directly.
It needs to be pointed out that prove pass can't handle what CL_244737 does, because b5 has multiple predecessors.

gopherbot pushed a commit that referenced this issue Mar 29, 2021
The current phiopt pass just transforms the following code
  x := false
  if b { x = true}
into
  x = b

But we find code in runtime.atoi like this:
  neg := false
  if s[0] == '-' {
    neg = true
    s = s[1:]
  }

The current phiopt pass does not covert it into code like:
  neg := s[0] == '-'
  if neg { s = s[1:] }

Therefore, this patch strengthens the phiopt pass so that the
boolean Phi value "neg" can be replaced with a copy of control
value "s[0] == '-'", thereby using "cmp+cset" instead of a branch.

But in some cases even replacing the boolean Phis cannot eliminate
this branch. In the following case, this patch replaces "d" with a
copy of "a<0", but the regalloc pass will insert the "Load {c}"
value into an empty block to split the live ranges, which causes
the branch to not be eliminated.

For example:
  func test(a, b, c int) (bool, int) {
    d := false
    if (a<0) {
      if (b<0) {
        c = c+1
      }
      d = true
    }
    return d, c
  }

The optimized assembly code:
  MOVD "".a(FP), R0
  TBZ $63, R0, 48
  MOVD "".c+16(FP), R1
  ADD $1, R1, R2
  MOVD "".b+8(FP), R3
  CMP ZR, R3
  CSEL LT, R2, R1, R1
  CMP ZR, R0
  CSET LT, R0
  MOVB R0, "".~r3+24(FP)
  MOVD R1, "".~r4+32(FP)
  RET (R30)
  MOVD "".c+16(FP), R1
  JMP 28

The benchmark:

name          old time/op            new time/op            delta
pkg:cmd/compile/internal/ssa goos:linux goarch:arm64
PhioptPass  117783.250000ns +- 1%  117219.111111ns +- 1%   ~  (p=0.074  n=8+9)

Statistical data from compilecmp tool:

compilecmp local/master -> HEAD
local/master (a826f7d): debug/dwarf: support DW_FORM_rnglistx aka formRnglistx
HEAD (e57e003c10): cmd/compile/internal/ssa: strengthen phiopt pass

benchstat -geomean  /tmp/2516644532 /tmp/1075915815
completed 50 of 50, estimated time remaining 0s (ETA 7:10PM)
name                      old time/op       new time/op       delta
Template                        554ms _ 3%        553ms _ 3%    ~     (p=0.986 n=49+48)
Unicode                         252ms _ 4%        249ms _ 4%  -1.33%  (p=0.002 n=47+49)
GoTypes                         3.16s _ 3%        3.18s _ 3%  +0.77%  (p=0.022 n=44+48)
Compiler                        257ms _ 4%        258ms _ 4%    ~     (p=0.121 n=50+49)
SSA                             24.2s _ 4%        24.2s _ 5%    ~     (p=0.694 n=49+50)
Flate                           338ms _ 4%        338ms _ 4%    ~     (p=0.592 n=43+46)
GoParser                        506ms _ 3%        507ms _ 3%    ~     (p=0.942 n=49+50)
Reflect                         1.37s _ 4%        1.37s _ 5%    ~     (p=0.408 n=50+50)
Tar                             486ms _ 3%        487ms _ 4%    ~     (p=0.911 n=47+50)
XML                             619ms _ 2%        619ms _ 3%    ~     (p=0.368 n=46+48)
LinkCompiler                    1.29s _31%        1.32s _23%    ~     (p=0.306 n=49+44)
ExternalLinkCompiler            3.39s _10%        3.36s _ 6%    ~     (p=0.311 n=48+46)
LinkWithoutDebugCompiler        846ms _37%        793ms _24%  -6.29%  (p=0.040 n=50+49)
[Geo mean]                      974ms             971ms       -0.36%

name                      old user-time/op  new user-time/op  delta
Template                        910ms _12%        893ms _13%    ~     (p=0.098 n=49+49)
Unicode                         495ms _28%        492ms _18%    ~     (p=0.562 n=50+46)
GoTypes                         4.42s _15%        4.39s _13%    ~     (p=0.684 n=49+50)
Compiler                        419ms _22%        422ms _16%    ~     (p=0.579 n=48+50)
SSA                             36.5s _ 7%        36.6s _ 8%    ~     (p=0.465 n=50+47)
Flate                           521ms _21%        523ms _16%    ~     (p=0.889 n=50+47)
GoParser                        810ms _12%        792ms _15%    ~     (p=0.149 n=50+50)
Reflect                         1.98s _13%        2.02s _13%    ~     (p=0.144 n=47+50)
Tar                             826ms _15%        806ms _19%    ~     (p=0.115 n=49+49)
XML                             988ms _14%       1003ms _14%    ~     (p=0.179 n=50+50)
LinkCompiler                    1.79s _ 8%        1.84s _11%  +2.81%  (p=0.001 n=49+49)
ExternalLinkCompiler            3.69s _ 4%        3.71s _ 3%    ~     (p=0.261 n=50+50)
LinkWithoutDebugCompiler        838ms _10%        827ms _11%    ~     (p=0.323 n=50+48)
[Geo mean]                      1.44s             1.44s       -0.05%

name                      old alloc/op      new alloc/op      delta
Template                       39.0MB _ 1%       39.0MB _ 1%    ~     (p=0.445 n=50+49)
Unicode                        28.5MB _ 0%       28.5MB _ 0%    ~     (p=0.460 n=50+50)
GoTypes                         169MB _ 1%        169MB _ 1%    ~     (p=0.092 n=48+50)
Compiler                       23.4MB _ 1%       23.4MB _ 1%  -0.19%  (p=0.032 n=50+49)
SSA                            1.54GB _ 0%       1.55GB _ 1%  +0.14%  (p=0.001 n=50+50)
Flate                          23.8MB _ 1%       23.8MB _ 2%    ~     (p=0.702 n=49+49)
GoParser                       35.4MB _ 1%       35.4MB _ 1%    ~     (p=0.786 n=50+50)
Reflect                        85.3MB _ 1%       85.3MB _ 1%    ~     (p=0.298 n=50+50)
Tar                            34.6MB _ 2%       34.6MB _ 2%    ~     (p=0.683 n=50+50)
XML                            44.5MB _ 3%       44.0MB _ 2%  -1.05%  (p=0.000 n=50+46)
LinkCompiler                    136MB _ 0%        136MB _ 0%  +0.01%  (p=0.005 n=50+50)
ExternalLinkCompiler            128MB _ 0%        128MB _ 0%    ~     (p=0.179 n=50+50)
LinkWithoutDebugCompiler       84.3MB _ 0%       84.3MB _ 0%  +0.01%  (p=0.006 n=50+50)
[Geo mean]                     70.7MB            70.6MB       -0.07%

name                      old allocs/op     new allocs/op     delta
Template                         410k _ 0%         410k _ 0%    ~     (p=0.606 n=48+49)
Unicode                          310k _ 0%         310k _ 0%    ~     (p=0.674 n=50+50)
GoTypes                         1.81M _ 0%        1.81M _ 0%    ~     (p=0.674 n=50+50)
Compiler                         202k _ 0%         202k _ 0%  +0.02%  (p=0.046 n=50+50)
SSA                             16.3M _ 0%        16.3M _ 0%  +0.10%  (p=0.000 n=50+50)
Flate                            244k _ 0%         244k _ 0%    ~     (p=0.834 n=49+50)
GoParser                         380k _ 0%         380k _ 0%    ~     (p=0.410 n=50+50)
Reflect                         1.08M _ 0%        1.08M _ 0%    ~     (p=0.782 n=48+50)
Tar                              368k _ 0%         368k _ 0%    ~     (p=0.585 n=50+49)
XML                              453k _ 0%         453k _ 0%  -0.01%  (p=0.025 n=49+49)
LinkCompiler                     713k _ 0%         713k _ 0%  +0.01%  (p=0.044 n=50+50)
ExternalLinkCompiler             794k _ 0%         794k _ 0%  +0.01%  (p=0.000 n=50+49)
LinkWithoutDebugCompiler         251k _ 0%         251k _ 0%    ~     (p=0.092 n=47+50)
[Geo mean]                       615k              615k       +0.01%

name                      old maxRSS/op     new maxRSS/op     delta
Template                        37.0M _ 4%        37.2M _ 3%    ~     (p=0.062 n=48+48)
Unicode                         36.9M _ 5%        37.3M _ 4%  +1.10%  (p=0.021 n=50+47)
GoTypes                         94.3M _ 3%        94.9M _ 4%  +0.69%  (p=0.022 n=45+46)
Compiler                        33.4M _ 3%        33.4M _ 5%    ~     (p=0.964 n=49+50)
SSA                              741M _ 3%         738M _ 3%    ~     (p=0.164 n=50+50)
Flate                           28.5M _ 6%        28.8M _ 4%  +1.07%  (p=0.009 n=50+49)
GoParser                        35.0M _ 3%        35.3M _ 4%  +0.83%  (p=0.010 n=50+48)
Reflect                         57.2M _ 6%        57.1M _ 4%    ~     (p=0.815 n=50+49)
Tar                             34.9M _ 3%        35.0M _ 3%    ~     (p=0.134 n=49+48)
XML                             39.5M _ 5%        40.0M _ 3%  +1.35%  (p=0.001 n=50+48)
LinkCompiler                     220M _ 2%         220M _ 2%    ~     (p=0.547 n=49+48)
ExternalLinkCompiler             235M _ 2%         236M _ 2%    ~     (p=0.538 n=47+44)
LinkWithoutDebugCompiler         179M _ 1%         179M _ 1%    ~     (p=0.775 n=50+50)
[Geo mean]                      74.9M             75.2M       +0.43%

name                      old text-bytes    new text-bytes    delta
HelloSize                       784kB _ 0%        784kB _ 0%  +0.01%  (p=0.000 n=50+50)

name                      old data-bytes    new data-bytes    delta
HelloSize                      13.1kB _ 0%       13.1kB _ 0%    ~     (all equal)

name                      old bss-bytes     new bss-bytes     delta
HelloSize                       206kB _ 0%        206kB _ 0%    ~     (all equal)

name                      old exe-bytes     new exe-bytes     delta
HelloSize                      1.28MB _ 0%       1.28MB _ 0%  +0.00%  (p=0.000 n=50+50)

file      before    after     _       %
addr2line 4006300   4004484   -1816   -0.045%
api       5029956   5029324   -632    -0.013%
asm       4936311   4939423   +3112   +0.063%
buildid   2595059   2595291   +232    +0.009%
cgo       4401029   4397333   -3696   -0.084%
compile   22246677  22246863  +186    +0.001%
cover     4443825   4443065   -760    -0.017%
dist      3366078   3365838   -240    -0.007%
doc       3776391   3776615   +224    +0.006%
fix       3218800   3218648   -152    -0.005%
link      6365321   6365345   +24     +0.000%
nm        3923625   3923857   +232    +0.006%
objdump   4295569   4295041   -528    -0.012%
pack      2390745   2389217   -1528   -0.064%
pprof     12870094  12866942  -3152   -0.024%
test2json 2587265   2587073   -192    -0.007%
trace     9612629   9613981   +1352   +0.014%
vet       6791008   6792072   +1064   +0.016%
total     106856682 106850412 -6270   -0.006%

Update #37608

Change-Id: Ic6206b22fd1faf570be9fd3c2511aa6c4ce38cdb
Reviewed-on: https://go-review.googlesource.com/c/go/+/252937
Trust: fannie zhang <Fannie.Zhang@arm.com>
Run-TryBot: fannie zhang <Fannie.Zhang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
gopherbot pushed a commit that referenced this issue Apr 28, 2021
When the control value of a If block is known for a particular inbound edge
because its value can be inferred from the control value of its predecessor,
then this inbound edge can be redirected to the known successor directly,
This CL optimizes this kind of cases to eliminate unnecessary comparision.

For example, the following piece of code comes from runtime.atoi,
if !neg && un > uint(maxInt) {
	return 0, false
}
if neg && un > uint(maxInt)+1 {
	return 0, false
}

Before this optimization, if the first "if" statement does not return, both
conditions of the second "if" statement will be checked. But obviously the
value of neg is known through the first "if" statement, and there is no need
to check neg repeatedly.

After this optimization, this redundancy check is eliminated, and the execution
logic becomes as follows.
if !neg {
	if un > uint(maxInt) {
		return 0, false
	}
} else {
	if un > uint(maxInt)+1 {
		return 0, false
	}
}

This CL does not bring significant performance changes, but it makes the code
structure look more reasonable.

Statistical data from tool compilecmp on Linux/amd64:
name                      old time/op       new time/op       delta
Template                        380ms ± 4%        385ms ± 3%  +1.16%  (p=0.000 n=50+49)
Unicode                         168ms ± 9%        169ms ± 9%    ~     (p=0.421 n=49+46)
GoTypes                         1.99s ± 4%        2.02s ± 4%  +1.48%  (p=0.000 n=49+49)
Compiler                        188ms ± 8%        188ms ± 9%    ~     (p=0.997 n=49+50)
SSA                             11.8s ± 2%        12.0s ± 2%  +1.24%  (p=0.000 n=48+50)
Flate                           242ms ± 6%        244ms ± 9%    ~     (p=0.307 n=46+49)
GoParser                        361ms ± 3%        366ms ± 4%  +1.23%  (p=0.000 n=48+49)
Reflect                         836ms ± 3%        842ms ± 3%  +0.70%  (p=0.004 n=48+48)
Tar                             335ms ± 3%        340ms ± 4%  +1.47%  (p=0.000 n=49+46)
XML                             432ms ± 4%        437ms ± 4%  +1.11%  (p=0.002 n=49+49)
LinkCompiler                    701ms ± 4%        704ms ± 5%    ~     (p=0.278 n=49+50)
ExternalLinkCompiler            1.83s ± 3%        1.84s ± 3%  +0.51%  (p=0.034 n=48+49)
LinkWithoutDebugCompiler        436ms ± 6%        438ms ± 6%    ~     (p=0.419 n=48+49)
[Geo mean]                      612ms             617ms       +0.84%

name                      old alloc/op      new alloc/op      delta
Template                       38.7MB ± 1%       39.1MB ± 1%  +1.19%  (p=0.000 n=50+50)
Unicode                        28.1MB ± 0%       28.1MB ± 0%  +0.20%  (p=0.000 n=49+45)
GoTypes                         168MB ± 1%        170MB ± 1%  +1.05%  (p=0.000 n=48+49)
Compiler                       23.0MB ± 1%       23.1MB ± 1%  +0.63%  (p=0.000 n=50+50)
SSA                            1.54GB ± 1%       1.55GB ± 1%  +0.85%  (p=0.000 n=50+50)
Flate                          23.6MB ± 1%       23.9MB ± 1%  +1.36%  (p=0.000 n=43+46)
GoParser                       35.0MB ± 1%       35.3MB ± 1%  +0.94%  (p=0.000 n=50+50)
Reflect                        84.7MB ± 1%       86.1MB ± 1%  +1.72%  (p=0.000 n=49+49)
Tar                            34.5MB ± 1%       34.9MB ± 1%  +1.07%  (p=0.000 n=47+48)
XML                            44.2MB ± 3%       44.6MB ± 3%  +0.70%  (p=0.003 n=50+49)
LinkCompiler                    128MB ± 0%        128MB ± 0%  +0.01%  (p=0.004 n=49+50)
ExternalLinkCompiler            120MB ± 0%        120MB ± 0%  +0.01%  (p=0.000 n=49+50)
LinkWithoutDebugCompiler       77.3MB ± 0%       77.3MB ± 0%  +0.02%  (p=0.000 n=50+50)
[Geo mean]                     69.1MB            69.6MB       +0.75%

file      before    after     Δ       %
addr2line 4049276   4051308   +2032   +0.050%
api       5248940   5248996   +56     +0.001%
asm       4868093   4868037   -56     -0.001%
buildid   2627666   2626026   -1640   -0.062%
cgo       4614432   4615040   +608    +0.013%
compile   23298888  23301267  +2379   +0.010%
cover     4591609   4591161   -448    -0.010%
dist      3449638   3450254   +616    +0.018%
doc       3925667   3926363   +696    +0.018%
fix       3322936   3323464   +528    +0.016%
link      6628632   6629560   +928    +0.014%
nm        3991753   3996497   +4744   +0.119%
objdump   4396119   4395615   -504    -0.011%
pack      2399719   2399535   -184    -0.008%
pprof     13616418  13622866  +6448   +0.047%
test2json 2646121   2646081   -40     -0.002%
trace     10233087  10226359  -6728   -0.066%
vet       7117994   7121066   +3072   +0.043%
total     111026988 111039495 +12507  +0.011%

On linux arm64:
name                      old time/op       new time/op       delta
Template                        284ms ± 1%        286ms ± 1%  +0.70%  (p=0.000 n=49+50)
Unicode                         125ms ± 3%        125ms ± 2%    ~     (p=0.548 n=50+50)
GoTypes                         1.69s ± 1%        1.71s ± 1%  +1.02%  (p=0.000 n=49+49)
Compiler                        125ms ± 1%        124ms ± 2%  -0.35%  (p=0.020 n=50+50)
SSA                             12.7s ± 1%        12.8s ± 1%  +1.21%  (p=0.000 n=49+49)
Flate                           172ms ± 1%        173ms ± 1%  +0.20%  (p=0.047 n=50+50)
GoParser                        265ms ± 1%        266ms ± 1%  +0.64%  (p=0.000 n=50+50)
Reflect                         651ms ± 1%        650ms ± 1%    ~     (p=0.079 n=48+48)
Tar                             246ms ± 1%        246ms ± 1%    ~     (p=0.202 n=50+46)
XML                             328ms ± 1%        332ms ± 1%  +1.28%  (p=0.000 n=50+49)
LinkCompiler                    600ms ± 1%        599ms ± 1%    ~     (p=0.264 n=50+50)
ExternalLinkCompiler            1.88s ± 1%        1.90s ± 0%  +1.36%  (p=0.000 n=50+50)
LinkWithoutDebugCompiler        365ms ± 1%        365ms ± 1%    ~     (p=0.602 n=50+46)
[Geo mean]                      490ms             492ms       +0.47%

name                      old alloc/op      new alloc/op      delta
Template                       38.8MB ± 1%       39.1MB ± 1%  +0.92%  (p=0.000 n=44+42)
Unicode                        28.4MB ± 0%       28.4MB ± 0%  +0.22%  (p=0.000 n=44+45)
GoTypes                         169MB ± 1%        171MB ± 1%  +1.12%  (p=0.000 n=50+50)
Compiler                       23.2MB ± 1%       23.3MB ± 1%  +0.56%  (p=0.000 n=42+43)
SSA                            1.55GB ± 0%       1.56GB ± 0%  +0.91%  (p=0.000 n=48+49)
Flate                          23.7MB ± 2%       24.0MB ± 1%  +1.20%  (p=0.000 n=50+50)
GoParser                       35.3MB ± 1%       35.6MB ± 1%  +0.88%  (p=0.000 n=50+50)
Reflect                        85.0MB ± 0%       86.5MB ± 0%  +1.70%  (p=0.000 n=49+48)
Tar                            34.5MB ± 1%       34.9MB ± 1%  +1.03%  (p=0.000 n=47+50)
XML                            43.8MB ± 2%       44.0MB ± 0%  +0.41%  (p=0.002 n=49+38)
LinkCompiler                    136MB ± 0%        136MB ± 0%  +0.01%  (p=0.006 n=50+49)
ExternalLinkCompiler            127MB ± 0%        127MB ± 0%  +0.02%  (p=0.000 n=49+50)
LinkWithoutDebugCompiler       84.1MB ± 0%       84.1MB ± 0%    ~     (p=0.534 n=50+50)
[Geo mean]                     70.4MB            70.9MB       +0.69%

file      before    after     Δ       %
addr2line 4006004   4004556   -1448   -0.036%
api       5029716   5028828   -888    -0.018%
asm       4936863   4934943   -1920   -0.039%
buildid   2594947   2594099   -848    -0.033%
cgo       4399702   4399502   -200    -0.005%
compile   22233139  22230486  -2653   -0.012%
cover     4443681   4443881   +200    +0.005%
dist      3365902   3365486   -416    -0.012%
doc       3776175   3776151   -24     -0.001%
fix       3218624   3218600   -24     -0.001%
link      6365001   6361409   -3592   -0.056%
nm        3923345   3923065   -280    -0.007%
objdump   4295473   4296673   +1200   +0.028%
pack      2390561   2389393   -1168   -0.049%
pprof     12866419  12865115  -1304   -0.010%
test2json 2587113   2585561   -1552   -0.060%
trace     9609814   9610846   +1032   +0.011%
vet       6790272   6789760   -512    -0.008%
total     106832751 106818354 -14397  -0.013%

Update: #37608

Change-Id: I2831238b12e3af5aef2261f64f804bf0a8b43f86
Reviewed-on: https://go-review.googlesource.com/c/go/+/244737
Reviewed-by: eric fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: eric fang <eric.fang@arm.com>
Run-TryBot: eric fang <eric.fang@arm.com>
TryBot-Result: Go Bot <gobot@golang.org>
@archanaravindar
Copy link
Contributor

archanaravindar commented Sep 21, 2021

The code generated for WriteByte() function now does not show unnecessary branches
go version used : devel go1.18-782aa42255 Thu Sep 2 15:14:50 2021 +0000 linux/ppc64le, tested on POWER8 and POWER9

Code for WriteByte() function now looks like

  0xc4e38               e8e60008                MOVD 8(R6),R7                        // ld r7,8(r6)
  0xc4e3c               e8a60010                MOVD 16(R6),R5                       // ld r5,16(r6)
  0xc4e40               7d072850                SUB R7,R5,R8                         // subf r8,r7,r5
  0xc4e44               2c280001                CMP R8,$1                            // cmpdi r8,1
  0xc4e48               41800018                BLT 0xc4e60                          // blt 0xc4e60
                b.buf = b.buf[:l+n]
  0xc4e4c               38870001                ADD R7,$1,R4                         // addi r4,r7,1
  0xc4e50               7c252040                CMPU R5,R4                           // cmpld r5,r4
  0xc4e54               4180005c                BLT 0xc4eb0                          // blt 0xc4eb0
  0xc4e58               f8860008                MOVD R4,8(R6)                        // std r4,8(r6)
        m, ok := b.tryGrowByReslice(1)
  0xc4e5c               4800001c                BR 0xc4e78                           // b 0xc4e78
                m = b.grow(1)
  0xc4e60               f8c10020                MOVD R6,32(R1)                       // std r6,32(r1)
  0xc4e64               38600001                MOVD $1,R3                           // li r3,1
  0xc4e68               f8610028                MOVD R3,40(R1)                       // std r3,40(r1)
  0xc4e6c               4bfff645                CALL bytes.(*Buffer).grow(SB)        // bl 0xc44b0
  0xc4e70               e8e10030                MOVD 48(R1),R7                       // ld r7,48(r1)
        b.buf[m] = c
  0xc4e74               e8c10058                MOVD 88(R1),R6                       // ld r6,88(r1)
  0xc4e78               e8860008                MOVD 8(R6),R4                        // ld r4,8(r6)
  0xc4e7c               e8a60000                MOVD 0(R6),R5                        // ld r5,0(r6)
  0xc4e80               7c272040                CMPU R7,R4                           // cmpld r7,r4
  0xc4e84               40800024                BGE 0xc4ea8                          // bge 0xc4ea8

However, there is potential opportunity of reducing branches in the math/big tests
specifically in the Decimal.round function
which is exercised by the test BenchmarkFloatString in the larger big.test

For instance,

  1. r9 is loaded with 0 at 0x17e9f0 and control jumps to 0x17ea04 that compares r9 with r0 and if its equal control branches to 0x17ea10. In essence once control reaches 0x17e9f0 its guaranteed that r9 will be loaded with zero and the condition compare at 0x17ea04 will be true and control goes to 0x17ea10
    So we can replace 0x17e9cc 40810024 BLE 0x17e9f0 // ble 0x17e9f0
    by directly ble 0x17ea10
  2. Similarly we can replace
    0x17e9f0 3920000 MOVD $0,R9 // li r9,0
    0x17e9f4 48000010 BR 0x17ea04 // b 0x17ea04
    by br 0x17ea10
  0x17e9b4              2c090035                CMPW R9,$53                          // cmpwi r9,53
  0x17e9b8              40820040                BNE 0x17e9f8                         // bne 0x17e9f8
  0x17e9bc              39460001                ADD R6,$1,R10                        // addi r10,r6,1
  0x17e9c0              7c245000                CMP R4,R10                           // cmpd r4,r10
  0x17e9c4              40820034                BNE 0x17e9f8                         // bne 0x17e9f8
        if n < 0 || n >= len(x.mant) {
  0x17e9c8              7c260000                CMP R6,R0                            // cmpd r6,r0
                return n > 0 && (x.mant[n-1]-'0')&1 != 0
  0x17e9cc              40810024                BLE 0x17e9f0                         // ble 0x17e9f0
  0x17e9d0              3926ffff                ADD R6,$-1,R9                        // addi r9,r6,-1
  0x17e9d4              7d2940ae                MOVBZ (R8)(R9),R9                    // lbzx r9,r9,r8
  0x17e9d8              3929ffd0                ADD R9,$-48,R9                       // addi r9,r9,-48
  0x17e9dc              71290001                ANDCC R9,$1,R9                       // andi. r9,r9,1
  0x17e9e0              7c090000                CMPW R9,R0                           // cmpw r9,r0
  0x17e9e4              39200001                MOVD $1,R9                           // li r9,1
  0x17e9e8              7d20489e                ISEL $2,R0,R9,R9                     // isel r9,r0,r9,eq
  0x17e9ec              48000018                BR 0x17ea04                          // b 0x17ea04
  0x17e9f0              39200000                MOVD $0,R9                           // li r9,0
  0x17e9f4              48000010                BR 0x17ea04                          // b 0x17ea04
        return x.mant[n] >= '5'
  0x17e9f8              28090035                CMPWU R9,$53                         // cmplwi r9,53
  0x17e9fc              39400001                MOVD $1,R10                          // li r10,1
  0x17ea00              7d20501e                ISEL $0,R0,R10,R9                    // isel r9,r0,r10,lt
        if shouldRoundUp(x, n) {
  0x17ea04              7c090000                CMPW R9,R0                           // cmpw r9,r0
  0x17ea08              41820008                BEQ 0x17ea10                         // beq 0x17ea10
        if n < 0 || n >= len(x.mant) {
  0x17ea0c              4800007c                BR 0x17ea88                          // b 0x17ea88
                x.roundDown(n)
  0x17ea10              7c000378                OR R0,R0,R0                          // or r0,r0,r0
        x.mant = x.mant[:n]

In general, this problem seems to occur when there are multiple conditions involved such as the one below and optimizing this will improve the code generation of branch instructions in the golang compiler, such instances where there are multiple conditions involved is seen in common practice

func shouldRoundUp(x *decimal, n int) bool {
        if x.mant[n] == '5' && n+1 == len(x.mant) {
                // exactly halfway - round to even
                return n > 0 && (x.mant[n-1]-'0')&1 != 0
        }
        // not halfway - digit tells all (x.mant has no trailing zeros)
        return x.mant[n] >= '5'
} 

@CAFxX
Copy link
Contributor

CAFxX commented Jun 24, 2022

Haven't seen this mentioned already, so just adding it here as a memo: for code like

if b == 'a' || b == 'A' {
    return 1
}
return 0

we don't fuse (on tip) the two condition checks, instead performing two checks and two conditional jumps even though the conditions themselves are pure:

        CMPB    AL, $97
        JEQ     foo_pc8
        CMPB    AL, $65
        JNE     foo_pc14
foo_pc8:
        MOVL    $1, AX
        RET
foo_pc14:
        XORL    AX, AX
        RET

gcc/clang instead (at O1 and above) will emit the more compact

        and     dil, -33
        cmp     dil, 65
        sete    al
        ret

@mundaym
Copy link
Member

mundaym commented Jul 1, 2022

@CAFxX there is an issue for fusing integer comparisons specifically: #38721. There is an old CL linked from that that demonstrates a possible way of achieving this optimization.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests