New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: branch elimination opportunites when comparing constants #37608
Comments
Parts of this sound like the shortcircuit pass. Is that working correctly here? |
Here is the email that I wrote to @dr2chase this morning. It looks apropos. (Note that the situation I describe in the email occurs in block b9 in the shortcircuit pass of
|
Sorry not to reply earlier.
I think case 3 is potentially a mess, it might require a lot of phi insertion depending on the shape of the graph. Case 4 depends on the source of the input flow edge; it is some block, either falling into case 1, 2, or 3. Slightly off-topic, I recall recently seeing a formulation for SSA where blocks with phis are like "functions" and their predecessors "call" them, passing parameters appropriately. This simplifies the description of case 4 above because it is converted to 1, 2, or 3, as appropriate. |
Sorry, David. I didn’t mean to imply that I had a sub-24-hour SLA on random personal emails asking for help! The idea of doing a critical edge removal pass first is an intriguing one, thanks! |
I'm working on this. |
This is proving to be a rich source of optimizations. The challenge is that every time I try a deeper analysis, it catches more cases and is also slower. Soon (I hope today) I'll start mailing some of the cheap should-definitely-do-this cases, and we can work incrementally from there. |
Change https://golang.org/cl/222917 mentions this issue: |
Change https://golang.org/cl/222918 mentions this issue: |
Change https://golang.org/cl/222919 mentions this issue: |
Change https://golang.org/cl/222923 mentions this issue: |
v is pretty generic. Subsequent changes will make this function more complicated, so rename it now, independently, for easier review. v is the control value for the block (or its underlying phi); call it ctl. Passes toolstash-check. Updates #37608 Change-Id: I3fbae3344f1c95aff0a69c1e4f61ef637a54774e Reviewed-on: https://go-review.googlesource.com/c/go/+/222917 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
shortcircuitBlock contained a loop to handle blocks like b: <- p q v = Phi true false If v -> t u in a single execution. This change makes shortcircuitBlock do it in two instead, one for each constant phi arg. Motivation: Upcoming changes will expand the range of blocks that the shortcircuit pass can handle. Those changes need to understand what the CFG will look like after the rewrite in shortcircuitBlock. Making shortcircuitBlock do only a single CFG modification at a time significantly simplifies that code. In theory, this is less efficient, but not measurably so. There is minor, unimportant churn in the generated code. Updates #37608 Change-Id: Ia6dce7011e3e19b546ed1e176bd407575a0ab837 Reviewed-on: https://go-review.googlesource.com/c/go/+/222918 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Continue to simplify, rename for clarity, improve docs, and reduce variable scope. This is in preparation for this function becoming more complicated. Passes toolstash-check. Updates #37608 Change-Id: I630a4e07c92297c46d18aea69ec29852d6371ff0 Reviewed-on: https://go-review.googlesource.com/c/go/+/222919 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Prior to this change, the shortcircuit pass could only handle blocks containing only a single phi control value, possibly wrapped in some OpNot and OpCopy values. This change partially lifts this limitation. It handles some cases in which the block contains other phi values. This appears to happen most commonly in cases in which the conditionals being checked involve the memory state, in which case there is a phi memory value in the block. The general idea here is to use the information we have about the CFG to (1) move the other phi values into other blocks and/or (2) rewrite uses of the other phi values in other blocks. For example, consider this CFG: p q \ / b / \ t u And consider a phi value v in block b. We'll write v = Phi(p: x, q: y) to say that v has value x corresponding to inbound block p, and value y for block q. We will rewrite this CFG to: p q | / | b |/ \ t u What should we do with v? Any uses of v in u can be replaced with y. Why? If we are in block u, we came from b, and before that from q. If prior to b we came from p, then we would have gone to t, not u. Since we came from q, we know that v took the value y. Uses of v in t are a bit more complicated. It is going to end up being a phi value: Phi(p: ?, b: ?). Suppose, after the rewrite, we came from block p. Then, before the rewrite, we would have gone to b, where v would have the value x. So we have Phi(p: x, b: ?). Suppose, after the rewrite, we came from block b. Then we must have come from block q. If we come from block q, v has value y. So we have Phi(p: x, b: y). Uses of v in t can thus be replaced with a new phi value, with the same values as v, but with altered predecessors. Similar reasoning can be employed to rewrite or replace other uses of v elsewhere in the CFG, so that v itself can be eliminated, and the CFG rewrite can proceed. This change sets up the infrastructure for such optimizations and adds a few cheap ones. All optimizations in this change depend only on the shape of the CFG; future changes may also depend on where v's uses are. That analysis is more powerful but more expensive, and should be done incrementally. The use of closures here is perhaps a bit unusual, but during development it proved critical to having readable code. We must decide early on whether we can safely do the CFG modifications, and then later fix up the phis if so. Safely storing state and decisions across these two phases is hard to do readably. Closures solve the problem neatly. I manually instrumented the code paths in shortcircuitPhiPlan. During make.bash there are nearly 6000 invocations. The least-visited code path gets run 85 times, so all the code in this CL is reasonably well-exercised. Here is a concrete example of code improved by this change: func f(e interface{}) int { if x, ok := e.(int); ok { return x } return 0 } Omitting PCDATA, FUNCDATA, and the like, it used to compile to: "".f STEXT nosplit size=50 args=0x18 locals=0x0 0x0000 00000 (x.go:4) LEAQ type.int(SB), AX 0x0007 00007 (x.go:4) MOVQ "".e+8(SP), CX 0x000c 00012 (x.go:4) CMPQ AX, CX 0x000f 00015 (x.go:4) JNE 43 0x0011 00017 (x.go:4) MOVQ "".e+16(SP), AX 0x0016 00022 (x.go:4) MOVQ (AX), AX 0x0019 00025 (x.go:4) JNE 33 0x001b 00027 (x.go:5) MOVQ AX, "".~r1+24(SP) 0x0020 00032 (x.go:5) RET 0x0021 00033 (x.go:7) MOVQ $0, "".~r1+24(SP) 0x002a 00042 (x.go:7) RET 0x002b 00043 (x.go:7) MOVL $0, AX 0x0030 00048 (x.go:4) JMP 25 Afterwards, it compiles to: "".f STEXT nosplit size=41 args=0x18 locals=0x0 0x0000 00000 (x.go:4) LEAQ type.int(SB), AX 0x0007 00007 (x.go:4) MOVQ "".e+8(SP), CX 0x000c 00012 (x.go:4) CMPQ AX, CX 0x000f 00015 (x.go:4) JNE 31 0x0011 00017 (x.go:4) MOVQ "".e+16(SP), AX 0x0016 00022 (x.go:4) MOVQ (AX), AX 0x0019 00025 (x.go:5) MOVQ AX, "".~r1+24(SP) 0x001e 00030 (x.go:5) RET 0x001f 00031 (x.go:7) MOVQ $0, "".~r1+24(SP) 0x0028 00040 (x.go:7) RET Note that there is now only a single JNE and a single RET $0 path. Updates #37608 Has a minor good effect on compilation speed and memory use. Provides widespread improvements to generated code. The rare, minor regressions I have investigated are due to register allocation fluctuations. file before after Δ % addr2line 4376080 4371984 -4096 -0.094% api 5945400 5933112 -12288 -0.207% asm 5034312 5030216 -4096 -0.081% buildid 2844952 2840856 -4096 -0.144% cgo 4812872 4804680 -8192 -0.170% compile 19622064 19610368 -11696 -0.060% cover 5236648 5232552 -4096 -0.078% dist 3658312 3654216 -4096 -0.112% doc 4653512 4649416 -4096 -0.088% fix 3370072 3365976 -4096 -0.122% link 6671864 6667768 -4096 -0.061% pprof 14781652 14761172 -20480 -0.139% trace 11639684 11627396 -12288 -0.106% vet 8252280 8231800 -20480 -0.248% total 115052984 114934792 -118192 -0.103% file before after Δ % internal/cpu.s 3298 3296 -2 -0.061% internal/bytealg.s 1730 1737 +7 +0.405% cmd/vendor/golang.org/x/mod/semver.s 7332 7283 -49 -0.668% image/color.s 8248 8156 -92 -1.115% math.s 35966 35956 -10 -0.028% math/cmplx.s 6596 6575 -21 -0.318% runtime.s 480566 480053 -513 -0.107% sync.s 16408 16385 -23 -0.140% math/rand.s 10447 10406 -41 -0.392% internal/reflectlite.s 28408 28366 -42 -0.148% errors.s 2736 2701 -35 -1.279% sort.s 17031 17036 +5 +0.029% io.s 16993 16964 -29 -0.171% container/heap.s 2006 1997 -9 -0.449% text/tabwriter.s 9570 9552 -18 -0.188% bytes.s 31823 31594 -229 -0.720% strconv.s 52760 52717 -43 -0.082% vendor/golang.org/x/text/transform.s 16713 16706 -7 -0.042% strings.s 42590 42563 -27 -0.063% bufio.s 22883 22785 -98 -0.428% encoding/base32.s 9586 9531 -55 -0.574% syscall.s 82237 82243 +6 +0.007% image.s 37465 37452 -13 -0.035% regexp/syntax.s 82827 82769 -58 -0.070% image/draw.s 18698 18584 -114 -0.610% image/jpeg.s 36560 36549 -11 -0.030% time.s 82557 82526 -31 -0.038% context.s 10863 10820 -43 -0.396% regexp.s 64114 64049 -65 -0.101% os.s 51751 51524 -227 -0.439% reflect.s 168240 168049 -191 -0.114% cmd/go/internal/lockedfile/internal/filelock.s 2317 2290 -27 -1.165% path/filepath.s 17831 17766 -65 -0.365% io/ioutil.s 6994 6990 -4 -0.057% encoding/binary.s 30791 30726 -65 -0.211% cmd/vendor/golang.org/x/sys/unix.s 78055 78033 -22 -0.028% encoding/pem.s 9280 9247 -33 -0.356% crypto/cipher.s 20376 20374 -2 -0.010% os/exec.s 29229 29140 -89 -0.304% internal/goroot.s 4588 4579 -9 -0.196% cmd/internal/browser.s 2246 2240 -6 -0.267% cmd/vendor/golang.org/x/crypto/ssh/terminal.s 27183 27149 -34 -0.125% fmt.s 76625 76484 -141 -0.184% encoding/hex.s 6154 6152 -2 -0.032% compress/lzw.s 7063 7059 -4 -0.057% database/sql/driver.s 18875 18862 -13 -0.069% debug/plan9obj.s 8268 8266 -2 -0.024% net/url.s 29724 29719 -5 -0.017% encoding/csv.s 12872 12856 -16 -0.124% debug/gosym.s 25303 25268 -35 -0.138% compress/flate.s 50952 51019 +67 +0.131% compress/zlib.s 7277 7266 -11 -0.151% archive/zip.s 42155 42111 -44 -0.104% debug/dwarf.s 107632 107541 -91 -0.085% database/sql.s 98373 98028 -345 -0.351% os/user.s 14722 14708 -14 -0.095% encoding/json.s 105836 105711 -125 -0.118% debug/macho.s 32598 32560 -38 -0.117% encoding/gob.s 136478 135755 -723 -0.530% debug/pe.s 31160 30869 -291 -0.934% debug/elf.s 63495 63302 -193 -0.304% vendor/golang.org/x/text/unicode/bidi.s 27220 27217 -3 -0.011% vendor/golang.org/x/text/secure/bidirule.s 3363 3352 -11 -0.327% go/token.s 12036 12035 -1 -0.008% flag.s 22277 22256 -21 -0.094% mime.s 39696 39509 -187 -0.471% go/scanner.s 19033 19020 -13 -0.068% archive/tar.s 70936 70581 -355 -0.500% internal/xcoff.s 22823 22820 -3 -0.013% text/scanner.s 11631 11629 -2 -0.017% encoding/xml.s 110534 110408 -126 -0.114% math/big.s 183636 183545 -91 -0.050% image/gif.s 27376 27343 -33 -0.121% crypto/dsa.s 6029 5969 -60 -0.995% image/png.s 42947 42939 -8 -0.019% crypto/rand.s 6866 6854 -12 -0.175% vendor/golang.org/x/text/unicode/norm.s 66394 66354 -40 -0.060% runtime/trace.s 2603 2521 -82 -3.150% crypto/ed25519.s 6321 6300 -21 -0.332% text/template/parse.s 93910 93844 -66 -0.070% crypto/rsa.s 31460 31369 -91 -0.289% encoding/asn1.s 57021 57023 +2 +0.004% crypto/elliptic.s 51382 51363 -19 -0.037% crypto/x509/pkix.s 10386 10342 -44 -0.424% vendor/golang.org/x/net/idna.s 24482 24466 -16 -0.065% vendor/golang.org/x/crypto/cryptobyte.s 33479 33280 -199 -0.594% crypto/ecdsa.s 11936 11883 -53 -0.444% go/constant.s 43670 42663 -1007 -2.306% go/ast.s 80383 80191 -192 -0.239% testing.s 68069 68057 -12 -0.018% runtime/pprof.s 59613 59603 -10 -0.017% testing/iotest.s 4895 4891 -4 -0.082% internal/trace.s 78136 78089 -47 -0.060% cmd/internal/goobj2.s 13158 13154 -4 -0.030% cmd/internal/src.s 17661 17657 -4 -0.023% go/parser.s 79046 78880 -166 -0.210% cmd/internal/objabi.s 16367 16343 -24 -0.147% text/template.s 94899 94486 -413 -0.435% go/printer.s 77267 76992 -275 -0.356% cmd/internal/goobj.s 25988 25947 -41 -0.158% runtime/pprof/internal/profile.s 102066 101933 -133 -0.130% go/format.s 5419 5371 -48 -0.886% cmd/vendor/golang.org/x/arch/ppc64/ppc64asm.s 37181 37149 -32 -0.086% go/doc.s 74533 74132 -401 -0.538% html/template.s 88743 88389 -354 -0.399% cmd/asm/internal/lex.s 24881 24872 -9 -0.036% cmd/internal/buildid.s 18263 18256 -7 -0.038% cmd/vendor/golang.org/x/arch/x86/x86asm.s 80036 79980 -56 -0.070% go/build.s 68905 68737 -168 -0.244% cmd/cover.s 46070 45950 -120 -0.260% cmd/internal/obj.s 117001 116991 -10 -0.009% cmd/doc.s 62700 62419 -281 -0.448% cmd/internal/obj/arm.s 66745 66687 -58 -0.087% cmd/compile/internal/syntax.s 145406 145062 -344 -0.237% cmd/internal/obj/wasm.s 44049 44027 -22 -0.050% net.s 291835 291020 -815 -0.279% cmd/dist.s 209020 208807 -213 -0.102% cmd/cgo.s 241564 241102 -462 -0.191% vendor/golang.org/x/net/http/httpproxy.s 9407 9399 -8 -0.085% log/syslog.s 7921 7909 -12 -0.151% go/types.s 319325 317513 -1812 -0.567% vendor/golang.org/x/net/http/httpguts.s 3834 3825 -9 -0.235% mime/multipart.s 21414 21343 -71 -0.332% cmd/internal/obj/ppc64.s 119949 119938 -11 -0.009% cmd/compile/internal/logopt.s 10158 10118 -40 -0.394% vendor/golang.org/x/net/nettest.s 28012 27991 -21 -0.075% go/internal/srcimporter.s 6405 6380 -25 -0.390% go/internal/gcimporter.s 34525 34493 -32 -0.093% net/mail.s 23937 23720 -217 -0.907% go/internal/gccgoimporter.s 56095 56038 -57 -0.102% cmd/compile/internal/types.s 47247 47207 -40 -0.085% cmd/api.s 39582 39558 -24 -0.061% cmd/go/internal/base.s 12572 12551 -21 -0.167% cmd/vendor/golang.org/x/xerrors.s 17846 17814 -32 -0.179% cmd/vendor/golang.org/x/mod/sumdb/note.s 18142 18070 -72 -0.397% cmd/go/internal/search.s 19994 19876 -118 -0.590% cmd/go/internal/imports.s 16457 16428 -29 -0.176% cmd/vendor/golang.org/x/mod/module.s 17838 17759 -79 -0.443% cmd/go/internal/cache.s 30551 30514 -37 -0.121% cmd/vendor/golang.org/x/mod/sumdb/tlog.s 36356 36321 -35 -0.096% cmd/internal/test2json.s 9452 9408 -44 -0.466% cmd/go/internal/mvs.s 25136 25092 -44 -0.175% cmd/go/internal/txtar.s 3488 3461 -27 -0.774% cmd/vendor/golang.org/x/mod/zip.s 18811 18800 -11 -0.058% cmd/go/internal/version.s 11213 11171 -42 -0.375% cmd/link/internal/benchmark.s 4941 4949 +8 +0.162% cmd/internal/obj/s390x.s 126865 126849 -16 -0.013% cmd/gofmt.s 30684 30596 -88 -0.287% cmd/fix.s 87450 86906 -544 -0.622% cmd/internal/obj/x86.s 88578 88556 -22 -0.025% cmd/vendor/golang.org/x/mod/modfile.s 72450 72363 -87 -0.120% cmd/oldlink/internal/loader.s 16743 16741 -2 -0.012% cmd/pack.s 14863 14861 -2 -0.013% cmd/go/internal/load.s 106742 106568 -174 -0.163% cmd/oldlink/internal/objfile.s 21787 21780 -7 -0.032% cmd/oldlink/internal/loadmacho.s 29309 29317 +8 +0.027% cmd/oldlink/internal/loadelf.s 35013 35021 +8 +0.023% cmd/asm/internal/asm.s 68550 68538 -12 -0.018% cmd/link/internal/loader.s 94765 94564 -201 -0.212% cmd/link/internal/loadelf.s 35663 35667 +4 +0.011% cmd/link/internal/loadmacho.s 29501 29509 +8 +0.027% cmd/vendor/golang.org/x/tools/go/analysis.s 4983 4976 -7 -0.140% cmd/vendor/golang.org/x/tools/go/analysis/internal/analysisflags.s 16771 16709 -62 -0.370% cmd/vendor/golang.org/x/tools/go/types/objectpath.s 18481 18456 -25 -0.135% cmd/vendor/golang.org/x/tools/go/analysis/passes/internal/analysisutil.s 2100 2085 -15 -0.714% cmd/vendor/github.com/google/pprof/profile.s 150141 149620 -521 -0.347% cmd/vendor/github.com/google/pprof/internal/measurement.s 10420 10404 -16 -0.154% cmd/vendor/golang.org/x/tools/go/analysis/passes/asmdecl.s 36814 36755 -59 -0.160% cmd/vendor/golang.org/x/tools/go/analysis/passes/bools.s 6688 6673 -15 -0.224% cmd/vendor/golang.org/x/tools/go/analysis/passes/cgocall.s 9856 9784 -72 -0.731% cmd/vendor/golang.org/x/tools/go/analysis/passes/composite.s 3011 2979 -32 -1.063% cmd/vendor/golang.org/x/tools/go/analysis/passes/copylock.s 9737 9682 -55 -0.565% cmd/vendor/golang.org/x/tools/go/cfg.s 30738 30725 -13 -0.042% cmd/vendor/github.com/ianlancetaylor/demangle.s 175195 174513 -682 -0.389% cmd/vendor/golang.org/x/tools/go/analysis/passes/httpresponse.s 3625 3520 -105 -2.897% cmd/vendor/golang.org/x/tools/go/analysis/passes/loopclosure.s 2987 2971 -16 -0.536% cmd/vendor/golang.org/x/tools/go/analysis/passes/shift.s 4372 4340 -32 -0.732% cmd/vendor/golang.org/x/tools/go/analysis/passes/stdmethods.s 8634 8611 -23 -0.266% cmd/vendor/golang.org/x/tools/go/analysis/passes/tests.s 6189 6164 -25 -0.404% cmd/vendor/golang.org/x/tools/go/analysis/passes/structtag.s 8089 8073 -16 -0.198% cmd/vendor/golang.org/x/tools/go/analysis/passes/unsafeptr.s 2208 2177 -31 -1.404% cmd/vendor/golang.org/x/tools/go/analysis/passes/unreachable.s 8050 8047 -3 -0.037% cmd/vendor/golang.org/x/tools/go/analysis/passes/unusedresult.s 3665 3629 -36 -0.982% cmd/vendor/golang.org/x/tools/go/ast/astutil.s 65773 65680 -93 -0.141% cmd/vendor/golang.org/x/tools/go/analysis/unitchecker.s 13328 13286 -42 -0.315% cmd/vendor/golang.org/x/tools/go/types/typeutil.s 12263 12162 -101 -0.824% cmd/vendor/golang.org/x/tools/go/analysis/passes/errorsas.s 1459 1421 -38 -2.605% cmd/vendor/golang.org/x/tools/go/analysis/passes/ctrlflow.s 5208 5191 -17 -0.326% cmd/vendor/golang.org/x/tools/go/analysis/passes/unmarshal.s 1801 1782 -19 -1.055% cmd/vendor/golang.org/x/tools/go/analysis/passes/lostcancel.s 9569 9528 -41 -0.428% cmd/go/internal/work.s 304928 304756 -172 -0.056% crypto/x509.s 147340 147139 -201 -0.136% cmd/vendor/golang.org/x/tools/go/analysis/passes/printf.s 34287 34019 -268 -0.782% crypto/tls.s 311603 310644 -959 -0.308% cmd/oldlink/internal/ld.s 533115 532651 -464 -0.087% cmd/oldlink/internal/wasm.s 16484 16458 -26 -0.158% cmd/oldlink/internal/x86.s 18832 18830 -2 -0.011% cmd/link/internal/ld.s 548200 547626 -574 -0.105% cmd/link/internal/wasm.s 16760 16734 -26 -0.155% cmd/link/internal/arm64.s 20850 20840 -10 -0.048% cmd/link/internal/x86.s 17437 17435 -2 -0.011% net/http.s 556647 555519 -1128 -0.203% net/http/cookiejar.s 15849 15833 -16 -0.101% expvar.s 9521 9508 -13 -0.137% net/http/httptest.s 16471 16452 -19 -0.115% cmd/vendor/github.com/google/pprof/internal/plugin.s 4266 4264 -2 -0.047% net/http/cgi.s 23448 23428 -20 -0.085% cmd/go/internal/web.s 16472 16428 -44 -0.267% net/http/httputil.s 39672 39670 -2 -0.005% net/rpc.s 33989 33965 -24 -0.071% net/http/fcgi.s 19167 19162 -5 -0.026% cmd/vendor/github.com/google/pprof/internal/symbolz.s 5861 5857 -4 -0.068% cmd/vendor/github.com/google/pprof/internal/binutils.s 35842 35823 -19 -0.053% cmd/vendor/github.com/google/pprof/internal/symbolizer.s 11449 11404 -45 -0.393% cmd/go/internal/get.s 62726 62582 -144 -0.230% cmd/vendor/github.com/google/pprof/internal/report.s 80032 80022 -10 -0.012% cmd/go/internal/modfetch/codehost.s 89005 88871 -134 -0.151% cmd/trace.s 116607 116496 -111 -0.095% cmd/vendor/github.com/google/pprof/internal/driver.s 143234 143207 -27 -0.019% cmd/vendor/github.com/google/pprof/driver.s 9000 8998 -2 -0.022% cmd/go/internal/modfetch.s 126300 125726 -574 -0.454% cmd/pprof.s 12317 12312 -5 -0.041% cmd/go/internal/modconv.s 17878 17861 -17 -0.095% cmd/go/internal/modload.s 150261 149763 -498 -0.331% cmd/go/internal/clean.s 11122 11091 -31 -0.279% cmd/go/internal/help.s 6523 6521 -2 -0.031% cmd/go/internal/generate.s 11627 11614 -13 -0.112% cmd/go/internal/envcmd.s 22034 21986 -48 -0.218% cmd/go/internal/modget.s 38478 38398 -80 -0.208% cmd/go/internal/modcmd.s 46430 46229 -201 -0.433% cmd/go/internal/test.s 64399 64374 -25 -0.039% cmd/compile/internal/ssa.s 3615264 3608276 -6988 -0.193% cmd/compile/internal/gc.s 1538865 1537625 -1240 -0.081% cmd/compile/internal/amd64.s 33593 33574 -19 -0.057% cmd/compile/internal/x86.s 30871 30852 -19 -0.062% total 19343565 19311284 -32281 -0.167% Change-Id: Ib030eb79458827a5a5b6d0d2f98765f8325a4d7e Reviewed-on: https://go-review.googlesource.com/c/go/+/222923 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Since the 1.15 window has passed, I'm going to jot down a few more related things and test cases I've noticed, just to have them written down somewhere. (1) The shortcircuit pass can be strengthened considerably, at some compiler performance risk, by figuring out the locations of all the uses of the non-control phis. I have CLs locally that play with this. (2) The phiopt pass could be strengthened. For example, in runtime.atoi, we find code like this: neg := false
if s[0] == '-' {
neg = true
s = s[1:]
} This should be ripe for the phiopt pass to convert into code like: neg := s[0] == '-'
if neg {
s = s[1]
} But it doesn't because of the second condition for _, b := range f.Blocks {
for _, v := range b.Values {
if v.Op != OpPhi {
continue
}
all := true
for _, a := range v.Args {
if a.Op != OpConstBool {
all = false
break
}
}
if !all {
continue
}
fmt.Println(f.Name, b, v)
}
} (3) The shortcircuit pass currently looks for situations in which the phi control value of a block is known for a particular inbound edge because it is a constant along that edge. But there is a structurally similar scenario that can occur in which a control value of a block is known for a particular inbound edge because its predecessor branched on that exact same control value. For example, if !neg && un > uint(maxInt) {
return 0, false
}
if neg && un > uint(maxInt)+1 {
return 0, false
} After CSE, the first thing we do is branch on if !neg {
if un > uint(maxInt) {
return 0, false
}
} else {
if un > uint(maxInt)+1 {
return 0, false
}
} Prove can catch some repeated block controls, but not conditional ones; that's where a modified short circuit pass would help. (4) We should consider making phiopt part of the fuse pass loops, or at least after prove. cmd/compile/internal/types.(*Type).IsInteger is a good test case. It is: func (t *Type) IsInteger() bool {
switch t.Etype {
case TINT8, TUINT8, TINT16, TUINT16, TINT32, TUINT32, TINT64, TUINT64, TINT, TUINT, TUINTPTR:
return true
}
return false
} This switch case should be optimized into a single unsigned comparison. But it's too complicated for phiopt. Prove simplifies it, but it's too late for phiopt. Fuse can't handle it. We end up with pointless branching. We might want to make phiopt a fuse function, like we recently did for shortcircuit, and run it after prove. (5) Running shortcircuit again later helps. The trick is doing so cheaply enough without having to run a deadcode afterwards. I tried this in CL 229058 and failed. The failure is very interesting, and are the effects of doing that CL piecemeal. I still think it's a good idea, but it needs some thought and investigate before trying again. That's a bunch of stuff. But it all falls under the umbrella of "look at the CFG and do less branching". Help with any/all of these is of course welcome. cc @mundaym |
A long time ago, I found that compared with gcc, the generation instruction of function math.IsInf is very cumbersome and there are many jumps, see https://godbolt.org/z/n8wWyG and https://godbolt.org/z/GcWh9S. There doesn't seem to be no improvement at the moment. I look forward to making it as concise as the instructions generated by gcc. I am interested in this case. But I won't be free until next quarter. |
Hi, I'm doing the third one. |
If there are no other volunteers, I will look into the second one. Thank you. 😊 |
Change https://golang.org/cl/244737 mentions this issue: |
Change https://golang.org/cl/252937 mentions this issue: |
Change https://golang.org/cl/257981 mentions this issue: |
This comment is an explanation of CL https://go-review.googlesource.com/c/go/+/244737 @randall77 After shortcircuit, the graph becomes as follow, then CL_244737 can play a role. |
The current phiopt pass just transforms the following code x := false if b { x = true} into x = b But we find code in runtime.atoi like this: neg := false if s[0] == '-' { neg = true s = s[1:] } The current phiopt pass does not covert it into code like: neg := s[0] == '-' if neg { s = s[1:] } Therefore, this patch strengthens the phiopt pass so that the boolean Phi value "neg" can be replaced with a copy of control value "s[0] == '-'", thereby using "cmp+cset" instead of a branch. But in some cases even replacing the boolean Phis cannot eliminate this branch. In the following case, this patch replaces "d" with a copy of "a<0", but the regalloc pass will insert the "Load {c}" value into an empty block to split the live ranges, which causes the branch to not be eliminated. For example: func test(a, b, c int) (bool, int) { d := false if (a<0) { if (b<0) { c = c+1 } d = true } return d, c } The optimized assembly code: MOVD "".a(FP), R0 TBZ $63, R0, 48 MOVD "".c+16(FP), R1 ADD $1, R1, R2 MOVD "".b+8(FP), R3 CMP ZR, R3 CSEL LT, R2, R1, R1 CMP ZR, R0 CSET LT, R0 MOVB R0, "".~r3+24(FP) MOVD R1, "".~r4+32(FP) RET (R30) MOVD "".c+16(FP), R1 JMP 28 The benchmark: name old time/op new time/op delta pkg:cmd/compile/internal/ssa goos:linux goarch:arm64 PhioptPass 117783.250000ns +- 1% 117219.111111ns +- 1% ~ (p=0.074 n=8+9) Statistical data from compilecmp tool: compilecmp local/master -> HEAD local/master (a826f7d): debug/dwarf: support DW_FORM_rnglistx aka formRnglistx HEAD (e57e003c10): cmd/compile/internal/ssa: strengthen phiopt pass benchstat -geomean /tmp/2516644532 /tmp/1075915815 completed 50 of 50, estimated time remaining 0s (ETA 7:10PM) name old time/op new time/op delta Template 554ms _ 3% 553ms _ 3% ~ (p=0.986 n=49+48) Unicode 252ms _ 4% 249ms _ 4% -1.33% (p=0.002 n=47+49) GoTypes 3.16s _ 3% 3.18s _ 3% +0.77% (p=0.022 n=44+48) Compiler 257ms _ 4% 258ms _ 4% ~ (p=0.121 n=50+49) SSA 24.2s _ 4% 24.2s _ 5% ~ (p=0.694 n=49+50) Flate 338ms _ 4% 338ms _ 4% ~ (p=0.592 n=43+46) GoParser 506ms _ 3% 507ms _ 3% ~ (p=0.942 n=49+50) Reflect 1.37s _ 4% 1.37s _ 5% ~ (p=0.408 n=50+50) Tar 486ms _ 3% 487ms _ 4% ~ (p=0.911 n=47+50) XML 619ms _ 2% 619ms _ 3% ~ (p=0.368 n=46+48) LinkCompiler 1.29s _31% 1.32s _23% ~ (p=0.306 n=49+44) ExternalLinkCompiler 3.39s _10% 3.36s _ 6% ~ (p=0.311 n=48+46) LinkWithoutDebugCompiler 846ms _37% 793ms _24% -6.29% (p=0.040 n=50+49) [Geo mean] 974ms 971ms -0.36% name old user-time/op new user-time/op delta Template 910ms _12% 893ms _13% ~ (p=0.098 n=49+49) Unicode 495ms _28% 492ms _18% ~ (p=0.562 n=50+46) GoTypes 4.42s _15% 4.39s _13% ~ (p=0.684 n=49+50) Compiler 419ms _22% 422ms _16% ~ (p=0.579 n=48+50) SSA 36.5s _ 7% 36.6s _ 8% ~ (p=0.465 n=50+47) Flate 521ms _21% 523ms _16% ~ (p=0.889 n=50+47) GoParser 810ms _12% 792ms _15% ~ (p=0.149 n=50+50) Reflect 1.98s _13% 2.02s _13% ~ (p=0.144 n=47+50) Tar 826ms _15% 806ms _19% ~ (p=0.115 n=49+49) XML 988ms _14% 1003ms _14% ~ (p=0.179 n=50+50) LinkCompiler 1.79s _ 8% 1.84s _11% +2.81% (p=0.001 n=49+49) ExternalLinkCompiler 3.69s _ 4% 3.71s _ 3% ~ (p=0.261 n=50+50) LinkWithoutDebugCompiler 838ms _10% 827ms _11% ~ (p=0.323 n=50+48) [Geo mean] 1.44s 1.44s -0.05% name old alloc/op new alloc/op delta Template 39.0MB _ 1% 39.0MB _ 1% ~ (p=0.445 n=50+49) Unicode 28.5MB _ 0% 28.5MB _ 0% ~ (p=0.460 n=50+50) GoTypes 169MB _ 1% 169MB _ 1% ~ (p=0.092 n=48+50) Compiler 23.4MB _ 1% 23.4MB _ 1% -0.19% (p=0.032 n=50+49) SSA 1.54GB _ 0% 1.55GB _ 1% +0.14% (p=0.001 n=50+50) Flate 23.8MB _ 1% 23.8MB _ 2% ~ (p=0.702 n=49+49) GoParser 35.4MB _ 1% 35.4MB _ 1% ~ (p=0.786 n=50+50) Reflect 85.3MB _ 1% 85.3MB _ 1% ~ (p=0.298 n=50+50) Tar 34.6MB _ 2% 34.6MB _ 2% ~ (p=0.683 n=50+50) XML 44.5MB _ 3% 44.0MB _ 2% -1.05% (p=0.000 n=50+46) LinkCompiler 136MB _ 0% 136MB _ 0% +0.01% (p=0.005 n=50+50) ExternalLinkCompiler 128MB _ 0% 128MB _ 0% ~ (p=0.179 n=50+50) LinkWithoutDebugCompiler 84.3MB _ 0% 84.3MB _ 0% +0.01% (p=0.006 n=50+50) [Geo mean] 70.7MB 70.6MB -0.07% name old allocs/op new allocs/op delta Template 410k _ 0% 410k _ 0% ~ (p=0.606 n=48+49) Unicode 310k _ 0% 310k _ 0% ~ (p=0.674 n=50+50) GoTypes 1.81M _ 0% 1.81M _ 0% ~ (p=0.674 n=50+50) Compiler 202k _ 0% 202k _ 0% +0.02% (p=0.046 n=50+50) SSA 16.3M _ 0% 16.3M _ 0% +0.10% (p=0.000 n=50+50) Flate 244k _ 0% 244k _ 0% ~ (p=0.834 n=49+50) GoParser 380k _ 0% 380k _ 0% ~ (p=0.410 n=50+50) Reflect 1.08M _ 0% 1.08M _ 0% ~ (p=0.782 n=48+50) Tar 368k _ 0% 368k _ 0% ~ (p=0.585 n=50+49) XML 453k _ 0% 453k _ 0% -0.01% (p=0.025 n=49+49) LinkCompiler 713k _ 0% 713k _ 0% +0.01% (p=0.044 n=50+50) ExternalLinkCompiler 794k _ 0% 794k _ 0% +0.01% (p=0.000 n=50+49) LinkWithoutDebugCompiler 251k _ 0% 251k _ 0% ~ (p=0.092 n=47+50) [Geo mean] 615k 615k +0.01% name old maxRSS/op new maxRSS/op delta Template 37.0M _ 4% 37.2M _ 3% ~ (p=0.062 n=48+48) Unicode 36.9M _ 5% 37.3M _ 4% +1.10% (p=0.021 n=50+47) GoTypes 94.3M _ 3% 94.9M _ 4% +0.69% (p=0.022 n=45+46) Compiler 33.4M _ 3% 33.4M _ 5% ~ (p=0.964 n=49+50) SSA 741M _ 3% 738M _ 3% ~ (p=0.164 n=50+50) Flate 28.5M _ 6% 28.8M _ 4% +1.07% (p=0.009 n=50+49) GoParser 35.0M _ 3% 35.3M _ 4% +0.83% (p=0.010 n=50+48) Reflect 57.2M _ 6% 57.1M _ 4% ~ (p=0.815 n=50+49) Tar 34.9M _ 3% 35.0M _ 3% ~ (p=0.134 n=49+48) XML 39.5M _ 5% 40.0M _ 3% +1.35% (p=0.001 n=50+48) LinkCompiler 220M _ 2% 220M _ 2% ~ (p=0.547 n=49+48) ExternalLinkCompiler 235M _ 2% 236M _ 2% ~ (p=0.538 n=47+44) LinkWithoutDebugCompiler 179M _ 1% 179M _ 1% ~ (p=0.775 n=50+50) [Geo mean] 74.9M 75.2M +0.43% name old text-bytes new text-bytes delta HelloSize 784kB _ 0% 784kB _ 0% +0.01% (p=0.000 n=50+50) name old data-bytes new data-bytes delta HelloSize 13.1kB _ 0% 13.1kB _ 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 206kB _ 0% 206kB _ 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.28MB _ 0% 1.28MB _ 0% +0.00% (p=0.000 n=50+50) file before after _ % addr2line 4006300 4004484 -1816 -0.045% api 5029956 5029324 -632 -0.013% asm 4936311 4939423 +3112 +0.063% buildid 2595059 2595291 +232 +0.009% cgo 4401029 4397333 -3696 -0.084% compile 22246677 22246863 +186 +0.001% cover 4443825 4443065 -760 -0.017% dist 3366078 3365838 -240 -0.007% doc 3776391 3776615 +224 +0.006% fix 3218800 3218648 -152 -0.005% link 6365321 6365345 +24 +0.000% nm 3923625 3923857 +232 +0.006% objdump 4295569 4295041 -528 -0.012% pack 2390745 2389217 -1528 -0.064% pprof 12870094 12866942 -3152 -0.024% test2json 2587265 2587073 -192 -0.007% trace 9612629 9613981 +1352 +0.014% vet 6791008 6792072 +1064 +0.016% total 106856682 106850412 -6270 -0.006% Update #37608 Change-Id: Ic6206b22fd1faf570be9fd3c2511aa6c4ce38cdb Reviewed-on: https://go-review.googlesource.com/c/go/+/252937 Trust: fannie zhang <Fannie.Zhang@arm.com> Run-TryBot: fannie zhang <Fannie.Zhang@arm.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
When the control value of a If block is known for a particular inbound edge because its value can be inferred from the control value of its predecessor, then this inbound edge can be redirected to the known successor directly, This CL optimizes this kind of cases to eliminate unnecessary comparision. For example, the following piece of code comes from runtime.atoi, if !neg && un > uint(maxInt) { return 0, false } if neg && un > uint(maxInt)+1 { return 0, false } Before this optimization, if the first "if" statement does not return, both conditions of the second "if" statement will be checked. But obviously the value of neg is known through the first "if" statement, and there is no need to check neg repeatedly. After this optimization, this redundancy check is eliminated, and the execution logic becomes as follows. if !neg { if un > uint(maxInt) { return 0, false } } else { if un > uint(maxInt)+1 { return 0, false } } This CL does not bring significant performance changes, but it makes the code structure look more reasonable. Statistical data from tool compilecmp on Linux/amd64: name old time/op new time/op delta Template 380ms ± 4% 385ms ± 3% +1.16% (p=0.000 n=50+49) Unicode 168ms ± 9% 169ms ± 9% ~ (p=0.421 n=49+46) GoTypes 1.99s ± 4% 2.02s ± 4% +1.48% (p=0.000 n=49+49) Compiler 188ms ± 8% 188ms ± 9% ~ (p=0.997 n=49+50) SSA 11.8s ± 2% 12.0s ± 2% +1.24% (p=0.000 n=48+50) Flate 242ms ± 6% 244ms ± 9% ~ (p=0.307 n=46+49) GoParser 361ms ± 3% 366ms ± 4% +1.23% (p=0.000 n=48+49) Reflect 836ms ± 3% 842ms ± 3% +0.70% (p=0.004 n=48+48) Tar 335ms ± 3% 340ms ± 4% +1.47% (p=0.000 n=49+46) XML 432ms ± 4% 437ms ± 4% +1.11% (p=0.002 n=49+49) LinkCompiler 701ms ± 4% 704ms ± 5% ~ (p=0.278 n=49+50) ExternalLinkCompiler 1.83s ± 3% 1.84s ± 3% +0.51% (p=0.034 n=48+49) LinkWithoutDebugCompiler 436ms ± 6% 438ms ± 6% ~ (p=0.419 n=48+49) [Geo mean] 612ms 617ms +0.84% name old alloc/op new alloc/op delta Template 38.7MB ± 1% 39.1MB ± 1% +1.19% (p=0.000 n=50+50) Unicode 28.1MB ± 0% 28.1MB ± 0% +0.20% (p=0.000 n=49+45) GoTypes 168MB ± 1% 170MB ± 1% +1.05% (p=0.000 n=48+49) Compiler 23.0MB ± 1% 23.1MB ± 1% +0.63% (p=0.000 n=50+50) SSA 1.54GB ± 1% 1.55GB ± 1% +0.85% (p=0.000 n=50+50) Flate 23.6MB ± 1% 23.9MB ± 1% +1.36% (p=0.000 n=43+46) GoParser 35.0MB ± 1% 35.3MB ± 1% +0.94% (p=0.000 n=50+50) Reflect 84.7MB ± 1% 86.1MB ± 1% +1.72% (p=0.000 n=49+49) Tar 34.5MB ± 1% 34.9MB ± 1% +1.07% (p=0.000 n=47+48) XML 44.2MB ± 3% 44.6MB ± 3% +0.70% (p=0.003 n=50+49) LinkCompiler 128MB ± 0% 128MB ± 0% +0.01% (p=0.004 n=49+50) ExternalLinkCompiler 120MB ± 0% 120MB ± 0% +0.01% (p=0.000 n=49+50) LinkWithoutDebugCompiler 77.3MB ± 0% 77.3MB ± 0% +0.02% (p=0.000 n=50+50) [Geo mean] 69.1MB 69.6MB +0.75% file before after Δ % addr2line 4049276 4051308 +2032 +0.050% api 5248940 5248996 +56 +0.001% asm 4868093 4868037 -56 -0.001% buildid 2627666 2626026 -1640 -0.062% cgo 4614432 4615040 +608 +0.013% compile 23298888 23301267 +2379 +0.010% cover 4591609 4591161 -448 -0.010% dist 3449638 3450254 +616 +0.018% doc 3925667 3926363 +696 +0.018% fix 3322936 3323464 +528 +0.016% link 6628632 6629560 +928 +0.014% nm 3991753 3996497 +4744 +0.119% objdump 4396119 4395615 -504 -0.011% pack 2399719 2399535 -184 -0.008% pprof 13616418 13622866 +6448 +0.047% test2json 2646121 2646081 -40 -0.002% trace 10233087 10226359 -6728 -0.066% vet 7117994 7121066 +3072 +0.043% total 111026988 111039495 +12507 +0.011% On linux arm64: name old time/op new time/op delta Template 284ms ± 1% 286ms ± 1% +0.70% (p=0.000 n=49+50) Unicode 125ms ± 3% 125ms ± 2% ~ (p=0.548 n=50+50) GoTypes 1.69s ± 1% 1.71s ± 1% +1.02% (p=0.000 n=49+49) Compiler 125ms ± 1% 124ms ± 2% -0.35% (p=0.020 n=50+50) SSA 12.7s ± 1% 12.8s ± 1% +1.21% (p=0.000 n=49+49) Flate 172ms ± 1% 173ms ± 1% +0.20% (p=0.047 n=50+50) GoParser 265ms ± 1% 266ms ± 1% +0.64% (p=0.000 n=50+50) Reflect 651ms ± 1% 650ms ± 1% ~ (p=0.079 n=48+48) Tar 246ms ± 1% 246ms ± 1% ~ (p=0.202 n=50+46) XML 328ms ± 1% 332ms ± 1% +1.28% (p=0.000 n=50+49) LinkCompiler 600ms ± 1% 599ms ± 1% ~ (p=0.264 n=50+50) ExternalLinkCompiler 1.88s ± 1% 1.90s ± 0% +1.36% (p=0.000 n=50+50) LinkWithoutDebugCompiler 365ms ± 1% 365ms ± 1% ~ (p=0.602 n=50+46) [Geo mean] 490ms 492ms +0.47% name old alloc/op new alloc/op delta Template 38.8MB ± 1% 39.1MB ± 1% +0.92% (p=0.000 n=44+42) Unicode 28.4MB ± 0% 28.4MB ± 0% +0.22% (p=0.000 n=44+45) GoTypes 169MB ± 1% 171MB ± 1% +1.12% (p=0.000 n=50+50) Compiler 23.2MB ± 1% 23.3MB ± 1% +0.56% (p=0.000 n=42+43) SSA 1.55GB ± 0% 1.56GB ± 0% +0.91% (p=0.000 n=48+49) Flate 23.7MB ± 2% 24.0MB ± 1% +1.20% (p=0.000 n=50+50) GoParser 35.3MB ± 1% 35.6MB ± 1% +0.88% (p=0.000 n=50+50) Reflect 85.0MB ± 0% 86.5MB ± 0% +1.70% (p=0.000 n=49+48) Tar 34.5MB ± 1% 34.9MB ± 1% +1.03% (p=0.000 n=47+50) XML 43.8MB ± 2% 44.0MB ± 0% +0.41% (p=0.002 n=49+38) LinkCompiler 136MB ± 0% 136MB ± 0% +0.01% (p=0.006 n=50+49) ExternalLinkCompiler 127MB ± 0% 127MB ± 0% +0.02% (p=0.000 n=49+50) LinkWithoutDebugCompiler 84.1MB ± 0% 84.1MB ± 0% ~ (p=0.534 n=50+50) [Geo mean] 70.4MB 70.9MB +0.69% file before after Δ % addr2line 4006004 4004556 -1448 -0.036% api 5029716 5028828 -888 -0.018% asm 4936863 4934943 -1920 -0.039% buildid 2594947 2594099 -848 -0.033% cgo 4399702 4399502 -200 -0.005% compile 22233139 22230486 -2653 -0.012% cover 4443681 4443881 +200 +0.005% dist 3365902 3365486 -416 -0.012% doc 3776175 3776151 -24 -0.001% fix 3218624 3218600 -24 -0.001% link 6365001 6361409 -3592 -0.056% nm 3923345 3923065 -280 -0.007% objdump 4295473 4296673 +1200 +0.028% pack 2390561 2389393 -1168 -0.049% pprof 12866419 12865115 -1304 -0.010% test2json 2587113 2585561 -1552 -0.060% trace 9609814 9610846 +1032 +0.011% vet 6790272 6789760 -512 -0.008% total 106832751 106818354 -14397 -0.013% Update: #37608 Change-Id: I2831238b12e3af5aef2261f64f804bf0a8b43f86 Reviewed-on: https://go-review.googlesource.com/c/go/+/244737 Reviewed-by: eric fang <eric.fang@arm.com> Reviewed-by: Keith Randall <khr@golang.org> Trust: eric fang <eric.fang@arm.com> Run-TryBot: eric fang <eric.fang@arm.com> TryBot-Result: Go Bot <gobot@golang.org>
The code generated for WriteByte() function now does not show unnecessary branches Code for WriteByte() function now looks like
However, there is potential opportunity of reducing branches in the math/big tests For instance,
In general, this problem seems to occur when there are multiple conditions involved such as the one below and optimizing this will improve the code generation of branch instructions in the golang compiler, such instances where there are multiple conditions involved is seen in common practice
|
Haven't seen this mentioned already, so just adding it here as a memo: for code like
we don't fuse (on tip) the two condition checks, instead performing two checks and two conditional jumps even though the conditions themselves are pure:
gcc/clang instead (at O1 and above) will emit the more compact
|
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Inspecting code for other reasons, found cases where there are constant compares and subsequent branches against those constants, resulting in several unnecessary branches.
What did you expect to see?
No unnecessary branches.
What did you see instead?
Unnecessary branching due to constant compares of known values.
This example comes from the ssa output of bytes.(Buffer).WriteByte. Here are some snippets:
v18 00011 (+107) MOVD 8(R6), R7
v38 00012 (107) MOVD 16(R6), R5
v26 00013 (107) SUB R7, R5, R8
v70 00014 (107) CMP R8, $1
// This could be BLT 36. Block starting at 46 sets R3 to 0 then branches to a CMP R3, $0.
b1 00015 (107) BLT 46 <--- Change to BLT 36
v54 00019 (108) MOVD R4, 8(R6)
// Once the block at 46 is eliminated as a predecessor of this block, then the next 3 instructions are not needed, since the CMPW has only a single predecessor and it is comparing 0 against 1 and the BEQ will never happen.
v45 00020 (108) MOVD $1, R3 <---- Remove
v52 00021 (+265) CMPW R3, $0 <---- Remove
b9 00022 (+266) BEQ 36 <---- Remove
v24 00023 (+269) PCDATA $1, $1
v24 00024 (+269) MOVD 8(R6), R4
v34 00025 (+269) PCDATA $0, $2
v34 00026 (269) MOVD (R6), R5
v88 00027 (269) CMPU R4, R7
v63 00036 (267) PCDATA $1, $0
v63 00037 (+267) MOVD R6, 32(R1)
v81 00038 (267) MOVD $1, R3
v65 00039 (267) MOVD R3, 40(R1)
v66 00040 (+267) CALL "".(*Buffer).grow(SB)
v68 00041 (267) MOVD 48(R1), R7
v36 00042 (269) PCDATA $0, $1
v36 00043 (269) PCDATA $1, $1
v36 00044 (269) MOVD "".b(R1), R6
b11 00045 (267) JMP 23
// The MOVD $0, R3 and JMP 21 could be replaced by JMP 36 since the successor is CMP $0, R3 followed by BEQ 36. Once it is decided that 36 is the successor, I think the MOVD $0, R7 could also go away since the block starting at 36 doesn't use it. That also means the predecessor of this block could branch directly to 36.
v21 00046 (267) PCDATA $1, $0 <--- Remove
v21 00047 (267) MOVD $0, R7 <--- Remove
v25 00048 (267) MOVD $0, R3 <--- Remove
b2 00049 (+265) JMP 21 <---- Remove
This scenario is common in stdlib functions. It looks like the same situation occurs in x86, so it is not specific to ppc64/ppc64le.
@josharian I think I reported this problem last fall and you commented on it, but I can't find the issue where that was done.
The text was updated successfully, but these errors were encountered: