-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: performance regression due to mid-stack inlining changes #19386
Comments
Some background, up to 1544217 performance of tip was doing quite well compared to 1.8
|
My pending CLs involve updating |
Great. I ran compiler benchmarks commit by commit. (I haven't checked linker impact yet. It'd be good to do that too.) There appear to be two significant commits.
Also of possible concern is 9fd359a, not because the direct numbers look bad (see below) but because import/export is a known bottleneck for large build trees with large packages in them. @davecheney, can you double check that commit with your favorite benchmarks?
|
CL https://golang.org/cl/37766 mentions this issue. |
CL 37234 introduced string concatenation into some hot code. This CL does that work earlier and caches the result. Updates #19386 Performance impact vs master: name old time/op new time/op delta Template 223ms ± 5% 216ms ± 5% -2.98% (p=0.001 n=20+20) Unicode 98.7ms ± 4% 99.0ms ± 4% ~ (p=0.749 n=20+19) GoTypes 631ms ± 4% 626ms ± 4% ~ (p=0.253 n=20+20) Compiler 2.91s ± 1% 2.87s ± 3% -1.11% (p=0.005 n=18+20) SSA 4.48s ± 2% 4.36s ± 2% -2.77% (p=0.000 n=20+20) Flate 130ms ± 2% 129ms ± 6% ~ (p=0.428 n=19+20) GoParser 160ms ± 4% 157ms ± 3% -1.62% (p=0.005 n=20+18) Reflect 395ms ± 2% 394ms ± 4% ~ (p=0.445 n=20+20) Tar 120ms ± 5% 118ms ± 6% ~ (p=0.101 n=19+20) XML 224ms ± 3% 223ms ± 3% ~ (p=0.544 n=19+19) name old user-ns/op new user-ns/op delta Template 291user-ms ± 5% 265user-ms ± 5% -9.02% (p=0.000 n=20+19) Unicode 140user-ms ± 3% 139user-ms ± 8% ~ (p=0.904 n=20+20) GoTypes 844user-ms ± 3% 849user-ms ± 3% ~ (p=0.251 n=20+18) Compiler 4.06user-s ± 5% 3.98user-s ± 2% ~ (p=0.056 n=20+20) SSA 6.89user-s ± 5% 6.50user-s ± 3% -5.61% (p=0.000 n=20+20) Flate 164user-ms ± 5% 163user-ms ± 4% ~ (p=0.365 n=20+19) GoParser 206user-ms ± 6% 204user-ms ± 4% ~ (p=0.534 n=20+18) Reflect 501user-ms ± 4% 505user-ms ± 5% ~ (p=0.383 n=20+20) Tar 151user-ms ± 3% 152user-ms ± 7% ~ (p=0.798 n=17+20) XML 283user-ms ± 7% 280user-ms ± 5% ~ (p=0.301 n=20+20) name old alloc/op new alloc/op delta Template 42.5MB ± 0% 40.2MB ± 0% -5.59% (p=0.000 n=20+20) Unicode 31.7MB ± 0% 31.0MB ± 0% -2.19% (p=0.000 n=20+18) GoTypes 124MB ± 0% 117MB ± 0% -5.90% (p=0.000 n=20+20) Compiler 533MB ± 0% 490MB ± 0% -8.07% (p=0.000 n=20+20) SSA 989MB ± 0% 893MB ± 0% -9.74% (p=0.000 n=20+20) Flate 27.8MB ± 0% 26.1MB ± 0% -5.92% (p=0.000 n=20+20) GoParser 34.3MB ± 0% 32.1MB ± 0% -6.43% (p=0.000 n=19+20) Reflect 84.6MB ± 0% 81.4MB ± 0% -3.84% (p=0.000 n=20+20) Tar 28.8MB ± 0% 27.7MB ± 0% -3.89% (p=0.000 n=20+20) XML 47.2MB ± 0% 44.2MB ± 0% -6.45% (p=0.000 n=20+19) name old allocs/op new allocs/op delta Template 420k ± 1% 381k ± 1% -9.35% (p=0.000 n=20+20) Unicode 338k ± 1% 324k ± 1% -4.29% (p=0.000 n=20+19) GoTypes 1.28M ± 0% 1.15M ± 0% -10.30% (p=0.000 n=20+20) Compiler 5.06M ± 0% 4.41M ± 0% -12.92% (p=0.000 n=20+20) SSA 9.14M ± 0% 7.91M ± 0% -13.46% (p=0.000 n=19+20) Flate 267k ± 0% 241k ± 1% -9.53% (p=0.000 n=20+20) GoParser 347k ± 1% 312k ± 0% -10.15% (p=0.000 n=19+20) Reflect 1.07M ± 0% 1.00M ± 0% -6.86% (p=0.000 n=20+20) Tar 274k ± 1% 256k ± 1% -6.73% (p=0.000 n=20+20) XML 448k ± 0% 398k ± 0% -11.17% (p=0.000 n=20+18) Performance impact when applied together with CL 37234 atop CL 37234's parent commit (i.e. as if it were a part of CL 37234), to show that this commit makes CL 37234 completely performance-neutral: name old time/op new time/op delta Template 222ms ±14% 222ms ±14% ~ (p=1.000 n=14+15) Unicode 104ms ±18% 106ms ±18% ~ (p=0.650 n=13+14) GoTypes 653ms ± 7% 638ms ± 5% ~ (p=0.145 n=14+12) Compiler 3.10s ± 1% 3.13s ±10% ~ (p=1.000 n=2+2) SSA 4.73s ±11% 4.68s ±11% ~ (p=0.567 n=15+15) Flate 136ms ± 4% 133ms ± 7% ~ (p=0.231 n=12+14) GoParser 163ms ±11% 169ms ±10% ~ (p=0.352 n=14+14) Reflect 415ms ±15% 423ms ±20% ~ (p=0.715 n=15+14) Tar 133ms ±17% 130ms ±23% ~ (p=0.252 n=14+15) XML 236ms ±16% 235ms ±14% ~ (p=0.874 n=14+14) name old user-ns/op new user-ns/op delta Template 271user-ms ±10% 271user-ms ±10% ~ (p=0.780 n=14+15) Unicode 143user-ms ± 5% 146user-ms ±11% ~ (p=0.432 n=12+14) GoTypes 864user-ms ± 5% 866user-ms ± 9% ~ (p=0.905 n=14+13) Compiler 4.17user-s ± 1% 4.26user-s ± 7% ~ (p=1.000 n=2+2) SSA 6.79user-s ± 8% 6.79user-s ± 6% ~ (p=0.902 n=15+15) Flate 169user-ms ± 8% 164user-ms ± 5% -3.13% (p=0.014 n=14+14) GoParser 212user-ms ± 7% 217user-ms ±22% ~ (p=1.000 n=13+15) Reflect 521user-ms ± 7% 533user-ms ±15% ~ (p=0.511 n=14+14) Tar 165user-ms ±17% 161user-ms ±15% ~ (p=0.345 n=15+15) XML 294user-ms ±11% 292user-ms ±10% ~ (p=0.839 n=14+14) name old alloc/op new alloc/op delta Template 39.9MB ± 0% 39.9MB ± 0% ~ (p=0.621 n=15+14) Unicode 31.0MB ± 0% 31.0MB ± 0% ~ (p=0.098 n=13+15) GoTypes 117MB ± 0% 117MB ± 0% ~ (p=0.775 n=15+15) Compiler 488MB ± 0% 488MB ± 0% ~ (p=0.333 n=2+2) SSA 892MB ± 0% 892MB ± 0% +0.03% (p=0.000 n=15+15) Flate 26.1MB ± 0% 26.1MB ± 0% ~ (p=0.098 n=15+15) GoParser 31.8MB ± 0% 31.8MB ± 0% ~ (p=0.525 n=15+13) Reflect 81.2MB ± 0% 81.2MB ± 0% +0.06% (p=0.001 n=12+14) Tar 27.5MB ± 0% 27.5MB ± 0% ~ (p=0.595 n=15+15) XML 44.1MB ± 0% 44.1MB ± 0% ~ (p=0.486 n=15+15) name old allocs/op new allocs/op delta Template 378k ± 1% 378k ± 0% ~ (p=0.949 n=15+14) Unicode 324k ± 0% 324k ± 1% ~ (p=0.057 n=14+15) GoTypes 1.15M ± 0% 1.15M ± 0% ~ (p=0.461 n=15+15) Compiler 4.39M ± 0% 4.39M ± 0% ~ (p=0.333 n=2+2) SSA 7.90M ± 0% 7.90M ± 0% +0.06% (p=0.008 n=15+15) Flate 240k ± 1% 241k ± 0% ~ (p=0.233 n=15+15) GoParser 309k ± 1% 309k ± 0% ~ (p=0.867 n=15+12) Reflect 1.00M ± 0% 1.00M ± 0% ~ (p=0.139 n=12+15) Tar 254k ± 1% 253k ± 1% ~ (p=0.345 n=15+15) XML 398k ± 0% 397k ± 1% ~ (p=0.267 n=15+15) Change-Id: Ic999a0f456a371c99eebba0f9747263a13836e33 Reviewed-on: https://go-review.googlesource.com/37766 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Hi @josharian. Thanks for keeping an eye on compiler performance and raising this issue. @davidlazar's recent CL stack was concerned primarily with correctness - making sure that line number and stack trace information was reported accurately in the runtime for existing uses of inlining. During the code reviews, Robert and others asked about the impact on export data size but we forgot to check compiler performance. As one of the people who was working with David on this, I'll take responsibility for forgetting that. In my head, we didn't need to worry about compiler performance until we made inlining more aggressive, which we haven't done yet. Clearly my overall mental model was wrong, and even if I'd been looking, I definitely would not have flagged CL 37234 as performance-sensitive. My apologies. I believe @griesemer also plans to look into reducing the growth of the export data somewhat, while still keeping the correctness gains. @davecheney, please let us know if that shows up as significant in your tests, so that we can prioritize appropriately. @josharian, thanks for the fix for CL 37234. Are you already looking at CL 37231 as well? I just don't want to duplicate effort. If you're not, we will. Thanks again for filing this issue. |
Thanks, Russ.
My pleasure. Someday I'll set up some dedicated hardware and automate it so the nagging can come from a bot instead of me. :)
I am not. (I got distracted thinking about a fuzzy pprof diff tool and then...) Before you look, though, let me do one more round of benchmarking to confirm that there's a there there. I'll report back. |
Can this be handled by something from the /x/perf repo? |
OK, I've double-checked, and the impact from CL 37231 (699175a) is small but real. Numbers below. Crude instrumentation suggests that about 20% of entries in Ctxt.InlTree are exact duplicates. Unifying those might help. I don't plan to look further at CL 37231. I will run the benchjuju benchmark against CL 37239 (9fd359a) and report back, probably tomorrow.
|
Those numbers agree with what I was seeing on my laptop. The only duplicates in Ctxt.InlTree should be for multiple inlined calls of the same function on the same line. What code gave you 20% duplicate entries? |
Apparently something erroneous. My apologies. I just re-instrumented and got only 1% duplicates. For future reference, to get the 1% number, I applied the diff below and did: $ go build -a std cmd 2>&1 | grep NODE | wc -l
25825
$ go build -a std cmd 2>&1 | grep NODE | sort | uniq | wc -l
25555 Diff: diff --git a/src/cmd/compile/internal/gc/main.go b/src/cmd/compile/internal/gc/main.go
index 490ac7db40..9404f917f1 100644
--- a/src/cmd/compile/internal/gc/main.go
+++ b/src/cmd/compile/internal/gc/main.go
@@ -536,6 +536,10 @@ func Main() {
log.Fatalf("cannot write benchmark data: %v", err)
}
}
+
+ for _, n := range Ctxt.InlTree.Nodes() {
+ fmt.Printf("NODE %v\n", n)
+ }
}
func writebench(filename string) error {
diff --git a/src/cmd/internal/obj/inl.go b/src/cmd/internal/obj/inl.go
index 116921995a..7597b526a1 100644
--- a/src/cmd/internal/obj/inl.go
+++ b/src/cmd/internal/obj/inl.go
@@ -42,6 +42,8 @@ type InlTree struct {
nodes []InlinedCall
}
+func (t *InlTree) Nodes() []InlinedCall { return t.nodes }
+
// InlinedCall is a node in an InlTree.
type InlinedCall struct {
Parent int // index of the parent in the InlTree or < 0 if outermost call |
benchjuju results for CL 37239 (9fd359a):
https://perf.golang.org/search?q=upload:20170308.1 I'll leave it to y'all to decide whether there's anything further practical to be done here or whether this issue should be closed. If so, I understand. |
|
Not sure there's anything more to do here. Closing. |
Measuring from ed70f37 (just before mid-stack inlining changes began) to 9fd359a (tip as of writing), I see:
These negate a month's worth of the 1.9 toolspeed improvements and some. Measuring from the go1.8 tag to tip:
It'd be nice to see whether some of these performance regressions can be undone. Let's revisit once mid-stack inlining is complete?
cc @davidlazar @mdempsky @randall77
The text was updated successfully, but these errors were encountered: