Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: performance regression due to mid-stack inlining changes #19386

Closed
josharian opened this issue Mar 3, 2017 · 13 comments
Closed

Comments

@josharian
Copy link
Contributor

Measuring from ed70f37 (just before mid-stack inlining changes began) to 9fd359a (tip as of writing), I see:

name       old time/op      new time/op      delta
Template        223ms ± 3%       232ms ± 2%   +4.03%  (p=0.000 n=20+19)
Unicode         100ms ± 4%       102ms ± 4%   +1.98%  (p=0.001 n=19+20)
GoTypes         622ms ± 4%       625ms ± 4%     ~     (p=0.945 n=20+19)
SSA             4.40s ± 4%       4.54s ± 3%   +3.22%  (p=0.000 n=20+20)
Flate           129ms ± 4%       133ms ± 2%   +3.30%  (p=0.000 n=19+18)
GoParser        156ms ± 4%       160ms ± 2%   +2.62%  (p=0.000 n=18+19)
Reflect         394ms ± 3%       405ms ± 4%   +2.77%  (p=0.000 n=20+20)
Tar             117ms ± 3%       121ms ± 5%   +3.25%  (p=0.000 n=20+20)
XML             219ms ± 2%       224ms ± 2%   +2.23%  (p=0.000 n=19+20)

name       old user-ns/op   new user-ns/op   delta
Template   274user-ms ± 3%  298user-ms ± 3%   +9.06%  (p=0.000 n=20+20)
Unicode    142user-ms ± 5%  142user-ms ± 2%     ~     (p=0.916 n=19+18)
GoTypes    846user-ms ± 3%  845user-ms ± 4%     ~     (p=0.879 n=19+20)
SSA        6.55user-s ± 1%  6.93user-s ± 4%   +5.91%  (p=0.000 n=19+20)
Flate      162user-ms ± 4%  168user-ms ± 5%   +3.53%  (p=0.000 n=19+20)
GoParser   204user-ms ± 5%  206user-ms ± 6%     ~     (p=0.234 n=19+19)
Reflect    503user-ms ± 4%  514user-ms ± 5%   +2.11%  (p=0.004 n=20+20)
Tar        150user-ms ± 6%  154user-ms ± 5%   +2.89%  (p=0.002 n=20+20)
XML        282user-ms ± 2%  282user-ms ± 2%     ~     (p=0.828 n=18+20)

name       old alloc/op     new alloc/op     delta
Template       39.8MB ± 0%      42.6MB ± 0%   +6.96%  (p=0.000 n=19+20)
Unicode        31.0MB ± 0%      31.7MB ± 0%   +2.22%  (p=0.000 n=20+20)
GoTypes         116MB ± 0%       124MB ± 0%   +6.73%  (p=0.000 n=20+20)
SSA             889MB ± 0%       989MB ± 0%  +11.28%  (p=0.000 n=20+20)
Flate          25.9MB ± 0%      27.8MB ± 0%   +7.32%  (p=0.000 n=20+19)
GoParser       31.8MB ± 0%      34.3MB ± 0%   +8.02%  (p=0.000 n=19+20)
Reflect        80.9MB ± 0%      84.6MB ± 0%   +4.63%  (p=0.000 n=20+20)
Tar            27.3MB ± 0%      28.8MB ± 0%   +5.59%  (p=0.000 n=18+20)
XML            43.9MB ± 0%      47.2MB ± 0%   +7.59%  (p=0.000 n=20+20)

name       old allocs/op    new allocs/op    delta
Template         377k ± 0%        421k ± 1%  +11.67%  (p=0.000 n=19+20)
Unicode          324k ± 1%        338k ± 1%   +4.37%  (p=0.000 n=20+20)
GoTypes         1.14M ± 0%       1.28M ± 0%  +12.00%  (p=0.000 n=19+20)
SSA             7.88M ± 0%       9.14M ± 0%  +16.03%  (p=0.000 n=20+20)
Flate            239k ± 1%        267k ± 1%  +11.46%  (p=0.000 n=20+19)
GoParser         308k ± 1%        347k ± 1%  +12.66%  (p=0.000 n=19+20)
Reflect          994k ± 0%       1075k ± 0%   +8.11%  (p=0.000 n=20+19)
Tar              252k ± 0%        274k ± 1%   +8.88%  (p=0.000 n=18+20)
XML              396k ± 1%        449k ± 0%  +13.42%  (p=0.000 n=20+20)

These negate a month's worth of the 1.9 toolspeed improvements and some. Measuring from the go1.8 tag to tip:

name       old time/op      new time/op      delta
Template        229ms ± 4%       232ms ± 2%  +1.52%  (p=0.000 n=97+19)
Unicode         100ms ± 6%       102ms ± 4%  +1.43%  (p=0.014 n=99+20)
Reflect         425ms ± 6%       405ms ± 4%  -4.71%  (p=0.000 n=99+20)
Tar             119ms ± 5%       121ms ± 5%  +1.84%  (p=0.000 n=98+20)
XML             224ms ± 4%       224ms ± 2%    ~     (p=0.424 n=98+20)

name       old user-ns/op   new user-ns/op   delta
Template   274user-ms ± 5%  298user-ms ± 3%  +8.85%  (p=0.000 n=98+20)
Unicode    138user-ms ± 7%  142user-ms ± 2%  +2.79%  (p=0.000 n=99+18)
Reflect    537user-ms ± 5%  514user-ms ± 5%  -4.28%  (p=0.000 n=99+20)
Tar        150user-ms ± 8%  154user-ms ± 5%  +2.32%  (p=0.001 n=99+20)
XML        285user-ms ± 6%  282user-ms ± 2%    ~     (p=0.161 n=97+20)

name       old alloc/op     new alloc/op     delta
Template       40.7MB ± 0%      42.6MB ± 0%  +4.49%  (p=0.000 n=97+20)
Unicode        30.6MB ± 0%      31.7MB ± 0%  +3.65%  (p=0.000 n=100+20)
Reflect        84.3MB ± 0%      84.6MB ± 0%  +0.39%  (p=0.000 n=98+20)
Tar            27.3MB ± 0%      28.8MB ± 0%  +5.57%  (p=0.000 n=98+20)
XML            44.7MB ± 0%      47.2MB ± 0%  +5.64%  (p=0.000 n=99+20)

name       old allocs/op    new allocs/op    delta
Template         401k ± 1%        421k ± 1%  +4.98%  (p=0.000 n=95+20)
Unicode          331k ± 1%        338k ± 1%  +1.94%  (p=0.000 n=100+20)
Reflect         1.06M ± 0%       1.07M ± 0%  +1.78%  (p=0.000 n=99+19)
Tar              266k ± 1%        274k ± 1%  +3.03%  (p=0.000 n=98+20)
XML              417k ± 1%        449k ± 0%  +7.52%  (p=0.000 n=99+20)

It'd be nice to see whether some of these performance regressions can be undone. Let's revisit once mid-stack inlining is complete?

cc @davidlazar @mdempsky @randall77

@josharian josharian added this to the Go1.9 milestone Mar 3, 2017
@davecheney
Copy link
Contributor

Some background, up to 1544217 performance of tip was doing quite well compared to 1.8

go version go1.4.3 linux/amd64

real    0m33.996s
user    1m32.300s
sys     0m10.860s
go version go1.8 linux/amd64

real    0m51.245s
user    2m45.964s
sys     0m8.012s
go version devel +1544217 Wed Mar 1 02:02:40 2017 +0000 linux/amd64

real    0m50.832s
user    2m43.784s
sys     0m8.620s

@davidlazar
Copy link
Member

My pending CLs involve updating runtime.Caller and runtime.Callers to work with inlining. We can start to fix the performance regressions now.

@josharian
Copy link
Contributor Author

Great. I ran compiler benchmarks commit by commit. (I haven't checked linker impact yet. It'd be good to do that too.) There appear to be two significant commits.

301149b:

name       old time/op      new time/op      delta
Template        227ms ± 2%       234ms ± 2%   +2.73%  (p=0.000 n=18+17)
Unicode         101ms ± 5%       102ms ± 5%     ~     (p=0.583 n=20+20)
GoTypes         629ms ± 6%       631ms ± 4%     ~     (p=0.495 n=20+20)
SSA             4.45s ± 3%       4.56s ± 3%   +2.37%  (p=0.000 n=20+20)
Flate           131ms ± 4%       132ms ± 3%     ~     (p=0.206 n=20+18)
GoParser        157ms ± 3%       159ms ± 2%   +1.34%  (p=0.003 n=20+20)
Reflect         401ms ± 4%       400ms ± 3%     ~     (p=0.792 n=20+19)
Tar             120ms ± 5%       121ms ± 3%     ~     (p=0.211 n=20+20)
XML             223ms ± 4%       225ms ± 4%     ~     (p=0.108 n=20+20)

name       old user-ns/op   new user-ns/op   delta
Template   276user-ms ± 2%  298user-ms ± 3%   +7.87%  (p=0.000 n=18+18)
Unicode    140user-ms ± 7%  142user-ms ± 5%     ~     (p=0.091 n=20+20)
GoTypes    845user-ms ± 3%  844user-ms ± 3%     ~     (p=0.718 n=20+20)
SSA        6.52user-s ± 4%  6.90user-s ± 5%   +5.83%  (p=0.000 n=20+20)
Flate      164user-ms ± 4%  167user-ms ± 4%   +1.50%  (p=0.017 n=20+20)
GoParser   205user-ms ± 5%  206user-ms ± 4%     ~     (p=0.512 n=20+20)
Reflect    507user-ms ± 4%  503user-ms ± 4%     ~     (p=0.127 n=20+19)
Tar        153user-ms ± 6%  154user-ms ± 4%     ~     (p=0.175 n=20+19)
XML        284user-ms ± 5%  283user-ms ± 4%     ~     (p=0.708 n=20+19)

name       old alloc/op     new alloc/op     delta
Template       39.9MB ± 0%      42.4MB ± 0%   +6.26%  (p=0.000 n=20+20)
Unicode        31.0MB ± 0%      31.7MB ± 0%   +2.25%  (p=0.000 n=20+20)
GoTypes         116MB ± 0%       124MB ± 0%   +6.33%  (p=0.000 n=20+20)
SSA             892MB ± 0%       989MB ± 0%  +10.84%  (p=0.000 n=20+20)
Flate          26.1MB ± 0%      27.7MB ± 0%   +6.38%  (p=0.000 n=19+20)
GoParser       31.8MB ± 0%      34.1MB ± 0%   +7.17%  (p=0.000 n=20+19)
Reflect        81.2MB ± 0%      84.5MB ± 0%   +4.10%  (p=0.000 n=19+20)
Tar            27.5MB ± 0%      28.7MB ± 0%   +4.38%  (p=0.000 n=20+20)
XML            44.1MB ± 0%      47.2MB ± 0%   +7.06%  (p=0.000 n=20+19)

name       old allocs/op    new allocs/op    delta
Template         378k ± 1%        419k ± 1%  +10.80%  (p=0.000 n=20+20)
Unicode          323k ± 1%        338k ± 0%   +4.47%  (p=0.000 n=20+20)
GoTypes         1.15M ± 0%       1.28M ± 0%  +11.61%  (p=0.000 n=20+20)
SSA             7.90M ± 0%       9.13M ± 0%  +15.62%  (p=0.000 n=20+20)
Flate            241k ± 1%        266k ± 1%  +10.62%  (p=0.000 n=19+20)
GoParser         309k ± 0%        345k ± 1%  +11.71%  (p=0.000 n=20+20)
Reflect         1.00M ± 0%       1.07M ± 0%   +7.48%  (p=0.000 n=17+19)
Tar              253k ± 1%        273k ± 1%   +7.83%  (p=0.000 n=20+19)
XML              398k ± 1%        448k ± 1%  +12.70%  (p=0.000 n=20+20)

699175a:

name       old time/op      new time/op      delta
Template        222ms ± 4%       224ms ± 2%    ~     (p=0.199 n=19+18)
Unicode        98.5ms ± 4%     101.7ms ± 4%  +3.25%  (p=0.000 n=19+20)
GoTypes         629ms ± 2%       623ms ± 4%  -1.05%  (p=0.038 n=18+20)
SSA             4.42s ± 2%       4.44s ± 2%    ~     (p=0.478 n=20+20)
Flate           128ms ± 3%       131ms ± 4%  +2.53%  (p=0.000 n=19+20)
GoParser        157ms ± 4%       158ms ± 4%  +1.25%  (p=0.047 n=19+20)
Reflect         395ms ± 3%       399ms ± 3%    ~     (p=0.184 n=19+20)
Tar             116ms ± 3%       120ms ± 4%  +3.10%  (p=0.000 n=20+20)
XML             220ms ± 3%       222ms ± 3%  +1.04%  (p=0.035 n=20+20)

name       old user-ns/op   new user-ns/op   delta
Template   272user-ms ± 4%  276user-ms ± 5%    ~     (p=0.107 n=19+20)
Unicode    142user-ms ± 3%  142user-ms ± 5%    ~     (p=0.773 n=19+19)
GoTypes    850user-ms ± 5%  848user-ms ± 5%    ~     (p=0.738 n=20+20)
SSA        6.54user-s ± 3%  6.59user-s ± 3%    ~     (p=0.086 n=20+20)
Flate      163user-ms ± 4%  166user-ms ± 6%  +1.81%  (p=0.020 n=20+20)
GoParser   206user-ms ± 6%  207user-ms ± 6%    ~     (p=0.341 n=20+20)
Reflect    503user-ms ± 3%  506user-ms ± 3%    ~     (p=0.411 n=19+20)
Tar        149user-ms ± 7%  153user-ms ± 6%  +2.39%  (p=0.008 n=20+19)
XML        282user-ms ± 3%  281user-ms ± 4%    ~     (p=0.607 n=20+19)

name       old alloc/op     new alloc/op     delta
Template       39.8MB ± 0%      39.9MB ± 0%  +0.25%  (p=0.000 n=20+19)
Unicode        31.0MB ± 0%      31.0MB ± 0%    ~     (p=0.444 n=20+19)
GoTypes         116MB ± 0%       117MB ± 0%  +0.34%  (p=0.000 n=20+19)
SSA             889MB ± 0%       892MB ± 0%  +0.36%  (p=0.000 n=20+18)
Flate          25.9MB ± 0%      26.0MB ± 0%  +0.60%  (p=0.000 n=20+20)
GoParser       31.8MB ± 0%      31.8MB ± 0%  +0.19%  (p=0.000 n=20+19)
Reflect        80.9MB ± 0%      81.3MB ± 0%  +0.46%  (p=0.000 n=20+20)
Tar            27.3MB ± 0%      27.4MB ± 0%  +0.50%  (p=0.000 n=20+20)
XML            43.9MB ± 0%      44.1MB ± 0%  +0.38%  (p=0.000 n=20+20)

name       old allocs/op    new allocs/op    delta
Template         377k ± 1%        378k ± 1%  +0.28%  (p=0.009 n=20+19)
Unicode          324k ± 0%        324k ± 0%    ~     (p=0.414 n=20+20)
GoTypes         1.14M ± 0%       1.15M ± 0%  +0.36%  (p=0.000 n=20+20)
SSA             7.87M ± 0%       7.90M ± 0%  +0.34%  (p=0.000 n=20+19)
Flate            240k ± 1%        240k ± 1%  +0.38%  (p=0.001 n=20+20)
GoParser         308k ± 0%        309k ± 1%    ~     (p=0.074 n=20+19)
Reflect         1.00M ± 0%       1.00M ± 0%  +0.44%  (p=0.000 n=20+20)
Tar              252k ± 1%        253k ± 0%  +0.39%  (p=0.001 n=20+20)
XML              395k ± 1%        397k ± 1%  +0.45%  (p=0.000 n=20+20)

Also of possible concern is 9fd359a, not because the direct numbers look bad (see below) but because import/export is a known bottleneck for large build trees with large packages in them. @davecheney, can you double check that commit with your favorite benchmarks?

name       old time/op      new time/op      delta
Template        234ms ± 5%       234ms ± 4%    ~     (p=0.772 n=16+18)
Unicode         102ms ± 3%       101ms ± 4%  -1.53%  (p=0.002 n=15+18)
GoTypes         634ms ± 6%       637ms ± 6%    ~     (p=0.351 n=20+19)
Compiler        2.93s ± 2%       2.96s ± 2%  +0.78%  (p=0.047 n=18+18)
SSA             4.56s ± 3%       4.56s ± 2%    ~     (p=0.707 n=19+17)
Flate           133ms ± 6%       132ms ± 4%    ~     (p=0.221 n=20+20)
GoParser        159ms ± 4%       160ms ± 5%    ~     (p=0.314 n=20+20)
Reflect         400ms ± 4%       403ms ± 7%    ~     (p=0.301 n=20+20)
Tar             121ms ± 4%       121ms ± 5%    ~     (p=0.738 n=20+20)
XML             224ms ± 3%       225ms ± 5%    ~     (p=0.925 n=20+20)

name       old user-ns/op   new user-ns/op   delta
Template   297user-ms ± 8%  302user-ms ± 6%    ~     (p=0.065 n=19+20)
Unicode    142user-ms ± 4%  141user-ms ± 6%    ~     (p=0.052 n=17+19)
GoTypes    847user-ms ± 3%  850user-ms ± 5%    ~     (p=0.355 n=20+20)
Compiler   4.12user-s ± 9%  4.07user-s ± 8%    ~     (p=0.183 n=20+20)
SSA        6.93user-s ± 3%  6.89user-s ± 7%    ~     (p=0.792 n=19+20)
Flate      166user-ms ± 5%  165user-ms ± 1%    ~     (p=0.053 n=20+16)
GoParser   207user-ms ± 5%  206user-ms ± 4%    ~     (p=0.149 n=20+20)
Reflect    508user-ms ± 3%  510user-ms ± 4%    ~     (p=0.465 n=20+20)
Tar        153user-ms ± 4%  154user-ms ± 5%    ~     (p=0.904 n=20+20)
XML        283user-ms ± 5%  284user-ms ± 4%    ~     (p=0.414 n=20+20)

name       old alloc/op     new alloc/op     delta
Template       42.4MB ± 0%      42.6MB ± 0%  +0.35%  (p=0.000 n=20+19)
Unicode        31.7MB ± 0%      31.7MB ± 0%    ~     (p=0.495 n=20+20)
GoTypes         124MB ± 0%       124MB ± 0%  +0.10%  (p=0.000 n=20+17)
Compiler        533MB ± 0%       534MB ± 0%  +0.02%  (p=0.005 n=20+20)
SSA             989MB ± 0%       989MB ± 0%  +0.07%  (p=0.000 n=20+20)
Flate          27.7MB ± 0%      27.8MB ± 0%  +0.13%  (p=0.001 n=20+20)
GoParser       34.1MB ± 0%      34.3MB ± 0%  +0.50%  (p=0.000 n=20+18)
Reflect        84.5MB ± 0%      84.6MB ± 0%  +0.12%  (p=0.000 n=20+20)
Tar            28.6MB ± 0%      28.8MB ± 0%  +0.48%  (p=0.000 n=19+20)
XML            47.2MB ± 0%      47.2MB ± 0%    ~     (p=0.718 n=20+20)

name       old allocs/op    new allocs/op    delta
Template         419k ± 1%        421k ± 0%  +0.45%  (p=0.000 n=20+19)
Unicode          338k ± 1%        338k ± 1%    ~     (p=0.529 n=20+20)
GoTypes         1.28M ± 0%       1.28M ± 0%  +0.16%  (p=0.000 n=20+19)
Compiler        5.06M ± 0%       5.06M ± 0%  +0.14%  (p=0.000 n=20+20)
SSA             9.14M ± 0%       9.14M ± 0%  +0.06%  (p=0.000 n=20+20)
Flate            266k ± 1%        267k ± 1%    ~     (p=0.383 n=20+20)
GoParser         345k ± 0%        347k ± 0%  +0.53%  (p=0.000 n=20+18)
Reflect         1.07M ± 0%       1.07M ± 0%    ~     (p=0.820 n=20+20)
Tar              272k ± 1%        274k ± 1%  +0.52%  (p=0.000 n=20+20)
XML              448k ± 1%        449k ± 1%    ~     (p=0.925 n=20+20)

@gopherbot
Copy link

CL https://golang.org/cl/37766 mentions this issue.

gopherbot pushed a commit that referenced this issue Mar 4, 2017
CL 37234 introduced string concatenation into some hot code. 
This CL does that work earlier and caches the result.

Updates #19386

Performance impact vs master:

name       old time/op      new time/op      delta
Template        223ms ± 5%       216ms ± 5%   -2.98%  (p=0.001 n=20+20)
Unicode        98.7ms ± 4%      99.0ms ± 4%     ~     (p=0.749 n=20+19)
GoTypes         631ms ± 4%       626ms ± 4%     ~     (p=0.253 n=20+20)
Compiler        2.91s ± 1%       2.87s ± 3%   -1.11%  (p=0.005 n=18+20)
SSA             4.48s ± 2%       4.36s ± 2%   -2.77%  (p=0.000 n=20+20)
Flate           130ms ± 2%       129ms ± 6%     ~     (p=0.428 n=19+20)
GoParser        160ms ± 4%       157ms ± 3%   -1.62%  (p=0.005 n=20+18)
Reflect         395ms ± 2%       394ms ± 4%     ~     (p=0.445 n=20+20)
Tar             120ms ± 5%       118ms ± 6%     ~     (p=0.101 n=19+20)
XML             224ms ± 3%       223ms ± 3%     ~     (p=0.544 n=19+19)

name       old user-ns/op   new user-ns/op   delta
Template   291user-ms ± 5%  265user-ms ± 5%   -9.02%  (p=0.000 n=20+19)
Unicode    140user-ms ± 3%  139user-ms ± 8%     ~     (p=0.904 n=20+20)
GoTypes    844user-ms ± 3%  849user-ms ± 3%     ~     (p=0.251 n=20+18)
Compiler   4.06user-s ± 5%  3.98user-s ± 2%     ~     (p=0.056 n=20+20)
SSA        6.89user-s ± 5%  6.50user-s ± 3%   -5.61%  (p=0.000 n=20+20)
Flate      164user-ms ± 5%  163user-ms ± 4%     ~     (p=0.365 n=20+19)
GoParser   206user-ms ± 6%  204user-ms ± 4%     ~     (p=0.534 n=20+18)
Reflect    501user-ms ± 4%  505user-ms ± 5%     ~     (p=0.383 n=20+20)
Tar        151user-ms ± 3%  152user-ms ± 7%     ~     (p=0.798 n=17+20)
XML        283user-ms ± 7%  280user-ms ± 5%     ~     (p=0.301 n=20+20)

name       old alloc/op     new alloc/op     delta
Template       42.5MB ± 0%      40.2MB ± 0%   -5.59%  (p=0.000 n=20+20)
Unicode        31.7MB ± 0%      31.0MB ± 0%   -2.19%  (p=0.000 n=20+18)
GoTypes         124MB ± 0%       117MB ± 0%   -5.90%  (p=0.000 n=20+20)
Compiler        533MB ± 0%       490MB ± 0%   -8.07%  (p=0.000 n=20+20)
SSA             989MB ± 0%       893MB ± 0%   -9.74%  (p=0.000 n=20+20)
Flate          27.8MB ± 0%      26.1MB ± 0%   -5.92%  (p=0.000 n=20+20)
GoParser       34.3MB ± 0%      32.1MB ± 0%   -6.43%  (p=0.000 n=19+20)
Reflect        84.6MB ± 0%      81.4MB ± 0%   -3.84%  (p=0.000 n=20+20)
Tar            28.8MB ± 0%      27.7MB ± 0%   -3.89%  (p=0.000 n=20+20)
XML            47.2MB ± 0%      44.2MB ± 0%   -6.45%  (p=0.000 n=20+19)

name       old allocs/op    new allocs/op    delta
Template         420k ± 1%        381k ± 1%   -9.35%  (p=0.000 n=20+20)
Unicode          338k ± 1%        324k ± 1%   -4.29%  (p=0.000 n=20+19)
GoTypes         1.28M ± 0%       1.15M ± 0%  -10.30%  (p=0.000 n=20+20)
Compiler        5.06M ± 0%       4.41M ± 0%  -12.92%  (p=0.000 n=20+20)
SSA             9.14M ± 0%       7.91M ± 0%  -13.46%  (p=0.000 n=19+20)
Flate            267k ± 0%        241k ± 1%   -9.53%  (p=0.000 n=20+20)
GoParser         347k ± 1%        312k ± 0%  -10.15%  (p=0.000 n=19+20)
Reflect         1.07M ± 0%       1.00M ± 0%   -6.86%  (p=0.000 n=20+20)
Tar              274k ± 1%        256k ± 1%   -6.73%  (p=0.000 n=20+20)
XML              448k ± 0%        398k ± 0%  -11.17%  (p=0.000 n=20+18)


Performance impact when applied together with CL 37234
atop CL 37234's parent commit (i.e. as if it were
a part of CL 37234), to show that this commit
makes CL 37234 completely performance-neutral:

name       old time/op      new time/op      delta
Template        222ms ±14%       222ms ±14%    ~     (p=1.000 n=14+15)
Unicode         104ms ±18%       106ms ±18%    ~     (p=0.650 n=13+14)
GoTypes         653ms ± 7%       638ms ± 5%    ~     (p=0.145 n=14+12)
Compiler        3.10s ± 1%       3.13s ±10%    ~     (p=1.000 n=2+2)
SSA             4.73s ±11%       4.68s ±11%    ~     (p=0.567 n=15+15)
Flate           136ms ± 4%       133ms ± 7%    ~     (p=0.231 n=12+14)
GoParser        163ms ±11%       169ms ±10%    ~     (p=0.352 n=14+14)
Reflect         415ms ±15%       423ms ±20%    ~     (p=0.715 n=15+14)
Tar             133ms ±17%       130ms ±23%    ~     (p=0.252 n=14+15)
XML             236ms ±16%       235ms ±14%    ~     (p=0.874 n=14+14)

name       old user-ns/op   new user-ns/op   delta
Template   271user-ms ±10%  271user-ms ±10%    ~     (p=0.780 n=14+15)
Unicode    143user-ms ± 5%  146user-ms ±11%    ~     (p=0.432 n=12+14)
GoTypes    864user-ms ± 5%  866user-ms ± 9%    ~     (p=0.905 n=14+13)
Compiler   4.17user-s ± 1%  4.26user-s ± 7%    ~     (p=1.000 n=2+2)
SSA        6.79user-s ± 8%  6.79user-s ± 6%    ~     (p=0.902 n=15+15)
Flate      169user-ms ± 8%  164user-ms ± 5%  -3.13%  (p=0.014 n=14+14)
GoParser   212user-ms ± 7%  217user-ms ±22%    ~     (p=1.000 n=13+15)
Reflect    521user-ms ± 7%  533user-ms ±15%    ~     (p=0.511 n=14+14)
Tar        165user-ms ±17%  161user-ms ±15%    ~     (p=0.345 n=15+15)
XML        294user-ms ±11%  292user-ms ±10%    ~     (p=0.839 n=14+14)

name       old alloc/op     new alloc/op     delta
Template       39.9MB ± 0%      39.9MB ± 0%    ~     (p=0.621 n=15+14)
Unicode        31.0MB ± 0%      31.0MB ± 0%    ~     (p=0.098 n=13+15)
GoTypes         117MB ± 0%       117MB ± 0%    ~     (p=0.775 n=15+15)
Compiler        488MB ± 0%       488MB ± 0%    ~     (p=0.333 n=2+2)
SSA             892MB ± 0%       892MB ± 0%  +0.03%  (p=0.000 n=15+15)
Flate          26.1MB ± 0%      26.1MB ± 0%    ~     (p=0.098 n=15+15)
GoParser       31.8MB ± 0%      31.8MB ± 0%    ~     (p=0.525 n=15+13)
Reflect        81.2MB ± 0%      81.2MB ± 0%  +0.06%  (p=0.001 n=12+14)
Tar            27.5MB ± 0%      27.5MB ± 0%    ~     (p=0.595 n=15+15)
XML            44.1MB ± 0%      44.1MB ± 0%    ~     (p=0.486 n=15+15)

name       old allocs/op    new allocs/op    delta
Template         378k ± 1%        378k ± 0%    ~     (p=0.949 n=15+14)
Unicode          324k ± 0%        324k ± 1%    ~     (p=0.057 n=14+15)
GoTypes         1.15M ± 0%       1.15M ± 0%    ~     (p=0.461 n=15+15)
Compiler        4.39M ± 0%       4.39M ± 0%    ~     (p=0.333 n=2+2)
SSA             7.90M ± 0%       7.90M ± 0%  +0.06%  (p=0.008 n=15+15)
Flate            240k ± 1%        241k ± 0%    ~     (p=0.233 n=15+15)
GoParser         309k ± 1%        309k ± 0%    ~     (p=0.867 n=15+12)
Reflect         1.00M ± 0%       1.00M ± 0%    ~     (p=0.139 n=12+15)
Tar              254k ± 1%        253k ± 1%    ~     (p=0.345 n=15+15)
XML              398k ± 0%        397k ± 1%    ~     (p=0.267 n=15+15)


Change-Id: Ic999a0f456a371c99eebba0f9747263a13836e33
Reviewed-on: https://go-review.googlesource.com/37766
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@rsc
Copy link
Contributor

rsc commented Mar 6, 2017

Hi @josharian. Thanks for keeping an eye on compiler performance and raising this issue.

@davidlazar's recent CL stack was concerned primarily with correctness - making sure that line number and stack trace information was reported accurately in the runtime for existing uses of inlining. During the code reviews, Robert and others asked about the impact on export data size but we forgot to check compiler performance. As one of the people who was working with David on this, I'll take responsibility for forgetting that. In my head, we didn't need to worry about compiler performance until we made inlining more aggressive, which we haven't done yet. Clearly my overall mental model was wrong, and even if I'd been looking, I definitely would not have flagged CL 37234 as performance-sensitive. My apologies.

I believe @griesemer also plans to look into reducing the growth of the export data somewhat, while still keeping the correctness gains. @davecheney, please let us know if that shows up as significant in your tests, so that we can prioritize appropriately.

@josharian, thanks for the fix for CL 37234. Are you already looking at CL 37231 as well? I just don't want to duplicate effort. If you're not, we will.

Thanks again for filing this issue.

@rsc rsc modified the milestones: Go1.9Early, Go1.9 Mar 6, 2017
@josharian
Copy link
Contributor Author

Thanks, Russ.

Thanks for keeping an eye on compiler performance and raising this issue.

My pleasure. Someday I'll set up some dedicated hardware and automate it so the nagging can come from a bot instead of me. :)

Are you already looking at CL 37231 as well?

I am not. (I got distracted thinking about a fuzzy pprof diff tool and then...)

Before you look, though, let me do one more round of benchmarking to confirm that there's a there there. I'll report back.

@dgryski
Copy link
Contributor

dgryski commented Mar 7, 2017

My pleasure. Someday I'll set up some dedicated hardware and automate it so the nagging can come from a bot instead of me. :)

Can this be handled by something from the /x/perf repo?

@josharian
Copy link
Contributor Author

OK, I've double-checked, and the impact from CL 37231 (699175a) is small but real. Numbers below.

Crude instrumentation suggests that about 20% of entries in Ctxt.InlTree are exact duplicates. Unifying those might help. I don't plan to look further at CL 37231.

I will run the benchjuju benchmark against CL 37239 (9fd359a) and report back, probably tomorrow.

name       old time/op      new time/op      delta
Template        211ms ± 8%       212ms ± 6%    ~     (p=0.666 n=100+96)
Unicode        95.1ms ± 4%      96.9ms ± 5%  +1.92%  (p=0.000 n=99+99)
GoTypes         590ms ± 7%       600ms ± 4%  +1.64%  (p=0.000 n=100+95)
SSA             4.11s ± 4%       4.14s ± 3%  +0.68%  (p=0.001 n=99+99)
Flate           124ms ± 3%       125ms ± 4%  +1.13%  (p=0.000 n=98+97)
GoParser        152ms ± 3%       152ms ± 3%    ~     (p=0.067 n=98+95)
Reflect         374ms ± 4%       376ms ± 4%  +0.51%  (p=0.031 n=100+98)
Tar             112ms ± 5%       112ms ± 4%  +0.80%  (p=0.004 n=97+99)
XML             212ms ± 4%       214ms ± 4%  +0.75%  (p=0.001 n=99+97)

name       old user-ns/op   new user-ns/op   delta
Template   263user-ms ± 6%  263user-ms ± 8%    ~     (p=0.799 n=99+98)
Unicode    136user-ms ± 4%  137user-ms ± 5%  +0.99%  (p=0.000 n=93+96)
GoTypes    807user-ms ± 5%  817user-ms ± 5%  +1.16%  (p=0.000 n=98+100)
SSA        6.15user-s ± 4%  6.21user-s ± 4%  +0.94%  (p=0.000 n=100+100)
Flate      157user-ms ± 5%  159user-ms ± 4%  +1.33%  (p=0.000 n=99+95)
GoParser   199user-ms ± 5%  201user-ms ± 4%  +1.00%  (p=0.000 n=96+97)
Reflect    484user-ms ± 4%  488user-ms ± 3%  +0.85%  (p=0.001 n=98+89)
Tar        144user-ms ± 5%  145user-ms ± 5%    ~     (p=0.404 n=97+100)
XML        271user-ms ± 4%  273user-ms ± 5%  +0.65%  (p=0.022 n=99+97)

name       old alloc/op     new alloc/op     delta
Template       39.8MB ± 0%      39.9MB ± 0%  +0.25%  (p=0.000 n=99+100)
Unicode        31.0MB ± 0%      31.0MB ± 0%    ~     (p=0.569 n=100+96)
GoTypes         116MB ± 0%       117MB ± 0%  +0.32%  (p=0.000 n=99+99)
SSA             889MB ± 0%       892MB ± 0%  +0.35%  (p=0.000 n=100+100)
Flate          25.9MB ± 0%      26.1MB ± 0%  +0.62%  (p=0.000 n=99+99)
GoParser       31.8MB ± 0%      31.8MB ± 0%  +0.19%  (p=0.000 n=100+99)
Reflect        80.9MB ± 0%      81.3MB ± 0%  +0.45%  (p=0.000 n=100+99)
Tar            27.3MB ± 0%      27.4MB ± 0%  +0.51%  (p=0.000 n=100+99)
XML            43.9MB ± 0%      44.1MB ± 0%  +0.38%  (p=0.000 n=97+97)

name       old allocs/op    new allocs/op    delta
Template         377k ± 1%        378k ± 1%  +0.29%  (p=0.000 n=98+100)
Unicode          324k ± 1%        324k ± 1%    ~     (p=0.697 n=99+98)
GoTypes         1.14M ± 0%       1.15M ± 0%  +0.33%  (p=0.000 n=99+99)
SSA             7.87M ± 0%       7.90M ± 0%  +0.33%  (p=0.000 n=100+100)
Flate            239k ± 1%        241k ± 1%  +0.48%  (p=0.000 n=99+99)
GoParser         308k ± 1%        309k ± 1%  +0.22%  (p=0.000 n=100+100)
Reflect          995k ± 0%        999k ± 0%  +0.42%  (p=0.000 n=100+99)
Tar              252k ± 1%        253k ± 1%  +0.40%  (p=0.000 n=99+99)
XML              396k ± 1%        397k ± 1%  +0.38%  (p=0.000 n=99+98)

@davidlazar
Copy link
Member

Those numbers agree with what I was seeing on my laptop. The only duplicates in Ctxt.InlTree should be for multiple inlined calls of the same function on the same line. What code gave you 20% duplicate entries?

@josharian
Copy link
Contributor Author

Apparently something erroneous. My apologies. I just re-instrumented and got only 1% duplicates.

For future reference, to get the 1% number, I applied the diff below and did:

$ go build -a std cmd 2>&1 | grep NODE | wc -l
   25825
$ go build -a std cmd 2>&1 | grep NODE | sort | uniq | wc -l
   25555

Diff:

diff --git a/src/cmd/compile/internal/gc/main.go b/src/cmd/compile/internal/gc/main.go
index 490ac7db40..9404f917f1 100644
--- a/src/cmd/compile/internal/gc/main.go
+++ b/src/cmd/compile/internal/gc/main.go
@@ -536,6 +536,10 @@ func Main() {
                        log.Fatalf("cannot write benchmark data: %v", err)
                }
        }
+
+       for _, n := range Ctxt.InlTree.Nodes() {
+               fmt.Printf("NODE %v\n", n)
+       }
 }
 
 func writebench(filename string) error {
diff --git a/src/cmd/internal/obj/inl.go b/src/cmd/internal/obj/inl.go
index 116921995a..7597b526a1 100644
--- a/src/cmd/internal/obj/inl.go
+++ b/src/cmd/internal/obj/inl.go
@@ -42,6 +42,8 @@ type InlTree struct {
        nodes []InlinedCall
 }
 
+func (t *InlTree) Nodes() []InlinedCall { return t.nodes }
+
 // InlinedCall is a node in an InlTree.
 type InlinedCall struct {
        Parent int      // index of the parent in the InlTree or < 0 if outermost call

@josharian
Copy link
Contributor Author

josharian commented Mar 8, 2017

benchjuju results for CL 37239 (9fd359a):

name       old time/op  new time/op  delta
BuildJuju   21.6s ± 3%   22.0s ± 3%  +1.72%  (p=0.000 n=49+50)

https://perf.golang.org/search?q=upload:20170308.1

I'll leave it to y'all to decide whether there's anything further practical to be done here or whether this issue should be closed. If so, I understand.

@dcheney-atlassian
Copy link

go version go1.4.3 linux/amd64

real    0m46.864s
user    1m54.428s
sys     0m15.780s
go version go1.8 linux/amd64

real    1m6.088s
user    3m21.104s
sys     0m12.352s
go version devel +3a1271d Wed Mar 8 22:31:00 2017 +0000 linux/amd64

real    1m11.115s
user    3m31.592s
sys     0m13.700s

@josharian
Copy link
Contributor Author

Not sure there's anything more to do here. Closing.

@golang golang locked and limited conversation to collaborators May 2, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants