Skip to content

cmd/compile: ppc64 function pointer call performance regression #42709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pmur opened this issue Nov 18, 2020 · 3 comments
Closed

cmd/compile: ppc64 function pointer call performance regression #42709

pmur opened this issue Nov 18, 2020 · 3 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@pmur
Copy link
Contributor

pmur commented Nov 18, 2020

There seems to be a large regression in performance when function pointer calls where changed to use the link register instead of the count register. Based on when this change went in, this likely affects all versions back to go1.14. The impact can be mitigated by setting the appropriate branch hint for this usage e.g

The regex benchmarks are a pretty good example of how much this
hint can help generic ppc64le code on a power9 machine:

name                          old time/op    new time/op     delta
Find                             606ns ± 0%      447ns ± 0%  -26.27%
FindAllNoMatches                 309ns ± 0%      205ns ± 0%  -33.72%
FindString                       609ns ± 0%      451ns ± 0%  -26.04%
FindSubmatch                     734ns ± 0%      594ns ± 0%  -19.07%
FindStringSubmatch               706ns ± 0%      574ns ± 0%  -18.83%
Literal                          177ns ± 0%      136ns ± 0%  -22.89%
NotLiteral                      4.69µs ± 0%     2.34µs ± 0%  -50.14%
MatchClass                      6.05µs ± 0%     3.26µs ± 0%  -46.08%
MatchClass_InRange              5.93µs ± 0%     3.15µs ± 0%  -46.86%
ReplaceAll                      3.15µs ± 0%     2.18µs ± 0%  -30.77%
AnchoredLiteralShortNonMatch     156ns ± 0%      109ns ± 0%  -30.61%
AnchoredLiteralLongNonMatch      192ns ± 0%      136ns ± 0%  -29.34%
AnchoredShortMatch               268ns ± 0%      209ns ± 0%  -22.00%
AnchoredLongMatch                472ns ± 0%      357ns ± 0%  -24.30%
OnePassShortA                   1.16µs ± 0%     0.87µs ± 0%  -25.03%
NotOnePassShortA                1.34µs ± 0%     1.20µs ± 0%  -10.63%
OnePassShortB                    940ns ± 0%      655ns ± 0%  -30.29%
NotOnePassShortB                 873ns ± 0%      703ns ± 0%  -19.52%
OnePassLongPrefix                258ns ± 0%      155ns ± 0%  -40.13%
OnePassLongNotPrefix             943ns ± 0%      529ns ± 0%  -43.89%
MatchParallelShared              591ns ± 0%      436ns ± 0%  -26.31%
MatchParallelCopied              596ns ± 0%      435ns ± 0%  -27.10%
QuoteMetaAll                     186ns ± 0%      186ns ± 0%   -0.16%
QuoteMetaNone                   55.9ns ± 0%     55.9ns ± 0%   +0.02%
Compile/Onepass                 9.64µs ± 0%     9.26µs ± 0%   -3.97%
Compile/Medium                  21.7µs ± 0%     20.6µs ± 0%   -4.90%
Compile/Hard                     174µs ± 0%      174µs ± 0%   +0.07%
Match/Easy0/16                  7.35ns ± 0%     7.34ns ± 0%   -0.11%
Match/Easy0/32                   116ns ± 0%       97ns ± 0%  -16.27%
Match/Easy0/1K                   592ns ± 0%      562ns ± 0%   -5.04%
Match/Easy0/32K                 12.6µs ± 0%     12.5µs ± 0%   -0.64%
Match/Easy0/1M                   556µs ± 0%      556µs ± 0%   -0.00%
Match/Easy0/32M                 17.7ms ± 0%     17.7ms ± 0%   +0.05%
Match/Easy0i/16                 7.34ns ± 0%     7.35ns ± 0%   +0.10%
Match/Easy0i/32                 2.82µs ± 0%     1.64µs ± 0%  -41.71%
Match/Easy0i/1K                 83.2µs ± 0%     48.2µs ± 0%  -42.06%
Match/Easy0i/32K                2.13ms ± 0%     1.80ms ± 0%  -15.34%
Match/Easy0i/1M                 68.1ms ± 0%     57.6ms ± 0%  -15.31%
Match/Easy0i/32M                 2.18s ± 0%      1.80s ± 0%  -17.52%
Match/Easy1/16                  7.36ns ± 0%     7.34ns ± 0%   -0.24%
Match/Easy1/32                   118ns ± 0%       96ns ± 0%  -18.72%
Match/Easy1/1K                  2.46µs ± 0%     1.58µs ± 0%  -35.65%
Match/Easy1/32K                 80.2µs ± 0%     54.6µs ± 0%  -31.92%
Match/Easy1/1M                  2.75ms ± 0%     1.88ms ± 0%  -31.66%
Match/Easy1/32M                 87.5ms ± 0%     59.8ms ± 0%  -31.62%
Match/Medium/16                 7.34ns ± 0%     7.34ns ± 0%   +0.01%
Match/Medium/32                 2.60µs ± 0%     1.50µs ± 0%  -42.61%
Match/Medium/1K                 78.1µs ± 0%     43.7µs ± 0%  -44.06%
Match/Medium/32K                2.08ms ± 0%     1.52ms ± 0%  -27.11%
Match/Medium/1M                 66.5ms ± 0%     48.6ms ± 0%  -26.96%
Match/Medium/32M                 2.14s ± 0%      1.60s ± 0%  -25.18%
Match/Hard/16                   7.35ns ± 0%     7.35ns ± 0%   +0.03%
Match/Hard/32                   3.58µs ± 0%     2.44µs ± 0%  -31.82%
Match/Hard/1K                    108µs ± 0%       75µs ± 0%  -31.04%
Match/Hard/32K                  2.79ms ± 0%     2.25ms ± 0%  -19.30%
Match/Hard/1M                   89.4ms ± 0%     72.2ms ± 0%  -19.26%
Match/Hard/32M                   2.91s ± 0%      2.37s ± 0%  -18.60%
Match/Hard1/16                  11.1µs ± 0%      8.3µs ± 0%  -25.07%
Match/Hard1/32                  21.4µs ± 0%     16.1µs ± 0%  -24.85%
Match/Hard1/1K                   658µs ± 0%      498µs ± 0%  -24.27%
Match/Hard1/32K                 12.2ms ± 0%     11.7ms ± 0%   -4.60%
Match/Hard1/1M                   391ms ± 0%      374ms ± 0%   -4.40%
Match/Hard1/32M                  12.6s ± 0%      12.0s ± 0%   -4.68%
Match_onepass_regex/16           870ns ± 0%      611ns ± 0%  -29.79%
Match_onepass_regex/32          1.58µs ± 0%     1.08µs ± 0%  -31.48%
Match_onepass_regex/1K          45.7µs ± 0%     30.3µs ± 0%  -33.58%
Match_onepass_regex/32K         1.45ms ± 0%     0.97ms ± 0%  -33.20%
Match_onepass_regex/1M          46.2ms ± 0%     30.9ms ± 0%  -33.01%
Match_onepass_regex/32M          1.46s ± 0%      0.99s ± 0%  -32.02%

name                          old alloc/op   new alloc/op    delta
Find                             0.00B           0.00B         0.00%
FindAllNoMatches                 0.00B           0.00B         0.00%
FindString                       0.00B           0.00B         0.00%
FindSubmatch                     48.0B ± 0%      48.0B ± 0%    0.00%
FindStringSubmatch               32.0B ± 0%      32.0B ± 0%    0.00%
Compile/Onepass                 4.02kB ± 0%     4.02kB ± 0%    0.00%
Compile/Medium                  9.39kB ± 0%     9.39kB ± 0%    0.00%
Compile/Hard                    84.7kB ± 0%     84.7kB ± 0%    0.00%
Match_onepass_regex/16           0.00B           0.00B         0.00%
Match_onepass_regex/32           0.00B           0.00B         0.00%
Match_onepass_regex/1K           0.00B           0.00B         0.00%
Match_onepass_regex/32K          0.00B           0.00B         0.00%
Match_onepass_regex/1M           5.00B ± 0%      3.00B ± 0%  -40.00%
Match_onepass_regex/32M           136B ± 0%        68B ± 0%  -50.00%

name                          old allocs/op  new allocs/op   delta
Find                              0.00            0.00         0.00%
FindAllNoMatches                  0.00            0.00         0.00%
FindString                        0.00            0.00         0.00%
FindSubmatch                      1.00 ± 0%       1.00 ± 0%    0.00%
FindStringSubmatch                1.00 ± 0%       1.00 ± 0%    0.00%
Compile/Onepass                   52.0 ± 0%       52.0 ± 0%    0.00%
Compile/Medium                     112 ± 0%        112 ± 0%    0.00%
Compile/Hard                       424 ± 0%        424 ± 0%    0.00%
Match_onepass_regex/16            0.00            0.00         0.00%
Match_onepass_regex/32            0.00            0.00         0.00%
Match_onepass_regex/1K            0.00            0.00         0.00%
Match_onepass_regex/32K           0.00            0.00         0.00%
Match_onepass_regex/1M            0.00            0.00         0.00%
Match_onepass_regex/32M           2.00 ± 0%       1.00 ± 0%  -50.00%

name                          old speed      new speed       delta
QuoteMetaAll                  75.2MB/s ± 0%   75.3MB/s ± 0%   +0.15%
QuoteMetaNone                  465MB/s ± 0%    465MB/s ± 0%   -0.02%
Match/Easy0/16                2.18GB/s ± 0%   2.18GB/s ± 0%   +0.10%
Match/Easy0/32                 276MB/s ± 0%    330MB/s ± 0%  +19.46%
Match/Easy0/1K                1.73GB/s ± 0%   1.82GB/s ± 0%   +5.29%
Match/Easy0/32K               2.60GB/s ± 0%   2.62GB/s ± 0%   +0.64%
Match/Easy0/1M                1.89GB/s ± 0%   1.89GB/s ± 0%   +0.00%
Match/Easy0/32M               1.89GB/s ± 0%   1.89GB/s ± 0%   -0.05%
Match/Easy0i/16               2.18GB/s ± 0%   2.18GB/s ± 0%   -0.10%
Match/Easy0i/32               11.4MB/s ± 0%   19.5MB/s ± 0%  +71.48%
Match/Easy0i/1K               12.3MB/s ± 0%   21.2MB/s ± 0%  +72.62%
Match/Easy0i/32K              15.4MB/s ± 0%   18.2MB/s ± 0%  +18.12%
Match/Easy0i/1M               15.4MB/s ± 0%   18.2MB/s ± 0%  +18.12%
Match/Easy0i/32M              15.4MB/s ± 0%   18.6MB/s ± 0%  +21.21%
Match/Easy1/16                2.17GB/s ± 0%   2.18GB/s ± 0%   +0.24%
Match/Easy1/32                 271MB/s ± 0%    333MB/s ± 0%  +23.07%
Match/Easy1/1K                 417MB/s ± 0%    648MB/s ± 0%  +55.38%
Match/Easy1/32K                409MB/s ± 0%    600MB/s ± 0%  +46.88%
Match/Easy1/1M                 381MB/s ± 0%    558MB/s ± 0%  +46.33%
Match/Easy1/32M                383MB/s ± 0%    561MB/s ± 0%  +46.25%
Match/Medium/16               2.18GB/s ± 0%   2.18GB/s ± 0%   -0.01%
Match/Medium/32               12.3MB/s ± 0%   21.4MB/s ± 0%  +74.13%
Match/Medium/1K               13.1MB/s ± 0%   23.4MB/s ± 0%  +78.73%
Match/Medium/32K              15.7MB/s ± 0%   21.6MB/s ± 0%  +37.23%
Match/Medium/1M               15.8MB/s ± 0%   21.6MB/s ± 0%  +36.93%
Match/Medium/32M              15.7MB/s ± 0%   21.0MB/s ± 0%  +33.67%
Match/Hard/16                 2.18GB/s ± 0%   2.18GB/s ± 0%   -0.03%
Match/Hard/32                 8.93MB/s ± 0%  13.10MB/s ± 0%  +46.70%
Match/Hard/1K                 9.48MB/s ± 0%  13.74MB/s ± 0%  +44.94%
Match/Hard/32K                11.7MB/s ± 0%   14.5MB/s ± 0%  +23.87%
Match/Hard/1M                 11.7MB/s ± 0%   14.5MB/s ± 0%  +23.87%
Match/Hard/32M                11.6MB/s ± 0%   14.2MB/s ± 0%  +22.86%
Match/Hard1/16                1.44MB/s ± 0%   1.93MB/s ± 0%  +34.03%
Match/Hard1/32                1.49MB/s ± 0%   1.99MB/s ± 0%  +33.56%
Match/Hard1/1K                1.56MB/s ± 0%   2.05MB/s ± 0%  +31.41%
Match/Hard1/32K               2.68MB/s ± 0%   2.80MB/s ± 0%   +4.48%
Match/Hard1/1M                2.68MB/s ± 0%   2.80MB/s ± 0%   +4.48%
Match/Hard1/32M               2.66MB/s ± 0%   2.79MB/s ± 0%   +4.89%
Match_onepass_regex/16        18.4MB/s ± 0%   26.2MB/s ± 0%  +42.41%
Match_onepass_regex/32        20.2MB/s ± 0%   29.5MB/s ± 0%  +45.92%
Match_onepass_regex/1K        22.4MB/s ± 0%   33.8MB/s ± 0%  +50.54%
Match_onepass_regex/32K       22.6MB/s ± 0%   33.9MB/s ± 0%  +49.67%
Match_onepass_regex/1M        22.7MB/s ± 0%   33.9MB/s ± 0%  +49.27%
Match_onepass_regex/32M       23.0MB/s ± 0%   33.9MB/s ± 0%  +47.14%
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/271337 mentions this issue: cmd/compile,cmd/asm: insert hint when using blrl on ppc64

@laboger
Copy link
Contributor

laboger commented Nov 18, 2020

We realize this is during the freeze but thought we would see if it might possibly go in since it makes a nice improvement and is a very simple fix.

@ianlancetaylor ianlancetaylor changed the title ppc64 function pointer call performance regression cmd/compile: ppc64 function pointer call performance regression Nov 19, 2020
@ianlancetaylor ianlancetaylor added the NeedsFix The path to resolution is known, but the work has not been done. label Nov 19, 2020
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Nov 19, 2020
@ianlancetaylor
Copy link
Member

This seems like a bug fix to me. I think it's fine during the freeze. Thanks.

@golang golang locked and limited conversation to collaborators Nov 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants