Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: make encoding/binary.LittleEndian.* have negligible inlining cost on amd64 #42958

Closed
josharian opened this issue Dec 2, 2020 · 13 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance

Comments

@josharian
Copy link
Contributor

encoding/binary.LittleEndian.* typically compile down to a single instruction on amd64. Their inlining cost is very high. I suggest we add a hack to the inlining budget to consider them to be an intrinsic, for the purposes of cost. Probably the same for other platforms as well.

cc @mdempsky

@ALTree ALTree added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Dec 3, 2020
@mvdan
Copy link
Member

mvdan commented Dec 3, 2020

Are there any other such std APIs that we should teach the inliner about? For example, I imagine many math/bits APIs also compile to one or two amd64 instructions.

@cespare
Copy link
Contributor

cespare commented Dec 3, 2020

Unsafe conversions (see #42739).

@egonelbre
Copy link
Contributor

Related issue (#42788)

@randall77
Copy link
Contributor

Most of the math/bits APIs are semantically inlined, so they aren't really in the same bucket.

"semantically inlined" = the compiler treats these APIs specially. It throws away their Go implementation and uses a compiler-provided one. The good part is that it always optimizes (unless you give -N, I think?). The bad part is that if you copy the implementation into your own function, you don't get the optimizations.

encoding/binary is not semantically inlined. They just reduce to good code using standard optimizations. The good part of that is if you copy an encoding/binary body into your own function, the optimization still happens.

Not particularly relevant to this issue, other than to note that math/bits probably doesn't need to be included here.

@bcmills
Copy link
Contributor

bcmills commented Dec 3, 2020

encoding/binary is not semantically inlined. They just reduce to good code using standard optimizations.

My understanding is that inlining cost is currently based on the function AST. Would it make sense to compute a “compiled inlining cost” and stash that somewhere (maybe in the export data) for more accurate cost estimates? (Is there already a separate issue for that, beyond #17566?)

@randall77
Copy link
Contributor

I don't think there is a separate issue.
Yes, cost estimates based on AST size certainly suck.
We could record with each function the count or bytes of instructions that it compiles to, which would be better.
Within a package this can get tricky because we inline everything before we compile anything, but cross-package that works.

@mdempsky
Copy link
Member

mdempsky commented Dec 3, 2020

Would it make sense to compute a “compiled inlining cost” and stash that somewhere (maybe in the export data) for more accurate cost estimates?

The long term aspiration is to do inlining after we're in SSA form and have had a chance to apply some initial optimizations. I think that's a ways off still though.

In the mean time, I think some ad-hoc special cases for notable standard library functions seems fine. Or maybe a std-only //go:inlinecost directive that allows use to override the inlining cost.

We could record with each function the count or bytes of instructions that it compiles to, which would be better.

That's an interesting idea.

@josharian
Copy link
Contributor Author

We could record with each function the count or bytes of instructions that it compiles to, which would be better.

I’ve experimented with this.

It yields fairly drastically different inlining results—appends are suddenly crazy expensive, as is code that can panic, because the cold code paths count too. It’s probably still better, but I didn’t pursue it because of the inconsistency vs intra-package inlining.

@dsnet
Copy link
Member

dsnet commented Dec 4, 2020

Whatever happens, I think we should avoid special casing just encoding/binary.LittleEndian.*. There are a number of equivalent functions that avoid the encoding/binary package as a dependency that may want to benefit from this as well (e.g., "compress/flate".load64)

@josharian
Copy link
Contributor Author

@dsnet I hear you but special casing is easy and can happen ~immediately. Recognizing equivalent forms is definitely not easy in anything resembling the current inliner. These routines are used heavily in high performance code because it is one way to do something like vectorization. I'd rather not make the perfect to the enemy of the good here.

danderson added a commit to inetaf/netaddr that referenced this issue Dec 25, 2020
This allows IP manipulation operations to operate on IPs as two registers, which empirically
leads to significant speedups, in particular on OOO superscalars where the two halves can
be processed in parallel.

You might expect that we could keep the representation as [16]byte, and do a cycle of
BigEndian.Uint64+tweak+BigEndian.PutUint64, and that would compile down to efficient
code. Unfortunately, due to a variety of missing optimizations in the Go compiler,
that is not the case and that code turns into byte-wise operations. On the other hand,
converting to a uint64 pair at construction results in efficient construction (a pair of
MOVQ+BSWAPQ) and efficient twiddling operations (single-cycle arithmetic on 64-bit
pairs). See also:
 - golang/go#41663
 - golang/go#41684
 - golang/go#42958
 - The discussion in #63

name                              old time/op    new time/op    delta
StdIPv4-8                            146ns ± 2%     141ns ± 2%    -3.42%  (p=0.016 n=5+5)
IPv4-8                               120ns ± 1%     107ns ± 2%   -10.65%  (p=0.008 n=5+5)
IPv4_inline-8                        120ns ± 0%     118ns ± 1%    -1.67%  (p=0.016 n=4+5)
StdIPv6-8                            211ns ± 2%     215ns ± 1%    +2.18%  (p=0.008 n=5+5)
IPv6-8                               281ns ± 1%     252ns ± 1%   -10.19%  (p=0.008 n=5+5)
IPv4Contains-8                      11.8ns ± 4%     4.7ns ± 2%   -60.00%  (p=0.008 n=5+5)
ParseIPv4-8                         68.1ns ± 4%    78.8ns ± 1%   +15.74%  (p=0.008 n=5+5)
ParseIPv6-8                          419ns ± 1%     409ns ± 0%    -2.40%  (p=0.016 n=4+5)
StdParseIPv4-8                      73.7ns ± 1%    88.8ns ± 2%   +20.50%  (p=0.008 n=5+5)
StdParseIPv6-8                       132ns ± 2%     134ns ± 1%      ~     (p=0.079 n=5+5)
IPPrefixMasking/IPv4_/32-8          36.3ns ± 3%     4.8ns ± 4%   -86.72%  (p=0.008 n=5+5)
IPPrefixMasking/IPv4_/17-8          39.0ns ± 0%     4.8ns ± 3%   -87.78%  (p=0.008 n=5+5)
IPPrefixMasking/IPv4_/0-8           36.9ns ± 2%     4.8ns ± 4%   -87.07%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/128-8         32.7ns ± 1%     4.7ns ± 2%   -85.47%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/65-8          39.8ns ± 1%     4.7ns ± 1%   -88.13%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/0-8           40.7ns ± 1%     4.7ns ± 2%   -88.41%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/128-8     136ns ± 3%       5ns ± 2%   -96.53%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8      142ns ± 2%       5ns ± 1%   -96.65%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8       143ns ± 2%       5ns ± 3%   -96.67%  (p=0.008 n=5+5)
IPSetFuzz-8                         22.7µs ± 2%    16.4µs ± 2%   -27.84%  (p=0.008 n=5+5)

name                              old alloc/op   new alloc/op   delta
StdIPv4-8                            16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv4-8                               0.00B          0.00B           ~     (all equal)
IPv4_inline-8                        0.00B          0.00B           ~     (all equal)
StdIPv6-8                            16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv6-8                               16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv4Contains-8                       0.00B          0.00B           ~     (all equal)
ParseIPv4-8                          0.00B          0.00B           ~     (all equal)
ParseIPv6-8                          48.0B ± 0%     48.0B ± 0%      ~     (all equal)
StdParseIPv4-8                       16.0B ± 0%     16.0B ± 0%      ~     (all equal)
StdParseIPv6-8                       16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPPrefixMasking/IPv4_/32-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv4_/17-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv4_/0-8            0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/128-8          0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/65-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/0-8            0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_zone_/128-8     16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8      16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8       16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPSetFuzz-8                         2.60kB ± 0%    2.60kB ± 0%      ~     (p=1.000 n=5+5)

name                              old allocs/op  new allocs/op  delta
StdIPv4-8                             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv4-8                                0.00           0.00           ~     (all equal)
IPv4_inline-8                         0.00           0.00           ~     (all equal)
StdIPv6-8                             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv6-8                                1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv4Contains-8                        0.00           0.00           ~     (all equal)
ParseIPv4-8                           0.00           0.00           ~     (all equal)
ParseIPv6-8                           3.00 ± 0%      3.00 ± 0%      ~     (all equal)
StdParseIPv4-8                        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
StdParseIPv6-8                        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPPrefixMasking/IPv4_/32-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv4_/17-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv4_/0-8             0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/128-8           0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/65-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/0-8             0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_zone_/128-8      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPSetFuzz-8                           33.0 ± 0%      33.0 ± 0%      ~     (all equal)

Signed-off-by: David Anderson <dave@natulte.net>
danderson added a commit to inetaf/netaddr that referenced this issue Dec 26, 2020
This allows IP manipulation operations to operate on IPs as two registers, which empirically
leads to significant speedups, in particular on OOO superscalars where the two halves can
be processed in parallel.

You might expect that we could keep the representation as [16]byte, and do a cycle of
BigEndian.Uint64+tweak+BigEndian.PutUint64, and that would compile down to efficient
code. Unfortunately, due to a variety of missing optimizations in the Go compiler,
that is not the case and that code turns into byte-wise operations. On the other hand,
converting to a uint64 pair at construction results in efficient construction (a pair of
MOVQ+BSWAPQ) and efficient twiddling operations (single-cycle arithmetic on 64-bit
pairs). See also:
 - golang/go#41663
 - golang/go#41684
 - golang/go#42958
 - The discussion in #63

name                              old time/op    new time/op    delta
StdIPv4-8                            146ns ± 2%     141ns ± 2%    -3.42%  (p=0.016 n=5+5)
IPv4-8                               120ns ± 1%     107ns ± 2%   -10.65%  (p=0.008 n=5+5)
IPv4_inline-8                        120ns ± 0%     118ns ± 1%    -1.67%  (p=0.016 n=4+5)
StdIPv6-8                            211ns ± 2%     215ns ± 1%    +2.18%  (p=0.008 n=5+5)
IPv6-8                               281ns ± 1%     252ns ± 1%   -10.19%  (p=0.008 n=5+5)
IPv4Contains-8                      11.8ns ± 4%     4.7ns ± 2%   -60.00%  (p=0.008 n=5+5)
ParseIPv4-8                         68.1ns ± 4%    78.8ns ± 1%   +15.74%  (p=0.008 n=5+5)
ParseIPv6-8                          419ns ± 1%     409ns ± 0%    -2.40%  (p=0.016 n=4+5)
StdParseIPv4-8                      73.7ns ± 1%    88.8ns ± 2%   +20.50%  (p=0.008 n=5+5)
StdParseIPv6-8                       132ns ± 2%     134ns ± 1%      ~     (p=0.079 n=5+5)
IPPrefixMasking/IPv4_/32-8          36.3ns ± 3%     4.8ns ± 4%   -86.72%  (p=0.008 n=5+5)
IPPrefixMasking/IPv4_/17-8          39.0ns ± 0%     4.8ns ± 3%   -87.78%  (p=0.008 n=5+5)
IPPrefixMasking/IPv4_/0-8           36.9ns ± 2%     4.8ns ± 4%   -87.07%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/128-8         32.7ns ± 1%     4.7ns ± 2%   -85.47%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/65-8          39.8ns ± 1%     4.7ns ± 1%   -88.13%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/0-8           40.7ns ± 1%     4.7ns ± 2%   -88.41%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/128-8     136ns ± 3%       5ns ± 2%   -96.53%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8      142ns ± 2%       5ns ± 1%   -96.65%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8       143ns ± 2%       5ns ± 3%   -96.67%  (p=0.008 n=5+5)
IPSetFuzz-8                         22.7µs ± 2%    16.4µs ± 2%   -27.84%  (p=0.008 n=5+5)

name                              old alloc/op   new alloc/op   delta
StdIPv4-8                            16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv4-8                               0.00B          0.00B           ~     (all equal)
IPv4_inline-8                        0.00B          0.00B           ~     (all equal)
StdIPv6-8                            16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv6-8                               16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv4Contains-8                       0.00B          0.00B           ~     (all equal)
ParseIPv4-8                          0.00B          0.00B           ~     (all equal)
ParseIPv6-8                          48.0B ± 0%     48.0B ± 0%      ~     (all equal)
StdParseIPv4-8                       16.0B ± 0%     16.0B ± 0%      ~     (all equal)
StdParseIPv6-8                       16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPPrefixMasking/IPv4_/32-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv4_/17-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv4_/0-8            0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/128-8          0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/65-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/0-8            0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_zone_/128-8     16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8      16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8       16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPSetFuzz-8                         2.60kB ± 0%    2.60kB ± 0%      ~     (p=1.000 n=5+5)

name                              old allocs/op  new allocs/op  delta
StdIPv4-8                             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv4-8                                0.00           0.00           ~     (all equal)
IPv4_inline-8                         0.00           0.00           ~     (all equal)
StdIPv6-8                             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv6-8                                1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv4Contains-8                        0.00           0.00           ~     (all equal)
ParseIPv4-8                           0.00           0.00           ~     (all equal)
ParseIPv6-8                           3.00 ± 0%      3.00 ± 0%      ~     (all equal)
StdParseIPv4-8                        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
StdParseIPv6-8                        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPPrefixMasking/IPv4_/32-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv4_/17-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv4_/0-8             0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/128-8           0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/65-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/0-8             0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_zone_/128-8      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPSetFuzz-8                           33.0 ± 0%      33.0 ± 0%      ~     (all equal)

Signed-off-by: David Anderson <dave@natulte.net>
majst01 pushed a commit to majst01/netaddr that referenced this issue Dec 26, 2020
This allows IP manipulation operations to operate on IPs as two registers, which empirically
leads to significant speedups, in particular on OOO superscalars where the two halves can
be processed in parallel.

You might expect that we could keep the representation as [16]byte, and do a cycle of
BigEndian.Uint64+tweak+BigEndian.PutUint64, and that would compile down to efficient
code. Unfortunately, due to a variety of missing optimizations in the Go compiler,
that is not the case and that code turns into byte-wise operations. On the other hand,
converting to a uint64 pair at construction results in efficient construction (a pair of
MOVQ+BSWAPQ) and efficient twiddling operations (single-cycle arithmetic on 64-bit
pairs). See also:
 - golang/go#41663
 - golang/go#41684
 - golang/go#42958
 - The discussion in inetaf#63

name                              old time/op    new time/op    delta
StdIPv4-8                            146ns ± 2%     141ns ± 2%    -3.42%  (p=0.016 n=5+5)
IPv4-8                               120ns ± 1%     107ns ± 2%   -10.65%  (p=0.008 n=5+5)
IPv4_inline-8                        120ns ± 0%     118ns ± 1%    -1.67%  (p=0.016 n=4+5)
StdIPv6-8                            211ns ± 2%     215ns ± 1%    +2.18%  (p=0.008 n=5+5)
IPv6-8                               281ns ± 1%     252ns ± 1%   -10.19%  (p=0.008 n=5+5)
IPv4Contains-8                      11.8ns ± 4%     4.7ns ± 2%   -60.00%  (p=0.008 n=5+5)
ParseIPv4-8                         68.1ns ± 4%    78.8ns ± 1%   +15.74%  (p=0.008 n=5+5)
ParseIPv6-8                          419ns ± 1%     409ns ± 0%    -2.40%  (p=0.016 n=4+5)
StdParseIPv4-8                      73.7ns ± 1%    88.8ns ± 2%   +20.50%  (p=0.008 n=5+5)
StdParseIPv6-8                       132ns ± 2%     134ns ± 1%      ~     (p=0.079 n=5+5)
IPPrefixMasking/IPv4_/32-8          36.3ns ± 3%     4.8ns ± 4%   -86.72%  (p=0.008 n=5+5)
IPPrefixMasking/IPv4_/17-8          39.0ns ± 0%     4.8ns ± 3%   -87.78%  (p=0.008 n=5+5)
IPPrefixMasking/IPv4_/0-8           36.9ns ± 2%     4.8ns ± 4%   -87.07%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/128-8         32.7ns ± 1%     4.7ns ± 2%   -85.47%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/65-8          39.8ns ± 1%     4.7ns ± 1%   -88.13%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/0-8           40.7ns ± 1%     4.7ns ± 2%   -88.41%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/128-8     136ns ± 3%       5ns ± 2%   -96.53%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8      142ns ± 2%       5ns ± 1%   -96.65%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8       143ns ± 2%       5ns ± 3%   -96.67%  (p=0.008 n=5+5)
IPSetFuzz-8                         22.7µs ± 2%    16.4µs ± 2%   -27.84%  (p=0.008 n=5+5)

name                              old alloc/op   new alloc/op   delta
StdIPv4-8                            16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv4-8                               0.00B          0.00B           ~     (all equal)
IPv4_inline-8                        0.00B          0.00B           ~     (all equal)
StdIPv6-8                            16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv6-8                               16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv4Contains-8                       0.00B          0.00B           ~     (all equal)
ParseIPv4-8                          0.00B          0.00B           ~     (all equal)
ParseIPv6-8                          48.0B ± 0%     48.0B ± 0%      ~     (all equal)
StdParseIPv4-8                       16.0B ± 0%     16.0B ± 0%      ~     (all equal)
StdParseIPv6-8                       16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPPrefixMasking/IPv4_/32-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv4_/17-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv4_/0-8            0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/128-8          0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/65-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/0-8            0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_zone_/128-8     16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8      16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8       16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPSetFuzz-8                         2.60kB ± 0%    2.60kB ± 0%      ~     (p=1.000 n=5+5)

name                              old allocs/op  new allocs/op  delta
StdIPv4-8                             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv4-8                                0.00           0.00           ~     (all equal)
IPv4_inline-8                         0.00           0.00           ~     (all equal)
StdIPv6-8                             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv6-8                                1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv4Contains-8                        0.00           0.00           ~     (all equal)
ParseIPv4-8                           0.00           0.00           ~     (all equal)
ParseIPv6-8                           3.00 ± 0%      3.00 ± 0%      ~     (all equal)
StdParseIPv4-8                        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
StdParseIPv6-8                        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPPrefixMasking/IPv4_/32-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv4_/17-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv4_/0-8             0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/128-8           0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/65-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/0-8             0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_zone_/128-8      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPSetFuzz-8                           33.0 ± 0%      33.0 ± 0%      ~     (all equal)

Signed-off-by: David Anderson <dave@natulte.net>
Signed-off-by: Stefan Majer <stefan.majer@gmail.com>
@dsnet
Copy link
Member

dsnet commented Dec 27, 2020

Random thought: Should little-endian and big-endian functionality be moved to math/bits?

With the acceptance of #395, you could imagine an API that operates on fixed-width arrays rather than pointers:

func LoadUint32LE(v [4]byte) uint32
func StoreUint32LE(v uint32) [4]byte

@gopherbot
Copy link

Change https://golang.org/cl/349931 mentions this issue: cmd/compile: make encoding/binary loads/stores cheaper to inline

@gopherbot
Copy link

Change https://go.dev/cl/431015 mentions this issue: cmd/compile: make encoding/binary appends cheaper to inline

gopherbot pushed a commit that referenced this issue Sep 15, 2022
Go 1.19 introduce new append-like APIs in package encoding/binary, this
change teaches the inliner to treat calls to these methods as cheap, so
that code using them will be more inlineable.

Updates #42958

Change-Id: Ie3dd4906e285430f435bdedbf8a11fdffce9302d
Reviewed-on: https://go-review.googlesource.com/c/go/+/431015
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Jenny Rakoczy <jenny@golang.org>
Reviewed-by: Jenny Rakoczy <jenny@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Jenny Rakoczy <jenny@golang.org>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Keith Randall <khr@google.com>
@golang golang locked and limited conversation to collaborators Sep 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Projects
None yet
Development

No branches or pull requests

10 participants