Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: combine BSWAP on amd64 #41684

Closed
josharian opened this issue Sep 28, 2020 · 10 comments
Closed

cmd/compile: combine BSWAP on amd64 #41684

josharian opened this issue Sep 28, 2020 · 10 comments
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@josharian
Copy link
Contributor

package p

import "encoding/binary"

func f(b []byte, x *[8]byte) {
	_ = b[8]
	t := binary.BigEndian.Uint64(x[:])
	binary.BigEndian.PutUint64(b, t)
}

This should compile down to two MOVQs on amd64, one to load from x and one to write to b.

Instead, it contains two successive BSWAP instructions. We should eliminate those. As a bonus, we should eliminate any BSWAPS separated only by bitwise operations. For example:

package p

import "encoding/binary"

func f(b []byte, x *[8]byte) {
	_ = b[8]
	t := binary.BigEndian.Uint64(x[:])
	t = ^t
	binary.BigEndian.PutUint64(b, t)
}

(In addition to unary ^, there's binary &, |, and ^. Maybe some others?)

One advantage to this kind of optimization is that it reduces the important of whether the endianness selected in the code matches the endianness of the architecture the code is being executed on.

Related: #41663

cc @agarciamontoro

@josharian josharian added this to the Unplanned milestone Sep 28, 2020
@ianlancetaylor ianlancetaylor changed the title cmd/compile: combine BWAPS on amd64 cmd/compile: combine BSWAP on amd64 Sep 28, 2020
@andybons andybons added the NeedsFix The path to resolution is known, but the work has not been done. label Sep 29, 2020
danderson added a commit to inetaf/netaddr that referenced this issue Dec 25, 2020
This allows IP manipulation operations to operate on IPs as two registers, which empirically
leads to significant speedups, in particular on OOO superscalars where the two halves can
be processed in parallel.

You might expect that we could keep the representation as [16]byte, and do a cycle of
BigEndian.Uint64+tweak+BigEndian.PutUint64, and that would compile down to efficient
code. Unfortunately, due to a variety of missing optimizations in the Go compiler,
that is not the case and that code turns into byte-wise operations. On the other hand,
converting to a uint64 pair at construction results in efficient construction (a pair of
MOVQ+BSWAPQ) and efficient twiddling operations (single-cycle arithmetic on 64-bit
pairs). See also:
 - golang/go#41663
 - golang/go#41684
 - golang/go#42958
 - The discussion in #63

name                              old time/op    new time/op    delta
StdIPv4-8                            146ns ± 2%     141ns ± 2%    -3.42%  (p=0.016 n=5+5)
IPv4-8                               120ns ± 1%     107ns ± 2%   -10.65%  (p=0.008 n=5+5)
IPv4_inline-8                        120ns ± 0%     118ns ± 1%    -1.67%  (p=0.016 n=4+5)
StdIPv6-8                            211ns ± 2%     215ns ± 1%    +2.18%  (p=0.008 n=5+5)
IPv6-8                               281ns ± 1%     252ns ± 1%   -10.19%  (p=0.008 n=5+5)
IPv4Contains-8                      11.8ns ± 4%     4.7ns ± 2%   -60.00%  (p=0.008 n=5+5)
ParseIPv4-8                         68.1ns ± 4%    78.8ns ± 1%   +15.74%  (p=0.008 n=5+5)
ParseIPv6-8                          419ns ± 1%     409ns ± 0%    -2.40%  (p=0.016 n=4+5)
StdParseIPv4-8                      73.7ns ± 1%    88.8ns ± 2%   +20.50%  (p=0.008 n=5+5)
StdParseIPv6-8                       132ns ± 2%     134ns ± 1%      ~     (p=0.079 n=5+5)
IPPrefixMasking/IPv4_/32-8          36.3ns ± 3%     4.8ns ± 4%   -86.72%  (p=0.008 n=5+5)
IPPrefixMasking/IPv4_/17-8          39.0ns ± 0%     4.8ns ± 3%   -87.78%  (p=0.008 n=5+5)
IPPrefixMasking/IPv4_/0-8           36.9ns ± 2%     4.8ns ± 4%   -87.07%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/128-8         32.7ns ± 1%     4.7ns ± 2%   -85.47%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/65-8          39.8ns ± 1%     4.7ns ± 1%   -88.13%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/0-8           40.7ns ± 1%     4.7ns ± 2%   -88.41%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/128-8     136ns ± 3%       5ns ± 2%   -96.53%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8      142ns ± 2%       5ns ± 1%   -96.65%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8       143ns ± 2%       5ns ± 3%   -96.67%  (p=0.008 n=5+5)
IPSetFuzz-8                         22.7µs ± 2%    16.4µs ± 2%   -27.84%  (p=0.008 n=5+5)

name                              old alloc/op   new alloc/op   delta
StdIPv4-8                            16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv4-8                               0.00B          0.00B           ~     (all equal)
IPv4_inline-8                        0.00B          0.00B           ~     (all equal)
StdIPv6-8                            16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv6-8                               16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv4Contains-8                       0.00B          0.00B           ~     (all equal)
ParseIPv4-8                          0.00B          0.00B           ~     (all equal)
ParseIPv6-8                          48.0B ± 0%     48.0B ± 0%      ~     (all equal)
StdParseIPv4-8                       16.0B ± 0%     16.0B ± 0%      ~     (all equal)
StdParseIPv6-8                       16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPPrefixMasking/IPv4_/32-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv4_/17-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv4_/0-8            0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/128-8          0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/65-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/0-8            0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_zone_/128-8     16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8      16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8       16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPSetFuzz-8                         2.60kB ± 0%    2.60kB ± 0%      ~     (p=1.000 n=5+5)

name                              old allocs/op  new allocs/op  delta
StdIPv4-8                             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv4-8                                0.00           0.00           ~     (all equal)
IPv4_inline-8                         0.00           0.00           ~     (all equal)
StdIPv6-8                             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv6-8                                1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv4Contains-8                        0.00           0.00           ~     (all equal)
ParseIPv4-8                           0.00           0.00           ~     (all equal)
ParseIPv6-8                           3.00 ± 0%      3.00 ± 0%      ~     (all equal)
StdParseIPv4-8                        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
StdParseIPv6-8                        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPPrefixMasking/IPv4_/32-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv4_/17-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv4_/0-8             0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/128-8           0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/65-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/0-8             0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_zone_/128-8      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPSetFuzz-8                           33.0 ± 0%      33.0 ± 0%      ~     (all equal)

Signed-off-by: David Anderson <dave@natulte.net>
danderson added a commit to inetaf/netaddr that referenced this issue Dec 26, 2020
This allows IP manipulation operations to operate on IPs as two registers, which empirically
leads to significant speedups, in particular on OOO superscalars where the two halves can
be processed in parallel.

You might expect that we could keep the representation as [16]byte, and do a cycle of
BigEndian.Uint64+tweak+BigEndian.PutUint64, and that would compile down to efficient
code. Unfortunately, due to a variety of missing optimizations in the Go compiler,
that is not the case and that code turns into byte-wise operations. On the other hand,
converting to a uint64 pair at construction results in efficient construction (a pair of
MOVQ+BSWAPQ) and efficient twiddling operations (single-cycle arithmetic on 64-bit
pairs). See also:
 - golang/go#41663
 - golang/go#41684
 - golang/go#42958
 - The discussion in #63

name                              old time/op    new time/op    delta
StdIPv4-8                            146ns ± 2%     141ns ± 2%    -3.42%  (p=0.016 n=5+5)
IPv4-8                               120ns ± 1%     107ns ± 2%   -10.65%  (p=0.008 n=5+5)
IPv4_inline-8                        120ns ± 0%     118ns ± 1%    -1.67%  (p=0.016 n=4+5)
StdIPv6-8                            211ns ± 2%     215ns ± 1%    +2.18%  (p=0.008 n=5+5)
IPv6-8                               281ns ± 1%     252ns ± 1%   -10.19%  (p=0.008 n=5+5)
IPv4Contains-8                      11.8ns ± 4%     4.7ns ± 2%   -60.00%  (p=0.008 n=5+5)
ParseIPv4-8                         68.1ns ± 4%    78.8ns ± 1%   +15.74%  (p=0.008 n=5+5)
ParseIPv6-8                          419ns ± 1%     409ns ± 0%    -2.40%  (p=0.016 n=4+5)
StdParseIPv4-8                      73.7ns ± 1%    88.8ns ± 2%   +20.50%  (p=0.008 n=5+5)
StdParseIPv6-8                       132ns ± 2%     134ns ± 1%      ~     (p=0.079 n=5+5)
IPPrefixMasking/IPv4_/32-8          36.3ns ± 3%     4.8ns ± 4%   -86.72%  (p=0.008 n=5+5)
IPPrefixMasking/IPv4_/17-8          39.0ns ± 0%     4.8ns ± 3%   -87.78%  (p=0.008 n=5+5)
IPPrefixMasking/IPv4_/0-8           36.9ns ± 2%     4.8ns ± 4%   -87.07%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/128-8         32.7ns ± 1%     4.7ns ± 2%   -85.47%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/65-8          39.8ns ± 1%     4.7ns ± 1%   -88.13%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/0-8           40.7ns ± 1%     4.7ns ± 2%   -88.41%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/128-8     136ns ± 3%       5ns ± 2%   -96.53%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8      142ns ± 2%       5ns ± 1%   -96.65%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8       143ns ± 2%       5ns ± 3%   -96.67%  (p=0.008 n=5+5)
IPSetFuzz-8                         22.7µs ± 2%    16.4µs ± 2%   -27.84%  (p=0.008 n=5+5)

name                              old alloc/op   new alloc/op   delta
StdIPv4-8                            16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv4-8                               0.00B          0.00B           ~     (all equal)
IPv4_inline-8                        0.00B          0.00B           ~     (all equal)
StdIPv6-8                            16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv6-8                               16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv4Contains-8                       0.00B          0.00B           ~     (all equal)
ParseIPv4-8                          0.00B          0.00B           ~     (all equal)
ParseIPv6-8                          48.0B ± 0%     48.0B ± 0%      ~     (all equal)
StdParseIPv4-8                       16.0B ± 0%     16.0B ± 0%      ~     (all equal)
StdParseIPv6-8                       16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPPrefixMasking/IPv4_/32-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv4_/17-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv4_/0-8            0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/128-8          0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/65-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/0-8            0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_zone_/128-8     16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8      16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8       16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPSetFuzz-8                         2.60kB ± 0%    2.60kB ± 0%      ~     (p=1.000 n=5+5)

name                              old allocs/op  new allocs/op  delta
StdIPv4-8                             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv4-8                                0.00           0.00           ~     (all equal)
IPv4_inline-8                         0.00           0.00           ~     (all equal)
StdIPv6-8                             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv6-8                                1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv4Contains-8                        0.00           0.00           ~     (all equal)
ParseIPv4-8                           0.00           0.00           ~     (all equal)
ParseIPv6-8                           3.00 ± 0%      3.00 ± 0%      ~     (all equal)
StdParseIPv4-8                        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
StdParseIPv6-8                        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPPrefixMasking/IPv4_/32-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv4_/17-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv4_/0-8             0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/128-8           0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/65-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/0-8             0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_zone_/128-8      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPSetFuzz-8                           33.0 ± 0%      33.0 ± 0%      ~     (all equal)

Signed-off-by: David Anderson <dave@natulte.net>
majst01 pushed a commit to majst01/netaddr that referenced this issue Dec 26, 2020
This allows IP manipulation operations to operate on IPs as two registers, which empirically
leads to significant speedups, in particular on OOO superscalars where the two halves can
be processed in parallel.

You might expect that we could keep the representation as [16]byte, and do a cycle of
BigEndian.Uint64+tweak+BigEndian.PutUint64, and that would compile down to efficient
code. Unfortunately, due to a variety of missing optimizations in the Go compiler,
that is not the case and that code turns into byte-wise operations. On the other hand,
converting to a uint64 pair at construction results in efficient construction (a pair of
MOVQ+BSWAPQ) and efficient twiddling operations (single-cycle arithmetic on 64-bit
pairs). See also:
 - golang/go#41663
 - golang/go#41684
 - golang/go#42958
 - The discussion in inetaf#63

name                              old time/op    new time/op    delta
StdIPv4-8                            146ns ± 2%     141ns ± 2%    -3.42%  (p=0.016 n=5+5)
IPv4-8                               120ns ± 1%     107ns ± 2%   -10.65%  (p=0.008 n=5+5)
IPv4_inline-8                        120ns ± 0%     118ns ± 1%    -1.67%  (p=0.016 n=4+5)
StdIPv6-8                            211ns ± 2%     215ns ± 1%    +2.18%  (p=0.008 n=5+5)
IPv6-8                               281ns ± 1%     252ns ± 1%   -10.19%  (p=0.008 n=5+5)
IPv4Contains-8                      11.8ns ± 4%     4.7ns ± 2%   -60.00%  (p=0.008 n=5+5)
ParseIPv4-8                         68.1ns ± 4%    78.8ns ± 1%   +15.74%  (p=0.008 n=5+5)
ParseIPv6-8                          419ns ± 1%     409ns ± 0%    -2.40%  (p=0.016 n=4+5)
StdParseIPv4-8                      73.7ns ± 1%    88.8ns ± 2%   +20.50%  (p=0.008 n=5+5)
StdParseIPv6-8                       132ns ± 2%     134ns ± 1%      ~     (p=0.079 n=5+5)
IPPrefixMasking/IPv4_/32-8          36.3ns ± 3%     4.8ns ± 4%   -86.72%  (p=0.008 n=5+5)
IPPrefixMasking/IPv4_/17-8          39.0ns ± 0%     4.8ns ± 3%   -87.78%  (p=0.008 n=5+5)
IPPrefixMasking/IPv4_/0-8           36.9ns ± 2%     4.8ns ± 4%   -87.07%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/128-8         32.7ns ± 1%     4.7ns ± 2%   -85.47%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/65-8          39.8ns ± 1%     4.7ns ± 1%   -88.13%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_/0-8           40.7ns ± 1%     4.7ns ± 2%   -88.41%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/128-8     136ns ± 3%       5ns ± 2%   -96.53%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8      142ns ± 2%       5ns ± 1%   -96.65%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8       143ns ± 2%       5ns ± 3%   -96.67%  (p=0.008 n=5+5)
IPSetFuzz-8                         22.7µs ± 2%    16.4µs ± 2%   -27.84%  (p=0.008 n=5+5)

name                              old alloc/op   new alloc/op   delta
StdIPv4-8                            16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv4-8                               0.00B          0.00B           ~     (all equal)
IPv4_inline-8                        0.00B          0.00B           ~     (all equal)
StdIPv6-8                            16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv6-8                               16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPv4Contains-8                       0.00B          0.00B           ~     (all equal)
ParseIPv4-8                          0.00B          0.00B           ~     (all equal)
ParseIPv6-8                          48.0B ± 0%     48.0B ± 0%      ~     (all equal)
StdParseIPv4-8                       16.0B ± 0%     16.0B ± 0%      ~     (all equal)
StdParseIPv6-8                       16.0B ± 0%     16.0B ± 0%      ~     (all equal)
IPPrefixMasking/IPv4_/32-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv4_/17-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv4_/0-8            0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/128-8          0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/65-8           0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_/0-8            0.00B          0.00B           ~     (all equal)
IPPrefixMasking/IPv6_zone_/128-8     16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8      16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8       16.0B ± 0%      0.0B       -100.00%  (p=0.008 n=5+5)
IPSetFuzz-8                         2.60kB ± 0%    2.60kB ± 0%      ~     (p=1.000 n=5+5)

name                              old allocs/op  new allocs/op  delta
StdIPv4-8                             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv4-8                                0.00           0.00           ~     (all equal)
IPv4_inline-8                         0.00           0.00           ~     (all equal)
StdIPv6-8                             1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv6-8                                1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPv4Contains-8                        0.00           0.00           ~     (all equal)
ParseIPv4-8                           0.00           0.00           ~     (all equal)
ParseIPv6-8                           3.00 ± 0%      3.00 ± 0%      ~     (all equal)
StdParseIPv4-8                        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
StdParseIPv6-8                        1.00 ± 0%      1.00 ± 0%      ~     (all equal)
IPPrefixMasking/IPv4_/32-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv4_/17-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv4_/0-8             0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/128-8           0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/65-8            0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_/0-8             0.00           0.00           ~     (all equal)
IPPrefixMasking/IPv6_zone_/128-8      1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/65-8       1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPPrefixMasking/IPv6_zone_/0-8        1.00 ± 0%      0.00       -100.00%  (p=0.008 n=5+5)
IPSetFuzz-8                           33.0 ± 0%      33.0 ± 0%      ~     (all equal)

Signed-off-by: David Anderson <dave@natulte.net>
Signed-off-by: Stefan Majer <stefan.majer@gmail.com>
@agarciamontoro
Copy link
Contributor

@josharian, I'd like to work on this one, too, as I guess it's quite similar to #41663

By the way, should this one be added to go1.17 too?

@josharian
Copy link
Contributor Author

Great! Let me know if you have any questions.

@agarciamontoro
Copy link
Contributor

agarciamontoro commented May 7, 2021

I am working on this, finally, and the first part of the issue is easily solvable with a rule such as:

(BSWAPQ (BSWAPQ p)) => p

@josharian, do we need to check that the inner BSWAPQ is used only once? Something like this:

(BSWAPQ a:(BSWAPQ p))
    && a.Uses == 1
    && clobber(a) =>
    p

I think I understand what the Uses field is for and what the clobber function does, but I'm not sure I got it 100%.

Thanks!

EDIT: All tests pass with both versions.

@josharian
Copy link
Contributor Author

No need to check uses or to clobber; your simple version suffices. BSWAP works on a regular value (held in a register), not directly on memory.

Feel free to stop at the double BSWAP elimination—eliminating the bitwise operations is, as I said, a bonus. :)

Are there other architectures to apply it to? Worth doing BSWAPL?

Note the 1.17 freeze has begun. Feel free to mail your changes, but they won't go in until the 1.18 cycle.

@agarciamontoro
Copy link
Contributor

agarciamontoro commented May 7, 2021

No need to check uses or to clobber; your simple version suffices. BSWAP works on a regular value (held in a register), not directly on memory.

Cool, thanks for the info!

Feel free to stop at the double BSWAP elimination—eliminating the bitwise operations is, as I said, a bonus. :)

I actually have the unary ^ kind of solved, too. I was wondering what the other cases are. You talk about some binary bitwise operations, but between what values? Should I test that with another array argument similar to x?

Are there other architectures to apply it to?

That's my next step, I wanted to take a look!

Worth doing BSWAPL?

Yup, I was actually thinking about doing something a bit more general, like

(BSWAP(Q|L) (BSWAP(Q|L) p)) => p

Note the 1.17 freeze has begun. Feel free to mail your changes, but they won't go in until the 1.18 cycle.

I konw, thanks for the heads-up!

@josharian
Copy link
Contributor Author

You talk about some binary bitwise operations, but between what values? Should I test that with another array argument similar to x?

Yes, exactly.

#31586 will probably be useful context for this.

@agarciamontoro
Copy link
Contributor

I'm sending a CL covering only the double swap case. I added rules for both amd64 and arm64. I am not familiar at all with s390x nor with ppc64le, but I think the double swap was not happening there. The test does cover s390x (this should be reviewed with even more care, though), but I had no clue on what to do with ppc64le. We can discuss in the CL itself, if that works for you.

Regarding the other cases (the unary and binary bitwise operations): should we open a new issue for them? I can probably start taking a deeper look next Friday.

@gopherbot
Copy link

Change https://golang.org/cl/320073 mentions this issue: cmd/compile: eliminate successive swaps

@agarciamontoro
Copy link
Contributor

@josharian, should we add this to the 1.18 milestone? I'm planning on finishing the work here this or next week.

@josharian
Copy link
Contributor Author

@agarciamontoro this won't block the 1.18 milestone, so there's no need, but thanks for checking in. (And for working on it.)

@golang golang locked and limited conversation to collaborators Oct 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

4 participants