You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
gofrontend currently generates very inefficient code for T$equal and t1 == t2 when dealing with identity-comparable types (i.e. integers, pointers, and unpadded structs and arrays of such).
For example:
The following code does two 4-byte compares and a jump in T$equal, and calls memcmp in t1 == t2. For t1 == t2, gofrontend has a trick to turn this into a single 8-byte compare if the struct is aligned and size<=16 bytes, but it's not the case here.
type T struct {
a int32
b int32
}
the following code always generates a call to memcmp:
type T struct {
a int32
b int32
c int32
d int32
}
A solution
In 2016 gcc introduced the __builtin_memcmp_eq builtin that knows how to lower memcmp efficiently when the result is only used for equality comparison (i.e. equality with 0 instead of 3-way ordering). This is typically useful when the size is a constexpr (as is the case here).
The basic idea is to replace a larger chain of integer comparisons loaded from contiguous memory locations into a smaller chain of bigger integer comparisons. Benefits are twofold:
There are less jumps, and therefore less opportunities for mispredictions and I-cache misses.
The code is smaller, both because jumps are removed and because the encoding of a 2*n byte compare is smaller than that of two n-byte compares.
As a first step, I’m simply proposing to replace calls to runtime.memequal with calls to __builtin_memcmp_eq. This only improves the generated code.
In first second example above, this would change the generated code (gccgo -march=haswell -m64 -O3 -c test.go) from:
This is both smaller in terms of code size and much more efficient.
Going further
Simplifying gofrontend
This also allows removing any specific code for handling sizes smaller than 16 bytes since they are already handled by gcc.
More performance improvements
This should be extended to piecewise-identity-comparable structs. For example, the following structure should be compared with three builtin calls ({a}, {c,d,e}, and {f,g}) and a float compare.
type T struct {
a int32
b float32 // Floats are not identity-comparable
c int32
d int32
e byte
// Implicit _ [3]byte padding
f int32
g int32
}
The text was updated successfully, but these errors were encountered:
The issue
gofrontend
currently generates very inefficient code forT$equal
andt1 == t2
when dealing with identity-comparable types (i.e. integers, pointers, and unpadded structs and arrays of such).For example:
T$equal
, and calls memcmp int1 == t2
. Fort1 == t2
,gofrontend
has a trick to turn this into a single 8-byte compare if the struct is aligned and size<=16 bytes, but it's not the case here.A solution
In 2016 gcc introduced the
__builtin_memcmp_eq
builtin that knows how to lower memcmp efficiently when the result is only used for equality comparison (i.e. equality with 0 instead of 3-way ordering). This is typically useful when the size is a constexpr (as is the case here).The basic idea is to replace a larger chain of integer comparisons loaded from contiguous memory locations into a smaller chain of bigger integer comparisons. Benefits are twofold:
As a first step, I’m simply proposing to replace calls to
runtime.memequal
with calls to__builtin_memcmp_eq
. This only improves the generated code.In first second example above, this would change the generated code (
gccgo -march=haswell -m64 -O3 -c test.go
) from:t1 == t2
T$equal
To (in both cases):
This is both smaller in terms of code size and much more efficient.
Going further
Simplifying
gofrontend
This also allows removing any specific code for handling sizes smaller than 16 bytes since they are already handled by gcc.
More performance improvements
This should be extended to piecewise-identity-comparable structs. For example, the following structure should be compared with three builtin calls (
{a}
,{c,d,e}
, and{f,g}
) and a float compare.The text was updated successfully, but these errors were encountered: