New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: miscellaneous optimizations #24958
Comments
Some optimization can also be done for ARM:
|
The reason why the following two MOVBs are not combined to a single MOVH, is that the first load (v23) is used twice.
This optimization seems can not be done by SSA rules only.
|
This is because the bounds check might panic after assigning the
|
Thank you. Keith. But such kinds of code looks odd. Is it possible to clobber the MOVBstore if it was only referred by a bound check? Or making bound check omits disappeared store/load ? |
Combining the stores is not legal in the original example. If the first array index succeeds but the second fails, the language semantics require that the first write happens and the second doesn't. That means that the first write must be just a single byte; it can't be combined with the second write. The "in-between" memory state is observable if someone recovers the panic. That requirement is realized in SSA by passing the results of the first store to the bounds check. |
Thank you. Keith. I see the key point. |
386/amd64:
|
about 40% of them have been implemented via my recent commits, others do not improvement. |
My optimization plan for arm64 in go-1.12:
further optimization for (shifted) register indexed load/store, especially interference with combined load/store.
1.1 combination of load/store uint16/uint32 to upper type
1.2 BE/LE should be both supported for the same situation
1.3 load/store in the order of from/to high memory down to low memory (it is not BE, LE load/store can also happen from high-memory/upper-byte to low-memory/lower byte)
(shifted) register indexed load/store for FP. (assembler and compiler)
optimization with MADD/MSUB/MADDW/MSUBW
"MUL R1, R2, R3"
"ADD R3, R4" can be optimized to
"MADD R1, R2, R3, R4"
I expect to see both performance improvement and code size reduction with MADD/MSUB emitted.
optimize comparasion with ANDS/ADDS/SUBS
optimize atomic operations with SWPALD/SWPALW/SWPALH/SWPALB
STADD/STSUB/STEOR/STOR
those instructions directly operate on a memory operand atomically, I expect to see both performance improvement and code size reduction with them emitted.
BFC - bit field clear
constant pool
"ADD $0x00aaaaaa, Rx" is assembled to "LDR off(PC), Rtmp" + "ADD Rtmp, Rx", and a 4-byte item is stored in the constant pool, total 12 bytes are cost.
It can be optimized to "ADD $0xaaa, Rx" + "ADD $0xaaa000, Rx", and only 8 bytes are cost.
This optimization does not improve performance, but does reduce code size.
FMULA/FMULS
(FADD z:(FMUL x y) a) -> (FMULA x a y)
a restriction z.Uses==1 is needed to improve performance.
constant pool
use "MOV $0xaaaa, Rx" + "MOVK $0xbbbb0000, Rx" instead of loading a 32-bit 0xbbbbaaaa from constant pool;
use "MOV $0xaaaa, Rx" + "MOVK $0xbbbb0000, Rx" + "MOVK $0xcccc00000000, Rx" instead of loading a 64-bit 0xccccbbbbaaaa from constant pool;
LDP/STP for FP (assembler and compiler)
The text was updated successfully, but these errors were encountered: