compile: convert standalone LEA to ADD/SUB to reduce port contention on x86/amd64 #49087

martisch · 2021-10-20T13:34:59Z

Except very new cores (e.g. sunny cove) LEA usually has around 2 cpu ports to execute while ADD has equal or more (4). It has shown advantageous in production profiles of C++ compiler optimized binaries to use ADD/SUB instead of LEA to reduce port contention.

We should test adding a pass after current SSA optimization that transforms stand alone simple LEA instructions (wasnt fused with compares ore moves) to use ADD and SUB where equivalent (modulo flags). We could also use that pass to split slow 3 operand LEAs to multiple ADDs or 2 operand LEAs.

klauspost · 2021-10-22T10:27:11Z

I can confirm this is also the case for AMD Zen 2.

#43690 shows improvement even when replacing LEAL const(a)(R8*1), a with ADDL $const, a; ADDL R8, a; in pipeline extensive work. I am not claiming this is always the case.

Also LEAW (16 bit destination) appears significantly slower (3x on Zen2, 2x on Intel) than 32 and 64 bit equivalents, so they seem to best be avoided.

For ADD, it seems the only surprise is ADDW imm16, r16 has a 3x penalty on Intel, but both ADDW imm8, r16 and ADDW imm32, r16 are fine.

martisch · 2021-10-22T15:12:59Z

LEAL const(a)(R8*1), a

Is even more special. That is a 3 operand LEA (3 cycles). Those except on very newest CPUs are already faster if split into 2x two operand LEA (2x 1 cycle): #21735 The go compiler does the splitting for Go code but not for assembler.

martisch added this to the Unplanned milestone Oct 20, 2021

martisch added the NeedsInvestigation label Oct 20, 2021

mvdan added the Performance label Oct 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compile: convert standalone LEA to ADD/SUB to reduce port contention on x86/amd64 #49087

compile: convert standalone LEA to ADD/SUB to reduce port contention on x86/amd64 #49087

martisch commented Oct 20, 2021

klauspost commented Oct 22, 2021

martisch commented Oct 22, 2021 •

edited

Loading

compile: convert standalone LEA to ADD/SUB to reduce port contention on x86/amd64 #49087

compile: convert standalone LEA to ADD/SUB to reduce port contention on x86/amd64 #49087

Comments

martisch commented Oct 20, 2021

klauspost commented Oct 22, 2021

martisch commented Oct 22, 2021 • edited Loading

martisch commented Oct 22, 2021 •

edited

Loading