New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile/internal/ssa: perform safe code de-duplication #24936
Comments
Change https://golang.org/cl/107895 mentions this issue: |
/cc @randall77 @aclements |
/cc @josharian @TocarIP |
This is a really interesting optimization. My main concern (other than build speed impact, which you mentioned on the CL was in the noise) is the effect on debuggability. For example, if a user tries to set a breakpoint on a line that's been merged with another block, I'm not sure what happens, but I can't imagine how that would still work "correctly". /cc @dr2chase |
Could you give some examples of where this helps in std or cmd (or elsewhere)? I find the synthetic example you gave above unconvincing as motivation because that could be written as a switch statement with the identical cases merged, which would be both clearer code and compile to the better output. |
@aclements, see attached file in original message (bincmp.txt). For example, // atoi parses an int from a string s.
// The bool result reports whether s is a number
// representable by a value of type int.
func atoi(s string) (int, bool) {
if s == "" {
return 0, false
}
neg := false
if s[0] == '-' {
neg = true
s = s[1:]
}
un := uint(0)
for i := 0; i < len(s); i++ {
c := s[i]
if c < '0' || c > '9' {
return 0, false
}
if un > maxUint/10 {
// overflow
return 0, false
}
un *= 10
un1 := un + uint(c) - '0'
if un1 < un {
// overflow
return 0, false
}
un = un1
}
if !neg && un > uint(maxInt) {
return 0, false
}
if neg && un > uint(maxInt)+1 {
return 0, false
}
n := int(un)
if neg {
n = -n
}
return n, true
} The
Five epilogue sequences of this form are removed: MOVQ $0, "".~r0+8(SP)
MOVB $0, "".~r1+16(SP)
RET |
Also, where are profiling samples attributed? |
@bradfitz, I've seen a couple regressions in go1 and tried to find explanation for them before posting the results. I also want to re-run benchmarks on different machine with better configuration to get less noise.
If this would require additional debug mappings, that could kill binary size win, but maybe code that is being executed still will be smaller and, hopefully, more I-cache friendly. |
Thanks for the std example. I don't think additional debug mappings can solve this. My concern is that the information is gone. If I try to set a breakpoint on the last [1] Okay, not strictly true: one could imagine synthesizing earlier breakpoints to determine which path was taken (or using the branch target buffer on x86), but no debugger does that and there's no way to express this in DWARF. |
go1 benchmarks
compilebench
|
Would it be enough if we share all but the first instruction in the duplicate basic blocks? That would leave one instruction on which to hang a line number. I think it would be useful to share epilogues, especially now that they are longer because we use frame pointers everywhere. Sharing epilogues doesn't have any adverse debugging problems. I even wrote a CL for sharing epilogues once upon a time, but never completed it: https://go-review.googlesource.com/c/go/+/33634 |
When do we decide on that "first instruction"? It would be complicated, for example, if you found common tails, then ran a phase of dead code elimination or CSE that vanished the "first instructions". Perhaps just before register allocation? Or do we prefer after, to provide greater freedom in register selection (which would probably cause tails to be much less common just because of the vagaries of register allocation). |
Given the small impact on binary size, and the small (and varied) performance improvement, I don't think this is worth the harm it does to debuggability and profile fidelity. Debuggability is one of our top priorities (partly because we did significant harm to it in recent releases and discovered how much of a problem that was). And reducing profile fidelity makes it harder for users to make more fundamental improvements to their code's performance. It may be worth revisiting epilogue deduplication, though, as @randall77 pointed out. I wouldn't expect nearly as much win from it as, say, in C, since Go doesn't have callee-save registers that need to be restored other than the frame pointer. But it could still be a win without any downsides. |
There are cases where duplicated code sequences can be merged without performance loss or invalidation of panic line info.
The most obvious are
BlockRet
that do not contain anything that may cause run time panic.Given this code:
We currently get this output:
If we were merging epilogue sequences, we could get this:
Proposed implementation will follow soon in form of CL.
Detailed
bincmp
info forgo
binary: bincmp.txt.Main objectives/restrictions:
If any of those are not respected by implementation, it's a bug.
The text was updated successfully, but these errors were encountered: