Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: incomplete dead code elimination for conditional #33713

Closed
mariecurried opened this issue Aug 19, 2019 · 16 comments
Closed

cmd/compile: incomplete dead code elimination for conditional #33713

mariecurried opened this issue Aug 19, 2019 · 16 comments
Labels
binary-size FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@mariecurried
Copy link

mariecurried commented Aug 19, 2019

What version of Go are you using (go version)?

$ go version
go version devel +0dd120d Sun Aug 18 01:16:33 2019 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What did you do?

I compiled the following function and inspected the generated assembly (https://godbolt.org/z/q8MXUH):

func f(x string) int {
    var y int
    if x == "1" && len(x) == 2 {
        y = 1
    }

    return y
}

What did you expect to see?

I expected the generated code to be a simple return, as in this case: https://godbolt.org/z/o3uNbH.

        movq    $0, "".~r1+24(SP)
        ret

What did you see instead?

Instead, the generated code contains a redundant ´jump´ instruction:

        movq    "".x+16(SP), AX
        cmpq    AX, $1
        jne     f_pc11
f_pc11:
        pcdata  $1, $2
        xorps   X0, X0
        movups  X0, "".~r1+24(SP)
        ret

The same happens for this function https://godbolt.org/z/tDm4z2.

@martisch
Copy link
Contributor

martisch commented Aug 19, 2019

Is this a manually constructed example or does this come up often in Go programs? Was it in hand written or generated code?

It will be very slow up to impossible to find all boolean tautologies. So there likely has to be some tradeoff how much time is spend in the compiler on finding these vs just leaving them as is. Which brings the question how often this comes up in what forms in non crafted examples and where.

Relevant discussion: #32713

@martisch martisch added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 19, 2019
@mariecurried
Copy link
Author

I probably didn't express my intent in a clear manner. I'm not saying these kinds of constructs should be treated as tautologies, but I happened to notice that this was happening in this case https://godbolt.org/z/o3uNbH.
This lead me to experiment with the compiler, ending up finding https://godbolt.org/z/q8MXUH and https://godbolt.org/z/tDm4z2, where the problem is not about boolean tautologies, but related to many redundant instructions being generated, given that the compiler knows the return value is always 0.

@martisch
Copy link
Contributor

martisch commented Aug 19, 2019

As the patterns that could be optimised can vary widely and we should make sure we detect the ones (without to much overhead) that help with real world code (which ideally we use simplified as test cases) Do you know examples of similar patterns not optimised in existing codebases?

If I understand the code and reply correctly the deeper issue here is that the compiler misses some dead code elimination of code that has no effect on the return value and no side effects?

@mariecurried
Copy link
Author

I'm not really a Go developer, I just like the language and enjoy experimenting with the compiler. That said, I don't have any concrete example of this issue (apart from these ones), but I believe this is a shortcoming of dead code elimination, given that for https://godbolt.org/z/q8MXUH and https://godbolt.org/z/tDm4z2 the compiler produces some unnecessary instructions that end up having no effect in the outcome of the function (the return value is always 0).

@martisch
Copy link
Contributor

/cc @randall77 @cherrymui @josharian

@martisch martisch changed the title cmd/compile: redundant jump for tautological string conditional cmd/compile: incomplete dead code elimination for conditional Aug 19, 2019
@martisch martisch removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 19, 2019
@mariecurried
Copy link
Author

Yeah, that is what I meant all along. Thanks for your help in renaming the issue accordingly, @martisch 😉

@bcmills bcmills added help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 19, 2019
@bcmills bcmills added this to the Unplanned milestone Aug 19, 2019
@randall77
Copy link
Contributor

I don't think we want to do anything explicit for cases like this. Unless someone can demonstrate that this occurs in real code.
But if this gets fixed as a side effect of some other improvement to optimizations (the prove pass, probably), then that would be fine.

@laboger
Copy link
Contributor

laboger commented Aug 19, 2019

I believe this is a similar example. It comes from bytes.(*Buffer).WriteByte.

TEXT bytes.(*Buffer).WriteByte(SB) /home/boger/golang/fresh/go/src/bytes/buffer.go
func (b *Buffer) WriteByte(c byte) error {
  0xc3130               e87e0010                MOVD 16(R30),R3                         
  0xc3134               7c230840                CMPU R3,R1                              
  0xc3138               41800010                BLT 0xc3148                             
  0xc313c               7ca802a6                MOVD LR,R5                              
  0xc3140               4bfa5e41                CALL runtime.morestack_noctxt(SB)       
  0xc3144               4bffffec                BR bytes.(*Buffer).WriteByte(SB)        
  0xc3148               7fe802a6                MOVD LR,R31                             
  0xc314c               fbe1ffc9                MOVDU R31,-56(R1)                       
        b.lastRead = opInvalid
  0xc3150               e8c10058                MOVD 88(R1),R6          
  0xc3154               98060020                MOVB R0,32(R6)          
        if l := len(b.buf); n <= cap(b.buf)-l {
  0xc3158               e8e60008                MOVD 8(R6),R7           
  0xc315c               e8a60010                MOVD 16(R6),R5          
  0xc3160               7d072850                SUB R7,R5,R8            
  0xc3164               2c280001                CMP R8,$1               
  0xc3168               4180006c                BLT 0xc31d4             

This should be BLT 0xc31b8, since 0xc31d4 just sets a constant and then
branches to a constant compare of that value. The other constant value
set in this block is not used on the path it actually takes.

                b.buf = b.buf[:l+n]
  0xc316c               38870001                ADD R7,$1,R4            
  0xc3170               7c242840                CMPU R4,R5              
  0xc3174               41810074                BGT 0xc31e8             
  0xc3178               f8860008                MOVD R4,8(R6)           

Since a predecessor of 0xc3180 can be eliminated, the CMPW only has 1 predecessor which
is setting a constant. As a result the next 3 instructions could be eliminated.

  0xc317c               38600001                MOVD $1,R3              
        m, ok := b.tryGrowByReslice(1)
  0xc3180               7c030000                CMPW R3,R0              
        if !ok {
  0xc3184               41820034                BEQ 0xc31b8             
        b.buf[m] = c

  0xc3188               e8860008                MOVD 8(R6),R4           
  0xc318c               e8a60000                MOVD 0(R6),R5           
  0xc3190               7c272040                CMPU R7,R4              
  0xc3194               4080004c                BGE 0xc31e0             
  0xc3198               88610060                MOVBZ 96(R1),R3         
  0xc319c               7c6729ae                MOVB R3,(R5)(R7)        
        return nil
  0xc31a0               f8010068                MOVD R0,104(R1)         
  0xc31a4               f8010070                MOVD R0,112(R1)         
  0xc31a8               ebe10000                MOVD 0(R1),R31          
  0xc31ac               7fe803a6                MOVD R31,LR             
  0xc31b0               38210038                ADD R1,$56,R1           
  0xc31b4               4e800020                RET                     
                m = b.grow(1)
  0xc31b8               f8c10020                MOVD R6,32(R1)                  
  0xc31bc               38600001                MOVD $1,R3                      
  0xc31c0               f8610028                MOVD R3,40(R1)                  
  0xc31c4               4bfff65d                CALL bytes.(*Buffer).grow(SB)   
  0xc31c8               e8e10030                MOVD 48(R1),R7                  
         b.buf[m] = c
  0xc31cc               e8c10058                MOVD 88(R1),R6          
                m = b.grow(1)
  0xc31d0               4bffffb8                BR 0xc3188              

 This block could go away

  0xc31d4               7c070378                OR R0,R0,R7             
  0xc31d8               7c030378                OR R0,R0,R3             
        m, ok := b.tryGrowByReslice(1)
  0xc31dc               4bffffa4                BR 0xc3180              

        b.buf[m] = c
  0xc31e0               7ce33b78                OR R7,R7,R3                     
  0xc31e4               4bfa828d                CALL runtime.panicIndex(SB)     
                b.buf = b.buf[:l+n]
  0xc31e8               4bfa82c9                CALL runtime.panicSliceAcap(SB) 
  0xc31ec               7c000378                OR R0,R0,R0                     

I don't know enough about passes to know where this type of optimization might be done. This is from an objdump on ppc64le, but the x86 objdump looks similar with respect the unnecessary branches and compares I mentioned.

@randall77
Copy link
Contributor

Lynn's example feels like the shortcircuit pass should handle it. It optimizes things of the form:

b1:
  x = true
  goto b2

b2:
   v = phi(x, ...other values from other predecessors...)
   if v: goto b3 else b4

shortcircuit will make b1 go directly to b3. It might be worth investigating why shortcircuit doesn't apply, and what might need to be upgraded to make it apply.

Maybe the merged thing is not a boolean, but an int constant?

b1:
   x = 0
   goto b2
b2:
   v = phi(x, ...)
   w = cmp(v, 0)
   if w goto b3 else b4

We could also make b1 go directly to b3 in this situation.

@josharian
Copy link
Contributor

I have a shortcircuit CL outstanding. Perhaps it helps.

@laboger
Copy link
Contributor

laboger commented Aug 21, 2019

Is this the one you mean @josharian https://go-review.googlesource.com/c/go/+/178197?

I can give it a try. This happens in some std pkgs, I've seen it in runtime and strconv so far.

@laboger
Copy link
Contributor

laboger commented Aug 21, 2019

The problem still occurs with this CL.
Another function with a similar problem is in strconv.(*decimal).Round. It also has a block with a single predecessor and successor that sets the constant that is being compare against a constant. But what caught my eye in this function was this sequence:

v31
00045 (344) CMPW	R4, $0
v85
00046 (344) MOVD	$1, R31
v85
00047 (344) ISEL	$2, R0, R31, R4
v62
00048 (+358) CMPW	R4, $0
b12
00049 (358) BNE	30

By itself, there should be no need for the ISEL and the CMP following it. However, the 2nd CMP has a phi operand, and the other phi operands are constant settings that should be short circuited.

This happens on ppc64le as well as AMD64 AFAICT.

I can post more detail, such as an objdump as above or more ssa output if helpful.

@josharian
Copy link
Contributor

The problem still occurs with this CL.

Thanks for checking. I'm mostly AFK this cycle, but I will try to take a look at this once my CL above lands, since I've been wading through shortcircuit recently.

@josharian
Copy link
Contributor

Turns out I'm not realistically going to investigate this during the 1.14 cycle, so if someone else wants to take a look, go for it.

@erifan
Copy link

erifan commented Jun 28, 2020

This CL https://go-review.googlesource.com/c/go/+/239457 seems to help this issue.

@mariecurried
Copy link
Author

No longer reproduces

@golang golang locked and limited conversation to collaborators Aug 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
binary-size FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

9 participants