Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: many new no-ops as a result of mid-stack inlining #29571

Open
laboger opened this issue Jan 4, 2019 · 12 comments
Open

cmd/compile: many new no-ops as a result of mid-stack inlining #29571

laboger opened this issue Jan 4, 2019 · 12 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@laboger
Copy link
Contributor

laboger commented Jan 4, 2019

What version of Go are you using (go version)?

latest

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

ppc64le and ppc64

What did you do?

I was inspecting code for another reason and I saw a lot of new no-ops in unexpected places. For example in runtime.cgoCheckTypedBlock:

       return mheap_.arenas[ai.l1()][ai.l2()].spans[(p/pageSize)%pagesPerArena]
  0x13d08               3fe0002b                ADDIS $0,$43,R31        
  0x13d0c               e8ff3728                MOVD 14120(R31),R7      
  0x13d10               8be70000                MOVBZ 0(R7),R31         
        s := spanOfUnchecked(uintptr(src))
  0x13d14               7c000378                OR R0,R0,R0             
  0x13d18               7c000378                OR R0,R0,R0             
  0x13d1c               7dc87378                OR R14,R14,R8           
        ai := arenaIndex(p)
  0x13d20               7c000378                OR R0,R0,R0             
  0x13d24               7c000378                OR R0,R0,R0             
        return mheap_.arenas[ai.l1()][ai.l2()].spans[(p/pageSize)%pagesPerArena]
  0x13d28               7c000378                OR R0,R0,R0             
  0x13d2c               7c000378                OR R0,R0,R0             
  0x13d30               7c000378                OR R0,R0,R0             
  0x13d34               7c000378                OR R0,R0,R0             
        return arenaIdx((p + arenaBaseOffset) / heapArenaBytes)
  0x13d38               79c93682                RLDICL R14,$38,$26,R9   

What did you expect to see?

I should only see no-ops where they are expected.

What did you see instead?

Lots of no-ops in unexpected places. In the test for bytes there are almost 9000 extra no-ops with this commit.

cd src/bytes
go test -c
objdump -D bytes.test | grep mr | grep r0,r0 | wc -l
Using go version devel +c043fc4 Fri Dec 28 04:17:55 2018 +0000 linux/ppc64le
347
Using go version devel +69c2c56 Fri Dec 28 20:55:36 2018 +0000 linux/ppc64le
9193

I have not found runtime breakage because of this but performance could be affected depending where the unnecessary no-ops occur.

@randall77
Copy link
Contributor

There's generally one nop per inlining that the compiler does. These are the InlMark opcodes that represent the callsite of an inlined function. They are required to get stack tracebacks correct when mid-stack inlining is enabled.

There's two ways to reduce them, both of which I was going to leave for 1.13 (but let me know if this is painful enough to try to get in for 1.12):

  1. Use an existing instruction instead of adding a nop. All we need is an instruction with the right file/line number to use as the inline mark. Existing instructions would work fine (if there is one, which there probably usually is).
  2. Drop nops if no safepoint references them. This will typically happen for nops inserted for leaf inlinings. Note that "safepoints" here includes instructions that might panic (like loads), so it might not remove all that many nops. And we're moving towards having safepoints everwhere™ which would probably make this case moot, except for inlined bodies which were completely optimized away.

I checked one of the cases you reported and the number of nops looks correct. For s := spanOfUnchecked(uintptr(src)) I expect to see 4 nops, one for the inlining of spanOfUnchecked and one each for the calls to arenaIndex, l1, and l2 called by spanOfUnchecked.

@laboger
Copy link
Contributor Author

laboger commented Jan 7, 2019

For now it's only painful to me because of how it looks. I will do some testing to see if it affects performance in any meaningful way. Otherwise your fixes for 1.13 sound OK.

  1. seems more promising, because isn't there always at least 1 instruction from the inlined code?

@randall77
Copy link
Contributor

  1. seems more promising, because isn't there always at least 1 instruction from the inlined code?

Not always. func f() { g() } generates no code for f. But usually there will be.

@julieqiu julieqiu added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 7, 2019
@laboger
Copy link
Contributor Author

laboger commented Jan 8, 2019

I'm OK with leaving this as is for 1.12 if there will be improvements in 1.13.
Do you want to keep this issue for that work or do you have another?

@randall77
Copy link
Contributor

Keeping this issue is fine.

@randall77 randall77 changed the title cmd/compile: many new unnecessary no-ops generated on ppc64x as a result of 69c2c56 cmd/compile: many new no-ops as a result of mid-stack inlining Jan 8, 2019
@randall77 randall77 added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jan 8, 2019
@randall77 randall77 added this to the Go1.13 milestone Jan 8, 2019
@randall77 randall77 self-assigned this Jan 8, 2019
@gopherbot
Copy link

Change https://golang.org/cl/158021 mentions this issue: cmd/compile: use existing instructions instead of nops for inline marks

gopherbot pushed a commit that referenced this issue Mar 25, 2019
Instead of always inserting a nop to use as the target of an inline
mark, see if we can instead find an instruction we're issuing anyway
with the correct line number, and use that instruction. That way, we
don't need to issue a nop.

Makes cmd/go 0.3% smaller.

Update #29571

Change-Id: If6cfc93ab3352ec2c6e0878f8074a3bf0786b2f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/158021
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
@gopherbot
Copy link

Change https://golang.org/cl/170445 mentions this issue: cmd/compile: get rid of unnecessary inline marks

gopherbot pushed a commit that referenced this issue Apr 8, 2019
If no other instruction mentions an inline mark, we can get rid of it.
This normally happens when the inlined function is empty, or when all
of its code is folded into other instructions.

Also use consistent statement-ness for inline mark positions, so that
more of them can be removed in favor of existing instructions.

Update #29571
Fixes #31172

Change-Id: I71f84d355101f37a27960d9e8528f42f92767496
Reviewed-on: https://go-review.googlesource.com/c/go/+/170445
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
@randall77
Copy link
Contributor

I don't plan on doing anything more on this for 1.13.
Punting to 1.14.

Any further work here I think involves either associating multiple source lines with an instruction, and/or with encoding that fact in dwarf.

@randall77 randall77 modified the milestones: Go1.13, Go1.14 Apr 30, 2019
@randall77 randall77 modified the milestones: Go1.14, Unplanned Oct 8, 2019
@randall77 randall77 removed their assignment Oct 8, 2019
@egonelbre
Copy link
Contributor

Copying information from #38255.

I noticed a case where compiler produces several inline marks in a sequence (https://go.godbolt.org/z/_yJaga, https://play.golang.org/p/HfxglRpLB4A).

        0x008c 00140 ($GOROOT\src\io\io.go:329) CALL    io.ReadAtLeast(SB)
        0x0091 00145 ($GOROOT\src\io\io.go:329) CMPQ    56(SP), $0
        0x0097 00151 (main.go:10)       JEQ     169
        0x0099 00153 (main.go:11)       PCDATA  $0, $-1
        0x0099 00153 (main.go:11)       PCDATA  $1, $-1
        0x0099 00153 (main.go:11)       MOVQ    144(SP), BP
        0x00a1 00161 (main.go:11)       ADDQ    $152, SP
        0x00a8 00168 (main.go:11)       RET
        0x00a9 00169 (main.go:16)       XCHGL   AX, AX
        0x00aa 00170 (main.go:19)       XCHGL   AX, AX
        0x00ab 00171 (main.go:22)       XCHGL   AX, AX
        0x00ac 00172 (main.go:25)       XCHGL   AX, AX
        0x00ad 00173 (main.go:28)       XCHGL   AX, AX
        0x00ae 00174 (main.go:31)       XCHGL   AX, AX
        0x00af 00175 (main.go:34)       XCHGL   AX, AX
        0x00b0 00176 (main.go:37)       XCHGL   AX, AX
        0x00b1 00177 ($GOROOT\src\encoding\binary\binary.go:78) PCDATA  $0, $1
        0x00b1 00177 ($GOROOT\src\encoding\binary\binary.go:78) PCDATA  $1, $3
        0x00b1 00177 ($GOROOT\src\encoding\binary\binary.go:78) MOVQ    ""..autotmp_80+136(SP), AX

Weirdly, when the func instead returns a sum of the decoded values, as in (https://go.godbolt.org/z/teAXW3, https://play.golang.org/p/7jyh8Ji7GFu), then there is only one XCHGL.

@randall77
Copy link
Contributor

When adding the values, there's probably a real instruction with the correct line number that can serve as the inline mark.

@egonelbre
Copy link
Contributor

In the first case there seems to be as well, movq from the array.

In https://go.godbolt.org/z/FGDh3n there's a movq to move the value from header slice to a register, so I would guess that I could attach it there. But, this analysis seems a bit over my head for now.

@FiloSottile
Copy link
Contributor

We have an instance where the NOPs seem to lead to a ~10% slowdown in a generated crypto implementation.

mit-plv/fiat-crypto#949 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants