Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: large number of NOP instructions in crypto/md5.blockGeneric #31545

Closed
mundaym opened this issue Apr 18, 2019 · 9 comments
Closed
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@mundaym
Copy link
Member

mundaym commented Apr 18, 2019

There seem to be a large number of contiguous NOP instructions inserted into the main loop inside md5.blockGeneric. Looks like they are markers for inlined binary.LittleEndian.UInt32 calls. I haven't checked the performance implications (probably doesn't make a huge difference) but it would still be good to remove these if we can:

v72
    00020 (+27) XCHGL AX, AX
v153
    00021 (+28) XCHGL AX, AX
v232
    00022 (+29) XCHGL AX, AX
v311
    00023 (+30) XCHGL AX, AX
v390
    00024 (+31) XCHGL AX, AX
v469
    00025 (+32) XCHGL AX, AX
v548
    00026 (+33) XCHGL AX, AX
v627
    00027 (+34) XCHGL AX, AX
v706
    00028 (+35) XCHGL AX, AX
v785
    00029 (+36) XCHGL AX, AX
v864
    00030 (+37) XCHGL AX, AX
v943
    00031 (+38) XCHGL AX, AX
v1022
    00032 (+39) XCHGL AX, AX
v1101
    00033 (+40) XCHGL AX, AX
v1180
    00034 (+41) XCHGL AX, AX
v1259
    00035 (+42) XCHGL AX, AX

/cc @randall77

@mundaym mundaym added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 18, 2019
@mundaym mundaym added this to the Unplanned milestone Apr 18, 2019
@dgryski
Copy link
Contributor

dgryski commented Apr 18, 2019

Dup of #29571 ?

@randall77
Copy link
Contributor

Yes, this is a dup of #29571. I had hoped CL 170445 would fix instances like this, but it looks like it hasn't.

@randall77
Copy link
Contributor

The reason CL 170445 didn't work is that these nops really are needed for correct tracebacks.
A binary.LittleEndian.Uint32 call reduces to a single instruction, which has a line number inside the inlined Uint32 function. If we took a profiling sample at that instruction, the nop is there to represent the location of the call in the parent function (blockGeneric, in this case). There aren't any other instructions in the parent frame which could be used as a substitute.

We could give the single instruction the line number of the parent, which would then obviate the need for the nop. I'm not sure what a principled method for making that happen would be, though. At the time of inlining, it isn't obvious that the inlined body will reduce to a single instruction (and it won't, on architectures that can't do unaligned loads).

@ianlancetaylor
Copy link
Contributor

In DWARF a single instruction can represent a stack of inlined function calls. So a profiling sample at that instruction would carry both the location in binary.LittleEndian.Uint32 and in the caller. Not sure if that makes any sense in the Go context.

@randall77
Copy link
Contributor

The Go inline markers can only represent a single frame.
That's not the problem here, though. Each of the inline marks represents a different call site.

@ianlancetaylor
Copy link
Contributor

I'm sure I'm misunderstanding this, but if each inline mark represents a different call site, then doesn't that imply that the inlined call was completely eliminated? I don't see a reason to keep an instruction around to represent a call that was optimized away.

@randall77
Copy link
Contributor

The body of the inlined call compiles to a single instruction. That instruction has a pc/line of the body of the inlined call. That instruction has a "parent" pointer to a nop instruction, which has a pc/line of the call site.

@ianlancetaylor
Copy link
Contributor

It seems to me that if we could attach a call stack to an instruction, then the single instruction of the inlined call would describe both the location of the inlined function and the location of the call to the inlined function, and the separate nop instruction would not be required.

@randall77
Copy link
Contributor

True, one way to fix this would be to allow multiple locations to be attached to a single instruction. That's currently not possible.
For profiling maybe we could just do it in DWARF? For things like runtime.Callers we'd need to do it ourselves in the pcln tables.

@golang golang locked and limited conversation to collaborators Apr 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants