Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: inline marker targets not reachable after assembly on ppc64x #40689

Closed
mundaym opened this issue Aug 11, 2020 · 11 comments
Closed
Labels
arch-ppc64x FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@mundaym
Copy link
Member

mundaym commented Aug 11, 2020

It looks like issue #40473 also affects ppc64x based on this trybot run for CL 247697: https://storage.googleapis.com/go-build-log/8548211f/linux-ppc64le-buildlet_059f3c62.log. It should be relatively easy to fix with a code change similar to that done for s390x in CL 247697.

@mundaym mundaym added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. arch-ppc64x labels Aug 11, 2020
@mundaym
Copy link
Member Author

mundaym commented Aug 11, 2020

/cc @randall77 @laboger

@laboger
Copy link
Contributor

laboger commented Aug 11, 2020

@mundaym I don't think the test in your CL is correct for ppc64x. If you look in cmd/compile/internal/ppc64/ggen.go function ginsop it is not generating a NOP but OR R0,R0,R0 for the inline marker. So this instruction won't get removed by PPC64 assembler and we shouldn't have the same problem as on S390x.

Correction: I see where a NOP could be generated for an inlMark. I can make a change to stop removing NOPs in the assembler but I want to verify it's not leaving a lot of them. I'll have to read through the issue again because I'm not sure why this causes a hang and why no other platforms are hitting this problem?

@mundaym
Copy link
Member Author

mundaym commented Aug 11, 2020

It's a tricky one to explain. I think of it as a 'use-after-free' bug:

  1. The compiler produces a Prog value representing an instruction and saves a pointer to it for use later as an inlining mark.
  2. The assembler rewrites the Prog into a GOT lookup and replaces the original Prog with a NOP (here).
  3. The assembler removes that NOP Prog from the Text linked list (here and here). Since this happens before assembly it does not assign the Prog a valid PC value.
  4. The linker reads from the pointer to the (now dead and unreachable) Prog that the compiler saved and uses it to produce invalid inlining table data.

The invalid inlining table data only appears to cause an infinite loop during traceback when the invalid PC value in the unscheduled Prog happens to correspond to an instruction that is part of the same inlined function (therefore the traceback never escapes the inlined function). This appears to be a rare situation.

It should be OK for the assembler to leave NOPs in the Text linked list since they do not produce actual code. Keeping them in the linked list however means they are scheduled and their PC value is valid (though really it points to the next instruction).

@gopherbot
Copy link

Change https://golang.org/cl/247842 mentions this issue: cmd/asm,cmd/internal/obj/ppc64: don't remove NOP in assembler

@laboger
Copy link
Contributor

laboger commented Aug 13, 2020

gopherbot please backport to 1.15 and 1.14.

@ianlancetaylor
Copy link
Contributor

@laboger What are the effects of this bug? That is, why should the CL be backported? I can't quite figure out what the actual problem is here. Thanks.

@randall77
Copy link
Contributor

This is related to #40473. Tracebacks hang because inline marker information is corrupted.

@laboger
Copy link
Contributor

laboger commented Aug 13, 2020

The change for s390x depends on this one. In the s390x fix, code was added to detect when the inline marker information was out of sync and that check fails on ppc64x. This change is needed to prevent that check from failing, as well as preventing other errors that could occur when they are out of sync.

@ianlancetaylor
Copy link
Contributor

@gopherbot Please backport to 1.14 and 1.15

This bug can cause tracebacks to hang. There is no workaround.

@gopherbot
Copy link

Backport issue(s) opened: #40766 (for 1.14), #40767 (for 1.15).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@gopherbot
Copy link

Change https://golang.org/cl/248381 mentions this issue: cmd/internal/obj/ppc64: don't remove NOP in assembler

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 2, 2020
@dmitshur dmitshur added this to the Go1.16 milestone Sep 2, 2020
embeddedgo pushed a commit to embeddedgo/go that referenced this issue Sep 19, 2020
@golang golang locked and limited conversation to collaborators Sep 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-ppc64x FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants