New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plugin: program on linux/s390x sometimes hangs after calling "plugin.Open" #40473
Comments
Are you able to generate a core dump for the program? It might identify where the program got stuck. |
Thanks @mundaym Following is my code:
When the app hang, I can trigger a signal.
However, it can only generate an empty file.
For any other platforms, it can generate dump and stack trace succcessfully. |
I just simplified the test app (ttt.go) to be:
The app code is:
The core dump is:
|
This appears to be caused by an infinite loop in the part of the traceback code that handles inlined functions. I can 'fix' it using the following patch: diff --git a/src/runtime/traceback.go b/src/runtime/traceback.go
index 944c8473d2..aac91e36ed 100644
--- a/src/runtime/traceback.go
+++ b/src/runtime/traceback.go
@@ -375,7 +375,12 @@ func gentraceback(pc0, sp0, lr0 uintptr, gp *g, skip int, pcbuf *uintptr, max in
}
lastFuncID = inltree[ix].funcID
// Back up to an instruction in the "caller".
+ oldtracepc := tracepc
tracepc = frame.fn.entry + uintptr(inltree[ix].parentPc)
+ if tracepc == oldtracepc {
+ println("error: infinite loop in inline traceback")
+ break
+ }
pc = tracepc + 1
}
} The reason the bug is intermittent is that the tracebacks are generated by the memory profiler which is non-deterministic and not all calls to gentraceback end up in this infinite loop (presumably they are tracing from different PCs). @randall77 Any ideas why this might be happening? I'm not sure if this is an issue with plugins or a strange edge case in the inlined function data. |
Note that this does not reproduce with master at tip. So Go 1.15 might not have this bug. |
@mundaym |
@mundaym |
I don't think plugins should matter here. Unless the plugins were built by a different version of Go? I thought we had a check for that, though. Sounds like a corner case. Could you find out what function it is? print Any way I could reproduce this myself? I think I would need the code for all the plugins. |
@randall77 |
I've managed to reproduce this using only open source components if you want to try @randall77. You will need to use Go 1.14.6 I think, on an s390x system: tee plugin.go << PLUGIN
package main
import (
_ "github.com/go-openapi/spec"
)
func main() {}
PLUGIN
tee main.go << MAIN
package main
import (
"fmt"
"os"
"plugin"
)
func main() {
if _, err := plugin.Open("plugin.so"); err != nil {
fmt.Fprintf(os.Stderr, "error while loading plugin: %v\n", err)
os.Exit(1)
}
}
MAIN
go get -u github.com/go-openapi/spec
go build -buildmode=plugin -o plugin.so plugin.go
go build main.go
./main The main binary will sometimes hang when run. It will be stuck in gentraceback. |
Here is some debug output:
The assembly for spec.init.4 looks like this:
v5 is the orginal InlMark which had pos +37. What is interesting is that the parentPc is 30 which corresponds to the final address of v9 which is at pos 43. The InlMark is supposed to only move to an instruction with the same pos. I wonder therefore if parentPc is supposed to be pointing at v26 since that is at pos +37 and has a pre-assembly pc of 30. I'm not sure how that would happen though since the code should be fully assembled and prog pc values set before the parentPc is generated. |
I can't create a gomote to test on:
I can't cross compile to check the assembly, because it has cgo in it. I'm afraid I can't help at the moment.
No, the inlMark should point to a different position. The position of the inlMark itself is in an inlined callee - the target of the inlMark should be in a caller. Can you show the final assembly (via gnu objdump or something) and the entries that traceback hits in the inline table (via println in traceback.go)? |
Thanks for the pointers, I think I have found the issue. The InlMark optimization here executes before the assembler has run. The optimization therefore assumes that the target The assembler removes the original The simplest fix here is, I think, to stop the s390x assembler removing Longer term I'm not sure what the contract between the compiler and the assembler should be with respect to One thing to consider for Go 1.16 is whether we should add a new pseudo instruction for inlined function marks. This could be treated the same way as the other pseudo ops we have like PS: gomote wasn't working because of weather issues at the location that hosts the build machine. They should all be working again now. |
Thanks for the excellent diagnosis.
Can the original Prog be repurposed as one of the two GOT lookup instructions? That way it would remain a valid target.
We've been moving away from the assembler doing any optimizations (e.g. we removed instruction reordering a while ago), now that the compiler is pretty good. If we can get rid of optimizations (and do them in the compiler instead), that would be good.
The inlmark nops should be real nops, that generate actual code. We should never remove them. So I think they would need to be their own op, not a marker for a following op. (Because what if there are 2 of them in a row?) |
What, if anything, do you think we need to fix for 1.15? |
Yeah, either would work. The second way is what I've implemented since it is a simpler change in my opinion.
I think it would be relatively easy to ensure and enforce that no two inline markers point to the same PC value. The advantage of it being a marker op rather than a code generating op would be that we could continue to use pre-existing instructions as inline marker targets and not just additional real nops. We'd still need the compiler to ensure a valid instruction followed the marker op.
I'd like to do the small fix to the s390x backend that is suitable for backporting to 1.15.x and 1.14.x release branches to fix this specific bug. This isn't a recent regression so I don't think we should rush a fix into 1.15 if it is going to be released very soon, though if it does make it then great. Then for 1.16 maybe we try something more comprehensive. |
Change https://golang.org/cl/247697 mentions this issue: |
@gopherbot please open backport issues. This is a bug that causes programs to intermittently crash with no workaround. |
Backport issue(s) opened: #40693 (for 1.15), #40694 (for 1.14). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
Change https://golang.org/cl/249448 mentions this issue: |
What version of Go are you using (
go version
)?Note: This problem is only found in linux/s390x. All Golang versions have the same issue.
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
This is a sample code (ttt.go):
What did you expect to see?
What did you see instead?
If you run multiple times, you can see the problem randomly:
==> hang without End of zipkin.so
The text was updated successfully, but these errors were encountered: