New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/go: running dsymutil failed: signal: segmentation fault #23046
Comments
Back up your cache first, or rename it away to $GOPATH/pkg/obj => $GOPATH/pkg/obj.old first? Also, add |
This is kind of annoying to debug. From memory: Run (I vaguely remember the binary ending up in a different directory than We'll probably need a dwarfdump too, but let's see if the error messages are obvious without it. |
The first command you gave me -
|
Right, that's expected. The linker ran dsymutil and it crashed; now we need to rerun it with logging so we know why. If |
Apologies, you're going to have to walk me through more steps. I'm running the whole clang command and getting an error that looks like this:
And about 30 more. I'm running this Go command to get the clang command.
|
Ugh. I must be forgetting a step or three, sorry. I'll look tomorrow when I can borrow a Mac. |
Perhaps unsurprisingly this bisects to 4435fcf. If you want a way to reproduce it, building delve ( |
Output of |
Ouch. I will take a look. |
To anybody getting here through a google search, the emergency workaround is stripping the executable |
Do you need anything more from me besides testing a patch? |
No, thanks. I can reproduce it using Alessandro's instructions. |
I built a tip version of LLVM dsymutil and ran it on an a.out from the Delve build. It does not crash, but it issues a warning about unresolvable DIE references, which I think is probably indicative of the problem. Here is the dsymutil stack trace at the warning:
The DIE that it's examining is a concrete parameter DIE, and the reference is the abstract origin. Here's the DWARF. Concrete subprogram DIE:
Abstract DIE:
The abstract origin for the second formal in the concrete DIE is 0x137423, which is overshooting the formals in the abstract DIE. Right off the bat I am not sure why this is happening, but it should be relatively easy to write a checker for it. |
The scenario here appears to be as follows. Exported function F in package P looks like
where F is marked as an inlining candidate. During the package P build, the post-optimization version of F has two autos, "x" and "~r1" (the latter generated by the compiler). In generated DWARF, there is an abstract subprogram for P and a concrete subprogram. The concrete subprogram in turn has two concrete DIE children (one for 'x' and one for '~r1'). All well and good. Later on some other package Q imports P and uses F. This results in an inlined function DIE for F somewhere in the DWARF for Q, and also another abstract subprogram DIE for F. The version of F that is seen by the inliner doesn't have the ~r1 temp, however, so the abstract subprogram for F emitted into Q's DWARF has inly a single child, a formal parameter DIE. Things now get interesting: at link time, the linker is presented with two symbols corresponding to the abstract subprogram DIE, and it happens to choose the one from Q. This is what triggers the bug, since Q's version does not have "~r1", whereas the concrete subprogram DIE from package P has a reference to that second variable. I experimented briefly with just eliminating all instances of "~r1" (and other r's) from the generated DWARF, since these are not user-visible variables, and there is virtually no chance that a user is going to type "print ~r1" at the GDB prompt. It looks as though we actually have tests that check for the presence of "~r1" etc (compile scope test), so I am unuser as to whether this is a viable path (it certainly makes the most sense to me, since having extra compiler-generated crud in the DWARF just makes programs larger). Another possibility is to make sure every version of F has the same autos (including ~r1). A third option is to have two distinct abstract subprogram DIE symbols, one for the home package and one for all of the imports. I'll try these and see which looks the best. |
Change https://golang.org/cl/83095 mentions this issue: |
I think @derekparker planned to use those in tracing. |
No, it's only for compiler-generated variables. |
Change https://golang.org/cl/83135 mentions this issue: |
I've created a tentative fix for this problem. Not 100% sure that all of the issues are covered, since I don't have access to a Mac this weekend (will do Mac testing tomorrow). |
This is happening to me too with |
After discussion with Heschi, I'm going to abandon my first fix for this issue; there is a cleaner way to deal with the problem that doesn't involve creating two different variants of the abstract subprogram DIE. I also discovered some new similar problems while testing my latest fix (primarily building things with "-l=4" to increase the number of inlines. Here is a summary just for posterity (helps to understand the details of the change. In the case for this specific issue, there is a discrepancy between the abstract function DIE built during compilation of the package that owns the symbol in question (in which we have an additional "~r0" return param) and the abstract function DIE created during compilation of some other package that imports the function (no "~r0"). Another scenario where this can occur is if the function in question contains a struct auto that is split into two pieces, then each piece winds up as a separate local (ex: "err.itab" and "err.data" in something like bytes.(*Buffer).ReadBytes; this creates the same problem (inconsistent versions of an abstract subprogram DIE). To take care of both situations, I'm changing the code to insure that the only variables that appear in the DWARF for an abstract subprogram are those that are explicitly declared in the original version of the function (e.g. "err" and not "err.itab/err.data"). |
Change https://golang.org/cl/83675 mentions this issue: |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
No
What operating system and processor architecture are you using (
go env
)?GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/kevin/Library/Caches/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/kevin"
GORACE=""
GOROOT="/Users/kevin/go"
GOTMPDIR=""
GOTOOLDIR="/Users/kevin/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-build854984327=/tmp/go-build -gno-record-gcc-switches -fno-common"
Mac Sierra version 10.12.6. Using confluentinc/confluent-kafka-go@99a5add.
What did you do?
I tried to compile a program that uses confluent-kafka-go and makes HTTP requests. Unfortunately it's proprietary, but I can answer questions about it if need be.
I frequently recompile Go tip with the latest commit.
The compilation argument was:
What did you expect to see?
I expected the program to compile.
What did you see instead?
This error message (and only this error message):
I can blow away my cache if need be, or try a different commit, but I'm completely in the dark about how to trigger this, and worried that if I make changes (like e.g. blowing away the cache) I won't be able to reliably reproduce the problem.
Running on my high end Macbook Pro which should have enough memory, CPU etc.
The text was updated successfully, but these errors were encountered: