Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: running dsymutil failed: signal: segmentation fault #23046

Closed
kevinburke opened this issue Dec 8, 2017 · 23 comments
Closed

cmd/go: running dsymutil failed: signal: segmentation fault #23046

kevinburke opened this issue Dec 8, 2017 · 23 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@kevinburke
Copy link
Contributor

kevinburke commented Dec 8, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version devel +38083c83a6 Thu Dec 7 23:37:46 2017 +0000 darwin/amd64

Does this issue reproduce with the latest release?

No

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/kevin/Library/Caches/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/kevin"
GORACE=""
GOROOT="/Users/kevin/go"
GOTMPDIR=""
GOTOOLDIR="/Users/kevin/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-build854984327=/tmp/go-build -gno-record-gcc-switches -fno-common"

Mac Sierra version 10.12.6. Using confluentinc/confluent-kafka-go@99a5add.

What did you do?

I tried to compile a program that uses confluent-kafka-go and makes HTTP requests. Unfortunately it's proprietary, but I can answer questions about it if need be.

I frequently recompile Go tip with the latest commit.

The compilation argument was:

go install -v -race ./...

What did you expect to see?

I expected the program to compile.

What did you see instead?

This error message (and only this error message):

$ go install -v -race ./...
# github.com/proprietary/proprietary/cmd/program
/Users/kevin/go/pkg/tool/darwin_amd64/link: /Users/kevin/go/pkg/tool/darwin_amd64/link: running dsymutil failed: signal: segmentation fault

I can blow away my cache if need be, or try a different commit, but I'm completely in the dark about how to trigger this, and worried that if I make changes (like e.g. blowing away the cache) I won't be able to reliably reproduce the problem.

Running on my high end Macbook Pro which should have enough memory, CPU etc.

@bradfitz
Copy link
Contributor

bradfitz commented Dec 8, 2017

Back up your cache first, or rename it away to $GOPATH/pkg/obj => $GOPATH/pkg/obj.old first?

Also, add -x to go install?

@bradfitz bradfitz added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Dec 8, 2017
@bradfitz bradfitz added this to the Go1.10 milestone Dec 8, 2017
@heschi
Copy link
Contributor

heschi commented Dec 8, 2017

This is kind of annoying to debug. From memory: Run go install -x -work -ldflags=-v. That should print, and preserve, the working directory. Then run dsymutil --verbose /path/to/binary, which should be close enough to what the linker is doing to reproduce any problems.

(I vaguely remember the binary ending up in a different directory than $WORKDIR that gets deleted unconditionally. In that case you might have to repeat the external link command, which will have been printed, to get a binary to run dsymutil on.)

We'll probably need a dwarfdump too, but let's see if the error messages are obvious without it.

@kevinburke
Copy link
Contributor Author

kevinburke commented Dec 8, 2017

The first command you gave me - go install -x -work - failed with a segfault. Here's the last bit of it

HEADER = -H1 -T0x1001000 -D0x0 -R0x1000
 0.00 deadcode
 0.05 pclntab=1509285 bytes, funcdata total 266158 bytes
 0.06 dodata
 0.07 symsize = 0
 0.08 symsize = 0
 0.09 dynreloc
 0.10 dwarf
 0.12 symsize = 0
 0.18 reloc
 0.20 asmb
 0.20 codeblk
 0.22 datblk
 0.24 sym
 0.25 headr
 0.31 host link: "clang" "-m64" "-Wl,-headerpad,1144" "-Wl,-no_pie" "-Wl,-pagezero_size,4000000" "-o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-build795571505/b001/exe/a.out" "-Qunused-arguments" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/go.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000000.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000001.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000002.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000003.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000004.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000005.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000006.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000007.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000008.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000009.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000010.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000011.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000012.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000013.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000014.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000015.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000016.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000017.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000018.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000019.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000020.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000021.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000022.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000023.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000024.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000025.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000026.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000027.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000028.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000029.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000030.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000031.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000032.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000033.o" "/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-216547337/000034.o" "-g" "-O2" "-lrdkafka" "-L/usr/local/Cellar/librdkafka/0.11.3/lib" "-lrdkafka" "-g" "-O2" "-framework" "CoreFoundation" "-framework" "Security" "-g" "-O2" "-g" "-O2" "-lpthread" "-g" "-O2" "-nopie"
/Users/kevin/go/pkg/tool/darwin_amd64/link: /Users/kevin/go/pkg/tool/darwin_amd64/link: running dsymutil failed: signal: segmentation fault

@heschi
Copy link
Contributor

heschi commented Dec 8, 2017

Right, that's expected. The linker ran dsymutil and it crashed; now we need to rerun it with logging so we know why. If /var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-build795571505/b001/exe/a.out still exists, run dsymutil --verbose on it. If not, run the whole clang command (change -o to put a.out in some random directory, doesn't matter) to recreate a.out and then you can dsymutil it.

@kevinburke
Copy link
Contributor Author

kevinburke commented Dec 8, 2017

Apologies, you're going to have to walk me through more steps. I'm running the whole clang command and getting an error that looks like this:

clang: error: no such file or directory: '/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-018471642/go.o'
clang: error: no such file or directory: '/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-018471642/000000.o'
clang: error: no such file or directory: '/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-018471642/000001.o'
clang: error: no such file or directory: '/var/folders/sf/fsn3_vgd0n98r0jb86bgp83r0000gn/T/go-link-018471642/000002.o'

And about 30 more.

I'm running this Go command to get the clang command.

go install -x -work -ldflags=-v .

@heschi
Copy link
Contributor

heschi commented Dec 8, 2017

Ugh. I must be forgetting a step or three, sorry. I'll look tomorrow when I can borrow a Mac.

@aarzilli
Copy link
Contributor

aarzilli commented Dec 8, 2017

Perhaps unsurprisingly this bisects to 4435fcf. If you want a way to reproduce it, building delve (go build github.com/derekparker/delve/cmd/dlv) does it.

@aarzilli
Copy link
Contributor

aarzilli commented Dec 8, 2017

@randall77
Copy link
Contributor

@thanm

@thanm
Copy link
Contributor

thanm commented Dec 8, 2017

Ouch. I will take a look.

@thanm thanm self-assigned this Dec 8, 2017
@aarzilli
Copy link
Contributor

aarzilli commented Dec 8, 2017

To anybody getting here through a google search, the emergency workaround is stripping the executable -ldflags=-s.

@kevinburke
Copy link
Contributor Author

Do you need anything more from me besides testing a patch?

@thanm
Copy link
Contributor

thanm commented Dec 8, 2017

No, thanks. I can reproduce it using Alessandro's instructions.

@thanm
Copy link
Contributor

thanm commented Dec 8, 2017

I built a tip version of LLVM dsymutil and ran it on an a.out from the Delve build. It does not crash, but it issues a warning about unresolvable DIE references, which I think is probably indicative of the problem. Here is the dsymutil stack trace at the warning:

while processing /var/folders/y1/14s910p95pj3vb71ywd53wk8008r__/T/go-link-341430069/go.o:
warning: could not find referenced DIE
7  llvm-dsymutil            0x00000001096ed986 llvm::dsymutil::resolveDIEReference(llvm::dsymutil::(anonymous namespace)::DwarfLinker const&, std::__1::vector<std::__1::unique_ptr<llvm::dsymutil::(anonymous namespace)::CompileUnit, std::__1::default_delete<llvm::dsymutil::(anonymous namespace)::CompileUnit> >, std::__1::allocator<std::__1::unique_ptr<llvm::dsymutil::(anonymous namespace)::CompileUnit, std::__1::default_delete<llvm::dsymutil::(anonymous namespace)::CompileUnit> > > >&, llvm::DWARFFormValue const&, llvm::DWARFUnit const&, llvm::DWARFDie const&, llvm::dsymutil::(anonymous namespace)::CompileUnit*&) + 486
8  llvm-dsymutil            0x00000001096e4fbc llvm::dsymutil::(anonymous namespace)::DwarfLinker::keepDIEAndDependencies(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDie const&, llvm::dsymutil::(anonymous namespace)::CompileUnit::DIEInfo&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, bool) + 1100
9  llvm-dsymutil            0x00000001096c3bdf llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDie const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namesp ...

The DIE that it's examining is a concrete parameter DIE, and the reference is the abstract origin. Here's the DWARF.

Concrete subprogram DIE:

 <1><13759b>: Abbrev Number: 4 (DW_TAG_subprogram)
    <13759c>   DW_AT_abstract_origin: <0x137403>
    <1375a0>   DW_AT_low_pc      : 0x113970
    <1375a8>   DW_AT_high_pc     : 0x113983
    <1375b0>   DW_AT_frame_base  : 1 byte block: 9c 	(DW_OP_call_frame_cfa)
 <2><1375b2>: Abbrev Number: 17 (DW_TAG_formal_parameter)
    <1375b3>   DW_AT_abstract_origin: <0x13741a>
    <1375b7>   DW_AT_location    : 1 byte block: 9c 	(DW_OP_call_frame_cfa)
 <2><1375b9>: Abbrev Number: 17 (DW_TAG_formal_parameter)
    <1375ba>   DW_AT_abstract_origin: <0x137423>
    <1375be>   DW_AT_location    : 2 byte block: 91 8 	(DW_OP_fbreg: 8)
 <2><1375c1>: Abbrev Number: 0

Abstract DIE:

 <1><137403>: Abbrev Number: 3 (DW_TAG_subprogram)
    <137404>   DW_AT_name        : bytes.(*Buffer).Len
    <137418>   DW_AT_inline      : 1	(inlined)
    <137419>   DW_AT_external    : 1
 <2><13741a>: Abbrev Number: 16 (DW_TAG_formal_parameter)
    <13741b>   DW_AT_name        : b
    <13741d>   DW_AT_variable_parameter: 0
    <13741e>   DW_AT_decl_line   : 74
    <13741f>   DW_AT_type        : <0x7f0d>
 <2><137423>: Abbrev Number: 0

The abstract origin for the second formal in the concrete DIE is 0x137423, which is overshooting the formals in the abstract DIE. Right off the bat I am not sure why this is happening, but it should be relatively easy to write a checker for it.

@thanm
Copy link
Contributor

thanm commented Dec 9, 2017

The scenario here appears to be as follows. Exported function F in package P looks like

  func F(x int) int {
    ...
  }

where F is marked as an inlining candidate. During the package P build, the post-optimization version of F has two autos, "x" and "~r1" (the latter generated by the compiler). In generated DWARF, there is an abstract subprogram for P and a concrete subprogram. The concrete subprogram in turn has two concrete DIE children (one for 'x' and one for '~r1'). All well and good.

Later on some other package Q imports P and uses F. This results in an inlined function DIE for F somewhere in the DWARF for Q, and also another abstract subprogram DIE for F. The version of F that is seen by the inliner doesn't have the ~r1 temp, however, so the abstract subprogram for F emitted into Q's DWARF has inly a single child, a formal parameter DIE.

Things now get interesting: at link time, the linker is presented with two symbols corresponding to the abstract subprogram DIE, and it happens to choose the one from Q. This is what triggers the bug, since Q's version does not have "~r1", whereas the concrete subprogram DIE from package P has a reference to that second variable.

I experimented briefly with just eliminating all instances of "~r1" (and other r's) from the generated DWARF, since these are not user-visible variables, and there is virtually no chance that a user is going to type "print ~r1" at the GDB prompt. It looks as though we actually have tests that check for the presence of "~r1" etc (compile scope test), so I am unuser as to whether this is a viable path (it certainly makes the most sense to me, since having extra compiler-generated crud in the DWARF just makes programs larger).

Another possibility is to make sure every version of F has the same autos (including ~r1). A third option is to have two distinct abstract subprogram DIE symbols, one for the home package and one for all of the imports.

I'll try these and see which looks the best.

@gopherbot
Copy link

Change https://golang.org/cl/83095 mentions this issue: cmd/link: new test point to detect DWARF abstract origin bugs

@aarzilli
Copy link
Contributor

I experimented briefly with just eliminating all instances of "~r1" (and other r's) from the generated DWARF, since these are not user-visible variables

I think @derekparker planned to use those in tracing.
Also, does this not happen if the output parameter is named by the user?

@thanm
Copy link
Contributor

thanm commented Dec 10, 2017

Also, does this not happen if the output parameter is named by the user?

No, it's only for compiler-generated variables.

@gopherbot
Copy link

Change https://golang.org/cl/83135 mentions this issue: cmd/compile: fix bug with DWARF abstract functions

@thanm
Copy link
Contributor

thanm commented Dec 10, 2017

I've created a tentative fix for this problem. Not 100% sure that all of the issues are covered, since I don't have access to a Mac this weekend (will do Mac testing tomorrow).

@dmitshur
Copy link
Contributor

This is happening to me too with go1.10beta1 darwin/amd64 on latest macOS High Sierra (10.13.2). Easy to reproduce, happens even after clearing $GOPATH/pkg. Seemingly on (all?) commands that import packages that use cgo (e.g., github.com/shurcooL/Hover, github.com/shurcooL/eX0/eX0-go).

@thanm
Copy link
Contributor

thanm commented Dec 13, 2017

After discussion with Heschi, I'm going to abandon my first fix for this issue; there is a cleaner way to deal with the problem that doesn't involve creating two different variants of the abstract subprogram DIE.

I also discovered some new similar problems while testing my latest fix (primarily building things with "-l=4" to increase the number of inlines. Here is a summary just for posterity (helps to understand the details of the change.

In the case for this specific issue, there is a discrepancy between the abstract function DIE built during compilation of the package that owns the symbol in question (in which we have an additional "~r0" return param) and the abstract function DIE created during compilation of some other package that imports the function (no "~r0").

Another scenario where this can occur is if the function in question contains a struct auto that is split into two pieces, then each piece winds up as a separate local (ex: "err.itab" and "err.data" in something like bytes.(*Buffer).ReadBytes; this creates the same problem (inconsistent versions of an abstract subprogram DIE).

To take care of both situations, I'm changing the code to insure that the only variables that appear in the DWARF for an abstract subprogram are those that are explicitly declared in the original version of the function (e.g. "err" and not "err.itab/err.data").

@gopherbot
Copy link

Change https://golang.org/cl/83675 mentions this issue: cmd/compile: fixes for bad DWARF abstract origin references

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

8 participants