Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go 1.12.1 dsymutil segmentation fault on MacOS Sierra #30908

Closed
laverya opened this issue Mar 18, 2019 · 23 comments
Closed

Go 1.12.1 dsymutil segmentation fault on MacOS Sierra #30908

laverya opened this issue Mar 18, 2019 · 23 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-Darwin

Comments

@laverya
Copy link

laverya commented Mar 18, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.1 darwin/amd64

$ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.12.6
BuildVersion:	16G1114

$ clang --version
Apple LLVM version 9.0.0 (clang-900.0.39.2)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/user906782/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/user906782/go"
GOPROXY=""
GORACE=""
GOROOT="/Users/user906782/homebrew/Cellar/go/1.12.1/libexec"
GOTMPDIR=""
GOTOOLDIR="/Users/user906782/homebrew/Cellar/go/1.12.1/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/b1/txl5_d893yj3rw0bg_kyplt80000ln/T/go-build229507920=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Run make VERSION=0.36.0 build-minimal with release 0.36.0 of replicatedhq/ship on MacOS Sierra. (Running on High Sierra or Mojave seems to be fine)

What did you expect to see?

Compilation finishes.

What did you see instead?

/usr/local/Cellar/go/1.12/libexec/pkg/tool/darwin_amd64/link: running dsymutil failed: signal: segmentation fault

@bradfitz
Copy link
Contributor

/cc @randall77

@laverya
Copy link
Author

laverya commented Mar 18, 2019

I've done some more work narrowing this down, and it seems that Go 1.12.1 works with the revision immediately before replicatedhq/ship@7c3cd3b but not with ones after/including that. Unfortunately, that's an absolutely massive commit...

And it DOES compile with Go 1.11.6 -

$ git status
HEAD detached at 7c3cd3b1
You are currently bisecting, started from branch 'master'.
  (use "git bisect reset" to get back to the original branch)

nothing to commit, working tree clean
$ go build -i -o bin/ship ./cmd/ship/
# github.com/replicatedhq/ship/cmd/ship
/Users/user906782/homebrew/Cellar/go/1.12.1/libexec/pkg/tool/darwin_amd64/link: /Users/user906782/homebrew/Cellar/go/1.12.1/libexec/pkg/tool/darwin_amd64/link: running dsymutil failed: signal: segmentation fault

$ go1.11.6 build -i -o bin/ship ./cmd/ship/
$

@fxcoudert
Copy link

Can you isolate the binary that is generated and on which dsymutil fails?

@randall77
Copy link
Contributor

I can reproduce. Additional commands I needed (in case anyone else needs to reproduce):

brew install yarn
go get -u github.com/replicatedhq/ship/pkg/cli

There's at least a bug in dsymutil. It probably shouldn't segfault, no matter the input we give it.
Presumably we're generating some dwarf data that dsymutil can't handle. I don't know what that might be.
I'm having trouble convincing the go tool to leave behind enough work directories to reproduce directly with dsymutil. Still working on that, will probably have to wait until tomorrow.

@randall77
Copy link
Contributor

Ok, I can catch the program in the act.
Here's the crash backtrace:

Process 57119 stopped
* thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000010001cfaa llvm-dsymutil`llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDie const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 1066
llvm-dsymutil`llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep:
->  0x10001cfaa <+1066>: movl   (%rax), %esi
    0x10001cfac <+1068>: xorl   %edx, %edx
    0x10001cfae <+1070>: shrq   $0x7, %rsi
    0x10001cfb2 <+1074>: movl   %edx, %edx
Target 0: (llvm-dsymutil) stopped.
(lldb) bt
* thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000000010001cfaa llvm-dsymutil`llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDie const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 1066
    frame #1: 0x000000010001d1d2 llvm-dsymutil`llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDie const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 1618
    frame #2: 0x000000010001d31f llvm-dsymutil`llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDie const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 1951
    frame #3: 0x000000010001d31f llvm-dsymutil`llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDie const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 1951
    frame #4: 0x000000010001d31f llvm-dsymutil`llvm::dsymutil::(anonymous namespace)::DwarfLinker::lookForDIEsToKeep(llvm::dsymutil::(anonymous namespace)::DwarfLinker::RelocationManager&, llvm::DWARFDie const&, llvm::dsymutil::DebugMapObject const&, llvm::dsymutil::(anonymous namespace)::CompileUnit&, unsigned int) + 1951
    frame #5: 0x0000000100018634 llvm-dsymutil`llvm::dsymutil::(anonymous namespace)::DwarfLinker::link(llvm::dsymutil::DebugMap const&) + 14948
    frame #6: 0x000000010001446a llvm-dsymutil`llvm::dsymutil::linkDwarf(llvm::StringRef, llvm::dsymutil::DebugMap const&, llvm::dsymutil::LinkOptions const&) + 2762
    frame #7: 0x0000000100004145 llvm-dsymutil`main + 8421
    frame #8: 0x00007fff9c6ad235 libdyld.dylib`start + 1

Despite crashing, it does generate an output directory a.out.dSYM. The file in it is zero size, though.

@thanm sent me a dsymutil built from source. It succeeds, but with warnings:

warning: could not find referenced DIE
note: while processing /Users/khr/tmpdir/go.o
warning: could not find referenced DIE
note: while processing /Users/khr/tmpdir/go.o
warning: could not find referenced DIE
note: while processing /Users/khr/tmpdir/go.o
warning: could not find referenced DIE
note: while processing /Users/khr/tmpdir/go.o

I've packed up the offending file in case anyone else wants to have a look.
Unpack my uploaded zip in the /tmp directory. Run dsymutil /tmp/issue30908/a.out.

@thanm
Copy link
Contributor

thanm commented Mar 20, 2019

Thanks for the stand-alone testcase. I will poke at it for a bit and see what I can figure out.

@fxcoudert
Copy link

Confirmed on this minimal example on Homebrew's Sierra machines. Running Mojave's dsymutil on the same example does not crash.

@thanm
Copy link
Contributor

thanm commented Mar 20, 2019

Wed Mar 20 13:28:20 EDT 2019

I ran dsymutil in the debugger and it does indeed look as though it is encountering bad DWARF.

Offending compiliation unit is here:

<0><d5847a>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <d5847b>   DW_AT_name        : github.com/replicatedhq/ship/vendor/k8s.io/helm/pkg/chartutil
    <d584b9>   DW_AT_language    : 22	(Go)

It appears that the routine 'chartutil.ReadValues' is being inlined into 'chartutil.CoalesceValues'. The inlined routine subprogram DIE being generated looks like

<2><d5ba90>: Abbrev Number: 7 (DW_TAG_inlined_subroutine)
    <d5ba91>   DW_AT_abstract_origin: <0xd588c7>
    <d5ba95>   DW_AT_ranges      : 0x4d7c10
    <d5ba99>   DW_AT_call_file   : 0x1a
    <d5ba9d>   DW_AT_call_line   : 165
 <3><d5ba9f>: Abbrev Number: 19 (DW_TAG_formal_parameter)
    <d5baa0>   DW_AT_abstract_origin: <0xd58913>
    <d5baa4>   DW_AT_location    : 0xd4d139 (location list)
 <3><d5baa8>: Abbrev Number: 19 (DW_TAG_formal_parameter)
    <d5baa9>   DW_AT_abstract_origin: <0xd5892a>
    <d5baad>   DW_AT_location    : 0xd4d1c1 (location list)
 <3><d5bab1>: Abbrev Number: 0

The second formal parameter abstract_origin offset (0xd5892a) is bad -- it looks as though it's coming out somewhere in between the formal parameter DIEs in the subprogram it's referring to. So almost certainly some sort of compiler problem here.

I'll grab the bug if that is ok and see about trying to come up with a compiler test case.

@thanm thanm self-assigned this Mar 20, 2019
@thanm thanm added the NeedsFix The path to resolution is known, but the work has not been done. label Mar 20, 2019
@randall77
Copy link
Contributor

Ok, thanks.

@thanm
Copy link
Contributor

thanm commented Mar 20, 2019

Hmm, that's odd. When I download and compile the offending package by itself, the inline in question doesn't happen:

./values.go:134:6: cannot inline ReadValues: function too complex: cost 81 exceeds budget 80

Is there perhaps something in the build procedure that is using a higher inlining level? I am sure the bug is still worth fixing, but it would help to know if this is a case where non-standard compile flags are being used.

@laverya
Copy link
Author

laverya commented Mar 20, 2019

Hmm - the make command just runs this, which I don't think would impact the cost/inlining calculations:

go build \
		-ldflags " -X github.com/replicatedhq/ship/pkg/version.version=`git describe --tags` -X github.com/replicatedhq/ship/pkg/version.gitSHA=`git rev-parse HEAD` -X github.com/replicatedhq/ship/pkg/version.buildTime=`date -u +"%Y-%m-%dT%H:%M:%SZ"` " \
		-i \
		-o bin/ship \
		./cmd/ship

@thanm
Copy link
Contributor

thanm commented Mar 20, 2019

OK, sorry for the noise about compile flags -- it looks as though the inline cost is being computed slightly differently when I compile on Mac. I still haven't quite figured out what is tickling this bug yet (don't see the problem when I build the chartutil test binary).

@laverya
Copy link
Author

laverya commented Mar 20, 2019

Hey, it's always good to check! After all, the proportion of people who are doing weird things with compiler flags is probably much higher among those who are encountering compiler bugs 😉

@gopherbot
Copy link

Change https://golang.org/cl/168410 mentions this issue: cmd/link: add optional sanity checking for DWARF symbols

@gopherbot
Copy link

Change https://golang.org/cl/168818 mentions this issue: cmd/compile: better handling for PAUTOHEAP in DWARF inline gen

@thanm
Copy link
Contributor

thanm commented Mar 22, 2019

For posterity, here's what's happening with this bug.

The function causing the issue is here:

 // ReadValues will parse YAML byte data into a Values.
  func ReadValues(data []byte) (vals Values, err error) {
	  err = yaml.Unmarshal(data, &vals)
	  if len(vals) == 0 {
		  vals = Values{}
	  }
	  return
  }

This is small enough to be an inlining candidate; it gets inlined in its home package and in other packages that import that package.

During the home package compile, 'vals' is promoted to the heap, and in the process the dcl node for the param updated, converting its class from PPARAMOUT to PAUTOHEAP. This happens at esc.go:2175.

This change in class confuses the DWARF generation code, specifically createDwarfVars. The code there winds up treating PAUTOHEAP the same as PAUTO, hence 'vals' is reclassified as a local and not an output parameter. This has a cascading effect and means that the abstract subprogram DIE generated for chartutil.ReadValues is slightly different in its home package relative to what gets emitted in other packages that inline it.

The fix is fairly straightforward in createDwarfVars(); I also wrote a separate patch to add some checking to the linker to try to detect these cases early and on Linux (at the moment it seems that we only see problems on Macos and with specific more strict versions of dsymutil).

gopherbot pushed a commit that referenced this issue Mar 22, 2019
Introduce a new linker command line option "-strictdups", which
enables sanity checking of "ok to duplicate" symbols, especially
DWARF info symbols. Acceptable values are 0 (no checking) 1 (issue
warnings) and 2 (issue a fatal error checks fail).

Currently if we read a DWARF symbol (such as "go.info.PKG.FUNCTION")
from one object file, and then encounter the same symbol later on
while reading another object file, we simply discard the second one
and move on with the link, since the two should in theory be
identical.

If as a result of a compiler bug we wind up with symbols that are not
identical, this tends to (silently) result in incorrect DWARF
generation, which may or may not be discovered depending on who is
consuming the DWARF and what's being done with it.

When this option is turned on, at the point where a duplicate
symbol is detected in the object file reader, we check to make sure
that the length/contents of the symbol are the same as the previously
read symbol, and print a descriptive warning (or error) if not.

For the time being this can be used for one-off testing to find
problems; at some point it would be nice if we can enable it by
default.

Updates #30908.

Change-Id: I64c4e07c326b4572db674ff17c93307e2eec607c
Reviewed-on: https://go-review.googlesource.com/c/go/+/168410
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/168957 mentions this issue: test: new test for issue 30908

@benesch
Copy link
Contributor

benesch commented Mar 25, 2019

Thanks, @thanm. Nominating this for backport since it prevents Homebrew from packaging any Go software that happens to trigger this invalid DWARF generation.

@gopherbot please consider this for backport to 1.12; it's a serious toolchain bug.

@gopherbot
Copy link

Backport issue(s) opened: #31028 (for 1.12).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@thanm
Copy link
Contributor

thanm commented Mar 25, 2019

@benesch thanks for the heads-up. Backport seems reasonable. I will discuss with the other compiler/runtime folks.

gopherbot pushed a commit that referenced this issue Mar 25, 2019
New test case designed to mimic the code in issue 30908, which
features duplicate but non-indentical DWARF abstract subprogram DIEs.

Updates #30908.

Change-Id: Iacb4b53e6a988e46c801cdac236cef883c553f8f
Reviewed-on: https://go-review.googlesource.com/c/go/+/168957
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
@gopherbot
Copy link

Change https://golang.org/cl/169121 mentions this issue: test: skip test expecting inlining on noopt builder

@gopherbot
Copy link

Change https://golang.org/cl/169161 mentions this issue: test: fix fixedbugs/issue30908.go to work with no-opt builder

@gopherbot
Copy link

Change https://golang.org/cl/169417 mentions this issue: cmd/compile: better handling for PAUTOHEAP in DWARF inline gen

gopherbot pushed a commit that referenced this issue Mar 27, 2019
… DWARF inline gen

When generating DWARF inline info records, the post-SSA code looks
through the original "pre-inline" dcl list for the function so as to
handle situations where formal params are promoted or optimized away.
This code was not properly handling the case where an output parameter
was promoted to the heap -- in this case the param node is converted
in place from class PPARAMOUT to class PAUTOHEAP. This caused
inconsistencies later on, since the variable entry in the abstract
subprogram DIE wound up as a local and not an output parameter.

Updates #30908.
Fixes #31028.

Change-Id: Ia70b89f0cf7f9b16246d95df17ad6e307228b8c7
Reviewed-on: https://go-review.googlesource.com/c/go/+/168818
Reviewed-by: Cherry Zhang <cherryyz@google.com>
(cherry picked from commit 68a98d5)
Reviewed-on: https://go-review.googlesource.com/c/go/+/169417
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
gopherbot pushed a commit that referenced this issue Mar 27, 2019
Update the issue 30908 test to work with the no-opt builder
(this requires a corresponding change in the linker as well).
As part of this change, 'rundir' tests are now linked without
passing "-w" to the linker.

Updates #30908.
Fixes #31034.

Change-Id: Ic776e1607075c295e409e1c8230aaf55a79a6323
Reviewed-on: https://go-review.googlesource.com/c/go/+/169161
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@golang golang locked and limited conversation to collaborators Mar 25, 2020
@rsc rsc unassigned thanm Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. OS-Darwin
Projects
None yet
Development

No branches or pull requests

7 participants