Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debug/elf: viewcore cannot read s390x z/Linux Go coredump: decoding dwarf section info at offset 0x0: too short #64431

Closed
kgibm opened this issue Nov 28, 2023 · 7 comments
Assignees
Labels
arch-s390x Issues solely affecting the s390x architecture. compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Linux
Milestone

Comments

@kgibm
Copy link

kgibm commented Nov 28, 2023

Go version

go version go1.21.4 linux/s390x

What operating system and processor architecture are you using (go env)?

GO111MODULE=''
GOARCH='s390x'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='s390x'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOPROXY='direct'
GOROOT='/usr/lib/golang'
GOSUMDB='off'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/usr/lib/golang/pkg/tool/linux_s390x'
GOVCS=''
GOVERSION='go1.21.4'
GCCGO='gccgo'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/tmp/debug/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -march=z196 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1176343210=/tmp/go-build -gno-record-gcc-switches'

What did you do?

A Go program is using excessive memory on z/Linux. We took a core dump with gcore, built viewcore on a z/Linux machine and tried to analyze the core but it failed:

$ ./viewcore core.156909 --exe manager
error reading dwarf: error reading DWARF info from manager: decoding dwarf section info at offset 0x0: too short

The error is coming from https://github.com/golang/debug/blob/master/internal/core/process.go#L263

	dwarf, dwarfErr := exeElf.DWARF()
	if dwarfErr != nil {
		dwarfErr = fmt.Errorf("error reading DWARF info from %s: %v", exeFile.Name(), dwarfErr)
	}

Therefore it seems this is an issue with debug/elf rather than viewcore.

The core dump looks fine and it is of the expected size matching the virtual size of the process:

$ file core.156909 
core.156909: ELF 64-bit MSB core file, IBM S/390, version 1 (SYSV), SVR4-style, from '/manager'
$ ls -l core.156909 
-rw-r--r--. 1 root root 754368488 Nov 28 18:28 core.156909
$ readelf -a core.156909 | head
ELF Header:
  Magic:   7f 45 4c 46 02 02 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, big endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              CORE (Core file)
  Machine:                           IBM S/390
  Version:                           0x1

What did you expect to see?

viewcore works

What did you see instead?

error reading dwarf: error reading DWARF info from manager: decoding dwarf section info at offset 0x0: too short
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 28, 2023
@dmitshur dmitshur added OS-Linux NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. arch-s390x Issues solely affecting the s390x architecture. labels Nov 28, 2023
@dmitshur dmitshur added this to the Backlog milestone Nov 28, 2023
@dmitshur
Copy link
Contributor

CC @golang/s390x, @golang/compiler.

@thanm thanm self-assigned this Nov 28, 2023
@thanm
Copy link
Contributor

thanm commented Nov 28, 2023

I'll take a look, most likely a bit later this week.

@kgibm
Copy link
Author

kgibm commented Nov 28, 2023

Thanks. I should have added that the error is about the manager Go executable (from which the core was produced) and that also looks okay:

# readelf -a manager | head
ELF Header:
  Magic:   7f 45 4c 46 02 02 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2's complement, big endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           IBM S/390
  Version:                           0x1

@thanm
Copy link
Contributor

thanm commented Nov 29, 2023

Hi @kgibm , I spent a little while exploring this issue.

I have limited access to our s390x machine (no ssh, only gomote commands), and our machine does not have "gcore", but I did get as far as creating a core dump using a toy program and then running 'viewcore' on the dump.

The good news is that viewcore was able to read the DWARF info for my toy program; the bad news is that it ran into problems later on (crash due to a different error of some sort). The crash I am seeing is

goroutine 1 [running]:
golang.org/x/debug/internal/core.(*Thread).SP(...)
	/data/golang/workdir/xxx/debug/internal/core/thread.go:32
golang.org/x/debug/internal/gocore.(*Process).readG(0xc000394000, {0xc000394000, 0xc000002340, 0xc000124b40})
	/data/golang/workdir/xxx/debug/internal/gocore/process.go:688 +0xa7e
golang.org/x/debug/internal/gocore.(*Process).readGs(0xc000394000)
	/data/golang/workdir/xxx/debug/internal/gocore/process.go:648 +0x118
golang.org/x/debug/internal/gocore.Core(0xc00038a000)
	/data/golang/workdir/xxx/debug/internal/gocore/process.go:173 +0x4f4
main.readCore()
	/data/golang/workdir/xxx/debug/cmd/viewcore/main.go:277 +0xc2
main.runMappings(0x17dd9a0, {0xc000098f20, 0x0, 0x1})
	/data/golang/workdir/xxx/debug/cmd/viewcore/main.go:406 +0x2a

which is fairly early in the coredump reading process, it's having trouble making sense of the "g" structure for one of the program's goroutines.

Viewcore is only sporadically/sparsely maintained, so it is entirely possible that something has changed in the runtime datastructures in the 1.21 timeframe that is throwing it off.

Some questions for you:

  • what sort of output do you see from the command

    readelf --debug-dump=info manager | head -40

  • in your crash the DWARF reader is reporting a zero offset for the .debug_info section; is this section actually present in your binary? what do you see when you run the command

    readelf --wide -S manager | fgrep debug_info

Thanks.

@kgibm
Copy link
Author

kgibm commented Nov 29, 2023

@thanm

  • what sort of output do you see from the command
    readelf --debug-dump=info manager | head -40

Empty

  • in your crash the DWARF reader is reporting a zero offset for the .debug_info section; is this section actually present in your binary? what do you see when you run the command
    readelf --wide -S manager | fgrep debug_info

Empty

I guess this explains my proximate issue. The executable also notes "stripped" which I presume is another symptom of lacking DWARF information:

$ file manager
manager: ELF 64-bit MSB executable, IBM S/390, version 1 (SYSV), statically linked, Go BuildID=Kz1lgHB4w_kxUOvlQqtu/tq5tP0TciDudwbRV8f6M/f7kzpDaG_HNG7APeD9w6/3cdjWHx8fvnsL3NCm5MQ, stripped

I will pass this on to the team producing the package. This is based on the Kubernetes operator-lifecycle-manager so I'm not sure if OLM is doing this or our build scripts.

If you have any tips on how to check what is causing the stripping and/or how to maximize symbols, that would be helpful.

Thanks!

@thanm
Copy link
Contributor

thanm commented Nov 29, 2023

Yup, does indeed look like a stripped executable.

Kubernetes for a number of other big binaries like kubelet adds Go linker options to strip things that aren't explicitly being used for debugging (if I recall correctly). E.g.

https://github.com/kubernetes/kubernetes/blob/3c268b752448f66ba8338cb62ff9a4a14f77873b/hack/lib/golang.sh#L887

So I think the thing to do would be to check if the build is being done with go build -o manager ... -ldflags="-s -w" (where "-s" strips symbols and "-w" strips DWARF). HTH.

@kgibm
Copy link
Author

kgibm commented Nov 29, 2023

Yes, confirmed we are using go build -ldflags="-s -w". We'll re-consider. This explains the issue so I'll close it out. For now, we'll switch to debug.WriteHeapDump on SIGUSR1 to produce a heapdump instead of trying to use viewcore. Thanks.

@kgibm kgibm closed this as completed Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-s390x Issues solely affecting the s390x architecture. compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Linux
Projects
None yet
Development

No branches or pull requests

4 participants