Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link, debug/dwarf: missing debug information using gdb set breakpoint on Entry point #38192

Closed
kumakichi opened this issue Apr 1, 2020 · 10 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@kumakichi
Copy link
Contributor

What version of Go are you using (go version)?

$ go version 1.14.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/gopath"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/root/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/root/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build429033440=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package main
func main() {
println("hello world")
}
go build -gcflags "all=-N -l" demo.go
gdb ./demo
(gdb) info files
Symbols from "/dev/shm/ex/demo".
Local exec file:
`/dev/shm/ex/demo', file type elf64-x86-64.
Entry point: 0x4542a0
...

What did you expect to see?

(gdb) b *0x4542a0
Breakpoint 1 at 0x4542a0: file /dev/shm/ex/go/src/runtime/rt0_linux_amd64.s, line 8.

What did you see instead?

(gdb) b *0x45cdd0
Breakpoint 1 at 0x45cdd0

The file and line information is missing, I tested go1.13.9,it's fine, only go1.14/go1.14.1 has this problem

@thanm
Copy link
Contributor

thanm commented Apr 1, 2020

I looked at this a little.

There were a number of changes in DWARF generation between 1.13 and 1.14, but I think the one that seems to be most relevant is that chunks of the line table are emitted directly by the compiler as opposed to being synthesized in the linker.

As part of moving more of line table generation into the compiler, things were changed so that each Go object file is given its own DWARF compilation unit, as opposed a single compilation unit per package. So if you have a package ABC with a couple of *.go files and one *.s file, at the DWARF level you'll see two compilation units, one for the Go code and one for the assembly.

In 1.14 the routine in question is just a blob within the giant runtime compilation unit. Here's what the relevant fragment from line table looks like (this is objdump --dwarf=rawline):

  ...
  [0x00017b63]  Advance Line by -511 to 12
  [0x00017b66]  Special opcode 60: advance Address by 6 to 0x44d720 and Line by -4 to 8
  [0x00017b67]  Set File Name to entry 83 in the File Name Table
  [0x00017b69]  Advance Line by 38 to 46

The entrypoint in question (_rt0_amd64_linux) is at address 0x44d720 above. Now here's what things look like in 1.14:

  [0x0001e5d0]  Extended opcode 2: set Address to 0x4558f0
  [0x0001e5db]  Set File Name to entry 1 in the File Name Table
  [0x0001e5dd]  Advance Line by 2 to 3
  [0x0001e5df]  Special opcode 9: advance Address by 0 to 0x4558f0 and Line by 5 to 8
  [0x0001e5e0]  Set File Name to entry 1 in the File Name Table
  [0x0001e5e2]  Advance Line by -7 to 1
  [0x0001e5e4]  Copy (view 1)
  [0x0001e5e5]  Extended opcode 1: End of Sequence

In the dump above (_rt0_amd64_linux) is at address 0x4558f0.

What's interesting here is that the first section is ok (it sets the file correctly to rt0_linux_amd64.s) and the line to 8, but then it seems to come along and apply a second line number of 1 to the same location. Looking at things in the decodedline dump I see:

  CU: /ssd2/go.master/src/runtime/rt0_linux_amd64.s:
  File name                            Line number    Starting address    View    Stmt

  .//ssd2/go.master/src/runtime/rt0_linux_amd64.s:[++]
  aster/src/runtime/rt0_linux_amd64.s            8            0x4558f0               x

  .//ssd2/go.master/src/runtime/rt0_linux_amd64.s:[++]
  aster/src/runtime/rt0_linux_amd64.s            1            0x4558f0       1       x
  aster/src/runtime/rt0_linux_amd64.s            1            0x4558f0       2       x

Note the "view" -- I am not really sure what objdump is trying to say here (since "view" is not a real register in the DWARF line table AFAIK) but it doesn't look quite right to have the location be both line 8 and line 1. I am speculating that this is what's confusing GDB.

CC'ing @jeremyfaller , since he did the work there.

@thanm thanm added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 1, 2020
@thanm thanm added this to the Go1.15 milestone Apr 1, 2020
@odeke-em odeke-em changed the title go1.14/go1.14.1 missing debug information using gdb set breakpoint on Entry point cmd/link, debug/dwarf: missing debug information using gdb set breakpoint on Entry point Apr 5, 2020
@thanm
Copy link
Contributor

thanm commented Apr 16, 2020

Oops, got the wrong github user when I pinged before. Trying again: @jeremyfaller

@jeremyfaller
Copy link
Contributor

thanks, @thanm. I'll take a look.

@odeke-em
Copy link
Member

@thanm @jeremyfaller shall we move this change perhaps to Go1.16 instead?

@thanm
Copy link
Contributor

thanm commented May 29, 2020

I think @jeremyfaller is tied up with other things things this week. I'll see if I can figure out a fix. Stay tuned.

@thanm
Copy link
Contributor

thanm commented May 29, 2020

I spent some time working on this bug. It is an interesting puzzle (seems like this happens a lot with DWARF bugs).

When I first looked at this problem, I assumed that the confusion on the part of GDB was due to this code:

https://go.googlesource.com/go/+/ee3dded36d69264998c39af0ec851371850d842b/src/cmd/internal/obj/dwarf.go#139

which is one of the main places where the DWARF line table contents are different between 1.13 and 1.14, in addition to the finer granularity of compile units.

After spending some time debugging and experimenting, I'm not sure if my original theory holds water. I think the problem looks more due to a quirk in how GDB is reading the line table and how it handles the end_sequence op.

The translation unit in question contains (after dead code elimination) a single function (this is from an assembly source):

  TEXT _rt0_amd64_linux(SB),NOSPLIT,$-8
	JMP	_rt0_amd64(SB)
        NOP

The line table fragment from this looks like:

 Line Number Statements:
  [0x00000903]  Extended opcode 2: set Address to 0x464f00
  [0x0000090e]  Set File Name to entry 1 in the File Name Table
  [0x00000910]  Advance Line by 2 to 3
  [0x00000912]  Special opcode 9: advance Address by 0 to 0x464f00 and Line by 5 to 8
  [0x00000913]  Set File Name to entry 1 in the File Name Table
  [0x00000915]  Advance Line by -7 to 1
  [0x00000917]  Copy (view 1)
  [0x00000918]  Extended opcode 1: End of Sequence

So to summarize, there are two rows (one from the special opcode at 0x912 and then next from the copy at 0x918).

I ran the program under GDB using a hidden maintainence command that traces the GDB line table reader:

  (gdb) set debug dwarf-line 1
  (gdb) b *0x464f00
  Processing actual line 8: file 1, address 0x464f00, is_stmt 1, discrim 0
  Recording line 8, file rt0_linux_amd64.s, address 0x464f00
  Processing actual line 8: file 1, address 0x464f00, is_stmt 1, discrim 0	(end sequence)
  Finishing current line, file rt0_linux_amd64.s, address 0x464f00
  Recording line 0, file rt0_linux_amd64.s, address 0x464f00
  Breakpoint 1 at 0x464f00
  (gdb) run

Note the two "Record line" trace lines: these correspond to the point where the GDB line table reader takes the current contents of the line table registers (according to its decoder) and copies them into its own internal representation of the line table. The first one looks good (line 8, all fine here), but then there is the second:

Recording line 0, file rt0_linux_amd64.s, address 0x464f00

This "record line" operation is the one being triggered by the end_sequence op, and it effectively overwrites the original line of 8 (for PC 0x464f00) with a line of zero. The 0-valued line seems to be the thing doing the damage here.

I did some more experiments. First, I changed the assembly source to:

  TEXT _rt0_amd64_linux(SB),NOSPLIT,$-8
	JMP	_rt0_amd64(SB)
        NOP

This gets rid of the problem completely: when the end_sequence operator is encountered.

I also spent some time looking at how various C compilers handle this same situation. Here's a small C routine:

 void empty() {
 }

This code gets compiled down to a single instruction ("ret"). However if you play the same game with GDB for this function, you get a different story. Example:

  (gdb) set debug dwarf-line 1
  (gdb) p &empty
  Processing actual line 1: file 1, address 0x11eb, is_stmt 1, discrim 0
  Recording line 1, file empty.c, address 0x11eb
  Processing actual line 2: file 1, address 0x11eb, is_stmt 1, discrim 0
  Recording line 2, file empty.c, address 0x11eb
  Processing actual line 2: file 1, address 0x11ec, is_stmt 1, discrim 0	(end sequence)
  Finishing current line, file empty.c, address 0x11ec
  Recording line 0, file empty.c, address 0x11ec
  $1 = (int (*)(int)) 0x11eb <empty>
  (gdb) disas empty
  Dump of assembler code for function empty:
    0x00000000000011eb <+0>:	retq   
  (gdb) 

Note what the line table reader is telling us. Even though 'empty' is only a single instruction long, the line table for it advances the PC past that instruction to the next instruction before issuing the end_sequence. So when the end_sequence triggers recording of a row, the "line 0" is applied to the next instruction, not the inst in "empty".

@gopherbot
Copy link

Change https://golang.org/cl/235739 mentions this issue: cmd/{compile,link}: fix problem with DWARF end_sequence ops

@thanm thanm self-assigned this May 29, 2020
@thanm thanm added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 29, 2020
@odeke-em
Copy link
Member

Nice work and investigation, thank you @thanm!

@gopherbot
Copy link

Change https://golang.org/cl/235917 mentions this issue: cmd/link: new DWARF line table test case

gopherbot pushed a commit that referenced this issue Jun 3, 2020
Add a test case for an issue with how Go emits DWARF line tables,
specifically relating to the line table "end sequence" operator.

Updates #38192.

Change-Id: I878b262e6ca6c550c0e460c3d5a1969ac4a2c31b
Reviewed-on: https://go-review.googlesource.com/c/go/+/235917
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
@gopherbot
Copy link

Change https://golang.org/cl/258422 mentions this issue: [release-branch.go1.15] cmd/{compile,link}: backport fix for issue 39757

gopherbot pushed a commit that referenced this issue Oct 23, 2020
During Go 1.15 development, a fix was added to the toolchain for issue
information. The 1.15 line tables were slightly malformed in the way
that they used the DWARF "end sequence" operator, resulting in
incorrect line table info for the final instruction in the final
function of a compilation unit.

This problem was fixed in https://golang.org/cl/235739, which made it
into Go 1.15. It now appears that while the fix works OK for linux, in
certain cases it causes issues with the Darwin linker (the "address
not in any section" ld64 error reported in issue #40974).

During Go 1.16 development, the fix in https://golang.org/cl/235739
was revised so as to fix another related problem (described in issue #39757);
the newer fix does not trigger the problem in the Darwin linker however.

This CL back-ports the changes in https://golang.org/cl/239286 to the
1.15 release branch, so as to fix the Darwin linker error.

Updates #38192.
Updates #39757.
Fixes #40974.

Change-Id: I9350fec4503cd3a76b97aaea0d8aed1511662e29
Reviewed-on: https://go-review.googlesource.com/c/go/+/258422
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jeremy Faller <jeremy@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Than McIntosh <thanm@google.com>
@golang golang locked and limited conversation to collaborators Sep 30, 2021
@rsc rsc unassigned thanm Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants