Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/link: dwarf sections cause program not to run on Windows #20183

Closed
egonelbre opened this issue Apr 30, 2017 · 22 comments
Closed

cmd/link: dwarf sections cause program not to run on Windows #20183

egonelbre opened this issue Apr 30, 2017 · 22 comments

Comments

@egonelbre
Copy link
Contributor

egonelbre commented Apr 30, 2017

What did you do?

I have a program that fails to run with compiler 29f0619 and later.

package main

import "C"
import _ "net/http/pprof"

func main() {}

Reverting that specific commit fixes the problem.

What did you expect to see?

Program starts as usual.

What did you see instead?

I see a message "This app cannot run on your PC".

System details

Microsoft Windows [Version 10.0.10586]

go version devel +29f0619 Wed Mar 1 04:51:03 2017 +0000 windows/amd64
GOARCH="amd64"
GOBIN=""
GOEXE=".exe"
GOHOSTARCH="amd64"
GOHOSTOS="windows"
GOOS="windows"
GOPATH="F:\Go"
GORACE=""
GOROOT="C:\Go.tip"
GOTOOLDIR="C:\Go.tip\pkg\tool\windows_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\Egon\AppData\Local\Temp\go-build971827527=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOROOT/bin/go version: go version devel +29f0619 Wed Mar 1 04:51:03 2017 +0000 windows/amd64
GOROOT/bin/go tool compile -V: compile version devel +29f0619 Wed Mar 1 04:51:03 2017 +0000 X:framepointer
lldb --version: lldb version 4.0.0
gdb --version: GNU gdb (GDB) 7.9.1

CC: @alexbrainman

@egonelbre
Copy link
Contributor Author

Running go tool objdump gives:

F:\Go\src\otsim.ee\otsim>go tool objdump otsim.exe
objdump: disassemble otsim.exe: runtime.pclntab and runtime.epclntab symbols must be in the same section

After guessing and disabling writing of gdbscript in ld/dwarf.go, it works:

 // If the pcln table contains runtime/runtime.go, use that to set gdbscript path.
 func finddebugruntimepath(s *Symbol) {
+	return
 	if gdbscript != "" {
 		return
 	}

The main difference in binary seems that the "/4" section is first (before ".text") in the broken version, but in the good version it's after ".tls".

Not sure what the proper fix is though.

@alexbrainman
Copy link
Member

I can reproduce this with:

diff --git a/src/cmd/link/internal/ld/dwarf.go b/src/cmd/link/internal/ld/dwarf.go
index 1f80f8c..e934ce5a 100644
--- a/src/cmd/link/internal/ld/dwarf.go
+++ b/src/cmd/link/internal/ld/dwarf.go
@@ -871,7 +871,7 @@ func finddebugruntimepath(s *Symbol) {
 
 	for i := range s.FuncInfo.File {
 		f := s.FuncInfo.File[i]
-		if i := strings.Index(f.Name, "runtime/runtime.go"); i >= 0 {
+		if i := strings.Index(f.Name, "runtime/proc.go"); i >= 0 {
 			gdbscript = f.Name[:i] + "runtime/runtime-gdb.py"
 			break
 		}
diff --git a/src/debug/pe/file_test.go b/src/debug/pe/file_test.go
index 7957083..6986442 100644
--- a/src/debug/pe/file_test.go
+++ b/src/debug/pe/file_test.go
@@ -308,7 +308,7 @@ func testDWARF(t *testing.T, linktype int) {
 	if err != nil {
 		t.Fatal(err)
 	}
-	defer os.RemoveAll(tmpdir)
+	//defer os.RemoveAll(tmpdir)
 
 	src := filepath.Join(tmpdir, "a.go")
 	file, err := os.Create(src)
@@ -334,7 +334,7 @@ func testDWARF(t *testing.T, linktype int) {
 	case linkCgoInternal:
 		args = append(args, "-ldflags", "-linkmode=internal")
 	case linkCgoExternal:
-		args = append(args, "-ldflags", "-linkmode=external")
+		args = append(args, "-ldflags", "-linkmode=external -tmpdir=" + tmpdir)
 	default:
 		t.Fatalf("invalid linktype parameter of %v", linktype)
 	}
@@ -343,6 +343,7 @@ func testDWARF(t *testing.T, linktype int) {
 	if err != nil {
 		t.Fatalf("building test executable failed: %s %s", err, out)
 	}
+	t.Logf("%v\n", string(out))
 	out, err = exec.Command(exe).CombinedOutput()
 	if err != nil {
 		t.Fatalf("running test executable failed: %s %s", err, out)

C:\dev\go\src\debug\pe>go install -v cmd/link && go test -v -run=Ext
=== RUN   TestExternalLinkerDWARF
--- FAIL: TestExternalLinkerDWARF (1.17s)
        file_test.go:346:
        file_test.go:349: running test executable failed: fork/exec C:\Users\brainman\AppData\Local\Temp\TestDWARF829794735\a.exe: %1 is not a valid Win32 application.
FAIL
exit status 1
FAIL    debug/pe        1.210s

C:\dev\go\src\debug\pe>

After guessing and disabling writing of gdbscript in ld/dwarf.go, it works:

I suspect your program (unlike everyone elses) uses some symbols from runtime/runtime.go. So this triggers .debug_gdb_scripts exra section to be included in your exe. If you want not to have .debug_gdb_scripts (as everyone else), your patch is good for that.

Maybe we could just do what you did for all windows users. But I would like to understand what .debug_gdb_scripts is for, before we ignore it.

Thank you for creating this issue.

Alex

@egonelbre
Copy link
Contributor Author

debug_gdb_scripts seem to be for formatting Go structures in gdb. As a temporary fix, it sounds good, but I'm guessing there is a proper fix. Unfortunately, dwarf is beyond my knowledge at this moment.

The only thing I was able to notice was that the order of file sections was different when the gdb_script was included (seen with PEInsider)... My guess is that when the ordering is fixed, then the problem should disappear. Which leads me to conclude, that either there must be some buggy section sorting somewhere or that order of adding the sections to the list itself is the problem. (But as I said, at the moment this stuff is beyond my knowledge.)

@alexbrainman
Copy link
Member

debug_gdb_scripts seem to be for formatting Go structures in gdb. As a temporary fix, it sounds good, but I'm guessing there is a proper fix.

Yes, there must be proper fix. But I do not know what it is.
What I am proposing will make your program build again. Do you miss debug_gdb_scripts section?

Unfortunately, dwarf is beyond my knowledge at this moment.

Welcome to the club.

The only thing I was able to notice was that the order of file sections was different when the gdb_script was included (seen with PEInsider)...

Yes.

C:\Users\brainman\AppData\Local\Temp\TestDWARF360094087>objdump -h a.exe

a.exe:     file format pei-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .debug_gdb_scripts 00000026  0000000100000000  0000000100000000  00000480  2**0
                  CONTENTS, READONLY, DEBUGGING
  1 .text         00124cd8  0000000000401000  0000000000401000  00000800  2**5
                  CONTENTS, ALLOC, LOAD, READONLY, CODE, DATA
  2 .data         00003c50  0000000000526000  0000000000526000  00125600  2**5
                  CONTENTS, ALLOC, LOAD, DATA
  3 .rdata        000009c0  000000000052a000  000000000052a000  00129400  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .pdata        000002ac  000000000052b000  000000000052b000  00129e00  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .xdata        00000228  000000000052c000  000000000052c000  0012a200  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .bss          0001f720  000000000052d000  000000000052d000  00000000  2**6
                  ALLOC
  7 .idata        00000ce8  000000000054d000  000000000054d000  0012a600  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  8 .CRT          00000068  000000000054e000  000000000054e000  0012b400  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  9 .tls          00000068  000000000054f000  000000000054f000  0012b600  2**5
                  CONTENTS, ALLOC, LOAD, DATA
 10 .debug_aranges 000001c0  0000000000550000  0000000000550000  0012b800  2**4
                  CONTENTS, READONLY, DEBUGGING
 11 .debug_pubnames 00005056  0000000000551000  0000000000551000  0012ba00  2**0
                  CONTENTS, READONLY, DEBUGGING
 12 .debug_pubtypes 0000868c  0000000000557000  0000000000557000  00130c00  2**0
                  CONTENTS, READONLY, DEBUGGING
 13 .debug_info   00048439  0000000000560000  0000000000560000  00139400  2**0
                  CONTENTS, READONLY, DEBUGGING
 14 .debug_abbrev 000008b8  00000000005a9000  00000000005a9000  00181a00  2**0
                  CONTENTS, READONLY, DEBUGGING
 15 .debug_line   0001539e  00000000005aa000  00000000005aa000  00182400  2**0
                  CONTENTS, READONLY, DEBUGGING
 16 .debug_frame  00012088  00000000005c0000  00000000005c0000  00197800  2**3
                  CONTENTS, READONLY, DEBUGGING
 17 .debug_str    0000001f  00000000005d3000  00000000005d3000  001a9a00  2**0
                  CONTENTS, READONLY, DEBUGGING
 18 .debug_loc    00000435  00000000005d4000  00000000005d4000  001a9c00  2**0
                  CONTENTS, READONLY, DEBUGGING

C:\Users\brainman\AppData\Local\Temp\TestDWARF360094087>objdump -h go.o

go.o:     file format pe-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         001228e3  0000000000000000  0000000000000000  00000404  2**5
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE, DATA
  1 .data         00003ba0  0000000000000000  0000000000000000  00122ce7  2**5
                  CONTENTS, ALLOC, LOAD, RELOC, DATA
  2 .bss          0001ec60  0000000000000000  0000000000000000  00000000  2**5
                  ALLOC
  3 .debug_abbrev 000000ff  0000000000000000  0000000000000000  00126887  2**0
                  CONTENTS, READONLY, DEBUGGING
  4 .debug_line   00014e7a  0000000000000000  0000000000000000  00126986  2**0
                  CONTENTS, RELOC, READONLY, DEBUGGING
  5 .debug_frame  00011d9c  0000000000000000  0000000000000000  0013b800  2**0
                  CONTENTS, RELOC, READONLY, DEBUGGING
  6 .debug_pubnames 00005056  0000000000000000  0000000000000000  0014d59c  2**0
                  CONTENTS, RELOC, READONLY, DEBUGGING
  7 .debug_pubtypes 0000868c  0000000000000000  0000000000000000  001525f2  2**0
                  CONTENTS, RELOC, READONLY, DEBUGGING
  8 .debug_aranges 00000030  0000000000000000  0000000000000000  0015ac7e  2**0
                  CONTENTS, RELOC, READONLY, DEBUGGING
  9 .debug_gdb_scripts 00000026  0000000000000000  0000000000000000  0015acae  2**0
                  CONTENTS, READONLY, DEBUGGING
 10 .debug_info   00045939  0000000000000000  0000000000000000  0015acd4  2**0
                  CONTENTS, RELOC, READONLY, DEBUGGING
 11 .ctors        00000008  0000000000000000  0000000000000000  001a060d  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

C:\Users\brainman\AppData\Local\Temp\TestDWARF360094087>

Notice how go.o (cmd/link generates that) has .debug_gdb_scripts section at position 9, while a.exe (gcc generates that) has .debug_gdb_scripts first on the list?

My guess is that when the ordering is fixed, then the problem should disappear. Which leads me to conclude, that either there must be some buggy section sorting somewhere or that order of adding the sections to the list itself is the problem. (But as I said, at the moment this stuff is beyond my knowledge.)

I don't know what the problem is. I will try debug this, but I am not sure it is going to be quick.

Alex

@egonelbre
Copy link
Contributor Author

egonelbre commented May 2, 2017

Found a standalone example that can be used to reproduce this (and doesn't require adjustments to compiler):

package main

import "C"
import _ "net/http/pprof"

func main() {}

@egonelbre
Copy link
Contributor Author

It definitely is caused by the .debug_gdb_scripts, because this fixes the problem:

> objcopy --remove-section .debug_gdb_scripts bad.exe bad.fixed.exe
> bad.fixed.exe

Best theory seems to be that the alignment/file-offset is affecting it because:

> objcopy --file-alignment 0x400 bad.exe bad.aligned.exe
> objdump -h bad.aligned.exe

...
 17 .debug_loc    000030a5  00000000005e6000  00000000005e6000  001bd000  2**0
                  CONTENTS, READONLY, DEBUGGING
 18 .debug_ranges 00000520  00000000005ea000  00000000005ea000  001c0400  2**0
                  CONTENTS, READONLY, DEBUGGING
 19 .debug_gdb_scripts 00000026  0000000100000000  0000000100000000  001c0c00  2**0
                  CONTENTS, READONLY, DEBUGGING

> objcopy --change-section-addr .debug_gdb_scripts=0x5eb000 bad.aligned.exe bad.fixed.exe

Then it starts working. (Initial file http://egonelbre.com/files/go/bad.exe).

PS: it seems that Rust has hit a similar issue in rust-lang/rust#25229.

@egonelbre
Copy link
Contributor Author

And here is a standalone gcc program that has the same problem:

int main(int argc, char const *argv[]) { return 0; }

__attribute__ ((section(".debug_gdb_scripts")))
char data[] = "C:\\debug.py";

@alexbrainman
Copy link
Member

And here is a standalone gcc program that has the same problem:

Indeed:

C:\tmp>gcc -o a.exe a.c | objdump -h a.exe

a.exe:     file format pei-i386

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .debug_gdb_scripts 0000000c  00000000  00000000  00000380  2**2
                  CONTENTS, READONLY, DEBUGGING
  1 .text         00000c44  00401000  00401000  00000600  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE, DATA
  2 .data         00000010  00402000  00402000  00001400  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  3 .rdata        00000124  00403000  00403000  00001600  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .eh_frame     000003a0  00404000  00404000  00001800  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .bss          00000060  00405000  00405000  00000000  2**2
                  ALLOC
  6 .idata        00000364  00406000  00406000  00001c00  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  7 .CRT          00000018  00407000  00407000  00002000  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  8 .tls          00000020  00408000  00408000  00002200  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  9 .debug_aranges 00000018  00409000  00409000  00002400  2**0
                  CONTENTS, READONLY, DEBUGGING
 10 .debug_info   00000dc5  0040a000  0040a000  00002600  2**0
                  CONTENTS, READONLY, DEBUGGING
 11 .debug_abbrev 000000a9  0040b000  0040b000  00003400  2**0
                  CONTENTS, READONLY, DEBUGGING
 12 .debug_line   000000d1  0040c000  0040c000  00003600  2**0
                  CONTENTS, READONLY, DEBUGGING

C:\tmp>

Lets see if @ianlancetaylor can help.

Alex

@egonelbre
Copy link
Contributor Author

Also seems that the as long as the section name isn't in some predefined set, it breaks... e.g.

Works:

int main(int argc, char const *argv[]){ return 0; }
__attribute__ ((section(".debug_pubnames")))
char data[] = "C:\\debug.py";

Doesn't work:

int main(int argc, char const *argv[]){ return 0; }
__attribute__ ((section(".debug_xxx")))
char data[] = "C:\\debug.py";

Guess: somewhere in the linker there is a table that is used to look up VMA properties for debug sections, but the default value returned (for things not in the table) is inappropriate for Windows.

@ianlancetaylor ianlancetaylor changed the title cmd/link: drawf sections cause program not to run on Windows cmd/link: dwarf sections cause program not to run on Windows May 4, 2017
@ianlancetaylor
Copy link
Contributor

The GNU linker source files that fail to list the .debug_gdb_scripts section are ld/scripttempl/pe.sc and ld/scripttempl/pep.sc. The best fix is to handle this case in gld_${EMULATION_NAME}_place_orphan in ld/emultempl/pe.em and ld/emultempl/pep.em. The best way to get that to happen is to file a bug report at https://sourceware.org/bugzilla with your C replication case and a pointer to this issue.

Until that bug in the linker is fixed, the only reasonable option I see is to skip calling writegdbscript when using external linking on Windows.

@egonelbre
Copy link
Contributor Author

The best way to get that to happen is to file a bug report at

Done https://sourceware.org/bugzilla/show_bug.cgi?id=21459

@gopherbot
Copy link

CL https://golang.org/cl/42651 mentions this issue.

@AlekSi
Copy link
Contributor

AlekSi commented May 6, 2017

Should this issue remain open with "Blocked" label?

@alexbrainman
Copy link
Member

Should this issue remain open with "Blocked" label?

Done.

Alex

@alexbrainman alexbrainman reopened this May 6, 2017
@dlsniper
Copy link
Contributor

dlsniper commented May 7, 2017

I think this can be safely closed now that the issues was fixed in 42651?

@alexbrainman
Copy link
Member

I think this can be safely closed now that the issues was fixed in 42651?

CL 42651 disables .debug_gdb_scripts section generation.

It is a work around the problem that is out of our control #20183 (comment) But I hope one day we might be able to revert CL 42651 in some form. That is why I left this opened.

I will let others close this if they see fit.

Alex

@egonelbre
Copy link
Contributor Author

I think it's unlikely we can remove the gcc fix in near future, even if gcc gets fixed immediately. We could potentially replace with some other workaround:

  1. We could compile with the wrong location and fix-up with objcopy (or something in Go)... but that seems a pretty extensive workaround.
  2. There might be a way to coerce ld to place the section in the proper location, but I don't know ld very well and initial searches didn't come up with useful information.
  3. There might be a way to add the section manually after creating the executable without it.

I think it would be doable some way, but I suspect it wouldn't be worthwhile in the end.

@alexbrainman
Copy link
Member

We could compile with the wrong location and fix-up with objcopy (or something in Go)... but that seems a pretty extensive workaround.

Sounds too hard.

There might be a way to coerce ld to place the section in the proper location, but I don't know ld very well and initial searches didn't come up with useful information.

That might be quite simple to do, if we know what to do. Maybe we could ask there https://sourceware.org/bugzilla/show_bug.cgi?id=21459

There might be a way to add the section manually after creating the executable without it.

Sounds too hard and might slow down build process.

I suspect it wouldn't be worthwhile in the end.

Close this issue, if you don't think there is anything for us to do here.

Alex

@egonelbre
Copy link
Contributor Author

Nick Clifton provided a linker script that can be used to place the .debug_gdb_scripts into the correct location:

SECTIONS
{
  .debug_gdb_scripts BLOCK(__section_alignment__) (NOLOAD) :
  {
    *(.debug_gdb_scripts)
  }
}
INSERT AFTER .debug_types;

And then invoke gcc with that script: gcc -Wl,-T,fix-debug-gdb-scripts.ld. Note, I only tested this exact script only with the C example.

@alexbrainman
Copy link
Member

Nick Clifton provided a linker script

Thanks to Nick Clifton and you - I suspect it might work just fine for our purpose. I will try it unless someone beats me to it. Mind you, currently Go does not add .debug_gdb_scripts sections when using external linker, so we need to fix that first (it is issue #20218).

Alex

@gopherbot
Copy link

CL https://golang.org/cl/43331 mentions this issue.

@egonelbre
Copy link
Contributor Author

Cross-reference from CL for search:

For buildmode=c-archive the linker script cannot be applied automatically, because the final linking will be done outside go toolchain. Having it a requirement would be cumbersome and bad user-experience. So the .debug_gdb_scripts won't be included for c-archives.

The script can still be added manually to the executable or manually loaded in gdb, when necessary.

@golang golang locked and limited conversation to collaborators May 15, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants