Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/addr2line: pclntab and epclntab in different sections on Windows #14710

Closed
aclements opened this issue Mar 8, 2016 · 15 comments
Closed

cmd/addr2line: pclntab and epclntab in different sections on Windows #14710

aclements opened this issue Mar 8, 2016 · 15 comments

Comments

@aclements
Copy link
Member

Very occasionally addr2line fails on Windows with an error like:

--- FAIL: TestAddr2Line (1.26s)
    addr2line_test.go:46: go tool addr2line C:\Users\WINGOP~1\AppData\Local\Temp\go-build494209374\cmd\addr2line\_test\addr2line.test.exe: exit status 1
        addr2line: reading C:\Users\WINGOP~1\AppData\Local\Temp\go-build494209374\cmd\addr2line\_test\addr2line.test.exe: runtime.pclntab and runtime.epclntab symbols must be in the same section
FAIL
FAIL    cmd/addr2line   1.302s

This just happened on a trybot: https://storage.googleapis.com/go-build-log/b75bac27/windows-amd64-gce_54631543.log (for https://go-review.googlesource.com/20038) and has happened several times in the past, sometimes in the cmd/addr2line test and sometimes in another test that invokes addr2line:

2015-10-30T22:46:39-fbf2732/windows-386-gce
2015-10-30T22:55:19-7832c82/windows-386-gce
2015-10-30T22:58:43-ce30458/windows-386-gce
2015-10-31T23:22:58-914db9f/windows-386-gce
2015-11-01T04:45:37-3d5163c/windows-386-gce
2015-11-16T21:16:51-bf5b4e7/windows-amd64-gce
2015-11-16T21:54:44-45d1c8a/windows-amd64-gce

This may be a bug in cmd/internal/objfile's PE loader, or it may be a bug in the linker. Given that addr2line clearly ran and then complained when it tried to read its own executable, I would lean toward a bug in the loader.

/cc @alexbrainman @minux

@alexbrainman
Copy link
Member

@aclements I cannot reproduce it here. Not even once. And I looked at both reading and writing code, and I don't see any possible problem. The only thing I can think of is that somehow cs.sect is not set properly here https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/pe.go#L954-L970 But I don't see how. It would be easy to fix once we can reproduce it somehow. Even having copy of failed executable will help, because we will know what the table looks like. I also don't have any good suggestions for adding debug code somewhere.

I will think some more.

Alex

@aclements
Copy link
Member Author

Yeah. I also tried to reproduce this back when there was a run of failures at the end of last October and couldn't, which indicates that it's somehow nondeterministic. Could it be something completely unrelated to the linker or loader, like we're overwriting the binary while we're in the middle of reading it?

We could have TestAddr2Line hex dump the whole binary when it fails. :) More seriously, could we expand the "must be in the same section" error to be more useful in tracking this down?

@alexbrainman
Copy link
Member

which indicates that it's somehow nondeterministic.

Yes I downloaded https://go-review.googlesource.com/#/c/20038/ (whatever patch it was that failed) and run all.bat against that. And that didn't failed.

Could it be something completely unrelated to the linker or loader, like we're overwriting the binary while we're in the middle of reading it?

I don't understand your suggestion. Tell me more.

More seriously, could we expand the "must be in the same section" error to be more useful in tracking this down?

I was thinking about that. We could print section numbers, that will be helpful. But I am worried this error message is already cryptic enough for other users.

Alex

@aclements
Copy link
Member Author

Could it be something completely unrelated to the linker or loader, like we're overwriting the binary while we're in the middle of reading it?

I don't understand your suggestion. Tell me more.

I haven't looked at the code, so it might genuinely not make sense. :) If there was a race in the build system, causing it to write twice to the same binary at the same time or read from the binary while it's being written, we could nondeterministically observe a corrupted binary. It's just a wild guess. (I know Windows has much stricter file locking semantics then POSIX, so maybe this just can't happen.)

@alexbrainman
Copy link
Member

If there was a race in the build system, causing it to write twice to the same binary at the same time

That is quite possible.

or read from the binary while it's being written, we could nondeterministically observe a corrupted binary.

The test analysies (reads) its own binary. So linker would have to be still writing into test binary, while the test binary is already running. Strange.

We did discovered some problem with linker recently. We had to add file flush here https://go-review.googlesource.com/#/c/18057/22/src/cmd/link/internal/ld/lib.go@1033 to make gcc see output properly. Perhaps there is something similar here.

Alex

@bradfitz bradfitz added this to the Unplanned milestone Apr 10, 2016
@josharian
Copy link
Contributor

It appears that CL 35114 might make this happen more reliably, if anyone wants to take another look.

@bradfitz bradfitz modified the milestones: Go1.9, Unplanned Jan 12, 2017
@alexbrainman
Copy link
Member

I tested with

git fetch https://go.googlesource.com/go refs/changes/14/35114/3 && git checkout FETCH_HEAD

I run all.bat 10 times, but no errors.

I also run

for /l %x in (1, 1, 100) do go test -short cmd/addr2line

(this runs "go test -short cmd/addr2line" 100 times), but no errors too.

All on windows/amd64.

Alex

@josharian
Copy link
Contributor

Bummer. However, it does appear to reproduce readily with gomote. This fails reliably for me:

$ git fetch https://go.googlesource.com/go refs/changes/14/35114/3 && git checkout FETCH_HEAD
$ gomote create windows-amd64-gce
$ gomote push user-josharian-windows-amd64-gce-0
$ gomote run user-josharian-windows-amd64-gce-0 go/src/all.bat

@alexbrainman
Copy link
Member

However, it does appear to reproduce readily with gomote.

Lucky you. How do I get access to it too?

I cannot think of anything simple for you to try. Maybe send me a copy of testaddr2line.exe file from tmp directory (created during that test) - you would have to comment "defer os.RemoveAll(tmpDir)" here https://github.com/golang/go/blob/master/src/cmd/addr2line/addr2line_test.go#L98 before running that test. Thank you.

Alex

@alexbrainman
Copy link
Member

Thanks to @josharian I was able to reproduce the error with gomote.

I tried to change test a little, to debug this, but the problem would immediately go away. But if I revert my changes, the problem would come back. So I decided, perhaps test executable is not right. I managed to build cmd/addr2line executable (go test -c cmd/addr2line) and copy it from remote machine onto my computer. Looking at executable symbol table:

$ objdump -t /tmp/a | grep pcln
[2011](sec  1)(fl 0x00)(ty 308)(scl   2) (nx 0) 0x00000000001b9300 runtime.pclntab
[2029](sec  2)(fl 0x00)(ty 308)(scl   2) (nx 0) 0x0000000000000000 runtime.epclntab
$ objdump -h /tmp/a

/tmp/a:     file format pei-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .text         00243000  0000000000401000  0000000000401000  00000600  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE, DATA
  1 .data         0004f000  0000000000644000  0000000000644000  00243600  2**4
                  CONTENTS, ALLOC, LOAD, DATA
  2 .debug_abbrev 000000ff  00000000006b5000  00000000006b5000  00292600  2**0
                  CONTENTS, READONLY, DEBUGGING
  3 .debug_line   00023a1b  00000000006b6000  00000000006b6000  00292800  2**0
                  CONTENTS, READONLY, DEBUGGING
  4 .debug_frame  000210a4  00000000006da000  00000000006da000  002b6400  2**0
                  CONTENTS, READONLY, DEBUGGING
  5 .debug_pubnames 00010053  00000000006fc000  00000000006fc000  002d7600  2**0
                  CONTENTS, READONLY, DEBUGGING
  6 .debug_pubtypes 000119a2  000000000070d000  000000000070d000  002e7800  2**0
                  CONTENTS, READONLY, DEBUGGING
  7 .debug_aranges 00000030  000000000071f000  000000000071f000  002f9200  2**0
                  CONTENTS, READONLY, DEBUGGING
  8 .debug_gdb_scripts 0000002a  0000000000720000  0000000000720000  002f9400  2**0
                  CONTENTS, READONLY, DEBUGGING
  9 .debug_info   0008e373  0000000000721000  0000000000721000  002f9600  2**0
                  CONTENTS, READONLY, DEBUGGING
 10 .idata        0000044c  00000000007b0000  00000000007b0000  00387a00  2**2
                  CONTENTS, ALLOC, LOAD, DATA
 11 .symtab       00039155  00000000007b1000  00000000007b1000  00388000  2**2
                  CONTENTS, READONLY
$

I can see that runtime.epclntab does indeed points to .data section, not .text. And runtime.epclntab symbol value is 0.

Looking at https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/pe.go#L956, there is a note

// Note: although address of runtime.edata (type SDATA) is at the start of .bss section
// it still belongs to the .data section, not the .bss section.

The linker takes special care of handling runtime.edata. I think we should do the same for runtime.epclntab.

The error only happens when condition at https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/pe.go#L961 is true because uint64(s.Value) == Segdata.Vaddr. So it is pretty rare. And would be difficult to reproduce, because everyone's installation is little bit different.

I can make the change, but I am not sure how to test this. Suggestions are welcome.

Maybe I am wrong about all this.

Alex

@aclements
Copy link
Member Author

Great find @alexbrainman! This may be easiest to fix in the linker, but another possibility would be to change the meaning of the e* symbols. These are very C-like, since they point to one byte past the end of their sections, which isn't generally legal in Go. We could instead make them point to the last byte of their sections, or record the lengths of these sections rather than their ends. Perhaps we could even lay these out as slices.

@alexbrainman
Copy link
Member

... We could instead make them point to the last byte of their sections, or record the lengths of these sections rather than their ends. Perhaps we could even lay these out as slices.

If we change runtime.epclntab, we would have to adjust Go tools (nm, addr2line, objdump, pprof), see issue #18416 for an example of what happens when we change runtime.epclntab. Maybe others use these symbols as we do.

I will fix the linker once go1.9 opens. Feel free to do whatever you like on top of that.

Alex

@gopherbot
Copy link

CL https://golang.org/cl/35272 mentions this issue.

@alexbrainman
Copy link
Member

CL 35272 (that "fixed" this issue) assumed that runtime.epclntab type is STEXT. But runtime.epclntab type is actually SRODATA. So this must be still broken.

Discovered while investigating #20183 (comment)

Alex

@alexbrainman alexbrainman reopened this May 2, 2017
@gopherbot
Copy link

CL https://golang.org/cl/42390 mentions this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants