New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/addr2line: pclntab and epclntab in different sections on Windows #14710
Comments
@aclements I cannot reproduce it here. Not even once. And I looked at both reading and writing code, and I don't see any possible problem. The only thing I can think of is that somehow cs.sect is not set properly here https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/pe.go#L954-L970 But I don't see how. It would be easy to fix once we can reproduce it somehow. Even having copy of failed executable will help, because we will know what the table looks like. I also don't have any good suggestions for adding debug code somewhere. I will think some more. Alex |
Yeah. I also tried to reproduce this back when there was a run of failures at the end of last October and couldn't, which indicates that it's somehow nondeterministic. Could it be something completely unrelated to the linker or loader, like we're overwriting the binary while we're in the middle of reading it? We could have TestAddr2Line hex dump the whole binary when it fails. :) More seriously, could we expand the "must be in the same section" error to be more useful in tracking this down? |
Yes I downloaded https://go-review.googlesource.com/#/c/20038/ (whatever patch it was that failed) and run all.bat against that. And that didn't failed.
I don't understand your suggestion. Tell me more.
I was thinking about that. We could print section numbers, that will be helpful. But I am worried this error message is already cryptic enough for other users. Alex |
I haven't looked at the code, so it might genuinely not make sense. :) If there was a race in the build system, causing it to write twice to the same binary at the same time or read from the binary while it's being written, we could nondeterministically observe a corrupted binary. It's just a wild guess. (I know Windows has much stricter file locking semantics then POSIX, so maybe this just can't happen.) |
That is quite possible.
The test analysies (reads) its own binary. So linker would have to be still writing into test binary, while the test binary is already running. Strange. We did discovered some problem with linker recently. We had to add file flush here https://go-review.googlesource.com/#/c/18057/22/src/cmd/link/internal/ld/lib.go@1033 to make gcc see output properly. Perhaps there is something similar here. Alex |
It appears that CL 35114 might make this happen more reliably, if anyone wants to take another look. |
I tested with git fetch https://go.googlesource.com/go refs/changes/14/35114/3 && git checkout FETCH_HEAD I run all.bat 10 times, but no errors. I also run for /l %x in (1, 1, 100) do go test -short cmd/addr2line (this runs "go test -short cmd/addr2line" 100 times), but no errors too. All on windows/amd64. Alex |
Bummer. However, it does appear to reproduce readily with gomote. This fails reliably for me: $ git fetch https://go.googlesource.com/go refs/changes/14/35114/3 && git checkout FETCH_HEAD
$ gomote create windows-amd64-gce
$ gomote push user-josharian-windows-amd64-gce-0
$ gomote run user-josharian-windows-amd64-gce-0 go/src/all.bat |
Lucky you. How do I get access to it too? I cannot think of anything simple for you to try. Maybe send me a copy of testaddr2line.exe file from tmp directory (created during that test) - you would have to comment "defer os.RemoveAll(tmpDir)" here https://github.com/golang/go/blob/master/src/cmd/addr2line/addr2line_test.go#L98 before running that test. Thank you. Alex |
Thanks to @josharian I was able to reproduce the error with gomote. I tried to change test a little, to debug this, but the problem would immediately go away. But if I revert my changes, the problem would come back. So I decided, perhaps test executable is not right. I managed to build cmd/addr2line executable (go test -c cmd/addr2line) and copy it from remote machine onto my computer. Looking at executable symbol table:
I can see that runtime.epclntab does indeed points to .data section, not .text. And runtime.epclntab symbol value is 0. Looking at https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/pe.go#L956, there is a note
The linker takes special care of handling runtime.edata. I think we should do the same for runtime.epclntab. The error only happens when condition at https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/pe.go#L961 is true because uint64(s.Value) == Segdata.Vaddr. So it is pretty rare. And would be difficult to reproduce, because everyone's installation is little bit different. I can make the change, but I am not sure how to test this. Suggestions are welcome. Maybe I am wrong about all this. Alex |
Great find @alexbrainman! This may be easiest to fix in the linker, but another possibility would be to change the meaning of the e* symbols. These are very C-like, since they point to one byte past the end of their sections, which isn't generally legal in Go. We could instead make them point to the last byte of their sections, or record the lengths of these sections rather than their ends. Perhaps we could even lay these out as slices. |
If we change runtime.epclntab, we would have to adjust Go tools (nm, addr2line, objdump, pprof), see issue #18416 for an example of what happens when we change runtime.epclntab. Maybe others use these symbols as we do. I will fix the linker once go1.9 opens. Feel free to do whatever you like on top of that. Alex |
CL https://golang.org/cl/35272 mentions this issue. |
CL 35272 (that "fixed" this issue) assumed that runtime.epclntab type is STEXT. But runtime.epclntab type is actually SRODATA. So this must be still broken. Discovered while investigating #20183 (comment) Alex |
CL https://golang.org/cl/42390 mentions this issue. |
Very occasionally addr2line fails on Windows with an error like:
This just happened on a trybot: https://storage.googleapis.com/go-build-log/b75bac27/windows-amd64-gce_54631543.log (for https://go-review.googlesource.com/20038) and has happened several times in the past, sometimes in the cmd/addr2line test and sometimes in another test that invokes addr2line:
2015-10-30T22:46:39-fbf2732/windows-386-gce
2015-10-30T22:55:19-7832c82/windows-386-gce
2015-10-30T22:58:43-ce30458/windows-386-gce
2015-10-31T23:22:58-914db9f/windows-386-gce
2015-11-01T04:45:37-3d5163c/windows-386-gce
2015-11-16T21:16:51-bf5b4e7/windows-amd64-gce
2015-11-16T21:54:44-45d1c8a/windows-amd64-gce
This may be a bug in cmd/internal/objfile's PE loader, or it may be a bug in the linker. Given that addr2line clearly ran and then complained when it tried to read its own executable, I would lean toward a bug in the loader.
/cc @alexbrainman @minux
The text was updated successfully, but these errors were encountered: