-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reflect: duplicate reflect.Type values under dynamic linking #18729
Comments
/cc @crawshaw @randall77 |
This is probably a reflect bug of some sort meaning a message is being encoded incorrectly. While it's a nice short test case (thanks!) it's tricky to pull apart because the "bad record MAC" is an error coming back from the server. I'm going to see if I can get some of the cyrpto/tls unit tests running inside a plugin and hope those narrow it down. If they don't, does anyone have a TLS server designed for debugging? (A TLS-aware equivalent of tcpdump?) |
The same error occurs with
After many more similar failures it locks up. Building with -c and sending it a SIGQUIT show it stuck in a way that suggests one end of the client/server has locked up. EDIT: I'm omitting the stack dump but it's available if anyone wants it. I suspect going after the unit test failures will be easier than the lockup, so I'll spend some time today trying to simplify one of them. cc @mwhudson as this now affects shared libraries. Would be nice if we had a -linkshared builder to have caught this earlier. |
Ah ha, I'm going to start here:
Oddly if I try to run go test -short -linkshared encoding/asn1 directly, it hangs, but in this form, the failure is consistent. |
I had forgotten about that bug. Presumably the fix for #18252 wasn't enough (which has been my working assumption so far, that there's a reflect/type-info bug). I'll dig around. |
Simplified test case:
At tip:
|
In the above example, both the binary and libstd.so contain type symbols for *[]int, as you would expect. But the program does not contain any dynamic relocations for it:
@mwhudson I think this is basically the eface version of the itab problem from #18252. Shouldn't there be a dynamic relocation for the *rtype in the binary prg? |
Well no, because prg is an executable so the system linker can assume no other type.*[]int symbol will be interposed in front of the one in prg. Adding
to prg.go gives as output:
d looks correct to my level of understanding, 0x602940 is the address of type.[]int in the executable. 0x7f84580f66a0 is the address of type.[]int in the library though, which is wrong. I don't understand right now how v got constructed, but I guess if this address was computed by adding some offsets this might have happened. (The bug is present in 1.7 but not 1.6 which lends support to this theory). Have we gotten @ianlancetaylor to look at this bug yet? |
Ah of course, thanks, I was inverting the libstd.so and prg modules in my mind. That gives me a new place to hunt. |
The This is because libstd.so is listed first the list of modules. I instrumented typelinksinit and firstmoduledata is libstd.so, with firstmoduledata.next being the program binary module. That's wrong. We want the linked list of modules to have same load order the dynamic linker uses. But the runtime package is defined in libstd.so, and that's where firstmoduledata is defined and first used. Problematically it's backwards in plugins. The runtime package is always in the initial binary, so firstmoduledata is the initial program. (I feel like I've run into this before, and I thought module load order was solved. Clearly it's not.) We actually have a slice of active modules now, independent from the linked list. So if we can come up with a way to order the modules correctly, we can reorder in the slice before typelinksinit. One hack I'm considering: sort the modules from lowest to highest values of md.text. From what I've seen in my printing on linux/glibc and macOS that matches what the dynamic linker does. (But I don't know it for a fact.) |
On 25 January 2017 at 16:25, David Crawshaw ***@***.***> wrote:
The reflect.New function takes a type []int and returns a value wrapping
a *[]int. It gets to the pointer using the ptrToThis field. This is
resolved via the moduledata typemap, and it resolves to the wrong module.
That is, it points to the *[]int symbol that comes from libstd.so, not
the program binary.
This is because libstd.so is listed first the list of modules. I
instrumented typelinksinit and firstmoduledata is libstd.so, with
firstmoduledata.next being the program binary module.
That's wrong. We want the linked list of modules to have same load order
the dynamic linker uses.
Ah.
But the runtime package is defined in libstd.so, and that's where
firstmoduledata is defined and first used.
Problematically it's backwards in plugins. The runtime package is always in
the initial binary, so firstmoduledata is the initial program.
Starting with firstmoduledata and walking next pointers is almost right,
isn't it? The only problem is that in the linkshared case libstd.so is at
the front when it should be .. uh, not at the front. (probably it will
always be second?).
Could we make it right by having the moduledata for the module containing
the runtime added by an initarray entry as well? I guess you'd need to do
something different in a static binary which is kinda horrid.
(I feel like I've run into this before, and I thought module load order
was solved. Clearly it's not.)
We actually have a slice of active modules now, independent from the
linked list. So if we can come up with a way to order the modules
correctly, we can reorder in the slice before typelinksinit.
Having the module containing runtime do _something_ in an initarray feels
most reliable here (maybe something separate to maintaining the moduledata
linked list) somehow.
One hack I'm considering: sort the modules from lowest to highest values
of md.text. From what I've seen in my printing on linux/glibc and macOS
that matches what the dynamic linker does. (But I don't know it for a fact.)
Yikes! Please don't do that without getting written confirmation from LKML
that it's safe first :)
|
A less evil hack would be to swap the position of firstmoduledata with whatever module contains the main_main symbol. That would be correct in all practical situations we will see with 1.8 (libstd.so, and plugins). It will eventually have to be revisited, if someone builds a complex enough loading scenario. WDYT? |
(At this point I'm looking for something small enough to go into 1.8. I agree going after initarray is more principled, it just sounds a bit hairy to me for a close-to-release change, because of that asm blob in the linker.) |
Of course, this bug was originally about a plugin failure. Runtime module ordering does not explain why the type would be wrong in a plugin. Either this diagnosis is wrong, or the failing test case I extracted is for a different bug. |
CL https://golang.org/cl/35644 mentions this issue. |
go test -linkshared seems to be pretty broken right now (sadface) but doing things by hand suggests that go test -linkshared crypto/tls is still broken with that CL (but encoding/asn1 is indeed fixed). |
Modules appear in the moduledata linked list in the order they are loaded by the dynamic loader, with one exception: the firstmoduledata itself the module that contains the runtime. This is not always the first module (when using -buildmode=shared, it is typically libstd.so, the second module). The order matters for typelinksinit, so we swap the first module with whatever module contains the main function. Updates #18729 This fixes the test case extracted with -linkshared, and now go test -linkshared encoding/... passes. However the original issue about a plugin failure is not yet fixed. Change-Id: I9f399ecc3518e22e6b0a350358e90b0baa44ac96 Reviewed-on: https://go-review.googlesource.com/35644 Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Michael Hudson-Doyle <michael.hudson@canonical.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Now there are a couple of bugs here, and I'm not sure how to separate them into issues. I'll put a bit of time tomorrow into poking at it all tomorrow. |
I hacked up the crypto/tls, encoding/asn1, and reflect unit tests to run as a plugin. The tls tests indeed fail just as this example fails. Unfortunately they are not easy things to debug. The asn1 and reflect tests all pass except for this interesting one:
I'll work on fixing that, then try hacking up the other encoding tests to run under a plugin. |
Actually that typelinks error is just a bad unit test, it cannot handle multiple modules. Onward. |
More interesting, from vendor/golang_org/x/crypto/curve25519:
I'm guessing some of the amd64 asm in this package is trying to hold CX across a load, and when dynamic linking for -buildmode=plugin or -buildmode=shared, it doesn't work. If so, this could be the encoding/crypto hang @mwhudson mentioned above. |
@crawshaw Are you still looking at this? |
I believe the only bug remaining here is #18820, which has been split out. There also needs to be better testing, which I split out as #18814. As this issue ranged across several problems, and I believe everything left is split out into those two, I'm going to close this issue. @fsenart, please follow #18820 for a resolution to your original issue. |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go version go1.8rc2 linux/amd64
What operating system and processor architecture are you using (
go env
)?GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/fsenart/projects"
GORACE=""
GOROOT="/home/fsenart/tools/go1.8rc2"
GOTOOLDIR="/home/fsenart/tools/go1.8rc2/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build840231100=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
What did you do?
If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.
What did you expect to see?
I don't expect any panic to happen.
What did you see instead?
A panic happens when requesting https://google.com while all works as expected when requesting https://amazon.com.
Observations
The text was updated successfully, but these errors were encountered: