New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plugin: loading plugin leads to 'fatal error: invalid runtime symbol table' with some stdlib packages #18190
Comments
If I change the import from
|
I assume this is another dynamic relocation bug on macOS. @ianlancetaylor do you think we should remove the macOS plugin support from 1.8? It clearly needs more testing, and it seems a bit late in the cycle to be resolving issues like this. |
I'm fine with that. |
Is there an open issue for tracking the progress of this feature in |
@kris-nova you can create one if you like. This is just a spare time project for me, so I'm not generating much of a paper trail. The program described above is failing executing the init functions of the plugin. I got lucky experimenting with it and extracted this partial stacktrace:
Indeed, an empty plugin doing nothing other than |
Here's a simplified plugin that produces this error on plugin.Open:
|
Followed a few false leads. A simpler example. If the host program calls plugin.Open then executes the symbol F, then this is all that's necessary to see a failure:
|
Checkpoint of what I've learned so far. When gentraceback is called, it finds a *_func using a PC from one module, but whose entry field is from another module:
The result is that the frame.pc cannot be used as a targetpc when using the pclntable derived from f.entry. I believe this is cased by the fact that *_func entry is filled by an R_ADDR which the linker turns into a dynamic relocation, so for a runtime function it points to the original module. But the second module, from the plugin, also has the runtime functions, and somewhere there's a direct CALL to that version of that function. We don't see this on linux because all appropriate function calls go through the PLT/GOT. We are missing such function call when GOOS=darwin. |
CL https://golang.org/cl/34196 mentions this issue. |
CL 34196 fixes the problem I described above, but it exposes another bug executing
The busted function name suggests another relocation has gone wrong. I'm wondering about |
The pclntable contains pointers to functions. If the function symbol is exported in a plugin, and there is a matching symbol in the host binary, then the pclntable of a plugin ends up pointing at the function in the host module. This doesn't work because the traceback code expects the pointer to be in the same module space as the PC value. So don't export functions that might overlap with the host binary. This way the pointer stays in its module. Updates #18190 Change-Id: Ifb77605b35fb0a1e7edeecfd22b1e335ed4bb392 Reviewed-on: https://go-review.googlesource.com/34196 Run-TryBot: David Crawshaw <crawshaw@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Found the problem. After applying CL 34196, there are still a handful of ftab entries in the plugin pclntable whose entry relocations resolve to the version in the host binary. Specifically, they are the exported C symbols defined in runtime and runtime/cgo: These cause a problem when tracing back through the stack because the linear search logic in runtime.findfunc gets caught on them:
The right solution is to not include the runtime in the plugin (#17150). A smaller, potentially-not-incorrect solution is to avoid exporting C symbols from plugins. I'm going to see if that breaks anything. |
CL https://golang.org/cl/34199 mentions this issue. |
@crawshaw after pulling https://golang.org/cl/34199 I no longer got
should I report it in separate issue? |
This one is fine. What program are you running? |
I'm running tests in go-bind-plugin (https://github.com/wendigo/go-bind-plugin) under macOS:
ends with (when pulled https://golang.org/cl/34199):
Current tip without CL 34199 passes both on linux and macOS: https://travis-ci.org/wendigo/go-bind-plugin |
That's a lot of code. Can you extract a smaller reproduction? |
Unfortunately it fails when running the tests - I can't extract smaller example. It won't fail either when |
Lots of meddling with the debugging in plugin.open eventually let me replicate the failure in a simpler plugin. With stackDebug = 4, it very much looks like a non-pointer value in a stack pointer slot during copystack. The stack map itself looks reasonable. It very suspiciously always happens in plugin.open, the first function in the stack frame that comes from the other (host) module. No clear idea yet what's happening. But now the plugin module does pass moduledataverify, which is nice. |
It looks like the stack map is wrong. What it thinks is a pointer slot is not one. I assume this has something to do with two copies of the runtime existing simultaneously. I spent a while trying to work out which part, to no luck. I looked into removing the runtime from the plugin on macOS, but this requires dynamic relocations to several runtime functions and values (like duffcopy and algarray) which the linker does not know how to generate. So as I'm not going to teach the linker how mach-o relocations are encoded by 1.8, I'll remove darwin support for plugins. |
Sad to read that @crawshaw |
Explicitly filter any C-only cgo functions out of pclntable, which allows them to be duplicated with the host binary. Updates #18190. Change-Id: I50d8706777a6133b3e95f696bc0bc586b84faa9e Reviewed-on: https://go-review.googlesource.com/34199 Reviewed-by: Ian Lance Taylor <iant@golang.org>
CL https://golang.org/cl/34391 mentions this issue. |
We are seeing a bad stack map in #18190. In a copystack, it is mistaking a slot for a pointer. Presumably this is caused either by our fledgling dynlink support on darwin, or a consequence of having two copies of the runtime in the process. But I have been unable to work out which in the 1.8 window, so pushing darwin support to 1.9 or later. Change-Id: I7fa4d2dede75033d9a428f24c1837a4613bd2639 Reviewed-on: https://go-review.googlesource.com/34391 Reviewed-by: Ian Lance Taylor <iant@golang.org>
I just hit this same error with go1.8beta2 but on linux/amd64 rather than darwin: # plugin.go
package main
func Greet() string {
return "Hello world"
} # main.go
package main
import (
"fmt"
"plugin"
)
func main() {
p, err := plugin.Open("plugin.so")
if err != nil {
panic(err)
}
greetSymbol, err := p.Lookup("Greet")
if err != nil {
panic(err)
}
greet := greetSymbol.(func() string)
fmt.Println(greet())
} Building the plugin works fine:
Attempting to build and run the main program fails with
|
@zoni it seems you've forgotten the |
Right you are, I did! I completely overlooked the fact that was necessary, importing |
same error
env
main.go package main
import (
"C"
"fmt"
"plugin"
)
func main() {
p, err := plugin.Open("./myplugin.so")
failOnError(err)
add, err := p.Lookup("Add")
failOnError(err)
sum := add.(func(int, int) int)(1, 2)
fmt.Println(sum)
}
func failOnError(err error) {
if err != nil {
panic(err)
}
} myplugin.go package main
func Add(x, y int) int {
return x+y
} |
Those "invalid symbols table" errors in 1.8rc1 on linux were fixed in golang.org/cl/35190, and so they should be fixed by the coming 1.8rc2. If you have the time, please test against HEAD. I'm keeping this bug open for darwin on 1.9. |
Anything I can do to help? |
I'd like to take a look (not sure how much help I can actually be), but I'm not sure where in the codebase I should start looking. If anyone can give me a general idea as to where the code that handles this lives, that would be awesome. edit |
I just finished investigation. I'm going to send CLs. Thanks. |
Change https://golang.org/cl/59370 mentions this issue: |
Change https://golang.org/cl/59372 mentions this issue: |
Change https://golang.org/cl/59373 mentions this issue: |
Change https://golang.org/cl/59417 mentions this issue: |
Without this CL, the system linker complains about absolute addressing in type..eqfunc.*. Updates #18190 Change-Id: I68db37a7f4c96b16a9c13baffc0f043a3048df6d Reviewed-on: https://go-review.googlesource.com/59373 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
* extract pkgname() and findlib() from the function for #18190. * rename const pkgname to const pkgdef to avoid confliction. Change-Id: Ie62509bfbddcf19cf92b5b12b598679a069e6e74 Reviewed-on: https://go-review.googlesource.com/59417 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
x.c #include <stdio.h>
void foo(void) {
puts("x");
}
void x() {
foo();
} y.c #include <stdio.h>
void foo(void) {
puts("y");
}
void y() {
foo();
} c.c #include <stdio.h>
void x(void);
void y(void);
int main(void) {
x();
y();
return 0;
}
I thought I was misunderstanding. At least, this is the root cause of pclntab. |
plugins allows to update data which don't exist at compile time. |
Change https://golang.org/cl/61091 mentions this issue: |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?What operating system and processor architecture are you using (
go env
)?What did you do?
pg.go
attempted builds (same error for each):
go build --buildmode=plugin pg.go
go build --gcflags "-dynlink" --buildmode=plugin pg.go
x.tpl
main.go
attempted builds (same error for each):
go build main.go && ./main
go build --gcflags "-dynlink" main.go && ./main
note
This program works correctly when you change pg.go to:
What did you expect to see?
terminal:
browser:
What did you see instead?
The text was updated successfully, but these errors were encountered: