-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/link: go tool dist test testshared
failed if linked with lld or mold
#46560
Comments
I'll take a look. |
My first guess would be that mold/lld don't implement the special treatment of sections named |
go tool dist test testshared
failed if linked with lld or moldgo tool dist test testshared
failed if linked with lld or mold
@ianlancetaylor could you please expand a little on what you meant by "special treatment of scetions named .init_array"? I compared the .init_array sections and related symbols in the good/bad objdump output and I don't see anything that looks obviously wrong. |
Both lld and mold creates DT_INIT, DT_INIT_ARRAY and DT_INIT_ARRAYSZ fields in the dynamic section, and I believe that's all we have to do for .init_array. lld is now fairly battle-tested, as it's been the standard linker for FreeBSD/AMD64 and other platforms, so I don't think there's a big missing feature in lld. What am I missing |
@thanm The special treatment is what @rui314 mentions: sections named This could potentially be related to section layout. The Go runtime relies on a I don't have any other ideas yet. |
Thanks -- intermixing sounds like a good theory; I'll pursue that. |
I can definitely reproduce this problem on tip. I've concocted a smaller stand-alone test case (attached) since it's kind of difficult to work with the dist test version. In the Go code for the testcase, there is this code:
Spent lots of time inspecting the shared library for "p", but as far as I can tell it look ok (maybe missed something). According to the objdump output, the offending symbol ("main.xyzabc") is indeed placed in the .bss section and not the .noptrbss section. I hacked the Go runtime to print out the boundaries of each mod on startup: Run of "good" executable (for this binary, "objdump -t" reports the address of main.xyzabc as 0x518620):
Run of "bad" executable (for this binary, "objdump -t" reports the address of main.xyzabc as 0x31be20):
I did some tracing in the Go linker when linking the main program; it does look as though the runtime.gcbss symbol is being generated, but when I turn on tracing I'm seeing different behavior between good + bad. Here's the good gcprog generation for the symbol:
Now here the bad:
So that may be the smoking gun? Not sure why this is happening though. From the linker trace output when linking the shared library it looks as though the type symbol in question "type.issue46560/p.T" is being mangled to "type.rb7zizEO". Looking at that symbol in the p.so shared object link, I see: Good:
Bad:
which seems to look the same (very puzzling). I'll need to spend a bit more time with the linker to understand why it is picking up the wrong type data for this symbol. |
Whoops, my eyeballs are not working properly. There is indeed a diff between the content of the two symbols at offset 32 (02200000 good vs 00000000 bad). So the question is: why is that happening? |
I think the missing values in .data.rel.ro are not a problem. .data.rel.ro has relocations of type R_X86_64_64, so that the section contains absolute addresses of the referenced symbols at runtime. The absolute addresses of the referenced symbols are not known at static-link-time, because we are linking a shared object file, and a shared object file is not guaranteed to be loaded at a fixed address in the virtual address space. Therefore, the linker emits dynamic relocations so that the section gets correct values at runtime. GNU ld write values to .data.rel.ro and create dynamic relocations for them, but the value written to .data.rel.ro will be overwritten by the dynamic linker, so the values don't matter. lld just leaves them as value zero. I think this is why some bytes in lld-generated .data.rel.ro are just zero. |
I created two executables, one is linked against mold-built libissue46560-p.so and the other is linked against BFD-built libissue46560-p.so. Note that the executables themselves are linked by BFD. Then, I copied .rodata from the working one to the failing one using objcopy, which fixed the issue. So, the data in .rodata seems to be wrong. More specifically, .rodata sections are almost identical except a few bytes, and the difference is in I'll investigate how cgo generate |
It was extremely puzzling, but I think I found the cause of the issue. It looks like there's a bug in go's linker. Here is what was happening:
So, I think the proper fix is to change |
Thanks. Yes, you made that clear already in your previous comment 8 days ago, and it is indeed evident that the problem is due to the fact that the go linker is not applying dynamic relocations to the section in question. Please bear with me while I work on this bug; I have many other demands on my time; I need to balance working on your bug with working on other bugs as well. Thank for your patience. |
@thanm Is it fixed now? |
@shmsr not fixed yet, thanks |
I might keep an eye, on this. |
What version of Go are you using (
go version
)?What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I tried to build Go with my own linker, mold (https://github.com/rui314/mold), and noticed that a CGO-related test fails only when linked with mold. The same test fails with lld. So the test seems to pass only when you are using GNU ld or GNU gold.
Specifically, this is the exact command that I can reproduce the issue on my Ubuntu 20.04 machine.
If I do not substitute the default linker with lld using
sudo ln
, the last test command succeeds. Before running the above command, please install LLVM lld 11 byapt-get install lld-11
.To restore the original ld, run
(cd /usr/bin; sudo ln -sf x86_64-linux-gnu-ld ld)
.What did you expect to see?
The test succeeds
What did you see instead?
The test fails with the following error message.
So, the test fails because Go garbage collector wrongly collects live objects.
I'm debugging the issue for two days so far without any luck. It looks like if I build all but
gopath/pkg/linux_amd64_dynlink/libtestshared-gcdata-p.so
using lld and link the particular DSO using GNU ld, the test passes. But I can't find a cause why that test dislikes lld or mold-linked shared object file. Is there any chance that CGO unnecessarily depends on GNU ld-specific section or symbol layout or something?The text was updated successfully, but these errors were encountered: