New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/link: build error with cgo in Windows, redefinition of go.map.zero #47185
Comments
Thanks for report. Could you paste the build command you run and the output of |
|
I'm working in reproducible code. I will share it soon. |
Thanks. I'll look into it. Also cc @thanm , if you happen to have some recent knowledge of redefinition of of go.map.zero. |
https://go-review.googlesource.com/c/go/+/326211 seems like a possible culprit? Without having a reproducer it's hard to tell though. One way to tell would be to see if the same program builds ok with Go 1.16.5, since CL 326211 is only in the most recent. |
From Go 1.13.5 to Go 1.16.5 builds fine. The reproducible code: https://github.com/xhit/go-issue-47185 |
Thanks. I'll take a look this morning. |
@xhit Thanks for reporting this problem. It does indeed look as though the failure is with CL 326211; this fix isn't doing quite what it should. Here is a detailed breakdown of what's going on, first recap of sorts for posterity: The Go compiler + runtime implementation of maps utilitizes a "map zero" symbol in cases where a map lookup fails and we need a zero'd symbol whose size matches that of the map value type. Example:
When compiling package p1, the compiler (behind the scenes) generates a read-only data symbol "go.map.zero" of whose size is equal to the value type of the map (here "A"), so that the map lookup can read from it in the case that a given key is not in the map (as in "x := m[0]"). Similar thing happens when package p2 is compiled (another go.map.zero symbol is generated), but with a larger size, since type "C" takes up more size. At link time what's supposed to happen is that the Go linker selects the single largest "go.map.zero" symbol (from all the incoming packages) and then rewrites all "go.map.zero" sym references to point to that symbol. When we modernized the linker in the Go 1.15/1.16 timeframe, we wound up introducing a bug such that instead of selecting the largest go.map.zero, the linker would just pick an arbitrary instance. This bug was reported in issue #46653, the fix for which was CL 326211. Turns out that the fix was buggy-- the linker was supposed to be selecting a single symbol, but instead it was carrying along all of the symbols. What appears to be happening on Linux is that the external linker doesn't seem to have a problem with the duplicates, whereas on Windows we get an error. The original intent (as mentioned in the CL 326211 commit message) was that the larger symbol would be selected as result of this code here: however this was a misreading of the code -- while it looks as though it should cause the larger symbol to be selected, the blob in question only applies to hashed64 symbols and not regular hashed symbols. Although it appears that it runs for regular hashed symbols, it will never kick in due to this code in the object file writer: which includes the size of the symbol in the hash (resulting in different hashes, hence we'll never get to the code above). So: what happens on Linux is that we wind up with duplicate go.map.zero symbols; the linux linker doesn't seem to be bothered by the fact that we have two symbols with the same name. Example:
There is a final additional aggravating factor: in the regression test added for issue 46653, only one of the map.zero symbols turns out to be actually reachable (which is all you need it turns out). So this means that that test won't trigger the duplicate definition on Windows, which is why we didn't see this from the get-go ... sigh. I think the right fix is to basically back out the change to convert go.map.zero to content-addressible and just use the same logic that was used in the fix that got back-ported to the 1.15 release. This will also have the side effect of producing less RODATA overall on linux. Cherry let me know what you think about this and/or if you can think of a better fix. |
Thanks for the repro and the analysis. Fixing the DUPOK symbol size using 1.15 logic SGTM. I was thinking about it yesterday (without looking at code). Do/did we generate static symbols for content addressable symbols in the go.o for external linking? If so, duplicates should be okay. Maybe this is what we do for ELF but not PE? Maybe we should do it everywhere (if not already). DUPOK symbols with different sizes are still concerning. I think it only makes sense if the extra bytes are all zeros. Is go.map.zero the only case that needs this logic? If so, I'd like to rethink it and use some better mechanism for long term. That said, for short term (1.16 and 1.17), restoring the 1.15 logic is probably the best. Thanks. |
Yes, symbols are static for ELF right now and I agree that going forward we should make sure they are static elsewhere (mac, windows, etc).
My recollection is that we still have some DUPOK collisions where the payload is different, unfortunately (I seem to remember some cases with DWARF where this happens, due to slightly different FE source position handling in different packages). If we could clean this up it would be great. |
Change https://golang.org/cl/334930 mentions this issue: |
@gopherbot please open a backport for this to Go 1.16 release. This fixes a bug introduced in the previous point release. |
Backport issue(s) opened: #47289 (for 1.16). Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases. |
No 1.15 backport, because it uses a different code path which does not have the issue. |
Change https://golang.org/cl/335629 mentions this issue: |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
First, sorry for bad english. I'm still learning.
I'm trying to build Dixer ETL with go1.16.6 with SQLite3 driver github.com/mattn/go-sqlite3 (also with Godror github.com/godror/godror and DB2 driver github.com/ibmdb/go_ibm_db one by one) and get this error:
I'm trying to get a sample code but I can't replicate it yet, but I can reproduce it in the project easily.
I'm using Windows 10 with gcc in MSYS2. Same project can build without problems with go1.16.5
To get the build error, I only need to catch the error from a function, sample code:
The
GetStringVarValue
function in exprfunc package reduced to get the error is:The success build is when I ignore the error. Using first code block is possible build with this:
I reduced to minimal the project but I only catch the error in main project. So the success build I can only get ignoring the error. I can share the code reduced to minimal to provide the project struct if needed despite build is success.
The build error only happens when I load a package that require CGO enabled and only happens in Windows amd64. I can build successfully in Linux amd64, arm64 and macOS amd64.
A simple
go build
trigger the error.I though the building machine have problems, so I reinstall in another virtual machine and a laptop. Same error.
What did you expect to see?
Build success.
What did you see instead?
The text was updated successfully, but these errors were encountered: