Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc/cgo/testplugin: broken on linux/386 builder by CL 75631 #22571

Closed
rsc opened this issue Nov 3, 2017 · 11 comments
Closed

misc/cgo/testplugin: broken on linux/386 builder by CL 75631 #22571

rsc opened this issue Nov 3, 2017 · 11 comments
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Nov 3, 2017

CL 75631, which has basically nothing to do with anything related to plugin support,
causes reliable failures in testplugin on the linux/386 trybot, but not on my workstation
using GOARCH=386 GOHOSTARCH=386.

I can't see what in the CL could possibly cause this, and I can't see why the test would
behave differently locally vs on the trybot.

I'm debugging on CL 75910, but the turnaround is about 10 minutes to add a print statement.

I'm going to disable it for now, so that I can get the other CLs in, and then send a fix
and reenable once I've debugged it.

/cc @crawshaw

@rsc rsc added this to the Go1.10 milestone Nov 3, 2017
@bradfitz
Copy link
Contributor

bradfitz commented Nov 3, 2017

Using gomote would provide better debugging latency.

@rsc
Copy link
Contributor Author

rsc commented Nov 3, 2017

I thought it was something about the way the builders run the tests - we've had a bunch of those recently - and that gomote wouldn't help. But maybe it would. I don't know.

@rsc
Copy link
Contributor Author

rsc commented Nov 3, 2017

I disabled 386 and then arm started failing, and I disabled arm and then amd64 failed. So I gave up and disabled this test entirely.

@rsc
Copy link
Contributor Author

rsc commented Nov 7, 2017

I do not yet understand why this is a flaky failure instead of a 100% failure, but this is what is failing:

GOPATH=$(pwd) go build -gcflags "$GO_GCFLAGS" -buildmode=plugin iface_a  # (1)
GOPATH=$(pwd) go build -gcflags "$GO_GCFLAGS" -buildmode=plugin iface_b  # (2)
GOPATH=$(pwd) go build -gcflags "$GO_GCFLAGS" iface  # (3)
LD_LIBRARY_PATH=$(pwd) ./iface

This is a typical failure:

./iface
2017/11/03 20:26:27 plugin.Open("iface_a.so"): plugin.Open("iface_a"): plugin was built with a different version of package iface_i

As I understand it, this message is saying that build (1) and build (3) built different versions of package iface_i. That's actually true, apparently 100% of the time. The builds of iface_a and iface_b build iface_i with -dynlink passed to the compiler. The build of iface builds iface_i without -dynlink passed to the compiler. That manifests as a different build ID, which is stored in the export data, so you'd think it would cause a version mismatch all the time.

Mismatch or not, isn't it bad that the two builds (1) and (3) disagree about the compiler flags for iface_i? Wouldn't that cause potential real problems? Should the final build be using some -buildmode to signal that it will load plugins? Or -gcflags=-dynlink but that seems too low-level.

/cc @crawshaw @ianlancetaylor

@rsc
Copy link
Contributor Author

rsc commented Nov 7, 2017

Build ID is not included in the hash in question, so that's not it. I still don't understand why CL 75631 was the one that caused the problems either. I could see the problems happening earlier when we started rebuilding more often, because we can see that an installed iface_i with -dynlink does not satisfy a need for one without -dynlink. But that happened before CL 75631. Still debugging via builder - no luck on gomotes - but at least the builders are only running testplugin, not the full suite.

@crawshaw
Copy link
Member

crawshaw commented Nov 7, 2017

This error almost certainly means the export data for the two versions of of iface_i are different. Can you get a copy of them so we can compare them?

Are the paths to the cache files leaking into the export data? (The cache paths contain the build ID, right?)

Let's put aside the issue of whether two builds of a package, one with -dynlink, one without, could cause problems. I've tried starting a conversation about that before, it didn't go well. And Ian really didn't like my invasive CL to fix it. And I have yet to find a concrete example of how the two different versions cause problems.

@rsc
Copy link
Contributor Author

rsc commented Nov 7, 2017

There are no obvious paths in the export data. I added a print to get the exact data being hashed but so far have not gotten it to reproduce yet throwing the CL at trybots.

@rsc
Copy link
Contributor Author

rsc commented Nov 7, 2017

I do remember the "hard-code -dynlink on all the time" suggestion. I don't want to do that either. As long as the export data is the same it should be OK. As a side note, I don't think the linker should be the one hashing the export data. The compiler should be in charge of generating the export hash. But that doesn't matter at the moment.

@rsc
Copy link
Contributor Author

rsc commented Nov 7, 2017

I was wrong. There's a file name in the exported position information. Not sure yet why it's changing but that's the root cause.

@rsc
Copy link
Contributor Author

rsc commented Nov 7, 2017

OK, this is actually pretty cool. The iface_i package in testshared and in testplugin are the same code, and until yesterday we did not include the path to the directory containing the code in the action ID. So if testshared managed to run and build iface_i fast enough, the final step of testplugin would use the cached iface_i build from testshared instead of doing its own. The only thing that would be different is the file paths, and file paths are now recorded in the export data (for line number reporting in inlined functions), so they affect the hash.

The earlier steps of testplugin did not use the cached copy because they were building with -dynlink.

I fixed this yesterday by putting p.Dir into the hash for non-GOROOT builds (but was still debugging the original failure CL without that fix). For GOROOT builds we do not include any directory in the hash but probably should still store the path suffix within $GOROOT. Right now if you put the same code into $GOROOT/src/p1 and $GOROOT/src/p2, the two would share a cache entry, which would not be correct since the file names in the two object files would be different.

@gopherbot
Copy link

Change https://golang.org/cl/76371 mentions this issue: misc/cgo/testplugin: unskip test

@golang golang locked and limited conversation to collaborators Dec 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants