New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
misc/cgo/testplugin: broken on linux/386 builder by CL 75631 #22571
Comments
Using |
I thought it was something about the way the builders run the tests - we've had a bunch of those recently - and that gomote wouldn't help. But maybe it would. I don't know. |
I disabled 386 and then arm started failing, and I disabled arm and then amd64 failed. So I gave up and disabled this test entirely. |
I do not yet understand why this is a flaky failure instead of a 100% failure, but this is what is failing:
This is a typical failure:
As I understand it, this message is saying that build (1) and build (3) built different versions of package iface_i. That's actually true, apparently 100% of the time. The builds of iface_a and iface_b build iface_i with -dynlink passed to the compiler. The build of iface builds iface_i without -dynlink passed to the compiler. That manifests as a different build ID, which is stored in the export data, so you'd think it would cause a version mismatch all the time. Mismatch or not, isn't it bad that the two builds (1) and (3) disagree about the compiler flags for iface_i? Wouldn't that cause potential real problems? Should the final build be using some -buildmode to signal that it will load plugins? Or -gcflags=-dynlink but that seems too low-level. |
Build ID is not included in the hash in question, so that's not it. I still don't understand why CL 75631 was the one that caused the problems either. I could see the problems happening earlier when we started rebuilding more often, because we can see that an installed iface_i with -dynlink does not satisfy a need for one without -dynlink. But that happened before CL 75631. Still debugging via builder - no luck on gomotes - but at least the builders are only running testplugin, not the full suite. |
This error almost certainly means the export data for the two versions of of Are the paths to the cache files leaking into the export data? (The cache paths contain the build ID, right?) Let's put aside the issue of whether two builds of a package, one with |
There are no obvious paths in the export data. I added a print to get the exact data being hashed but so far have not gotten it to reproduce yet throwing the CL at trybots. |
I do remember the "hard-code -dynlink on all the time" suggestion. I don't want to do that either. As long as the export data is the same it should be OK. As a side note, I don't think the linker should be the one hashing the export data. The compiler should be in charge of generating the export hash. But that doesn't matter at the moment. |
I was wrong. There's a file name in the exported position information. Not sure yet why it's changing but that's the root cause. |
OK, this is actually pretty cool. The iface_i package in testshared and in testplugin are the same code, and until yesterday we did not include the path to the directory containing the code in the action ID. So if testshared managed to run and build iface_i fast enough, the final step of testplugin would use the cached iface_i build from testshared instead of doing its own. The only thing that would be different is the file paths, and file paths are now recorded in the export data (for line number reporting in inlined functions), so they affect the hash. The earlier steps of testplugin did not use the cached copy because they were building with -dynlink. I fixed this yesterday by putting p.Dir into the hash for non-GOROOT builds (but was still debugging the original failure CL without that fix). For GOROOT builds we do not include any directory in the hash but probably should still store the path suffix within $GOROOT. Right now if you put the same code into $GOROOT/src/p1 and $GOROOT/src/p2, the two would share a cache entry, which would not be correct since the file names in the two object files would be different. |
Change https://golang.org/cl/76371 mentions this issue: |
CL 75631, which has basically nothing to do with anything related to plugin support,
causes reliable failures in testplugin on the linux/386 trybot, but not on my workstation
using GOARCH=386 GOHOSTARCH=386.
I can't see what in the CL could possibly cause this, and I can't see why the test would
behave differently locally vs on the trybot.
I'm debugging on CL 75910, but the turnaround is about 10 minutes to add a print statement.
I'm going to disable it for now, so that I can get the other CLs in, and then send a fix
and reenable once I've debugged it.
/cc @crawshaw
The text was updated successfully, but these errors were encountered: