Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: linux/mipsle builder has been unstable all cycle #26179

Closed
josharian opened this issue Jul 2, 2018 · 13 comments
Closed

build: linux/mipsle builder has been unstable all cycle #26179

josharian opened this issue Jul 2, 2018 · 13 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@josharian
Copy link
Contributor

Anecdotally, the linux/mipsle builder has been ~50% red for the entire Go 1.11 cycle. The failures vary; smells like memory corruption or some such. As far as I know, no one is working on fixing it, or even diagnosing whether the problem is the toolchain or the hardware.

It seems to me we should figure out if flaky hardware is the problem. If so, get new hardware. And if not, consider adding a line to the release notes that mipsle is unstable. Or some such.

@josharian josharian added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Jul 2, 2018
@josharian josharian added this to the Go1.11 milestone Jul 2, 2018
@milanknezevic
Copy link
Contributor

@josharian We are aware of this instability, but for the time being we are not sure what the main cause is. It seems to be hardware or kernel issue, and recently we found out that disabling transparent hugepages helps on a similar board.
We'll try to do this on builder.

@aclements
Copy link
Member

Downgrading to non-blocker since it sounds like this is probably a builder issue. (Feel free to disagree :)

@josharian
Copy link
Contributor Author

I do disagree. :)

Five reasons:

  • We don't know (yet) that it is a builder issue.

  • If disabling transparent hugepages helps, it seems possible that there is a kernel bug that we are triggering, or that we are doing something wrong ourselves.

  • Even if there is a builder issue, there might be further non-builder issues that have been masked by the builder issues.

  • Keeping a functioning builder is critical anyway.

  • This is explicitly documented. Quoting from the porting policy: "If a builder fails for more than four weeks or is failing at the time of a release freeze, and a new maintainer cannot be found, the port will be removed from the tree."

So I've re-upped this to be a blocker. Red builders aren't ok, particularly at release time.

@milanknezevic
Copy link
Contributor

I agree that we should investigate further the issue with transparent hugepages.
Just to mention, current build failures on mipsle are caused by problem in gdb 7.7, so we are planning to update gdb to 8.1 release.

@aclements
Copy link
Member

Just to mention, current build failures on mipsle are caused by problem in gdb 7.7, so we are planning to update gdb to 8.1 release.

Given that all of the linux/mipsle failures currently on the dashboard are gdb-related, do we still think there's another source of flakiness?

Are the gdb failures known to be an issue with gdb 7.7, or is it a guess that upgrading will help? (Given the nature of the failure it definitely seems like it would have to be a gdb issue, but it's hard to say for sure.)

@milanknezevic
Copy link
Contributor

Given that all of the linux/mipsle failures currently on the dashboard are gdb-related, do we still think there's another source of flakiness?

Except SEGFAULT that sporadically happens when huge transparent tables are enabled, there's no another known source for flakiness.

Are the gdb failures known to be an issue with gdb 7.7, or is it a guess that upgrading will help? (Given the nature of the failure it definitely seems like it would have to be a gdb issue, but it's hard to say for sure.)

Most likely it's the gdb issue described in this issue, it can be reproduced on gdb 7.7 and 7.4, didn't check for earlier versions. I've tried to reproduce this behavior on a board similar to builder and it doesn't happen on gdb 8.1 and the test does not fail, so it looks like it's fixed. Also gdb upgrade is needed for dwarf compression support that could solve this issue.

@aclements
Copy link
Member

Thanks.

What's the time-frame for updating the builder to gdb 8.1?

@vstefanovic
Copy link
Member

Hi Austin, it should be updated today or tomorrow.

@josharian
Copy link
Contributor Author

@vstefanovic will that update also include disabling transparent huge pages? Either way, please let us know once that happens, and perhaps @bradfitz or @andybons can clear the dashboard and re-test with old commits, to get a bunch of miles on it. Thanks!

@vstefanovic
Copy link
Member

@josharian gdb is updated, if some builds can be re-run, great, thanks.
Transparent huge pages were disabled about a month ago, we haven't noticed any segfaults in the builder logs afterwards.

@bradfitz
Copy link
Contributor

bradfitz commented Aug 7, 2018

@vstefanovic, done. I've wiped all builds with that GDB error.

@dmitshur, FYI, what I did was locally rebuilt x/build/cmd/retrybuilds with:

diff --git a/cmd/retrybuilds/retrybuilds.go b/cmd/retrybuilds/retrybuilds.go
index 265a93f..5861e79 100644
--- a/cmd/retrybuilds/retrybuilds.go
+++ b/cmd/retrybuilds/retrybuilds.go
@@ -104,6 +104,7 @@ func fixTheFlakes() {
 }
 
 var flakePhrases = []string{
+       "Failed to read a valid object file image from memory",
        "No space left on device",
        "no space left on device", // solaris case apparently
        "fatal error: error in backend: IO failure on output stream",

And ran:

$ retrybuilds -redo-flaky
2018/08/07 20:08:32 Restarting flaky {Builder:linux-mipsle Hash:1b870077c896379c066b41657d3c9062097a6943 LogURL:https://build.golang.org/log/9bfdfb9d4b18e8bba26264d704ab0921ca652658}
2018/08/07 20:08:32 Clearing linux-mipsle, hash 1b870077c896379c066b41657d3c9062097a6943
2018/08/07 20:08:32 Restarting flaky {Builder:linux-mipsle Hash:ac6d1564795e662b5b930c6b3d86f12351ff83d5 LogURL:https://build.golang.org/log/60fc8eff000efbeb6a1244673f96917797e8fe98}
2018/08/07 20:08:32 Clearing linux-mipsle, hash ac6d1564795e662b5b930c6b3d86f12351ff83d5
2018/08/07 20:08:32 Restarting flaky {Builder:linux-mipsle Hash:4cc09cd5320a2bea4f27a1db59970d4b715f6522 LogURL:https://build.golang.org/log/6baa08fb520cccf2c003bf9a14c2ea9bf24ce6e5}
2018/08/07 20:08:32 Clearing linux-mipsle, hash 4cc09cd5320a2bea4f27a1db59970d4b715f6522
2018/08/07 20:08:32 Restarting flaky {Builder:linux-mipsle Hash:51ddeb9965e942d5909c03fef005006457156638 LogURL:https://build.golang.org/log/8bb68d01986887ecb3655bae174af866d73f1eeb}
2018/08/07 20:08:32 Clearing linux-mipsle, hash 51ddeb9965e942d5909c03fef005006457156638
2018/08/07 20:08:32 Restarting flaky {Builder:linux-mipsle Hash:8589f46fe07998bd3b27a0cebce2f428e68014e0 LogURL:https://build.golang.org/log/8bad448f7dfb2966cad33b5f17356573d5d89138}
2018/08/07 20:08:32 Clearing linux-mipsle, hash 8589f46fe07998bd3b27a0cebce2f428e68014e0
2018/08/07 20:08:33 Restarting flaky {Builder:linux-mipsle Hash:8cc7540ecb592e8f9fdb429c3c7f5ede9548dfca LogURL:https://build.golang.org/log/c40b90628dc8fe7f7a0303a810c7fe224bd55bca}
2018/08/07 20:08:33 Clearing linux-mipsle, hash 8cc7540ecb592e8f9fdb429c3c7f5ede9548dfca
2018/08/07 20:08:33 Restarting flaky {Builder:linux-mipsle Hash:a9dcbab0fd4b5adfb40cb924f14ee2af9c8938eb LogURL:https://build.golang.org/log/a3d7438829f79de0f4292940bb47a039f723673c}
2018/08/07 20:08:33 Clearing linux-mipsle, hash a9dcbab0fd4b5adfb40cb924f14ee2af9c8938eb
2018/08/07 20:08:33 Restarting flaky {Builder:linux-mipsle Hash:870e12d7bfaea70fb0d743842f5864eb059cb939 LogURL:https://build.golang.org/log/947bbdba044342239338ec7f9b5509ac18887ce3}
2018/08/07 20:08:33 Clearing linux-mipsle, hash 870e12d7bfaea70fb0d743842f5864eb059cb939

@vstefanovic
Copy link
Member

@bradfitz, thanks. The first retried build failed due to my fault, old gdb was used in it.

@josharian
Copy link
Contributor Author

Closing this. Thanks, all. I've filed #26898 to follow up about transparent hugepages.

@golang golang locked and limited conversation to collaborators Aug 9, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

6 participants