Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/cmd/gopherbot: gopherbot is fallible to sleep-error loops #30182

Closed
dmitshur opened this issue Feb 12, 2019 · 6 comments
Closed

x/build/cmd/gopherbot: gopherbot is fallible to sleep-error loops #30182

dmitshur opened this issue Feb 12, 2019 · 6 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@dmitshur
Copy link
Contributor

dmitshur commented Feb 12, 2019

Gopherbot isn't happy right now. From its logs:

2019/02/12 01:01:02 got corpus update after 274.731276ms
freeze
	https://golang.org/issue/23772  x/build/cmd/gerritbot: <fill this in>
2019/02/12 01:01:02 freeze old issues: PUT https://api.github.com/repos/golang/go/issues/23772/lock: 404 Not Found []
2019/02/12 01:01:02 PUT https://api.github.com/repos/golang/go/issues/23772/lock: 404 Not Found []
2019/02/12 01:01:02 gopherbot ran in 110.450416ms
2019/02/12 01:01:02 sleeping 30s after previous error.
2019/02/12 01:01:32 Updating data from log *maintner.netMutSource ...
2019/02/12 01:01:36 Downloading 649 bytes of https://maintner.golang.org/logs/45 ...
2019/02/12 01:01:36 wrote /cache/golang-maintner/0045.growing.mutlog
2019/02/12 01:01:36 gerrit go.googlesource.com/gofrontend: Ref {CLNumber:161963 Version:0} => 3a9fd412dc7603e55cff110c741a02565ae83980
2019/02/12 01:01:36 Reloaded data from log *maintner.netMutSource.
2019/02/12 01:01:36 got corpus update after 3.996453811s
freeze
	https://golang.org/issue/23772  x/build/cmd/gerritbot: <fill this in>
2019/02/12 01:01:36 freeze old issues: PUT https://api.github.com/repos/golang/go/issues/23772/lock: 404 Not Found []
2019/02/12 01:01:36 PUT https://api.github.com/repos/golang/go/issues/23772/lock: 404 Not Found []
2019/02/12 01:01:36 gopherbot ran in 98.493511ms
2019/02/12 01:01:36 sleeping 30s after previous error.
2019/02/12 01:02:06 Updating data from log *maintner.netMutSource ...
...

It's constantly looping with "sleeping 30s after previous error" and not making any progress.

@bradfitz spotted this in #30181 (comment).

@dmitshur dmitshur added Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Feb 12, 2019
@dmitshur
Copy link
Contributor Author

dmitshur commented Feb 12, 2019

Issue #23772 was an empty/spam issue that was closed a year ago. It's showing up as 404 now, which gopherbot can't handle.

There are two things we need to fix here:

  1. This specific problem in the "freeze old issues" task: it should skip issues that are 404 and move on to the rest (not treat it as a fatal error).

  2. See if the high-level behavior can be improved such that if one task starts to fail, perhaps it can still perform the other tasks rather than short-circuiting and not attempting other tasks. Maybe it's not possible to make such an improvement in the general case, but it's worth considering.

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. Soon This needs to be done soon. (regressions, serious bugs, outages) and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Feb 12, 2019
@dmitshur dmitshur self-assigned this Feb 12, 2019
@dmitshur
Copy link
Contributor Author

Since this isn't resolved yet, I have to manually say that I've sent a fix for the "soon" part of this issue as CL 161906.

@gopherbot gopherbot added this to the Unreleased milestone Feb 12, 2019
@gopherbot
Copy link

Change https://golang.org/cl/161906 mentions this issue: cmd/gopherbot: handle 404 GitHub issues in freezeOldIssues task

@dmitshur
Copy link
Contributor Author

dmitshur commented Feb 12, 2019

I've deployed the fix in the aforementioned CL, which is why gopherbot woke up. The CL still needs to be reviewed before it can be merged.

Edit: I'll remove the Soon label, since the immediate problem is resolved.

@dmitshur
Copy link
Contributor Author

dmitshur commented Feb 12, 2019

The current fix adds to the plate of "actions gopherbot keeps taking without any effect" (issue #28320):

freeze
	https://golang.org/issue/23772  x/build/cmd/gerritbot: <fill this in>

To fix that, we need to update maintner to detect issues that have gone missing, and set their NotExist field to true. CL 161521 may help with that. /cc @jmdobry

Edit: Opened #30184 for that.

@dmitshur dmitshur removed the Soon This needs to be done soon. (regressions, serious bugs, outages) label Feb 12, 2019
gopherbot pushed a commit to golang/build that referenced this issue Feb 12, 2019
A GitHub issue can become 404. Attempting to lock it will produce a
404 response from the GitHub API. Don't treat it as a fatal error
when it happens.

Add a check for the NotExist field. This will help after golang/go#30184
is resolved.

Updates golang/go#30182

Change-Id: Ia04c59879909b1de00bd681606bfa331fe642cd4
Reviewed-on: https://go-review.googlesource.com/c/161906
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@dmitshur dmitshur changed the title x/build/cmd/gopherbot: gopherbot is in sleep-error loop x/build/cmd/gopherbot: gopherbot is prone to sleep-error loops Feb 12, 2019
@dmitshur dmitshur changed the title x/build/cmd/gopherbot: gopherbot is prone to sleep-error loops x/build/cmd/gopherbot: gopherbot is fallible to sleep-error loops Feb 12, 2019
@gopherbot
Copy link

Change https://golang.org/cl/164157 mentions this issue: cmd/gopherbot: don't return early on error in doTasks

@golang golang locked and limited conversation to collaborators Feb 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

2 participants