Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: gitmirror or maintnerd consumed all our Gerrit quota #23853

Closed
bradfitz opened this issue Feb 15, 2018 · 8 comments
Closed

x/build: gitmirror or maintnerd consumed all our Gerrit quota #23853

bradfitz opened this issue Feb 15, 2018 · 8 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Soon This needs to be done soon. (regressions, serious bugs, outages)

Comments

@bradfitz
Copy link
Contributor

Maintner is blocked:

2018/02/15 16:25:01 gerrit go.googlesource.com/sublime-config: sync: git fetch origin: exit status 128, fatal: remote error: Daily ls-remote rate limit exceeded for IP 35.188.125.9
2018/02/15 16:25:01 IS TEMP ERROR? *errors.errorString git fetch origin: exit status 128, fatal: remote error: Daily ls-remote rate limit exceeded for IP 35.188.125.9
2018/02/15 16:25:01 Temporary error from gerrit go.googlesource.com/sublime-config: git fetch origin: exit status 128, fatal: remote error: Daily ls-remote rate limit exceeded for IP 35.188.125.9
2018/02/15 16:25:02 gerrit go.googlesource.com/term: sync: git fetch origin: exit status 128, fatal: remote error: Daily ls-remote rate limit exceeded for IP 35.188.125.9
2018/02/15 16:25:02 IS TEMP ERROR? *errors.errorString git fetch origin: exit status 128, fatal: remote error: Daily ls-remote rate limit exceeded for IP 35.188.125.9
2018/02/15 16:25:02 Temporary error from gerrit go.googlesource.com/term: git fetch origin: exit status 128, fatal: remote error: Daily ls-remote rate limit exceeded for IP 35.188.125.9

So now all bots are down.

GerritBot consumed all of our Gerrit quota.

@andybons, please stop GerritBot and/or increase our Gerrit quota.

@bradfitz bradfitz added the NeedsFix The path to resolution is known, but the work has not been done. label Feb 15, 2018
@bradfitz bradfitz added this to the Soon milestone Feb 15, 2018
@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Feb 15, 2018
@andybons andybons changed the title x/build: GerritBot consumed all our Gerrit quota and starved maintner x/build: gitmirror or maintnerd consumed all our Gerrit quota Feb 15, 2018
@andybons
Copy link
Member

Our ls-remote quota has been reset. After investigating with @bradfitz, the requests were coming from either mainternd or gitmirror, not GerritBot.

@bradfitz
Copy link
Contributor Author

Sorry, I by default blame whatever changed last. I'm still debugging.

@bradfitz
Copy link
Contributor Author

Looking at logs in the Cloud Logging console (which captures all the GKE pod output), started at "2018-02-15T13:27:16.527663505Z" , or 5:27am Pacific: "2018-02-15 05:27:14.000 PST"

I'll look at logs just before that to see if it's obvious what went crazy.

@andybons
Copy link
Member

Based on the error (presenting the IP address), we’re not authenticating our git requests somewhere. If we do this then we’ll get better error logging and much higher quota.

@golang golang deleted a comment from timendez Feb 15, 2018
@golang golang deleted a comment from andybons Feb 15, 2018
@golang golang deleted a comment from timendez Feb 15, 2018
@bradfitz
Copy link
Contributor Author

@andybons, will do. We historically never did, because we were bound to 1 IP address and got high quota for that IP. Now that we bounce around k8s nodes, I'll make them authenticate.

@gopherbot
Copy link

Change https://golang.org/cl/94836 mentions this issue: internal/gitauth: new package to write out git cookies file

@bradfitz
Copy link
Contributor Author

From another bug:

We recently upgraded our Kubernetes cluster and our 10 random container jobs re-laid themselves out onto our 4 physical nodes (i.e. 4 egress IP addresses)

We got lucky before and our 3 gerrit-hitting jobs were using different nodes (different IPs).

But after this latest upgrade, all 3 are on the same node, so we burn through that IP's quota by the end of the day.

That's the current theory.

I'm adding auth now.

gopherbot pushed a commit to golang/build that referenced this issue Feb 16, 2018
Then use it from gitmirror and maintnerd.

Updates golang/go#23853

Change-Id: I8112f004638667894676c04fa218a7ced10422ac
Reviewed-on: https://go-review.googlesource.com/94836
Reviewed-by: Andrew Bonventre <andybons@golang.org>
@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 6, 2018

This happened.

@bradfitz bradfitz closed this as completed Apr 6, 2018
@bradfitz bradfitz added the Soon This needs to be done soon. (regressions, serious bugs, outages) label May 17, 2018
@golang golang locked and limited conversation to collaborators May 17, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Soon This needs to be done soon. (regressions, serious bugs, outages)
Projects
None yet
Development

No branches or pull requests

3 participants