New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/build/maintner/maintnerd: Gerrit events are slow to pick up #35293
Comments
I watched the logs and left a test comment on Gerrit:
The pubsub event arrived immediately. The questionable line is:
What's it's supposed to do is poll regularly, but if it gets a pubsub event, it wakes up the poller. But if it's already polling when it gets a pubsub event, it ignores it. It seems we're stuck in a poll (forever? usually?) and so when we get the pubsub events we just drop them. Next up is figure out why we're stuck updating from Gerrit. /cc @golang/osp-team |
Change https://golang.org/cl/205860 mentions this issue: |
I deployed maintnerd with 46182bf412d81cd9eadccb30ca9f451a58edd4fc (from https://go-review.googlesource.com/c/build/+/205918/1) and I'm watching logs now. It also updated to Debian Buster w/ a newer git, which might even fix the problem, but let's see...
|
Also increase its timeout and terminate it with a friendlier signal and add some logging of how long git operations took. Updates golang/go#35293 Updates golang/go#35124 Change-Id: I1ed466d872a11f60751953ef5274be96cea0294b Reviewed-on: https://go-review.googlesource.com/c/build/+/205860 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
The fetch operations seem the slowest:
I wonder if the problem is that because we drop pubsub wake-ups if a sync's already in progress, with the assumption that the current sync will likely pick up the change that we were just notified about, that assumption is rarely valid on these slow operations. If we're fetching a ref for 11 seconds, a lot of comments can come in then and get dropped. Then we finish and resort to polling many minutes later. We're on a pubsub wake-up we should set a bit saying "stuff happened" and when a sync's done and stuff happened meanwhile, sync immediately again until we get to a clean state where no pubsub events happened meanwhile and the ls-remote refs are unchanged. That's my current theory, but I forget most of this code. |
I was reading through the code and I thought we only drop the wakeup if the “wakeup“ channel's buffer is already full — that is, if we're going to wake up again at the end of the current poll anyway. Did I misread? |
You read, which is a step ahead of me, so you're probably right. I was looking at logs only and trying to remember the code. |
Also increase its timeout and terminate it with a friendlier signal and add some logging of how long git operations took. Updates golang/go#35293 Updates golang/go#35124 Change-Id: I1ed466d872a11f60751953ef5274be96cea0294b Reviewed-on: https://go-review.googlesource.com/c/build/+/205860 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Bryan C. Mills <bcmills@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>
I've noticed a huge delay lately starting TryBots.
Something's stuck or being slow in the pubsubhelper and/or maintner Gerrit code.
The text was updated successfully, but these errors were encountered: