-
Notifications
You must be signed in to change notification settings - Fork 18k
x/build/cmd/coordinator: watcher's git mirroring is slow/flaky #16388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
But on the watcher's git cache, I see:
I don't know what created that head. It seems to be the only screwed up one:
Did somebody accidentally create that recently? |
@aclements, @cherrymui and I have been looking at it for the past hour. We removed that branch about an hour ago; no idea when it was created but it pointed at a commit from you in February. |
I've nuked the watcher's "net" git cache and restarted the watcher, which re-cloned it, and github & the build dashboard are happy again. |
The build dashboard certainly doesn't look happy to me; the latest commit on master is still from July 8th. The git push of the go repo is taking for-freaking-ever, but I hope when it finally finishes that the watcher will notice the commits since July 8th. |
Going forward, we need monitoring on the watcher (that is #15760) This bug can be about making sure the watcher's git cache stays in sync better, deleting refs when they're deleted from upstream gerrit. |
It's still building, but it noticed new stuff. The red "fail" went away and was replaced by blank spots it knows it needs to build. |
Ah, I was only debugging the "net" repo. I hadn't noticed the main repo. |
I think I may know what's going on here. If you |
Also, it appears these have all been pushed up to GitHub. They don't show up in the UI (presumably because they're not in refs/heads), but if you |
@aclements that's what I suspected at first too, but even though we do run git clone --mirror, I only see 330 refs on the git cache:
Ooooh! That's where they're hiding. In packed-refs:
Okay, mystery solved. We should probably pusher smarter, and prioritize non-change refs, only pushing code review refs when we're otherwise idle. |
Do we want to mirror code review refs at all? (I could see arguments either way, but I didn't even realize we did this until 10 minutes ago.) |
As I told @aclements on chat, the But it's possible this has already been fixed in upstream git. The |
It hasn't; I'm able to replicate the poor performance with |
Okay, I wrote a little git syncer atop https://gist.github.com/bradfitz/b64ca2917fbc7a447141c580ec820da5 It prioritizes heads, then tags, then misc other stuff, and resumes where it left off, and runs quickly as opposed to |
Has anyone filed a bug upstream with git? If not, I can do so. |
@josharian, feel free. I was waiting to gather more information to feel confident I knew what git was doing. |
@quentinmit, you write:
How did you reproduce it? After I ran my gitsync tool to a new test repo on Github, I then tried both:
And I can't reproduce it. My git version is 1.9.1 on that machine. Are you sure they didn't fix it? |
Ping @quentinmit. This appears to be slow again today and I need to decide how to fix it. You say modern git is still slow with many refs, but my tests seemed fine. What did you run? In any case, I'll start with adding more visibility into what it's doing. |
@bradfitz Here is my repro with git 2.8.0.rc3.226.g39d4020:
(Obviously I killed it after 1.5 minutes; I don't know how long it would have taken.) Note that I did not clone with --mirror, so the repo does have slightly different refs. Perhaps that's the difference. |
I assume there was a "cd go" in between those lines? In any case, such a git push is 2 seconds for me, with git 1.9.1, when my current directory is a bare git checkout (from git clone --mirror). |
Yes, there was, I forgot to copy+paste it. In fact, from a clone --mirror it takes almost no time, so that explains the difference. I think our time would be best spent upgrading git in go-watcher-world. |
Good to hear. I'll update go-watcher-world after I deploy a new version with more status page visibility into what's happening. I'd like to see the problem more clearly before fixing it. |
I updated go-watcher-world to sid, which has git 2.8.0, and I still see it sucking with the new status pages:
But running my own watcher-world container on the coordinator with the /var/cache/watcher-git bind-mounted, I timed various operations:
There's no reason it should be slow. I'm inclined to do the syncing myself as in https://gist.github.com/bradfitz/b64ca2917fbc7a447141c580ec820da5 Which also has the advantage of prioritization and the ability to get more visibility into its state. |
After debugging with @aclements, I switched to using the ssh transport, and things now look good and very fast:
It seems the |
CL https://golang.org/cl/25110 mentions this issue. |
The "net" subrepo has stopped mirroring to github.
Note: https://github.com/golang/net/commits/master (stuck at Jul 7, 2016, f841c3)
Versus: https://go.googlesource.com/net (3 new commits since then, with e90d6d0 currently at top)
The watcher says:
http://farmer.golang.org/debug/watcher
So Github doesn't accept some ref that Gerrit has?
/cc @spearce @adg @broady @quentinmit
Related: #11811
The text was updated successfully, but these errors were encountered: