Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/cmd/coordinator: watcher's git mirroring is slow/flaky #16388

Closed
bradfitz opened this issue Jul 15, 2016 · 27 comments
Closed

x/build/cmd/coordinator: watcher's git mirroring is slow/flaky #16388

bradfitz opened this issue Jul 15, 2016 · 27 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Milestone

Comments

@bradfitz
Copy link
Contributor

The "net" subrepo has stopped mirroring to github.

Note: https://github.com/golang/net/commits/master (stuck at Jul 7, 2016, f841c3)
Versus: https://go.googlesource.com/net (3 new commits since then, with e90d6d0 currently at top)

The watcher says:

http://farmer.golang.org/debug/watcher

 ! [remote rejected] ---40hexomitted--- -> ---40hexomitted--- (pre-receive hook declined)
error: failed to push some refs to 'https://gopherbot:---40hexomitted---@github.com/golang/net'
remote: error: GH002: Sorry, branch or tag names consisting of 40 hex characters are not allowed.        
remote: error: Invalid branch or tag name "---40hexomitted---"        

So Github doesn't accept some ref that Gerrit has?

/cc @spearce @adg @broady @quentinmit

Related: #11811

@bradfitz bradfitz added the Builders x/build issues (builders, bots, dashboards) label Jul 15, 2016
@bradfitz bradfitz added this to the Unreleased milestone Jul 15, 2016
@bradfitz
Copy link
Contributor Author

In Gerrit for the "net" repo, I see only:

branches

... and no tags.

@bradfitz
Copy link
Contributor Author

But on the watcher's git cache, I see:

# cat /var/cache/watcher-git/net/refs/heads/f841c39de738b1d0df95b5a7187744f0e03d8112 
62685c2d7ca23c807425dca88b11a3e2323dab41

I don't know what created that head.

It seems to be the only screwed up one:

# find /var/cache/watcher-git -type f -path '*refs/heads*'
/var/cache/watcher-git/image/refs/heads/master
/var/cache/watcher-git/oauth2/refs/heads/master
/var/cache/watcher-git/gddo/refs/heads/master
/var/cache/watcher-git/exp/refs/heads/master
/var/cache/watcher-git/sync/refs/heads/master
/var/cache/watcher-git/sys/refs/heads/master
/var/cache/watcher-git/mobile/refs/heads/master
/var/cache/watcher-git/build/refs/heads/master
/var/cache/watcher-git/crypto/refs/heads/master
/var/cache/watcher-git/go/refs/heads/dev.ssa
/var/cache/watcher-git/go/refs/heads/master
/var/cache/watcher-git/go/refs/heads/release-branch.go1.7
/var/cache/watcher-git/tools/refs/heads/master
/var/cache/watcher-git/tools/refs/heads/release-branch.go1.7
/var/cache/watcher-git/talks/refs/heads/master
/var/cache/watcher-git/gofrontend/refs/heads/master
/var/cache/watcher-git/term/refs/heads/master
/var/cache/watcher-git/review/refs/heads/master
/var/cache/watcher-git/arch/refs/heads/master
/var/cache/watcher-git/proposal/refs/heads/master
/var/cache/watcher-git/net/refs/heads/master
/var/cache/watcher-git/net/refs/heads/f841c39de738b1d0df95b5a7187744f0e03d8112

Did somebody accidentally create that recently?

@quentinmit
Copy link
Contributor

@aclements, @cherrymui and I have been looking at it for the past hour. We removed that branch about an hour ago; no idea when it was created but it pointed at a commit from you in February.

@quentinmit quentinmit self-assigned this Jul 15, 2016
@bradfitz
Copy link
Contributor Author

I've nuked the watcher's "net" git cache and restarted the watcher, which re-cloned it, and github & the build dashboard are happy again.

@quentinmit
Copy link
Contributor

The build dashboard certainly doesn't look happy to me; the latest commit on master is still from July 8th.

The git push of the go repo is taking for-freaking-ever, but I hope when it finally finishes that the watcher will notice the commits since July 8th.

@bradfitz
Copy link
Contributor Author

Going forward, we need monitoring on the watcher (that is #15760)

This bug can be about making sure the watcher's git cache stays in sync better, deleting refs when they're deleted from upstream gerrit.

@bradfitz
Copy link
Contributor Author

The build dashboard certainly doesn't look happy to me; the latest commit on master is still from July 8th.

It's still building, but it noticed new stuff. The red "fail" went away and was replaced by blank spots it knows it needs to build.

@quentinmit
Copy link
Contributor

It's still building, but it noticed new stuff. The red "fail" went away and was replaced by blank spots it knows it needs to build.

That's true for the subrepos, but the go repo at the top does not show any blank spots.

builders

@bradfitz
Copy link
Contributor Author

Ah, I was only debugging the "net" repo. I hadn't noticed the main repo.

@aclements
Copy link
Member

The git push of the go repo is taking for-freaking-ever, but I hope when it finally finishes that the watcher will notice the commits since July 8th.

I think I may know what's going on here. If you git clone --mirror https://go.googlesource.com/go (which I believe is what the watcher does), you'll get 40,370 refs, including every version of every CL on Gerrit. I suspect the send-pack run by push --mirror is taking forever because it's processing all of these refs.

@aclements
Copy link
Member

Also, it appears these have all been pushed up to GitHub. They don't show up in the UI (presumably because they're not in refs/heads), but if you git clone --mirror https://github.com/golang/go, you'll see all 40k refs, as well.

@bradfitz
Copy link
Contributor Author

@aclements that's what I suspected at first too, but even though we do run git clone --mirror, I only see 330 refs on the git cache:

# find /var/cache/watcher-git/go/refs -type f | head -10
/var/cache/watcher-git/go/refs/heads/dev.ssa
/var/cache/watcher-git/go/refs/heads/master
/var/cache/watcher-git/go/refs/heads/release-branch.go1.7
/var/cache/watcher-git/go/refs/tags/go1.7rc1
/var/cache/watcher-git/go/refs/notes/review
/var/cache/watcher-git/go/refs/cache-automerge/b4/a7806724b78f7c26326f6acee0dcca0110b7a8
/var/cache/watcher-git/go/refs/cache-automerge/ef/efd11725885c850be617389faa4d9f56137641
/var/cache/watcher-git/go/refs/changes/85/22685/meta
/var/cache/watcher-git/go/refs/changes/05/24705/meta
/var/cache/watcher-git/go/refs/changes/05/24705/1

# find /var/cache/watcher-git/go/refs -type f | wc -l
330
# find /var/cache/watcher-git/go/refs -type f | tail -20
/var/cache/watcher-git/go/refs/changes/15/24915/meta
/var/cache/watcher-git/go/refs/changes/15/24915/1
/var/cache/watcher-git/go/refs/changes/15/24915/2
/var/cache/watcher-git/go/refs/changes/15/24715/meta
/var/cache/watcher-git/go/refs/changes/15/24715/1
/var/cache/watcher-git/go/refs/changes/15/24715/2
/var/cache/watcher-git/go/refs/changes/15/24815/meta
/var/cache/watcher-git/go/refs/changes/15/24815/1
/var/cache/watcher-git/go/refs/changes/15/24815/2
/var/cache/watcher-git/go/refs/changes/35/18835/meta
/var/cache/watcher-git/go/refs/changes/61/24861/meta
/var/cache/watcher-git/go/refs/changes/61/24861/1
/var/cache/watcher-git/go/refs/changes/61/24861/2
/var/cache/watcher-git/go/refs/changes/81/24781/meta
/var/cache/watcher-git/go/refs/changes/81/24781/1
/var/cache/watcher-git/go/refs/changes/81/24781/2
/var/cache/watcher-git/go/refs/changes/36/24836/meta
/var/cache/watcher-git/go/refs/changes/36/24836/1
/var/cache/watcher-git/go/refs/changes/56/24856/meta
/var/cache/watcher-git/go/refs/changes/56/24856/1

farmer go # pwd
/var/cache/watcher-git/go

farmer go # du -h | tail -1
306M    .

farmer go # ls
FETCH_HEAD  HEAD  branches  config  description  hooks  info  objects  packed-refs  refs

Ooooh! That's where they're hiding. In packed-refs:

# wc -l packed-refs 
40070 packed-refs

Okay, mystery solved.

We should probably pusher smarter, and prioritize non-change refs, only pushing code review refs when we're otherwise idle.

@aclements
Copy link
Member

Do we want to mirror code review refs at all? (I could see arguments either way, but I didn't even realize we did this until 10 minutes ago.)

@bradfitz
Copy link
Contributor Author

As I told @aclements on chat, the git push --mirror process seems to be totally stupid, and doing 40,000+ redundant ref pushes serially, even though almost all the 40k+ are already on the peer.

But it's possible this has already been fixed in upstream git. The go-watcher-world Docker container we're using is running git version 1.7.10.4, which is over 4 years old.

@quentinmit
Copy link
Contributor

But it's possible this has already been fixed in upstream git. The go-watcher-world Docker container we're using is running git version 1.7.10.4, which is over 4 years old.

It hasn't; I'm able to replicate the poor performance with git version 2.8.0.rc3.226.g39d4020 on my desktop.

@bradfitz
Copy link
Contributor Author

Okay, I wrote a little git syncer atop git show-ref and git ls-remote DEST and git push:

https://gist.github.com/bradfitz/b64ca2917fbc7a447141c580ec820da5

It prioritizes heads, then tags, then misc other stuff, and resumes where it left off, and runs quickly as opposed to git push --mirror. I'll drop it in the watcher later.

@josharian
Copy link
Contributor

Has anyone filed a bug upstream with git? If not, I can do so.

@bradfitz
Copy link
Contributor Author

@josharian, feel free. I was waiting to gather more information to feel confident I knew what git was doing.

@bradfitz
Copy link
Contributor Author

@quentinmit, you write:

It hasn't; I'm able to replicate the poor performance with git version 2.8.0.rc3.226.g39d4020 on my desktop.

How did you reproduce it?

After I ran my gitsync tool to a new test repo on Github, I then tried both:

$ time git push --mirror https://github.com/bradfitz/go-mirror-test.git 
Everything up-to-date

real    0m1.116s
user    0m0.128s
sys     0m0.052s

$ time git push -f --mirror https://github.com/bradfitz/go-mirror-test.git 
Everything up-to-date

real    0m1.108s
user    0m0.148s
sys     0m0.044s

And I can't reproduce it. My git version is 1.9.1 on that machine.

Are you sure they didn't fix it?

@bradfitz
Copy link
Contributor Author

Ping @quentinmit. This appears to be slow again today and I need to decide how to fix it. You say modern git is still slow with many refs, but my tests seemed fine. What did you run?

In any case, I'll start with adding more visibility into what it's doing.

@quentinmit
Copy link
Contributor

@bradfitz Here is my repro with git 2.8.0.rc3.226.g39d4020:

$ git clone https://go-review.googlesource.com/go
$ time git push --mirror -n -v http://github.com/golang/go
Pushing to http://github.com/golang/go
Username for 'https://github.com': quentinmit
Password for 'https://quentinmit@github.com':
^C

real    1m37.156s
user    0m0.013s
sys     0m0.013s  

(Obviously I killed it after 1.5 minutes; I don't know how long it would have taken.)

Note that I did not clone with --mirror, so the repo does have slightly different refs. Perhaps that's the difference.

@bradfitz
Copy link
Contributor Author

I assume there was a "cd go" in between those lines?

In any case, such a git push is 2 seconds for me, with git 1.9.1, when my current directory is a bare git checkout (from git clone --mirror).

@quentinmit
Copy link
Contributor

Yes, there was, I forgot to copy+paste it.

In fact, from a clone --mirror it takes almost no time, so that explains the difference. I think our time would be best spent upgrading git in go-watcher-world.

@bradfitz
Copy link
Contributor Author

Good to hear.

I'll update go-watcher-world after I deploy a new version with more status page visibility into what's happening. I'd like to see the problem more clearly before fixing it.

@bradfitz
Copy link
Contributor Author

I updated go-watcher-world to sid, which has git 2.8.0, and I still see it sucking with the new status pages:

watcher status for repo: "go"

2016-07-20T18:58:01Z   8m51s ago            running git push -f --mirror dest
2016-07-20T18:58:01Z   8m51s ago            added dest remote
2016-07-20T18:58:01Z   8m51s ago            adding dest remote
2016-07-20T18:57:35Z   9m16s ago            reusing git dir; running git fetch

But running my own watcher-world container on the coordinator with the /var/cache/watcher-git bind-mounted, I timed various operations:

root@77bd5d9082b9://var/cache/watcher-git/go# time (git show-ref > /dev/null)

real    0m0.118s
user    0m0.087s
sys     0m0.022s
root@77bd5d9082b9://var/cache/watcher-git/go# time (git ls-remote origin > /dev/null)

real    0m0.966s
user    0m0.205s
sys     0m0.054s

root@77bd5d9082b9://var/cache/watcher-git/go# time git fetch --dry-run origin
real    0m4.366s
user    0m3.339s
sys     0m0.439s

There's no reason it should be slow.

I'm inclined to do the syncing myself as in https://gist.github.com/bradfitz/b64ca2917fbc7a447141c580ec820da5

Which also has the advantage of prioritization and the ability to get more visibility into its state.

@bradfitz bradfitz assigned bradfitz and unassigned quentinmit Jul 20, 2016
@bradfitz bradfitz changed the title build: mirroring of net repo is broken x/build/cmd/coordinator: watcher's git mirroring is slow/flaky Jul 20, 2016
@bradfitz
Copy link
Contributor Author

After debugging with @aclements, I switched to using the ssh transport, and things now look good and very fast:

watcher status for repo: "go"
2016-07-21T00:27:38Z   1m24s ago            waiting
2016-07-21T00:27:38Z   1m24s ago            updated dashboard
2016-07-21T00:27:38Z   1m24s ago            found 13 remotes
2016-07-21T00:27:38Z   1m24s ago            found 13 remotes
2016-07-21T00:27:38Z   1m24s ago            updating dashboard
2016-07-21T00:27:38Z   1m24s ago            did sync to github
2016-07-21T00:27:38Z   1m24s ago            sync complete
2016-07-21T00:27:34Z   1m28s ago            2 refs to push; pushing batch
2016-07-21T00:27:34Z   1m28s ago            sync: got 40680 remote refs
2016-07-21T00:27:33Z   1m29s ago            sync: fetching remote refs
2016-07-21T00:27:33Z   1m29s ago            sync: got 40568 local refs
2016-07-21T00:27:33Z   1m29s ago            sync: fetching local refs
2016-07-21T00:27:33Z   1m29s ago            syncing to github
2016-07-21T00:27:33Z   1m29s ago            ran git fetch
2016-07-21T00:27:28Z   1m34s ago            running git fetch origin
2016-07-21T00:27:28Z   1m34s ago            poll timer fired
2016-07-21T00:22:28Z   6m34s ago            waiting
2016-07-21T00:22:28Z   6m34s ago            updated dashboard
2016-07-21T00:22:28Z   6m34s ago            found 13 remotes
2016-07-21T00:22:27Z   6m34s ago            found 13 remotes
2016-07-21T00:22:27Z   6m34s ago            updating dashboard
2016-07-21T00:22:27Z   6m34s ago            did sync to github
2016-07-21T00:22:27Z   6m34s ago            sync complete
2016-07-21T00:22:23Z   6m38s ago            1 refs to push; pushing batch
2016-07-21T00:22:23Z   6m38s ago            sync: got 40680 remote refs
2016-07-21T00:22:22Z   6m39s ago            sync: fetching remote refs
2016-07-21T00:22:22Z   6m39s ago            sync: got 40567 local refs
2016-07-21T00:22:22Z   6m39s ago            sync: fetching local refs
2016-07-21T00:22:22Z   6m39s ago            syncing to github
2016-07-21T00:22:22Z   6m39s ago            ran git fetch
2016-07-21T00:22:18Z   6m43s ago            running git fetch origin
2016-07-21T00:22:18Z   6m43s ago            poll timer fired
2016-07-21T00:17:18Z   11m43s ago           waiting
2016-07-21T00:17:18Z   11m43s ago           updated dashboard
2016-07-21T00:17:18Z   11m43s ago           found 13 remotes
2016-07-21T00:17:18Z   11m44s ago           found 13 remotes
2016-07-21T00:17:18Z   11m44s ago           updating dashboard
2016-07-21T00:17:18Z   11m44s ago           did sync to github
2016-07-21T00:17:18Z   11m44s ago           nothing to sync
2016-07-21T00:17:18Z   11m44s ago           sync: got 40680 remote refs
2016-07-21T00:17:17Z   11m45s ago           sync: fetching remote refs
2016-07-21T00:17:17Z   11m45s ago           sync: got 40567 local refs
2016-07-21T00:17:17Z   11m45s ago           sync: fetching local refs
2016-07-21T00:17:17Z   11m45s ago           syncing to github
2016-07-21T00:17:17Z   11m45s ago           ran git fetch
2016-07-21T00:17:13Z   11m48s ago           running git fetch origin
2016-07-21T00:17:13Z   11m48s ago           poll timer fired
2016-07-21T00:12:13Z   16m48s ago           waiting
2016-07-21T00:12:13Z   16m48s ago           updated dashboard

It seems the git-transport-https is just buggy or inefficient with 40k+ refs.

@gopherbot
Copy link

CL https://golang.org/cl/25110 mentions this issue.

@golang golang locked and limited conversation to collaborators Jul 21, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Projects
None yet
Development

No branches or pull requests

5 participants