|
|
Created:
13 years, 3 months ago by adg Modified:
13 years, 3 months ago Reviewers:
CC:
rsc, r, dsymonds, bradfitz, golang-dev Visibility:
Public. |
Descriptiondashboard: new Go dashboard data structure design
Patch Set 1 #
Total comments: 12
Patch Set 2 : diff -r fd80a4497037 https://go.googlecode.com/hg/ #Patch Set 3 : diff -r fd80a4497037 https://go.googlecode.com/hg/ #
Total comments: 28
Patch Set 4 : diff -r 9840a7bf8aba https://go.googlecode.com/hg/ #Patch Set 5 : diff -r 9840a7bf8aba https://go.googlecode.com/hg/ #
Total comments: 2
Patch Set 6 : diff -r 9840a7bf8aba https://go.googlecode.com/hg/ #
Total comments: 12
Patch Set 7 : diff -r 9840a7bf8aba https://go.googlecode.com/hg/ #
Total comments: 1
Patch Set 8 : diff -r 9840a7bf8aba https://go.googlecode.com/hg/ #
Total comments: 1
Patch Set 9 : diff -r 01a64d431f4a https://go.googlecode.com/hg/ #MessagesTotal messages: 24
Hello rsc, r (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
http://codereview.appspot.com/5416056/diff/1/misc/dashboard/ndashboard/build/... File misc/dashboard/ndashboard/build/build.go (right): http://codereview.appspot.com/5416056/diff/1/misc/dashboard/ndashboard/build/... misc/dashboard/ndashboard/build/build.go:14: Num int // Internal monotonic counter unique to this repo. This assumes a linear history. Many people use DVCSes in a general branch mode, which this isn't going to be able to model. http://codereview.appspot.com/5416056/diff/1/misc/dashboard/ndashboard/build/... misc/dashboard/ndashboard/build/build.go:16: ParentHash string This can't model merge commits. Does that matter? http://codereview.appspot.com/5416056/diff/1/misc/dashboard/ndashboard/build/... misc/dashboard/ndashboard/build/build.go:19: User string `datastore:",noindex"` Are you sure you don't want User and Time indexed? I could think of useful purposes for them to be indexed. http://codereview.appspot.com/5416056/diff/1/misc/dashboard/ndashboard/build/... misc/dashboard/ndashboard/build/build.go:34: RepoURL string // (empty for Go commits) I'm a bit wary of special-casing the Go repo in these places. Is it for saving space? Or just for identifying the Go repo easily? http://codereview.appspot.com/5416056/diff/1/misc/dashboard/ndashboard/build/... misc/dashboard/ndashboard/build/build.go:41: LogHash string `datastore:",noindex"` what's this for? http://codereview.appspot.com/5416056/diff/1/misc/dashboard/ndashboard/build/... misc/dashboard/ndashboard/build/build.go:49: CompressedLog []byte comment?
Sign in to reply to this message.
Hello rsc@golang.org, r@golang.org, dsymonds@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
http://codereview.appspot.com/5416056/diff/1/misc/dashboard/ndashboard/build/... File misc/dashboard/ndashboard/build/build.go (right): http://codereview.appspot.com/5416056/diff/1/misc/dashboard/ndashboard/build/... misc/dashboard/ndashboard/build/build.go:14: Num int // Internal monotonic counter unique to this repo. On 2011/11/21 03:34:47, dsymonds wrote: > This assumes a linear history. Many people use DVCSes in a general branch mode, > which this isn't going to be able to model. It doesn't matter. This is just to give the commits a consistent order on the status page. http://codereview.appspot.com/5416056/diff/1/misc/dashboard/ndashboard/build/... misc/dashboard/ndashboard/build/build.go:16: ParentHash string On 2011/11/21 03:34:47, dsymonds wrote: > This can't model merge commits. Does that matter? Nope. Again, it's just an ordering thing. http://codereview.appspot.com/5416056/diff/1/misc/dashboard/ndashboard/build/... misc/dashboard/ndashboard/build/build.go:19: User string `datastore:",noindex"` On 2011/11/21 03:34:47, dsymonds wrote: > Are you sure you don't want User and Time indexed? I could think of useful > purposes for them to be indexed. You're right. I've removed the noindex flag. http://codereview.appspot.com/5416056/diff/1/misc/dashboard/ndashboard/build/... misc/dashboard/ndashboard/build/build.go:34: RepoURL string // (empty for Go commits) On 2011/11/21 03:34:47, dsymonds wrote: > I'm a bit wary of special-casing the Go repo in these places. Is it for saving > space? Or just for identifying the Go repo easily? The entire system revolves around their being a central Go repository against which all the other repos are built. Consider this a "goinstallable import path". In fact, it should be RepoPath not RepoURL. (changed) http://codereview.appspot.com/5416056/diff/1/misc/dashboard/ndashboard/build/... misc/dashboard/ndashboard/build/build.go:41: LogHash string `datastore:",noindex"` On 2011/11/21 03:34:47, dsymonds wrote: > what's this for? It's the key to the Log record. http://codereview.appspot.com/5416056/diff/1/misc/dashboard/ndashboard/build/... misc/dashboard/ndashboard/build/build.go:49: CompressedLog []byte On 2011/11/21 03:34:47, dsymonds wrote: > comment? Done.
Sign in to reply to this message.
I don't see anything about entity groups here, and I have found that using entity groups is the only way to get decent performance out of the App Engine datastore. What's the entity group story? Is everything sitting in a single entity group? Is everything in its own entity group (the default, but terrible performance)? Russ
Sign in to reply to this message.
Seems like a good start. Maybe each different RepoPath should be its own entity group? http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... File misc/dashboard/ndashboard/build/build.go (right): http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:12: // Commit describes an individual commit in a repository. s/Commit/A Commit/ http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:17: RepoPath string // (empty for Go commits) Move to top, since Num is scoped inside RepoPath. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:30: // Result describes a build result for a Commit on an OS/architecture. s/Result/A Result/ http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:32: OSArch string // "arch-os(|-foo)" Builder string http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:45: return fmt.Sprintf("%v,%v,%v,%v", r.OSArch, r.OK, r.LogHash, r.GoHash) Let's use something less common than comma: how about |? It is possible that the Builder name would include a comma, since the GOEXPERIMENT is a comma-separated list. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:48: // Log is a bzip2-compressed log file keyed by the SHA1 hash of the This is going to be hard since we don't have a bzip2 compressor. Change to gzip-compressed? s/keyed by/stored under/ http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:48: // Log is a bzip2-compressed log file keyed by the SHA1 hash of the s/Log/A Log/ http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:54: // Tags are datastore entities used to keep track of the most recent a/Tags are/A Tag is/ s/entities/entity/ In general, if you can use a singular, do so, since it forces you to be more precise. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:56: // kind of tag. A Tag is stored using its Kind as its datastore Key. s/A/Each/ http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:63: // commitHandler records new commits. It reads a JSON-encoded Commit value s/new commits/a new commit/ http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:70: // tagHandler records new tags. It reads a JSON-encoded Tag value from th s/new tags/a new tag/ http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:77: // to be built. It expects an osArch query parameter. ,s/osArch/builder/g http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:88: // resultHandler records build results. s/build results/a build result/ http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:100: http.HandleFunc("/commit", AuthHandler(commitHandler)) sort block
Sign in to reply to this message.
On Tue, Nov 22, 2011 at 2:20 AM, Russ Cox <rsc@golang.org> wrote: > I don't see anything about entity groups here, and I have found that > using entity groups is the only way to get decent performance out > of the App Engine datastore. What's the entity group story? > Is everything sitting in a single entity group? Is everything in its > own entity group (the default, but terrible performance)? You should read a bit of http://code.google.com/appengine/docs/python/datastore/entities.html#Entity_G... because you've got the effect in reverse. A single entity group has a limited write throughput (I think the guarantee is only about 1 write per second). If every entity is in its own entity group, then each individual entity has a separate 1 w/s rate; the aggregate write rate can be much faster. If you need transactional behaviour, you need to put entities into entity groups (although this has weakened somewhat with XG transactions), but I don't think there's need for cross-entity transactions for the dashboard. Dave.
Sign in to reply to this message.
On Mon, Nov 21, 2011 at 16:50, David Symonds <dsymonds@golang.org> wrote: > I don't think there's need for cross-entity > transactions for the dashboard. At the very least you need transactions to assign the sequence numbers for each RepoPath. It may be true that if you maximally parallelize your entity group writes, you don't need to worry about the groups. But it seems like if you do use the groups, you get tremendously faster write throughput. That's not from docs, just my experience trying to make App Engine not crawl. Russ
Sign in to reply to this message.
I looked into my entity group program. I found that the only way to make batching help with speed was to store related items in entity groups so that they could all be put into a transaction to amortize the commit costs. That is consistent with what you claimed, but the conclusion is different: you should use entity groups as much as possible, because if you don't you can never batch updates for performance. Is there a downside to storing all your data in one entity group (or one per RepoPath, in this case)? Does it limit the amount of data that can be stored? It also seemed to me that queries might be faster over data stored in a single entity group, but I don't have numbers to back that up. Russ
Sign in to reply to this message.
On Mon, Nov 21, 2011 at 5:03 PM, Russ Cox <rsc@golang.org> wrote: > I looked into my entity group program. > > I found that the only way to make batching help with speed > was to store related items in entity groups so that they could > all be put into a transaction to amortize the commit costs. > That is consistent with what you claimed, but the conclusion > is different: you should use entity groups as much as possible, > because if you don't you can never batch updates for performance. > That was Python, though? You weren't using the new "XG" (cross-entity-group) transactions, right? XG is supported with HRD, which is the only datastore mode Go supports. So then you can put everything in its own entity group and batch updates in an XG transaction. That's at least what I do for simplicity, and I think it's fine... there's a paxo group between datacenters for just the entity groups involved in that transaction, I believe.
Sign in to reply to this message.
On Tue, Nov 22, 2011 at 9:03 AM, Russ Cox <rsc@golang.org> wrote: > I looked into my entity group program. > > I found that the only way to make batching help with speed > was to store related items in entity groups so that they could > all be put into a transaction to amortize the commit costs. > That is consistent with what you claimed, but the conclusion > is different: you should use entity groups as much as possible, > because if you don't you can never batch updates for performance. I'd expect you'd get the same performance if you used the bulk operations (e.g. put_multi). Separate entity groups will be committed by separate machines. > Is there a downside to storing all your data in one entity group > (or one per RepoPath, in this case)? Does it limit the amount > of data that can be stored? There's no size limit; you do have a write rate limit, though. > It also seemed to me that queries might be faster over data > stored in a single entity group, but I don't have numbers to > back that up. Ancestor queries can be quicker, yes, but I'm not sure it would be significant here. Dave.
Sign in to reply to this message.
http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... File misc/dashboard/ndashboard/build/build.go (right): http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:12: // Commit describes an individual commit in a repository. On 2011/11/21 15:30:44, rsc wrote: > s/Commit/A Commit/ Done. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:17: RepoPath string // (empty for Go commits) On 2011/11/21 15:30:44, rsc wrote: > Move to top, since Num is scoped inside RepoPath. Done. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:30: // Result describes a build result for a Commit on an OS/architecture. On 2011/11/21 15:30:44, rsc wrote: > s/Result/A Result/ Done. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:32: OSArch string // "arch-os(|-foo)" On 2011/11/21 15:30:44, rsc wrote: > Builder string Done. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:45: return fmt.Sprintf("%v,%v,%v,%v", r.OSArch, r.OK, r.LogHash, r.GoHash) On 2011/11/21 15:30:44, rsc wrote: > Let's use something less common than comma: how about |? > It is possible that the Builder name would include a comma, > since the GOEXPERIMENT is a comma-separated list. Done. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:48: // Log is a bzip2-compressed log file keyed by the SHA1 hash of the On 2011/11/21 15:30:44, rsc wrote: > This is going to be hard since we don't have a bzip2 compressor. > Change to gzip-compressed? > > s/keyed by/stored under/ > Done. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:48: // Log is a bzip2-compressed log file keyed by the SHA1 hash of the On 2011/11/21 15:30:44, rsc wrote: > s/Log/A Log/ Done. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:54: // Tags are datastore entities used to keep track of the most recent On 2011/11/21 15:30:44, rsc wrote: > a/Tags are/A Tag is/ > s/entities/entity/ > > In general, if you can use a singular, do so, since it forces you to be more > precise. > Done. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:56: // kind of tag. A Tag is stored using its Kind as its datastore Key. On 2011/11/21 15:30:44, rsc wrote: > s/A/Each/ Done. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:63: // commitHandler records new commits. It reads a JSON-encoded Commit value On 2011/11/21 15:30:44, rsc wrote: > s/new commits/a new commit/ Done. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:70: // tagHandler records new tags. It reads a JSON-encoded Tag value from th On 2011/11/21 15:30:44, rsc wrote: > s/new tags/a new tag/ Done. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:77: // to be built. It expects an osArch query parameter. On 2011/11/21 15:30:44, rsc wrote: > ,s/osArch/builder/g Done. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:88: // resultHandler records build results. On 2011/11/21 15:30:44, rsc wrote: > s/build results/a build result/ Done. http://codereview.appspot.com/5416056/diff/1007/misc/dashboard/ndashboard/bui... misc/dashboard/ndashboard/build/build.go:100: http.HandleFunc("/commit", AuthHandler(commitHandler)) On 2011/11/21 15:30:44, rsc wrote: > sort block Done.
Sign in to reply to this message.
I added some words about entity groups. In short, all Commits and Results for a particular RepoPath will be in their own entity group. Andrew
Sign in to reply to this message.
Hello rsc@golang.org, r@golang.org, dsymonds@golang.org, bradfitz@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
http://codereview.appspot.com/5416056/diff/16001/misc/dashboard/ndashboard/bu... File misc/dashboard/ndashboard/build/build.go (right): http://codereview.appspot.com/5416056/diff/16001/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:13: type Repo struct { Is this really a repository, or is it a package? For external things, the dashboard works on packages, not repositories.
Sign in to reply to this message.
http://codereview.appspot.com/5416056/diff/16001/misc/dashboard/ndashboard/bu... File misc/dashboard/ndashboard/build/build.go (right): http://codereview.appspot.com/5416056/diff/16001/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:13: type Repo struct { On 2011/11/21 23:16:53, rsc wrote: > Is this really a repository, or is it a package? > For external things, the dashboard works on packages, > not repositories. You're right, it's a package. And with naming it makes sense for the Path field to be empty for the Go tree.
Sign in to reply to this message.
Hello rsc@golang.org, r@golang.org, dsymonds@golang.org, bradfitz@golang.org (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
http://codereview.appspot.com/5416056/diff/18004/misc/dashboard/ndashboard/bu... File misc/dashboard/ndashboard/build/build.go (right): http://codereview.appspot.com/5416056/diff/18004/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:12: // A Package describes a repository that is listed on the dashboard. s/repository/package/, right? http://codereview.appspot.com/5416056/diff/18004/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:13: type Package struct { If every Commit for a given package path needs to be parented by a consistent Package, you should declare that each package entity has a key name equal to its path or something like that. That will enforce uniqueness, and make it much faster to find particular packages. http://codereview.appspot.com/5416056/diff/18004/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:18: // A Commit describes an individual commit in a repository. s/repository/package/ here too. http://codereview.appspot.com/5416056/diff/18004/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:23: type Commit struct { again here, using a key name scheme like "<PackagePath>:<Num>" would help enforce uniqueness if that's important. http://codereview.appspot.com/5416056/diff/18004/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:56: func (r *Result) Data() string { why not String()? http://codereview.appspot.com/5416056/diff/18004/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:69: type Tag struct { A key name here too might also be useful.
Sign in to reply to this message.
http://codereview.appspot.com/5416056/diff/18004/misc/dashboard/ndashboard/bu... File misc/dashboard/ndashboard/build/build.go (right): http://codereview.appspot.com/5416056/diff/18004/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:12: // A Package describes a repository that is listed on the dashboard. On 2011/11/21 23:59:18, dsymonds wrote: > s/repository/package/, right? Done. http://codereview.appspot.com/5416056/diff/18004/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:13: type Package struct { On 2011/11/21 23:59:18, dsymonds wrote: > If every Commit for a given package path needs to be parented by a consistent > Package, you should declare that each package entity has a key name equal to its > path or something like that. That will enforce uniqueness, and make it much > faster to find particular packages. Done. http://codereview.appspot.com/5416056/diff/18004/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:18: // A Commit describes an individual commit in a repository. On 2011/11/21 23:59:18, dsymonds wrote: > s/repository/package/ here too. Done. http://codereview.appspot.com/5416056/diff/18004/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:23: type Commit struct { On 2011/11/21 23:59:18, dsymonds wrote: > again here, using a key name scheme like "<PackagePath>:<Num>" would help > enforce uniqueness if that's important. Good point. I think "Keyname:Hash" is more appropriate. http://codereview.appspot.com/5416056/diff/18004/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:56: func (r *Result) Data() string { On 2011/11/21 23:59:18, dsymonds wrote: > why not String()? Because it doesn't describe the Result. (It doesn't include Hash or PackagePath, for example) http://codereview.appspot.com/5416056/diff/18004/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:69: type Tag struct { On 2011/11/21 23:59:18, dsymonds wrote: > A key name here too might also be useful. I created Key methods for the respective types.
Sign in to reply to this message.
LGTM http://codereview.appspot.com/5416056/diff/15002/misc/dashboard/ndashboard/bu... File misc/dashboard/ndashboard/build/build.go (right): http://codereview.appspot.com/5416056/diff/15002/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:50: key := fmt.Sprintf("%s:%s", com.PackagePath, com.Hash) key := com.PackagePath + ":" + com.Hash
Sign in to reply to this message.
On 22 November 2011 15:19, <dsymonds@golang.org> wrote: > http://codereview.appspot.com/5416056/diff/15002/misc/dashboard/ndashboard/bu... > File misc/dashboard/ndashboard/build/build.go (right): > > http://codereview.appspot.com/5416056/diff/15002/misc/dashboard/ndashboard/bu... > misc/dashboard/ndashboard/build/build.go:50: key := fmt.Sprintf("%s:%s", > com.PackagePath, com.Hash) > key := com.PackagePath + ":" + com.Hash Done.
Sign in to reply to this message.
LGTM Maybe the directory should just be 'app'. It is, after all, the App Engine app for misc/dashboard, and then you won't need to rename it later. http://codereview.appspot.com/5416056/diff/16003/misc/dashboard/ndashboard/bu... File misc/dashboard/ndashboard/build/build.go (right): http://codereview.appspot.com/5416056/diff/16003/misc/dashboard/ndashboard/bu... misc/dashboard/ndashboard/build/build.go:58: Builder string // "arch-os(|-foo)" os-arch[-note]
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=6cff18b6d608 *** dashboard: new Go dashboard data structure design R=rsc, r, dsymonds, bradfitz CC=golang-dev http://codereview.appspot.com/5416056
Sign in to reply to this message.
|