Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/maintner: nil (*GerritCL).Commit when it shouldn’t be #21984

Closed
andybons opened this issue Sep 22, 2017 · 9 comments
Closed

x/build/maintner: nil (*GerritCL).Commit when it shouldn’t be #21984

andybons opened this issue Sep 22, 2017 · 9 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Milestone

Comments

@andybons
Copy link
Member

andybons commented Sep 22, 2017

An example GerritCL value with an empty Commit field:

{
  Project:0xc42ece68c0
  Number:65473
  Created:2017-09-22 19:08:20 +0000 +0000
  Version:0 
  Commit:<nil *GitCommit>
  Meta:{GitCommit c9daf6d2bf37214c1164b0e353b8660a39301a65} 
  MetaCommits:[{GitCommit 9dd1c4dd3e43049d51bf829da0137d3d6b2ca684} {GitCommit c9daf6d2bf37214c1164b0e353b8660a39301a65}] 
  Status:new 
  GitHubIssueRefs:[] 
  Messages:[0xc448ce8f80 0xc448ce8f40]
}

This caused panics in at least one implementation since Commit was not checked.

/cc @kevinburke @bradfitz

@gopherbot gopherbot added this to the Unreleased milestone Sep 22, 2017
@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Sep 22, 2017
@gopherbot
Copy link

Change https://golang.org/cl/65475 mentions this issue: x/build/devapp: check for nil Commit field causing panic

gopherbot pushed a commit to golang/build that referenced this issue Sep 22, 2017
Underlying cause is tracked in golang/go#21984

Change-Id: Ie413f5ae0a3e3b859a3a4ae8463148d0065a46ec
Reviewed-on: https://go-review.googlesource.com/65475
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
@dmitshur
Copy link
Contributor

This caused the goissues.org server to crash once too. I've since added a nil check with a log (in commit be9a001e), but haven't seen another occurrence yet.

It happened at 5:28:56 am EDT, after the server has been running for many hours. It constantly iterates over every CL of most Gerrit projects. Given I haven't seen another nil Commit for days since then, I suspect this happens pretty rarely and is timing-sensitive (i.e., checking the same CL after time passes, it might no longer have a nil Commit). For example, CL 65473 (from issue report) has a non-nil Commit right now.

@kevinburke
Copy link
Contributor

Is it a race issue? Could we run maintner with the race detector enabled?

@dmitshur
Copy link
Contributor

I don't have enough information to say conclusively. It may be either a data race or a non-race (but still timing-related) logic issue.

I tried running my maintner client with race detector a bit, and it didn't find any races.

@bradfitz
Copy link
Contributor

I just added a test in godata, but I can't reproduce this.

My test was:

func TestGerritMetaNonNil(t *testing.T) {
        c := getGoData(t)
        c.Gerrit().ForeachProjectUnsorted(func(gp *maintner.GerritProject) error {
                var maxCL int32
                gp.ForeachCLUnsorted(func(cl *maintner.GerritCL) error {
                        if cl.Meta == nil {
                                t.Errorf("%s: enumerated CL %d has nil Meta", gp.ServerSlashProject(), cl.Number)
                        }
                        if len(cl.Metas) == 0 {
                                t.Errorf("%s: enumerated CL %d has empty Metas", gp.ServerSlashProject(), cl.Number)
                        }
                        if cl.Commit == nil {
                                t.Errorf("%s: enumerated CL %d has nil Commit", gp.ServerSlashProject(), cl.Number)
                        }
                        if cl.Number > maxCL {
                                maxCL = cl.Number
                        }
                        return nil
                })

                // And test that CL won't yield an incomplete one either:                                                                                                                  
                for n := int32(0); n <= maxCL; n++ {
                        cl := gp.CL(n)
                        if cl == nil {
                                continue
                        }
                        if cl.Meta == nil {
                                t.Errorf("%s: CL(%d) has nil Meta", gp.ServerSlashProject(), cl.Number)
                        }
                        if len(cl.Metas) == 0 {
                                t.Errorf("%s: CL(%d) has empty Metas", gp.ServerSlashProject(), cl.Number)
                        }
                        if cl.Commit == nil {
                                t.Errorf("%s: CL(%d) has nil Commit", gp.ServerSlashProject(), cl.Number)
                        }
                }
                return nil
        })
}

It does find the two deleted CLs from #22060, but no CLs with nil Commits.

So, I'm not sure what to do for this issue.

@gopherbot
Copy link

Change https://golang.org/cl/107296 mentions this issue: maintner: change type/name of GerritCL meta fields, ensure Meta always non-nil

@dmitshur
Copy link
Contributor

dmitshur commented Apr 15, 2018

When I played with NewNetworkMutationSource in #24866, I got a better idea of how maintner works. From that, I suspect it's a rare occurrence that has to do with the order and amount in which mutations are processed. Some of the processing happens whenever MutationStreamEvent.End is true. Perhaps what happens is that in some very rare cases, the server ends the set of mutations too early (or sends them in an unexpected order) in such a way, that the post-processing that happens temporarily produces a nil Commit.

This seems to be very rare and hard to reproduce, but the issue is real and I don't have any evidence that it can't happen in the future again.

An idea for how to proceed is the following. Before making GerritCL available for use by users of maintner API, we can add some code that does a sanity check on the GerritCL. If any of its fields have unexpected/invalid values, then it can hide the CL (so the users code would never see it and panic on it), and log to stderr about an "internal error" or inconstancy with information that would be useful for debugging.

Having maintner emit some debugging information in stderr is annoying to users, but it's even more annoying to have to keep writing if cl.Commit == nil { ... } guards because there's no confidence that Commit will always be non-nil in the future.

What do you think of this as a way to move forward (to improve the API quality of the package and eventually its implementation too)?

@bradfitz
Copy link
Contributor

@shurcooL, that's what I did 4 minutes before your reply. :) See the CL above.

@dmitshur
Copy link
Contributor

dmitshur commented Apr 15, 2018

I was just typing that I noticed you did pretty much exactly that (sans debug logging) in Patch Set 3 of that CL. :)

From a quick glance, it looks fantastic. 👍 Sorry, can't review it thoroughly right now because I gotta step away from the computer, but I can look later. Edit: Left a review on CL.

@golang golang locked and limited conversation to collaborators Apr 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Projects
None yet
Development

No branches or pull requests

5 participants