Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/cmd/gerritbot: flag common mistakes in GitHub PRs like no word wrap, wrong bug format, using markdown, ... #61573

Closed
thepudds opened this issue Jul 25, 2023 · 13 comments
Labels
Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@thepudds
Copy link
Contributor

#61182 starts with an enumeration of friction for the GitHub PR workflow.

We could reduce some friction by automatically flagging common PR problems to save maintainer time and give faster feedback to a potential contributor.

If a maintainer sees that someone who sent a GitHub PR has responded properly in Gerrit to automated feedback, that is a signal that more detailed feedback from a human reviewer is more likely to be acted upon, which might help a maintainer triage which CLs get their attention when. (One of the worst issues listed in #61182 is that for some percent of GitHub PRs, the author never responds to feedback in Gerrit).

I am about to send a WIP CL for discussion. It started as a way to flag improper word wrapping (#24832), but once we can flag one problem, it was a small-ish step to add a loop, define a simple rule func signature, and then define a set of small rules.

Current rule names

title: no package found
title: no lowercase word after a first colon
title: no colon then single space after package
title: ends with period
body: short
body: long lines
body: might use markdown
body: no sentence candidates found
body: no bug reference candidate found
body: bug format looks incorrect
body: bug reference candidate not at end
body: still contains PR instructions
body: contains Signed-off-by

Most of those rule definitions are small -- the median rule body is 5 lines of code (an if statement + 2 lines).

We could add more rules in the future, such as flagging raw benchmark output (suggesting benchstat), asking about the seeming absence of a test, flagging an incorrectly formatted CL reference, possibly checking gofmt, and so on.

Sample results

Running those rules against a corpus of 1000 recent CLs that started as GitHub PRs gives the following stats:

   Findings per CL: 1.45 (avg.)
 CLs with findings: 76.5%

CL hit %   Finding name
--------   ------------------------------------------------
    7.6%   title: no package found
   13.6%   title: no lowercase word after a first colon
    0.0%   title: no colon then single space after package
    2.0%   title: ends with period
   21.8%   body: short
   21.7%   body: long lines
   10.5%   body: might use markdown
    6.8%   body: no sentence candidates found
   46.0%   body: no bug reference candidate found
   10.3%   body: bug format looks incorrect
    1.6%   body: bug reference candidate not at end
    2.2%   body: still contains PR instructions
    0.9%   body: contains Signed-off-by

That corpus is the 1000 most recent PRs that were ultimately merged. The results here are for the commit message in the first patch set (prior to any human feedback).

Philosophy

The rules mostly err on the side of reducing false positives, though some are unavoidable. We attempt to mitigate false positives by stating in the message we post to the CL that the findings are "potential problems" based on "simple heuristics", and we ask the contributor to reply on the CL if a finding appears wrong. Part of the intent is to engage a first time contributor and get them using Gerrit, but without annoying them too much.

We also attempt to point towards auxiliary info and advice specific to the rule violation. If a rule triggers for a potential problem related to a commit message, we also include links to the contributing guide Good commit messages section and instructions on how to edit the commit message. (A current problem is new contributors don't always know how to edit the commit message after a human asks them to change it, which leads to extra review iterations).

There is also some simple logic to be more lenient for PRs that appear to be trivial changes (e.g., don't complain about a missing bug reference if the title contains things like "typo" or "grammar"). Where we have less certainty (e.g., identifying markdown use), we attempt to express that uncertainty in the resulting message shown to the user. The rule definition also allows us to skip certain rules for some repos (e.g., don't complain in the proposal repo about a missing package).

CC @heschi, @cagedmantis

@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Jul 25, 2023
@gopherbot gopherbot added this to the Unreleased milestone Jul 25, 2023
@thepudds
Copy link
Contributor Author

There is some potential overlap with another CL I sent (CL 509135 for #61316), but to start, I suggest we discuss the two issues separately. (Later, we can resolve whether/how to combine the message from CL 509135 with this message, vs. if it is better to have separate messages, but might be good to first establish if there is any interest in proceeding with this issue).

Also, there is some overlap with what could eventually be done with Tricium, but:

  • This seemed simple enough to try as an interim step.
  • This is in Go rather than Python.
  • Heschi recently mentioned here that it might be some time before even simple rules are added to Tricium, which seemingly does not have much out of the box.
  • There might be some benefit to building experience with automatically nit picking GitHub PRs (a workflow correlated with newer contributors in greater need of help) before we attempt to nit pick more seasoned contributors for all Gerrit CLs.

@gopherbot
Copy link

Change https://go.dev/cl/513397 mentions this issue: cmd/gerritbot: automatically flag common issues in PRs

@thepudds
Copy link
Contributor Author

There is now an initial implementation in https://go.dev/cl/513397 (with a few small tweaks such that the CL hit % numbers are slightly different in the CL vs. the opening comment here).

@thepudds thepudds changed the title x/build/cmd/gerritbot: flag common issue in GitHub PRs like no word wrap, wrong bug format, using markdown, ... x/build/cmd/gerritbot: flag common mistakes in GitHub PRs like no word wrap, wrong bug format, using markdown, ... Jul 26, 2023
@dr2chase dr2chase added the NeedsFix The path to resolution is known, but the work has not been done. label Jul 31, 2023
@gopherbot
Copy link

Change https://go.dev/cl/518679 mentions this issue: cmd/gerritbot: add corpus for CL rules

@gopherbot
Copy link

Change https://go.dev/cl/519796 mentions this issue: cmd/gerritbot: generate change triple for posting comment

gopherbot pushed a commit to golang/build that referenced this issue Aug 16, 2023
Use the more modern "<project>~<changeNumber>" format for a change
and properly specify the first revision.

While we are here, we add a few comments to the gerrit package
to help avoid confusion in the future.

This is a follow-up to CL 513397, which was failing to post the
new PR rule findings for a test PR with error:

2023/08/15 22:06:38 processPullRequest: importGerritChangeFromPR(#71, nil): could not add findings comment to CL for #71: HTTP status 404 Not Found on request to https://go-review.googlesource.com/a/changes/If1a8ae9e4b76a05b13139ddf9fda1cdf67b50b33/revisions/build~master~If1a8ae9e4b76a05b13139ddf9fda1cdf67b50b33/review; Not found: build~master~If1a8ae9e4b76a05b13139ddf9fda1cdf67b50b33

Updates golang/go#61573

Change-Id: Ib0fef88ae05b10e623c2c1af614aa932dbfcc043
Reviewed-on: https://go-review.googlesource.com/c/build/+/519796
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: t hepudds <thepudds1460@gmail.com>
@gopherbot
Copy link

Change https://go.dev/cl/520116 mentions this issue: cmd/gerritbot: greet with the GitHub username for the PR rule findings comment

@thepudds
Copy link
Contributor Author

It looks like https://go.dev/cl/520268 might have been the first real GitHub PR where the new rules were triggred.

The original commit message (on patch set 1) was:

path/filepath: add Localize

Fixes #57151.

The findings posted to the CL were:

Possible problems detected:
 
  1. The commit message body is very brief. That can be OK if 
  the change is trivial like correcting spelling or fixing a broken link,
  but usually the description should provide context for the change
  and explain what it does in complete sentences.
  
  2. Do you have the right bug reference format? For this repo, the 
  format is usually 'Fixes #12345' or 'Updates #12345' (without a period)
  at the end of the commit message.

From the timestamps, it looks the the contributor fixed those two problems within ~3 minutes, and then within ~10 minutes GerritBot imported the update back into Gerrit.

The update commit message was:

path/filepath: add Localize

Implemented filepath.Localize using internal/safefilepath.FromFS.

Fixes #57151

...which addresses the two findings that were reported.

@dmitshur
Copy link
Contributor

Thanks for that analysis @thepudds.

I want to comment on the "without a period" contributing to being a possible problem. It seems better to me to consider the period or no period as fine. That is, both Fixes #57151. and Fixes #57151 should be interchangeable.

https://go.dev/wiki/CommitMessage includes:

Don't be too pedantic in code reviews: it's not worth asking people to change from Updates or something else to For, or vice versa.

I usually include periods after "For #nnn." because I consider them to be short sentences. A recent CL by Russ https://go.dev/cl/506415 is another example that included a period in the Fixes line.

gopherbot pushed a commit to golang/build that referenced this issue Aug 17, 2023
…s comment

Use the GitHub username (e.g., "thepudds") to greet the
CL author when we report possible problems, rather
than using their email.

Updates golang/go#61573

Change-Id: I3699dc4e72c6244cd5d13493e5e4bd7cb098430a
Reviewed-on: https://go-review.googlesource.com/c/build/+/520116
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: t hepudds <thepudds1460@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
@thepudds
Copy link
Contributor Author

Hi @dmitshur, that makes sense. (That rule already allows different aliases like Fixes vs. Closes vs. Resolves, but it was not top of mind for me that the trailing period is common for some people on the core Go team such as yourself, and the examples in the Contributing Guide and wiki don't seem to use the trailing period, which is why the rule as originally defined would complain about that).

I'll circle back to this shortly to stop it from complaining about a trailing period. (I might stack it behind the corpus CL, in part to serve as an example of how the summary stats and whatnot change for the corpus after a small change).

@thepudds
Copy link
Contributor Author

Also, before I did anything here, I skimmed a dozen or so random-ish CLs to see how reviews would stall, what seemed to be taking up human time, what types of review feedback could potentially be automated, and so on, and during that, I thought I saw ~1-2 times where someone on the core Go team asked someone to remove a trailing period from a Fixes or Updates line, but I might have been mistaken. (E.g., it might be that a reviewer just supplied an example without a period for a commit message that happened to use a period but also had some other bug format problems).

In any event, to help make sure everyone is on the same page, I just tweaked the CommitMessage wiki page to say that a period is acceptable. Given this is policy-ish, please let me know if that's wrong, or correct my edit if needed.

@gopherbot
Copy link

Change https://go.dev/cl/525976 mentions this issue: cmd/gerritbot: remind again to log in to Gerrit

@gopherbot
Copy link

Change https://go.dev/cl/525975 mentions this issue: cmd/gerritbot/internal/rules: stop complaining about trailing period after bug number

gopherbot pushed a commit to golang/build that referenced this issue Sep 15, 2023
…after bug number

Some contributors (including on the core Go team) prefer to end
the bug number reference with a trailing period, like 'For #nnn.',
so adjust the GitHub PR rules to allow that without complaint.

The Contribution Guide does not use that style in its examples,
but https://go.dev/wiki/CommitMessage was updated to mention
it is allowed.

Updates golang/go#61573

Change-Id: Idac4cef24bff22241dfc786684fe536def12f24f
Reviewed-on: https://go-review.googlesource.com/c/build/+/525975
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
gopherbot pushed a commit to golang/build that referenced this issue Sep 25, 2023
CL 521157 seems to be a recent example where a GitHub PR author did not
log in to Gerrit after reading our new-ish automated GerritBot message,
even though there are some hints that they might have wanted to reply
but did not know to do so. (For example: they posted a question on
the associated GitHub PR; after I manually gave them a link to the
Gerrit login page, they then posted their first reply in Gerrit with a
detailed question ~4 minutes after creating their Gerrit account; they
previously had contributed a change that went through the entire process
without them ever replying in Gerrit).

Also, watching that interaction reminds us that automated or manual
exhortations to click 'Done' are potentially confusing for someone who
is not logged in (and hence does not see any 'Done' button at that
moment).

To better help new contributors such as this one, we make the automated
rule violation message even more explicit about logging in, including
supplying a link to the Gerrit login page again.

We also switch to say "log in" rather than "register" to better evoke
that it is not some heavyweight multi-step registration. (The
Contribution Guide currently says "register" a few times as well as
"create an account", but perhaps that could be adjusted in the future as
well to sound a bit lighter weight).

We also make the commit message editing advice more explicit, partly
based on watching a different contributor in CL 520936 update the commit
message via git in response to our prior automated advice. (In that
CL, we then manually posted a dry run of our new advice, which did get
them to edit the commit message successfully).

While we are here, we attempt to make "I spotted" more likely to be
visible when a contributor glances at a CL on their phone (by dropping
our friendly hello), and correct the bug format example used when seeing
an unknown repo. We also switch over to use go.dev/wiki instead of
github.com/golang/go/wiki.

Updates golang/go#61573

Change-Id: I5da57f093d575e260c2afdfcd8eff3df92c0a9d3
Reviewed-on: https://go-review.googlesource.com/c/build/+/525976
Auto-Submit: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Commit-Queue: t hepudds <thepudds1460@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: t hepudds <thepudds1460@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@gopherbot
Copy link

Change https://go.dev/cl/530736 mentions this issue: cmd/gerritbot: avoid changing the content of comment sync messages

gopherbot pushed a commit to golang/build that referenced this issue Sep 27, 2023
CL 525976 included an update of several messages to use
go.dev/wiki shortlinks instead of the equivalent
github.com/golang/go/wiki links, including to be more future-proof given
the wiki content is likely to move in the near future.

This included updating the URL used in the sync messages posted on the
GitHub PR to notify a contributor of a new Gerrit comment.

This in turn caused GerritBot to start re-posting what were effectively
duplicate sync comments to various open GitHub PRs. This is because
GerritBot's sync of Gerrit comments to the GitHub PR is nicely simple,
and currently relies on historical sync messages on the PR exactly
matching what the current incarnation of GerritBot would have posted for
old Gerrit comments. If the sync message content changes, GerritBot
thinks it has never posted about even old Gerrit comments because it
cannot find an exact match on the PR, and hence posts the new sync
message content about old Gerrit comments.

This CL restores the content of the sync messages, and adds a comment to
help future contributors.

The other messages modified in CL 525976 were not affected in
the same way because they are posted in reaction to an event
of creating a new GitHub PR or an update to the PR. Those other messages
have also recently been modified by other CLs without triggering
reposts.

CL 525976 was noticed as misbehaving within ~10 minutes of being
deployed. Also within ~10 minutes of being deployed, GerritBot
encountered a 403 error and stopped posting duplicates:

  403 You have exceeded a secondary rate limit and have been
  temporarily blocked from content creation

The deployment was rolled back ~10 minutes after that.

Updates golang/go#61573

Change-Id: Ibc5d299494baaa26118916799eaed023e5ccb26c
Reviewed-on: https://go-review.googlesource.com/c/build/+/530736
Reviewed-by: Heschi Kreinick <heschi@google.com>
Auto-Submit: Heschi Kreinick <heschi@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done.
Projects
Archived in project
Development

No branches or pull requests

4 participants