Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: alert on "bad" logs #32311

Open
bradfitz opened this issue May 29, 2019 · 3 comments
Open

x/build: alert on "bad" logs #32311

bradfitz opened this issue May 29, 2019 · 3 comments
Labels
Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@bradfitz
Copy link
Contributor

We should get alerts if we see new/many "bad" log messages from our various services.

For some definition of new, many, and bad.

Maybe bad could mean it has "error" in it. Or a dozen other phrases.

(forking from https://go-review.googlesource.com/c/build/+/179419/1/cmd/coordinator/gce.go#b193 )

/cc @bcmills @dmitshur

@gopherbot gopherbot added this to the Unreleased milestone May 29, 2019
@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label May 29, 2019
@bcmills
Copy link
Contributor

bcmills commented May 30, 2019

As a temporary workaround in the meantime, we could write the non-critical services (build.golang.org and dev.golang.org, not golang.org in general) in “crash-only” style, and just make sure that we'll notice if any given service is down.

@bcmills bcmills added the NeedsFix The path to resolution is known, but the work has not been done. label May 30, 2019
@bradfitz
Copy link
Contributor Author

For non-critical things that'll likely recover on their own, I can just add items (perhaps at WARN level where appropriate) at https://farmer.golang.org/#health .... each of those can easily be hooked up to monitoring too.

I'd prefer not to crash if a non-critical service we depend on is having temporary issues. We have a lot of them.

@dmitshur
Copy link
Contributor

build.golang.org and dev.golang.org are not non-critical services. If they're down, trybots and builders don't run, gopherbot won't assign reviewers to CLs, etc. People rely on those things working, and so I don't think it's good idea to try to solve this issue at the cost of reducing Go contributor productivity. We should find a non-disruptive way to find "bad" entries in logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants