Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/cmd/golangbuild: propagate "infra failure" status from lower-level steps upwards #65557

Open
dmitshur opened this issue Feb 6, 2024 · 2 comments
Labels
Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@dmitshur
Copy link
Contributor

dmitshur commented Feb 6, 2024

https://ci.chromium.org/b/8756827332643819201 is an example of a build that failed due to an infrastructure problem (an internal error, not a valid finding in the code being tested).

It correctly reported the failure as "infra" in the "1. fetch mod" inner step (displayed in purple, rather than red), but that infra status wasn't propagated to the parent step "3. run tests" and the top-level build status.

Though internal errors are rare and we work towards minimizing their frequency, it would still be an improvement to propagate the status upwards whenever it happens, so it's clear the build failed due to an internal error without having to look at its individual steps. This is the tracking issue for that.

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 6, 2024
@dmitshur dmitshur added this to the Unreleased milestone Feb 6, 2024
@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Feb 6, 2024
@dmitshur
Copy link
Contributor Author

dmitshur commented Feb 6, 2024

I looked a bit so far, posting what I saw.

The current logic in endInfraStep adds the bbpb.Status_INFRA_FAILURE annotation to the error to the call to step.End, but it doesn't modify the *errp itself, which is what fetchRepo returns to its caller. Perhaps if it were to do that, this would be enough to resolve this.

Or maybe there's more to do, or a different approach is better. It's worth reading the go.chromium.org/luci/luciexe/build docs to see if this situation is covered there.

@dmitshur
Copy link
Contributor Author

Noting that this may help avoiding some builders from being quarantined for reasons like "Had 20 consecutive COMPLETED_FAILURE tasks". Though I'm not sure the exact criteria for that, so many consecutive infra failures may also have the same effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

2 participants