x/build/cmd/coordinator: trybot (slowbot) was in a "running" state for 2+ hours #35700
Labels
Builders
x/build issues (builders, bots, dashboards)
FrozenDueToAge
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone
While in the process of deploying a new version of coordinator with @cagedmantis, one of the active trybot runs was a slowbot run for patch set 2 of CL 207858. What stood out was that the
aix-ppc64
builder was running for excessively long and timing out:Its temporary log was:
Relevant details for the
host-aix-ppc64-osuosl
builder at the time were as follows.Scheduler state
Buildlet pools
Reverse pool by host type (in use / total)
Reverse pool machine detail
power8-aix-host1 (140.211.9.26:37646) version 23, host-aix-ppc64-osuosl: connected 3h44m22.7s, working for 3h44m22.7s
Active builds
Filing this issue so we can discuss and investigate if needed after coordinator is deployed. /cc @bradfitz @cagedmantis @toothrot
At first glance, I thought it was a problem that a trybot didn't timeout/retry/fail after 2 hours, but given it was in "waiting_for_machine" state, maybe waiting indefinitely is the right thing?
The text was updated successfully, but these errors were encountered: