Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/cmd/gomote: instances are timing out after 2 hours #54696

Closed
cagedmantis opened this issue Aug 26, 2022 · 2 comments
Closed

x/build/cmd/gomote: instances are timing out after 2 hours #54696

cagedmantis opened this issue Aug 26, 2022 · 2 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@cagedmantis
Copy link
Contributor

There have been reports that instances have been timing sometime after two hours of use.

  • linux-amd64-alpine instances timeout after 2 hours.
  • Other instances have become unresponsive some time after 2 hours of use (Windows-amd64).

Durring one occurance, we collected the following logs:

2022-08-23 13:47:59.025 EDT2022/08/23 17:47:59 created buildlet userx-windows-amd64-2016-0 for userx (http://10.128.0.20 GCE VM: buildlet-windows-amd64-2016-rna896368)
2022-08-23 15:49:13.486 EDT2022/08/23 19:49:13 deleting VM "buildlet-windows-amd64-2016-rna896368" in zone "us-central1-c"; delete-at expiration ...
2022-08-23 15:49:13.906 EDT2022/08/23 19:49:13 Sent request to delete instance "buildlet-windows-amd64-2016-rna896368" in zone "us-central1-c". Operation ID, Name: 219625075713282518, operation-1661284153488-5e6eddbd6f518-f4db622d-0bbce859
2022-08-23 15:50:21.040 EDT2022/08/23 19:50:21 Buildlet http://10.128.0.20 GCE VM: buildlet-windows-amd64-2016-rna896368 failed three heartbeats; final error: timeout waiting for headers
2022-08-23 16:51:00.891 EDT2022/08/23 20:51:00 10.128.0.20:80: peer dead with Buildlet http://10.128.0.20 GCE VM: buildlet-windows-amd64-2016-rna896368 failed heartbeat after 20.000336477s; marking dead; err=timeout waiting for headers, waiting for headers for /exec
2022-08-23 16:53:51.295 EDT2022/08/23 20:53:51 10.128.0.20:80: peer dead with Buildlet http://10.128.0.20 GCE VM: buildlet-windows-amd64-2016-rna896368 failed heartbeat after 20.000336477s; marking dead; err=timeout waiting for headers,waiting for headers for /exec

@golang/release @prattmic @thanm

@cagedmantis cagedmantis added Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done. labels Aug 26, 2022
@cagedmantis cagedmantis added this to the Backlog milestone Aug 26, 2022
@gopherbot
Copy link

Change https://go.dev/cl/426015 mentions this issue: cmd/coordinator: check the session pool for buildlets

gopherbot pushed a commit to golang/build that referenced this issue Aug 29, 2022
Remote buildlets should not be destroyed by the pool implementations for
instances which have been created on clouds. This change adds a check
to ensure that the pool implementations can check for remote buildlets
for instances which are managed by the session pools. Those sessions will
be destroyed by the session pool.

The buildlet client accessors for instance names have been changed so
that they are no longer GCE specific.

The EC2 session pool implementation has been changed so that the
instance name is set in the buildlet client when an EC2 instance is
created.

Updates golang/go#54735
Updates golang/go#54696

Change-Id: Iae3c86cc47b4b79e3287d581dc042dcf2714f7c9
Reviewed-on: https://go-review.googlesource.com/c/build/+/426015
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
@heschi heschi added this to In Progress in Go Release Team Aug 30, 2022
@dmitshur dmitshur modified the milestones: Backlog, Unreleased Sep 26, 2022
@dmitshur
Copy link
Contributor

We believe this is fixed.

Go Release Team automation moved this from In Progress to Done Sep 27, 2022
@golang golang locked and limited conversation to collaborators Sep 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
Archived in project
Development

No branches or pull requests

3 participants