Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/cmd/coordinator: hang if writing source tarball fails midway #15582

Closed
bradfitz opened this issue May 6, 2016 · 2 comments
Closed

x/build/cmd/coordinator: hang if writing source tarball fails midway #15582

bradfitz opened this issue May 6, 2016 · 2 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented May 6, 2016

I noticed an openbsd build fail and hang:

  builder: openbsd-amd64-gce58
      rev: 26124695adcf9440d38dbd2f117221019243d4fa
 buildlet: http://10.240.0.25 GCE VM: buildlet-openbsd-amd64-gce58-rn1ced503
  started: 2016-05-06 18:53:34.631002774 +0000 UTC
   status: still running

Events:
  2016-05-06T18:53:34Z get_buildlet 
  2016-05-06T18:53:34Z awaiting_gce_quota 
  2016-05-06T18:53:34Z finish_awaiting_gce_quota after 3.07µs
  2016-05-06T18:53:34Z create_gce_buildlet buildlet-openbsd-amd64-gce58-rn1ced503
  2016-05-06T18:53:34Z create_gce_instance buildlet-openbsd-amd64-gce58-rn1ced503
  2016-05-06T18:54:02Z finish_create_gce_instance after 27.855829702s; buildlet-openbsd-amd64-gce58-rn1ced503
  2016-05-06T18:54:02Z wait_buildlet_start buildlet-openbsd-amd64-gce58-rn1ced503
  2016-05-06T18:54:04Z got_instance_info waiting_for_buildlet...
  2016-05-06T18:54:04Z finish_wait_buildlet_start after 1.617098383s; buildlet-openbsd-amd64-gce58-rn1ced503
  2016-05-06T18:54:04Z finish_create_gce_buildlet after 29.473051761s; buildlet-openbsd-amd64-gce58-rn1ced503
  2016-05-06T18:54:04Z finish_get_buildlet after 29.473098819s
  2016-05-06T18:54:04Z using_buildlet 10.240.0.25:80
  2016-05-06T18:54:04Z write_go_bootstrap_tar 
  2016-05-06T18:54:04Z write_version_tar 
  2016-05-06T18:54:04Z get_source 
  2016-05-06T18:54:04Z finish_get_source after 31.995µs
  2016-05-06T18:54:04Z write_go_src_tar 
  2016-05-06T19:03:50Z finish_write_go_src_tar after 9m46.615158891s; err=writing tarball from Gerrit: Put http://10.240.0.25/writetgz?dir=go: write tcp 10.240.0.3:33048->10.240.0.25:80: write: connection reset by peer
 +1047.1s (now)

Build log:


(buildlet still starting; no live streaming. reload manually to see status)

One goroutine is stuck at:

goroutine 3391 [select, 28 minutes]:
net/http.(*persistConn).roundTrip(0xc82039c2d0, 0xc820845b80, 0x0, 0x0, 0x0)
    /home/bradfitz/go/src/net/http/transport.go:1701 +0x95f
net/http.(*Transport).RoundTrip(0xc820fd1b30, 0xc8205905a0, 0xc820fd1b30, 0x0, 0xc800000000)
    /home/bradfitz/go/src/net/http/transport.go:385 +0x42a
net/http.send(0xc8205905a0, 0xd7b880, 0xc820fd1b30, 0x0, 0x0, 0x0, 0x8, 0x2a, 0xc820021238)
    /home/bradfitz/go/src/net/http/client.go:253 +0x14f
net/http.(*Client).send(0xc8206c8ea0, 0xc8205905a0, 0x0, 0x0, 0x0, 0xc820021238, 0x0, 0x1)
    /home/bradfitz/go/src/net/http/client.go:143 +0xef
net/http.(*Client).doFollowingRedirects(0xc8206c8ea0, 0xc8205905a0, 0xb0dae0, 0x4, 0x5f9901, 0xc8202c85dc)
    /home/bradfitz/go/src/net/http/client.go:498 +0x4d4
net/http.(*Client).Do(0xc8206c8ea0, 0xc8205905a0, 0xc8218518c0, 0xc8218518c0, 0x45defe)
    /home/bradfitz/go/src/net/http/client.go:184 +0x10b
golang.org/x/build/buildlet.(*Client).do(0xc8202c8540, 0xc8205905a0, 0xa36c80, 0xc8206e10b0, 0xc821636d80)
    /home/bradfitz/src/golang.org/x/build/buildlet/buildletclient.go:210 +0xf5
golang.org/x/build/buildlet.(*Client).doOK(0xc8202c8540, 0xc8205905a0, 0x0, 0x0)
    /home/bradfitz/src/golang.org/x/build/buildlet/buildletclient.go:306 +0x55
golang.org/x/build/buildlet.(*Client).PutTarFromURL(0xc8202c8540, 0xc8208ac6e0, 0x4f, 0xabb798, 0x5, 0x0, 0xd7b400)
    /home/bradfitz/src/golang.org/x/build/buildlet/buildletclient.go:345 +0x2cb
main.(*buildStatus).writeBootstrapToolchain(0xc820e05680, 0xc828288740, 0xc828288750)
    /home/bradfitz/src/golang.org/x/build/cmd/coordinator/coordinator.go:1749 +0xe4
main.(*buildStatus).(main.writeBootstrapToolchain)-fm(0x7573655200000008, 0xb0e120)
    /home/bradfitz/src/golang.org/x/build/cmd/coordinator/coordinator.go:1414 +0x20
go4.org/syncutil.(*Group).Go.func1(0xc8206e1020, 0xc821852fa0)
    /home/bradfitz/src/go4.org/syncutil/group.go:35 +0x4d
created by go4.org/syncutil.(*Group).Go
    /home/bradfitz/src/go4.org/syncutil/group.go:41 +0x58

And the parent one is stuck at:

goroutine 247 [semacquire, 28 minutes]:
sync.runtime_Semacquire(0xc8206e102c)
    /home/bradfitz/go/src/runtime/sema.go:47 +0x26
sync.(*WaitGroup).Wait(0xc8206e1020)
    /home/bradfitz/go/src/sync/waitgroup.go:131 +0x8d
go4.org/syncutil.(*Group).Err(0xc8206e1020, 0xc821852fa0, 0xe)
    /home/bradfitz/src/go4.org/syncutil/group.go:52 +0x23
main.(*buildStatus).build(0xc820e05680, 0x0, 0x0)
    /home/bradfitz/src/golang.org/x/build/cmd/coordinator/coordinator.go:1415 +0xfa8
main.(*buildStatus).start.func1(0xc820e05680)
    /home/bradfitz/src/golang.org/x/build/cmd/coordinator/coordinator.go:1267 +0x3d
created by main.(*buildStatus).start
    /home/bradfitz/src/golang.org/x/build/cmd/coordinator/coordinator.go:1275 +0x53
@bradfitz bradfitz self-assigned this May 6, 2016
@bradfitz bradfitz added the Builders x/build issues (builders, bots, dashboards) label May 6, 2016
@bradfitz bradfitz added this to the Unreleased milestone May 6, 2016
@bradfitz
Copy link
Contributor Author

bradfitz commented May 6, 2016

And from the coordinator logs:

2016/05/06 18:55:04 Buildlet http://10.240.0.25 GCE VM: buildlet-openbsd-amd64-gce58-rn1ced503 failed three heartbeats; final error: timeout waiting for headers 
2016/05/06 18:55:04 Sent request to delete instance "buildlet-openbsd-amd64-gce58-rn1ced503" in zone "us-central1-f". Operation ID, Name: 9118484307818665575, operation-1462560904200-53230ff283f41-44626dc1-b9470cc4 

So the VM was deleted but RoundTrips are still in-flight, hung forever.

@bradfitz bradfitz changed the title x/cmd/coordinator: hang if writing source tarball fails midway x/build/cmd/coordinator: hang if writing source tarball fails midway May 6, 2016
@gopherbot
Copy link

CL https://golang.org/cl/22890 mentions this issue.

@golang golang locked and limited conversation to collaborators May 9, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Projects
None yet
Development

No branches or pull requests

2 participants