New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/build/cmd/gomote: gomote create, run, etc timeout after 2 hours #56423
Comments
Note that the coordinator SSH server does not support commands without a PTY (e.g.,
A workaround is to pass
|
Change https://go.dev/cl/445435 mentions this issue: |
Connections connected for long than the timeout are automatically closed by the load balancer. gomote create (CreateInstance) and gomote run (ExecuteCommand) are implemented as single, long-running gRPC calls. Currently, if one of these exceeds 2 hours, the connection is closed and the call fails. Increase the limit to 24 hr as a mitigation to give long-running commands more time to complete. As noted at https://cloud.google.com/load-balancing/docs/https#timeouts_and_retries, these connections are still at risk of reset due to restarts of the load balance itself, so ideally gomote eventually migrates to RPCs that support retry/continue. For golang/go#56423. Change-Id: Ia10faea1ca8558373d2d6b45abcf99c476317270 Reviewed-on: https://go-review.googlesource.com/c/build/+/445435 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> Run-TryBot: Michael Pratt <mpratt@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com>
@prattmic Should I close this out? Would you like some more time to test the change? |
My immediate pain is fixed, though there is still a timeout, plus GCLB restarts can still reset connections at any time. Thus, long term I'd still like to eventually see RPCs move to retry-able operations that can survive these resets. Perhaps Unplanned is the right milestone for this? |
After two hours, gomote create and run (and presumably other commands) timeout with an error like:
This is rather annoying, particularly for gomote run. gomote create can be manually retried, but gomote run doesn't have a good workaround [1].
We believe this is because the load balancer has a 2 hour timeout, after which the gRPC connection is closed.
As a short term mitigation, I propose increasing this timeout, perhaps to 24 hours.
However the GCP docs note that connections with long timeout are at risk of disconnecting due to maintainence restarts of the load balancer. Longer term, it would probably be better to make the operations retry-able. e.g., perhaps GomoteServer.ExecuteCommand immediately returns an execution ID, and a new GomoteServer.StreamOutput could be used to stream output from that execution? If that is closed, the client can simply reconnect and continue streaming. Alternatively, maybe
gomote ssh
should be more featureful and could replacegomote run
, since it also seems to avoid timeout issues.[1] At the moment I have a terrible hack to scrap the ssh command out of
gomote ssh
, and then fake a PTY to feed commands into it.cc @cagedmantis @golang/release
The text was updated successfully, but these errors were encountered: