Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/cmd/gomote: 502 Bad Gateway error #28365

Open
hyangah opened this issue Oct 24, 2018 · 17 comments
Open

x/build/cmd/gomote: 502 Bad Gateway error #28365

hyangah opened this issue Oct 24, 2018 · 17 comments
Labels
Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@hyangah
Copy link
Contributor

hyangah commented Oct 24, 2018

I often encounter this 502 Bad Gateway error while actively working with a buildlet created using gomote. I guess this is because I was holding the buildlet too long (~30min) and there is a hidden time limit on each buildlet. If so, we need a better error message than this:

$ gomote ls user-hakim-windows-amd64-2016-0
Error running ls: 502 Bad Gateway: 

And, gomote list still shows the lease is not yet expired. That's misleading.

$ gomote list
user-hakim-windows-amd64-2016-0	windows-amd64-2016	host-windows-amd64-2016	expires in 29m49.611152102s
@gopherbot gopherbot added this to the Unreleased milestone Oct 24, 2018
@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Oct 24, 2018
@bcmills
Copy link
Contributor

bcmills commented Jun 4, 2019

I'm seeing this frequently too. I don't think it's actually a time limit, but it's opaque enough that I'm not sure.

CC @golang/osp-team

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 4, 2019
@thanm
Copy link
Contributor

thanm commented Oct 11, 2019

I ran into the same problem this morning. Very mysterious error.

@gopherbot
Copy link

Change https://golang.org/cl/200738 mentions this issue: cmd/coordinator: make gomote proxy 502 error include detail

gopherbot pushed a commit to golang/build that referenced this issue Oct 11, 2019
Go 1.11 added ReverseProxy.ErrorHandler; use it to make the
httputil.ReverseProxy failures print the underlying error back to the
(trusted) client. (Normally the client isn't necessarily trusted
enough to get the full info)

Also, log more to stderr where we can search for it.

Updates golang/go#28365

Change-Id: Iac2d863b159f24fda2e0e6e1f7374ed05434d3e4
Reviewed-on: https://go-review.googlesource.com/c/build/+/200738
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@bradfitz
Copy link
Contributor

Deployed coordinator with new logging. Please report any errors you see. We can also search the coordinator's logs for it too.

@thanm
Copy link
Contributor

thanm commented Oct 11, 2019

New error just now:

Error running puttar: 502 Bad Gateway; body: (golang.org/issue/28365): gomote proxy error: Put http://10.240.0.16/writetgz?dir=: net/http: request canceled while waiting for connection now run: gomote run user-thanm-android-386-emu-0 /bin/bash build-on-gomote.sh Error running run: Error trying to execute /bin/bash: buildlet: HTTP status 502 Bad Gateway: (golang.org/issue/28365): gomote proxy error: Post http://10.240.0.16/exec?: net/http: request canceled while waiting for connection

@gopherbot
Copy link

Change https://golang.org/cl/203217 mentions this issue: buildlet, cmd/coordinator: stop using legacy Request.Cancel for cancellation

gopherbot pushed a commit to golang/build that referenced this issue Oct 24, 2019
…llation

Maybe this will solve the golang/go#28365 problems. But at least it
gets us into codepaths that are known & trusted, and removes use of
deprecated API.

Also, add more logging to help debug golang/go#28365.

Updates golang/go#28365

Change-Id: Ibff2b03fd82573cbeedbbc22d12c30ae1a3c3aa0
Reviewed-on: https://go-review.googlesource.com/c/build/+/203217
Reviewed-by: Bryan C. Mills <bcmills@google.com>
codebien pushed a commit to codebien/build that referenced this issue Nov 13, 2019
Go 1.11 added ReverseProxy.ErrorHandler; use it to make the
httputil.ReverseProxy failures print the underlying error back to the
(trusted) client. (Normally the client isn't necessarily trusted
enough to get the full info)

Also, log more to stderr where we can search for it.

Updates golang/go#28365

Change-Id: Iac2d863b159f24fda2e0e6e1f7374ed05434d3e4
Reviewed-on: https://go-review.googlesource.com/c/build/+/200738
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
codebien pushed a commit to codebien/build that referenced this issue Nov 13, 2019
…llation

Maybe this will solve the golang/go#28365 problems. But at least it
gets us into codepaths that are known & trusted, and removes use of
deprecated API.

Also, add more logging to help debug golang/go#28365.

Updates golang/go#28365

Change-Id: Ibff2b03fd82573cbeedbbc22d12c30ae1a3c3aa0
Reviewed-on: https://go-review.googlesource.com/c/build/+/203217
Reviewed-by: Bryan C. Mills <bcmills@google.com>
@andybons

This comment has been minimized.

@bradfitz

This comment has been minimized.

@andybons

This comment has been minimized.

@dmitshur
Copy link
Contributor

As @bcmills noted in #37001 (comment), there's a chance this is the same issue as #37001, which has been resolved on Feb 6, 2020. /cc @toothrot

The last comments here are from 2019. Has this happened to anyone again since then?

@bcmills
Copy link
Contributor

bcmills commented May 27, 2020

I spent some quality time with a windows-amd64-2016 gomote instance for #39183 over the past week, and haven't seen any disconnects. I think it's fixed.

@bcmills
Copy link
Contributor

bcmills commented Jun 21, 2021

Per above — I think this is fixed. Setting to WaitingForInfo to see if anyone chimes in with recent gomote trouble.

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jun 21, 2021
@gopherbot
Copy link

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@mknyszek
Copy link
Contributor

mknyszek commented Dec 3, 2021

I just saw this on an iOS gomote.

Log:

$ gomote run user-mknyszek-ios-arm64-corellium-0 go/src/all.bash
Building Go cmd/dist using /var/root/go-ios-arm64-bootstrap. (go1.17.3 ios/arm64)
Building Go toolchain1 using /var/root/go-ios-arm64-bootstrap.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
2021/12/03 18:07:46 user-mknyszek-ios-arm64-corellium-0: timeout after 10s waiting for headers for /status
warning: unable to find runtime/cgo.a
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
2021/12/03 18:10:07 user-mknyszek-ios-arm64-corellium-0: timeout after 10s waiting for headers for /status
Building packages and commands for ios/arm64.
2021/12/03 18:12:36 user-mknyszek-ios-arm64-corellium-0: timeout after 10s waiting for headers for /status
2021/12/03 18:13:44 user-mknyszek-ios-arm64-corellium-0: timeout after 10s waiting for headers for /status

##### Testing packages.
ok  	archive/tar	0.047s
ok  	archive/zip	0.518s
ok  	bufio	0.072s
ok  	bytes	3.312s
ok  	compress/bzip2	0.084s
ok  	compress/flate	1.788s
ok  	compress/gzip	0.037s
ok  	compress/lzw	0.078s
ok  	compress/zlib	0.835s
ok  	constraints	0.022s
ok  	container/heap	0.026s
ok  	container/list	0.026s
ok  	container/ring	0.029s
ok  	context	0.148s
ok  	crypto	0.021s
ok  	crypto/aes	0.038s
ok  	crypto/cipher	0.027s
ok  	crypto/des	0.028s
ok  	crypto/dsa	0.024s
ok  	crypto/ecdsa	0.031s
ok  	crypto/ed25519	0.214s
2021/12/03 18:14:55 user-mknyszek-ios-arm64-corellium-0: timeout after 10s waiting for headers for /status
ok  	crypto/ed25519/internal/edwards25519	12.124s
ok  	crypto/ed25519/internal/edwards25519/field	4.480s
ok  	crypto/elliptic	0.068s
ok  	crypto/elliptic/internal/fiat	0.040s [no tests to run]
ok  	crypto/elliptic/internal/nistec	1.620s
ok  	crypto/hmac	0.040s
ok  	crypto/internal/subtle	0.021s
ok  	crypto/md5	0.030s
ok  	crypto/rand	0.033s
ok  	crypto/rc4	0.051s
ok  	crypto/rsa	0.191s
ok  	crypto/sha1	0.023s
ok  	crypto/sha256	0.028s
ok  	crypto/sha512	0.028s
ok  	crypto/subtle	0.021s
ok  	crypto/tls	1.032s
ok  	crypto/x509	0.533s
ok  	database/sql	1.525s
ok  	database/sql/driver	0.020s
ok  	debug/buildinfo	0.020s
ok  	debug/dwarf	0.042s
ok  	debug/elf	0.036s
ok  	debug/gosym	0.027s
ok  	debug/macho	0.027s
ok  	debug/pe	0.030s
ok  	debug/plan9obj	0.022s
ok  	embed	0.026s [no tests to run]
ok  	embed/internal/embedtest	0.024s
ok  	encoding/ascii85	0.023s
ok  	encoding/asn1	0.023s
ok  	encoding/base32	0.033s
ok  	encoding/base64	0.025s
ok  	encoding/binary	0.024s
ok  	encoding/csv	0.032s
ok  	encoding/gob	0.039s
ok  	encoding/hex	0.023s
ok  	encoding/json	0.232s
ok  	encoding/pem	0.030s
ok  	encoding/xml	0.039s
ok  	errors	0.024s
ok  	expvar	0.029s
ok  	flag	0.037s
ok  	fmt	0.116s
ok  	go/ast	0.030s
ok  	go/build	0.034s
ok  	go/build/constraint	0.023s
ok  	go/constant	0.025s
ok  	go/doc	0.148s
ok  	go/format	0.029s
ok  	go/importer	0.029s
ok  	go/internal/gccgoimporter	0.028s
ok  	go/internal/gcimporter	1.088s
ok  	go/internal/srcimporter	0.028s
ok  	go/parser	0.086s
ok  	go/printer	0.337s
ok  	go/scanner	0.026s
ok  	go/token	0.034s
2021/12/03 18:16:48 user-mknyszek-ios-arm64-corellium-0: timeout after 10s waiting for headers for /status
ok  	go/types	2.613s
ok  	hash	0.023s
ok  	hash/adler32	0.026s
ok  	hash/crc32	0.025s
ok  	hash/crc64	0.024s
ok  	hash/fnv	0.021s
ok  	hash/maphash	0.112s
ok  	html	0.024s
ok  	html/template	0.095s
ok  	image	0.073s
ok  	image/color	0.038s
ok  	image/draw	0.108s
ok  	image/gif	0.475s
ok  	image/jpeg	0.250s
ok  	image/png	0.057s
ok  	index/suffixarray	0.340s
2021/12/03 18:17:38 user-mknyszek-ios-arm64-corellium-0: timeout after 10s waiting for headers for /status
ok  	internal/abi	0.022s
ok  	internal/buildcfg	0.021s
ok  	internal/cpu	0.021s
ok  	internal/execabs	0.030s
ok  	internal/fmtsort	0.023s
ok  	internal/fuzz	0.025s
ok  	internal/godebug	0.026s
ok  	internal/intern	0.160s
ok  	internal/itoa	0.022s
ok  	internal/poll	0.056s
ok  	internal/profile	0.022s
2021/12/03 18:18:19 user-mknyszek-ios-arm64-corellium-0: timeout after 10s waiting for headers for /status
ok  	internal/reflectlite	0.102s
ok  	internal/singleflight	0.034s
ok  	internal/trace	0.052s
ok  	internal/unsafeheader	0.048s
ok  	internal/xcoff	0.032s
ok  	io	0.309s
ok  	io/fs	0.024s
ok  	io/ioutil	0.029s
ok  	log	0.027s
ok  	log/syslog	1.655s
ok  	math	0.027s
2021/12/03 18:19:13 user-mknyszek-ios-arm64-corellium-0: timeout after 10s waiting for headers for /status
2021/12/03 18:19:33 user-mknyszek-ios-arm64-corellium-0: timeout after 10s waiting for headers for /status
ok  	math/big	2.008s
ok  	math/bits	0.026s
ok  	math/cmplx	0.023s
ok  	math/rand	0.425s
ok  	mime	0.026s
ok  	mime/multipart	0.203s
ok  	mime/quotedprintable	0.043s
2021/12/03 18:20:23 user-mknyszek-ios-arm64-corellium-0: timeout after 10s waiting for headers for /status
ok  	net	4.220s
ok  	net/http	14.665s
ok  	net/http/cgi	0.070s
ok  	net/http/cookiejar	0.030s
ok  	net/http/fcgi	0.135s
ok  	net/http/httptest	0.160s
ok  	net/http/httptrace	0.035s
ok  	net/http/httputil	0.624s
ok  	net/http/internal	0.020s
ok  	net/http/internal/ascii	0.021s
ok  	net/http/pprof	5.373s
ok  	net/internal/socktest	0.042s
ok  	net/mail	0.032s
ok  	net/netip	0.036s
ok  	net/rpc	0.114s
ok  	net/rpc/jsonrpc	0.040s
ok  	net/smtp	0.049s
ok  	net/textproto	0.050s
ok  	net/url	0.032s
ok  	os	1.555s
ok  	os/exec	0.040s
ok  	os/exec/internal/fdtest	0.023s
ok  	os/signal	3.340s
ok  	os/user	0.022s
ok  	path	0.022s
ok  	path/filepath	0.060s
ok  	plugin	0.023s
2021/12/03 18:22:32 user-mknyszek-ios-arm64-corellium-0: timeout after 10s waiting for headers for /status
2021/12/03 18:22:52 user-mknyszek-ios-arm64-corellium-0: timeout after 10s waiting for headers for /status
2021/12/03 18:23:12 user-mknyszek-ios-arm64-corellium-0: timeout after 10s waiting for headers for /status
2021/12/03 18:23:12 Buildlet https://farmer.golang.org:443 failed three heartbeats; final error: timeout waiting for headers
Error running run: Error trying to execute go/src/all.bash: Buildlet https://farmer.golang.org:443 failed heartbeat after 10.000250219s; marking dead; err=timeout waiting for headers
$ gomote run user-mknyszek-ios-arm64-corellium-0 go/bin/go test -short runtime
Error running run: Error trying to execute go/bin/go: buildlet: HTTP status 502 Bad Gateway: (golang.org/issue/28365): gomote proxy error: Post "http://iPhone/exec?": revdial.Dialer closed
$ gomote list
user-mknyszek-ios-arm64-corellium-0	ios-arm64-corellium	host-ios-arm64-corellium-ios	expires in 29m57.358647306s
$ gomote run user-mknyszek-ios-arm64-corellium-0 go/bin/go test -short runtime
Error running run: Error trying to execute go/bin/go: buildlet: HTTP status 502 Bad Gateway: (golang.org/issue/28365): gomote proxy error: Post "http://iPhone/exec?": revdial.Dialer closed
$ gomote run user-mknyszek-ios-arm64-corellium-0 go/bin/go test -short runtime
Error running run: Error trying to execute go/bin/go: buildlet: HTTP status 502 Bad Gateway: (golang.org/issue/28365): gomote proxy error: Post "http://iPhone/exec?": revdial.Dialer closed

@mknyszek mknyszek reopened this Dec 3, 2021
@mknyszek mknyszek reopened this Dec 3, 2021
@mknyszek mknyszek removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Dec 3, 2021
@aclements
Copy link
Member

I've been running stress tests on a pool of 8 windows-amd64-longtest builders and over the course of about an hour I've gotten dozens of these failures:

2022/01/19 13:06:00 creating windows-amd64-longtest buildlet
2022/01/19 13:06:28 created buildlet user-austin-windows-amd64-longtest-3
2022/01/19 13:06:28 installing go1.4
Error running push: 502 Bad Gateway; body: (golang.org/issue/28365): gomote proxy error: Post "http://10.128.0.55/writetgz?dir=go1.4": read tcp 10.102.128.7:49068->10.128.0.55:80: read: connection reset by peer

One odd thing is that they seem to happen in clusters. My stress testing tool will retry setting up a new gomote 5 times before giving up, and many of the logs show the buildlet failing part way through make.bat and then failing several times in a row with "502 Bad Gateway":

2022/01/19 13:03:05 creating windows-amd64-longtest buildlet
2022/01/19 13:03:33 created buildlet user-austin-windows-amd64-longtest-3
2022/01/19 13:03:34 installing go1.4
2022/01/19 13:03:41 Remote doesn't have "src/syscall/zsysnum_freebsd_386.go"
2022/01/19 13:03:41 Remote doesn't have "test/fixedbugs/issue19548.go"
2022/01/19 13:03:41 Remote doesn't have "src/cmd/compile/internal/ssa/deadcode_t
est.go"
2022/01/19 13:03:41 Remote doesn't have "src/cmd/go/internal/modfetch/sumdb.go"
2022/01/19 13:03:41 Remote doesn't have "src/cmd/vendor/golang.org/x/sys/windows
/race0.go"
2022/01/19 13:03:41 Remote doesn't have 10994 files (only showed 5).
2022/01/19 13:03:41 Remote lacks a VERSION file; sending a fake one
2022/01/19 13:03:47 Uploading 10995 new/changed files; 40413585 byte .tar.gz
Building Go cmd/dist using C:\workdir\go1.4
Building Go toolchain1 using C:\workdir\go1.4.
Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
Error running run: Error trying to execute go/src/make.bat: missing Process-Stat
e trailer from HTTP response; buildlet built with old (<= 1.4) Go?
2022/01/19 13:06:00 setup command failed: exit status 1
2022/01/19 13:06:00 creating windows-amd64-longtest buildlet
2022/01/19 13:06:28 created buildlet user-austin-windows-amd64-longtest-3
2022/01/19 13:06:28 installing go1.4
Error running push: 502 Bad Gateway; body: (golang.org/issue/28365): gomote prox
y error: Post "http://10.128.0.55/writetgz?dir=go1.4": read tcp 10.102.128.7:490
68->10.128.0.55:80: read: connection reset by peer
2022/01/19 13:06:29 setup command failed: exit status 1
2022/01/19 13:06:29 creating windows-amd64-longtest buildlet
2022/01/19 13:07:08 created buildlet user-austin-windows-amd64-longtest-3
2022/01/19 13:07:08 installing go1.4
Error running push: 502 Bad Gateway; body: (golang.org/issue/28365): gomote prox
y error: Post "http://10.128.0.70/writetgz?dir=go1.4": read tcp 10.102.128.7:383
34->10.128.0.70:80: read: connection reset by peer
2022/01/19 13:07:10 setup command failed: exit status 1
2022/01/19 13:07:11 creating windows-amd64-longtest buildlet
2022/01/19 13:07:48 created buildlet user-austin-windows-amd64-longtest-3
2022/01/19 13:07:48 installing go1.4
Error running push: 502 Bad Gateway; body: (golang.org/issue/28365): gomote prox
y error: Post "http://10.128.0.76/writetgz?dir=go1.4": read tcp 10.102.128.7:419
10->10.128.0.76:80: read: connection reset by peer
2022/01/19 13:07:50 setup command failed: exit status 1
2022/01/19 13:07:50 creating windows-amd64-longtest buildlet
2022/01/19 13:08:31 created buildlet user-austin-windows-amd64-longtest-3
2022/01/19 13:08:31 installing go1.4
Error running push: 502 Bad Gateway; body: (golang.org/issue/28365): gomote prox
y error: Post "http://10.128.0.82/writetgz?dir=go1.4": read tcp 10.102.128.7:333
78->10.128.0.82:80: read: connection reset by peer
2022/01/19 13:08:33 setup command failed: exit status 1
2022/01/19 13:08:33 giving up after 5 retries
exited: status 1

@aclements
Copy link
Member

aclements commented Jan 20, 2022

To add to my previous message, I've noticed something very strange. I had spun up 25 windows-amd64-2016 builders and they ran fine for a few minutes. Then all but 3 of them died with errors like in my previous message. My gopool program kept trying to recreate the 22 failed builders, but I never saw one that lasted more than 2.1 seconds after creation. All but the 3 kept failing with 502 errors. I believe exactly the same thing had happened in a previous run, though I wasn't watching it as closely.

While this was happening, I saw the following on farmer:

host-windows-amd64-2016: 25 waiting (oldest 9m55s, newest 5s, progress 7s)
   gomote: 22 (oldest 44s, newest 5s)
   try: 3 (oldest 9m55s, newest 9m55s)

@aclements
Copy link
Member

As a data point, I tried running

VM=$(gomote create windows-amd64-2016)
gomote push $VM
gomote destroy $VM

in 12 parallel loops. I expected this to quickly produce 502 errors, but I let it go for about an hour and didn't get any 502 errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

9 participants