Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: monitor/graph GCE instance-create-to-buildlet latencies #21148

Closed
bradfitz opened this issue Jul 24, 2017 · 5 comments
Closed

x/build: monitor/graph GCE instance-create-to-buildlet latencies #21148

bradfitz opened this issue Jul 24, 2017 · 5 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Milestone

Comments

@bradfitz
Copy link
Contributor

In the past few days our Windows GCE instances seem to create, but then the buildlet doesn't come up in 5 minutes.

Why?

Also, we need to monitor & alert on this.

/cc @adams-sarah @cybrcodr @johnsonj

@gopherbot gopherbot added this to the Unreleased milestone Jul 24, 2017
@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Jul 24, 2017
@bradfitz
Copy link
Contributor Author

(The build system does retry, though, and it seems to eventually work. But something's being flaky and thus our builds and trybots are slow.)

@johnsonj
Copy link
Contributor

+1 on monitor/alert. Looks like the buildlet process starts but then nothing:

Serial console output for buildlet-windows-amd64-2012-rnb5c1b2a

 SeaBIOS (version 1.8.2-20170419_170401-google)
Total RAM Size = 0x00000000e6600000 = 3686 MiB
CPUs found: 4     Max CPUs supported: 4
found virtio-scsi at 0:3
virtio-scsi vendor='Google' product='PersistentDisk' rev='1' type=0 removable=0
virtio-scsi blksize=512 sectors=104857600 = 51200 MiB
drive 0x000f31a0: PCHS=0/0/0 translation=lba LCHS=1024/255/63 s=104857600
Booting from Hard Disk 0...
7/24/2017 7:55:26 PM UTC: GCE Agent started (version 3.5.1.0).
7/24/2017 7:55:28 PM UTC: Starting startup scripts (version 3.5.1.0).
7/24/2017 7:55:33 PM UTC: Finished running startup scripts.
2017/07/24 19:55:51 buildlet starting.

@johnsonj
Copy link
Contributor

Created a builder and captured console output:

2017/07/24 20:31:07 network is up.
2017/07/24 20:31:07 Downloading https://storage.googleapis.com/go-builder-data/b
uildlet.windows-amd64 to .\buildlet.exe ...
2017/07/24 20:31:07 Downloaded .\buildlet.exe (7617536 bytes)
fatal error: unexpected signal during runtime execution
[signal 0xc0000005 code=0x0 addr=0xffffffffffffffff pc=0x427e42]

runtime stack:
runtime.throw(0x7620f5, 0x2a)
        /home/bradfitz/go/src/runtime/panic.go:605 +0x9c
runtime.sigpanic()
        /home/bradfitz/go/src/runtime/signal_windows.go:155 +0x184
runtime.netpoll(0xc042019901, 0xc042019901)
        /home/bradfitz/go/src/runtime/netpoll_windows.go:105 +0x332
runtime.findrunnable(0xc042016000, 0x0)
        /home/bradfitz/go/src/runtime/proc.go:2107 +0x610
runtime.schedule()
        /home/bradfitz/go/src/runtime/proc.go:2245 +0x13a
runtime.goexit0(0xc04213a480)
        /home/bradfitz/go/src/runtime/proc.go:2396 +0x24b
runtime.mcall(0x0)
        /home/bradfitz/go/src/runtime/asm_amd64.s:286 +0x5e

goroutine 1 [select]:
net/http.(*Transport).getConn(0x8fe240, 0xc0421720f0, 0x0, 0xc04217e000, 0x4, 0x
c04216c120, 0x12, 0x0, 0x0, 0xc042067648)
        /home/bradfitz/go/src/net/http/transport.go:948 +0x5c6
net/http.(*Transport).RoundTrip(0x8fe240, 0xc042182000, 0x8fe240, 0x0, 0x0)
        /home/bradfitz/go/src/net/http/transport.go:400 +0x6ad
net/http.send(0xc042182000, 0x8c92c0, 0x8fe240, 0x0, 0x0, 0x0, 0xc04216e020, 0x1
00, 0xc0420679c8, 0x1)
        /home/bradfitz/go/src/net/http/client.go:249 +0x1b0
net/http.(*Client).send(0x8f92a0, 0xc042182000, 0x0, 0x0, 0x0, 0xc04216e020, 0x0
, 0x1, 0x4)
        /home/bradfitz/go/src/net/http/client.go:173 +0x104
net/http.(*Client).Do(0x8f92a0, 0xc042182000, 0xa, 0x757505, 0x11)
        /home/bradfitz/go/src/net/http/client.go:602 +0x294
cloud.google.com/go/compute/metadata.getETag(0x8f92a0, 0xc04216c100, 0x1c, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0)
        /home/bradfitz/src/cloud.google.com/go/compute/metadata/metadata.go:132
+0x1f7
cloud.google.com/go/compute/metadata.Get(0xc04216c100, 0x1c, 0x14, 0x7543c6, 0x8
, 0xc04216c100)
        /home/bradfitz/src/cloud.google.com/go/compute/metadata/metadata.go:107
+0x48
cloud.google.com/go/compute/metadata.InstanceAttributeValue(0x7543c6, 0x8, 0xc04
2067d98, 0x42b07d, 0x76ce20, 0xc042067da8)
        /home/bradfitz/src/cloud.google.com/go/compute/metadata/metadata.go:405
+0x76
main.metadataValue(0x7543c6, 0x8, 0x0, 0x0)
        /home/bradfitz/src/golang.org/x/build/cmd/buildlet/buildlet.go:329 +0x40
7
main.defaultListenAddr(0x757af6, 0x12)
        /home/bradfitz/src/golang.org/x/build/cmd/buildlet/buildlet.go:87 +0x4e
main.main()
        /home/bradfitz/src/golang.org/x/build/cmd/buildlet/buildlet.go:139 +0x8b
a

goroutine 49 [IO wait]:
internal/poll.runtime_pollWait(0x2d4e40, 0x77, 0xc04218a0b8)
        /home/bradfitz/go/src/runtime/netpoll.go:173 +0x5e
internal/poll.(*pollDesc).wait(0xc04218a158, 0x77, 0xc04216a000, 0x0, 0x0)
        /home/bradfitz/go/src/internal/poll/fd_poll_runtime.go:85 +0xb5
internal/poll.(*ioSrv).ExecIO(0x900e28, 0xc04218a0b8, 0x76c4b8, 0xc04214b1a8, 0x
c04214b1b0, 0xc04214b1a0)
        /home/bradfitz/go/src/internal/poll/fd_windows.go:191 +0x126
internal/poll.(*FD).ConnectEx(0xc04218a000, 0x8c9b00, 0xc04216c140, 0xc042162240
, 0xc04218a000)
        /home/bradfitz/go/src/internal/poll/fd_windows.go:721 +0x80
net.(*netFD).connect(0xc04218a000, 0x8cdf80, 0xc042162240, 0x0, 0x0, 0x8c9b00, 0
xc04216c140, 0x0, 0x0, 0x0, ...)
        /home/bradfitz/go/src/net/fd_windows.go:116 +0x243
net.(*netFD).dial(0xc04218a000, 0x8cdf80, 0xc042162240, 0x8cf240, 0x0, 0x8cf240,
 0xc0421721b0, 0xc04214b3a0, 0x56a395)
        /home/bradfitz/go/src/net/sock_posix.go:142 +0xf3
net.socket(0x8cdf80, 0xc042162240, 0x7533a6, 0x3, 0x2, 0x1, 0x0, 0x0, 0x8cf240,
0x0, ...)
        /home/bradfitz/go/src/net/sock_posix.go:93 +0x1c1
net.internetSocket(0x8cdf80, 0xc042162240, 0x7533a6, 0x3, 0x8cf240, 0x0, 0x8cf24
0, 0xc0421721b0, 0x1, 0x0, ...)
        /home/bradfitz/go/src/net/ipsock_posix.go:141 +0x158
net.doDialTCP(0x8cdf80, 0xc042162240, 0x7533a6, 0x3, 0x0, 0xc0421721b0, 0x920f40
, 0x0, 0x0)
        /home/bradfitz/go/src/net/tcpsock_posix.go:62 +0xc0
net.dialTCP(0x8cdf80, 0xc042162240, 0x7533a6, 0x3, 0x0, 0xc0421721b0, 0xbe55b423
665a8374, 0x77d9bc38, 0x902ee0)
        /home/bradfitz/go/src/net/tcpsock_posix.go:58 +0xeb
net.dialSingle(0x8cdf80, 0xc042162240, 0xc042180080, 0x8cbd00, 0xc0421721b0, 0x0
, 0x0, 0x0, 0x0)
        /home/bradfitz/go/src/net/dial.go:547 +0x3e9
net.dialSerial(0x8cdf80, 0xc042162240, 0xc042180080, 0xc042186090, 0x1, 0x1, 0x0
, 0x0, 0x0, 0x0)
        /home/bradfitz/go/src/net/dial.go:515 +0x24e
net.(*Dialer).DialContext(0xc042096120, 0x8cdf40, 0xc04204c078, 0x7533a6, 0x3, 0
xc04216c120, 0x12, 0x0, 0x0, 0x0, ...)
        /home/bradfitz/go/src/net/dial.go:397 +0x6f5
net.(*Dialer).Dial(0xc042096120, 0x7533a6, 0x3, 0xc04216c120, 0x12, 0x1240042176
120, 0x110, 0x110, 0xc042188000)
        /home/bradfitz/go/src/net/dial.go:320 +0x7c
net.(*Dialer).Dial-fm(0x7533a6, 0x3, 0xc04216c120, 0x12, 0xc042186060, 0xc042117
998, 0x403580, 0x60)
        /home/bradfitz/src/cloud.google.com/go/compute/metadata/metadata.go:72 +
0x59
net/http.(*Transport).dial(0x8fe240, 0x8cdf40, 0xc04204c078, 0x7533a6, 0x3, 0xc0
4216c120, 0x12, 0x0, 0x0, 0x0, ...)
        /home/bradfitz/go/src/net/http/transport.go:887 +0x82
net/http.(*Transport).dialConn(0x8fe240, 0x8cdf40, 0xc04204c078, 0x0, 0xc04217e0
00, 0x4, 0xc04216c120, 0x12, 0x0, 0xc042130120, ...)
        /home/bradfitz/go/src/net/http/transport.go:1060 +0x1d69
net/http.(*Transport).getConn.func4(0x8fe240, 0x8cdf40, 0xc04204c078, 0xc0421721
20, 0xc042176060)
        /home/bradfitz/go/src/net/http/transport.go:943 +0x7f
created by net/http.(*Transport).getConn
        /home/bradfitz/go/src/net/http/transport.go:942 +0x39a

goroutine 50 [select]:
net.(*netFD).connect.func2(0x8cdf80, 0xc042162240, 0xc04218a000, 0xc0421761e0)
        /home/bradfitz/go/src/net/fd_windows.go:105 +0xf9
created by net.(*netFD).connect
        /home/bradfitz/go/src/net/fd_windows.go:104 +0x218
2017/07/24 20:31:07 Error running buildlet: exit status 2
2017/07/24 20:31:07 (sleeping for 1 minute before failing)

@gopherbot
Copy link

CL https://golang.org/cl/50880 mentions this issue.

@bradfitz
Copy link
Contributor Author

A few days ago I'd replaced with the Windows buildlet with a Go 1.9-built one.

I've reverted it to a Go 1.8-built one and it now seems to work again.

That's disconcerting, so I'm hoping I had unrelated code changes in there too. I'm going to try to repro in staging. I really hope we don't have Go 1.9-on-Windows/GCE problems.

@golang golang locked and limited conversation to collaborators Jul 31, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Projects
None yet
Development

No branches or pull requests

3 participants