Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: maintner.golang.org not-accessible #21383

Closed
paulzhol opened this issue Aug 10, 2017 · 13 comments
Closed

x/build: maintner.golang.org not-accessible #21383

paulzhol opened this issue Aug 10, 2017 · 13 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Milestone

Comments

@paulzhol
Copy link
Member

The freebsd-arm-paulzhol builder is stuck in a loop since yesterday trying to build the sys subrepo but failing because it can't download it:

Error: runTests: looking up ref for "sys": rpc error: code = 13 desc = grpc: Post https://maintner.golang.org/apipb.MaintnerService/
GetRef: dial tcp 35.188.67.38:443: i/o timeout

Also the dashboard for the sys build reports HTTP 404, attached logs captured from farmer.golang.org:
temporarylogs.txt
temporarylogs2.txt

@josharian
Copy link
Contributor

cc @kevinburke @andybons

I always suspected maintner was really just Brad on his phone.

@josharian josharian added the Builders x/build issues (builders, bots, dashboards) label Aug 10, 2017
@kevinburke
Copy link
Contributor

I don't have access to the production infrastructure, unfortunately. My guess is it's crashing on some bad input.

@kevinburke
Copy link
Contributor

I think @jessfraz and @adams-sarah have access to production as well

@s-mang
Copy link
Contributor

s-mang commented Aug 10, 2017

maintner in a crash loop.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x904f37]

goroutine 521 [running]:
main.tryWorkItem(0x0, 0xc42030d530)
/go/src/golang.org/x/build/maintner/maintnerd/api.go:92 +0x37
main.apiService.GoFindTryWork(0xc42000b000, 0x103bb80, 0xc438f43a10, 0xc452496d60, 0x0, 0x0, 0x0)
/go/src/golang.org/x/build/maintner/maintnerd/api.go:170 +0x4b7
golang.org/x/build/maintner/maintnerd/apipb._MaintnerService_GoFindTryWork_Handler(0xab0de0, 0xc42000b000, 0x103bb80, 0xc438f43a10, 0xc449069260, 0x0, 0x0, 0x0, 0x66cb2b, 0x106a9e0)
/go/src/golang.org/x/build/maintner/maintnerd/apipb/api.pb.go:360 +0x28d
grpc%2ego4%2eorg.(*Server).processUnaryRPC(0xc452847bc0, 0x103df80, 0xc4322b2360, 0xc4460ae200, 0xc4539ce150, 0x1027590, 0x0, 0x0, 0x0)
/go/src/grpc.go4.org/server.go:697 +0xaa0
grpc%2ego4%2eorg.(*Server).handleStream(0xc452847bc0, 0x103df80, 0xc4322b2360, 0xc4460ae200, 0x0)
/go/src/grpc.go4.org/server.go:873 +0x1261
grpc%2ego4%2eorg.(*Server).serveStreams.func1.1(0xc452496cd0, 0xc452847bc0, 0x103df80, 0xc4322b2360, 0xc4460ae200)
/go/src/grpc.go4.org/server.go:456 +0xa9
created by grpc%2ego4%2eorg.(*Server).serveStreams.func1
/go/src/grpc.go4.org/server.go:457 +0xa1

@s-mang
Copy link
Contributor

s-mang commented Aug 10, 2017

Running to a meeting, but looks like a nil check (cl == nil) in maintnerd/api.go:167 would fix this.

@gopherbot gopherbot added this to the Unreleased milestone Aug 10, 2017
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/54751 mentions this issue: x/build/maintner/maintnerd: check CL is found before doing work

@kevinburke
Copy link
Contributor

From Sarah: https://golang.org/cl/54751

maintner is back up actually. wonder if someone rolled the prev cl back? or what.

but regardless, killing this CL for now.

Which makes me wonder if there is some sort of race or ordering problem.

One thing we could do is run this with the race detector on, either in production or in some sort of staging realm that mirrors the same data.

@paulzhol
Copy link
Member Author

with the maintainer back up, freebsd-arm-paulzhol finished building sys and net subrepos on 1.7 and 1.8 but now the builder is not receiving any work:

Reverse pool summary:
host-freebsd-arm-paulzhol: 0/1

Reverse pool machine detail
a20.home.idea-y.com (10.240.0.8:54566) version 15, host-freebsd-arm-paulzhol: connected 6h13m59.5s, idle for 359.5ms

My logs don't indicate anything out of the ordinary:

====================
2017/08/10 22:36:45 buildlet starting.
2017/08/10 22:36:45 Not on GCE; not remounting root filesystem.
2017/08/10 22:36:45 Dialing coordinator farmer.golang.org:443 ...
2017/08/10 22:36:45 Doing TLS handshake with coordinator (verifying hostname "farmer.golang.org")...
2017/08/10 22:36:46 Registering reverse mode with coordinator...
2017/08/10 22:36:46 Connected to coordinator; reverse dialing active
2017/08/11 03:17:12 buildlet reverse mode exiting.
====================
2017/08/11 03:17:59 buildlet starting.
2017/08/11 03:17:59 Not on GCE; not remounting root filesystem.
2017/08/11 03:17:59 Dialing coordinator farmer.golang.org:443 ...
2017/08/11 03:17:59 Doing TLS handshake with coordinator (verifying hostname "farmer.golang.org")...
2017/08/11 03:18:00 Registering reverse mode with coordinator...
2017/08/11 03:18:00 Connected to coordinator; reverse dialing active

@kevinburke
Copy link
Contributor

@paulzhol, can you open a separate issue for that one?

@paulzhol
Copy link
Member Author

#21403

@s-mang
Copy link
Contributor

s-mang commented Aug 11, 2017

Hey @kevinburke, all. Turns out my CL is the right fix.
It's a data issue - Gerrit gets ahead of maintner sometimes it looks like. In which case, the cl ptr would be nil. Which is causing a panic on dereference.

So the issue will only crop up periodically, which is what we are seeing for maintner outages.
I will get this CL in today.

@s-mang
Copy link
Contributor

s-mang commented Aug 11, 2017

Waiting to deploy. Will ping back here when this fix is live.

@s-mang
Copy link
Contributor

s-mang commented Aug 14, 2017

fyi this is live as of friday.

@golang golang locked and limited conversation to collaborators Aug 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge
Projects
None yet
Development

No branches or pull requests

5 participants