Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/maintner/maintnerd: /logs endpoint fails to serve 304 successfully during long poll, ends up serving 502 instead #53569

Closed
dmitshur opened this issue Jun 27, 2022 · 3 comments
Assignees
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@dmitshur
Copy link
Contributor

The /logs endpoint supports long polling via waitsizenot parameter. To avoid being flagged as a timeout, it tries to serve a 304 in just under a minute (55 seconds specifically):

// Return a 304 if there's no activity in just under a minute.
// This keeps some occasional activity on the TCP connection
// so we (and any proxies) know it's alive, and can fit
// within reason read/write deadlines on either side.
ctx, cancel := context.WithTimeout(r.Context(), 55*time.Second)
defer cancel()
changed := gl.waitSizeNot(ctx, oldSize)
if !changed {
	w.WriteHeader(http.StatusNotModified)
	return
}

(Source: x/build/maintner/maintnerd/gcslog/gcslog.go#243-253.)

I think this worked fine before, but in practice it seems not to work at all now. Instead it always serves a 502 after 60 seconds:

$ time curl -i 'https://maintner.golang.org/logs?waitsizenot=2270415713'
HTTP/2 502 
content-type: text/html; charset=UTF-8
referrer-policy: no-referrer
content-length: 332
date: Mon, 27 Jun 2022 02:56:31 GMT
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000


<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>502 Server Error</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Server Error</h1>
<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>
<h2></h2>
</body></html>
curl -i 'https://maintner.golang.org/logs?waitsizenot=2270415713'  0.02s user 0.02s system 0% cpu 1:00.14 total

Can be reproduced during quiet times (i.e., need 60 seconds+ of inactivity) by manually computing the current corpus size, or by running maintwatch.

Maybe 55 seconds has become too short because of new internal timeouts on the load balancing side. Can trivially try something like 45-50 or 20-25 seconds.

This isn't a big deal for functionality because most callers retry both 502 no differently than a correctly served 304, but adds some noise and inefficiency.

@dmitshur dmitshur added Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 27, 2022
@dmitshur dmitshur added this to the Unreleased milestone Jun 27, 2022
@gopherbot
Copy link

Change https://go.dev/cl/415254 mentions this issue: maintner/maintnerd/gcslog: adjust time before 304

@dmitshur
Copy link
Contributor Author

A few test queries with the /debug304 endpoint (from CL 415254) show that waiting under 30 seconds results in 304 being returned, but 30 or more seconds results in 502 after a minute.

I now suspect it's our ingress configuration that caused this to stop working. I think @cagedmantis ran into something similar in CL 406334, and it seems to align with https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#timeout:

You can use a BackendConfig to set a backend service timeout period in seconds. If you do not specify a value, the default value is 30 seconds.

I'll see if works well to configure it to 60 seconds and keep the current 55 timeout, otherwise it's fine to go to 25 seconds.

@gopherbot
Copy link

Change https://go.dev/cl/415354 mentions this issue: maintner/maintnerd: set ingress backend timeout to 60 seconds

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 30, 2022
@dmitshur dmitshur self-assigned this Jun 30, 2022
@dmitshur dmitshur added this to In Progress in Go Release Team Jun 30, 2022
Go Release Team automation moved this from In Progress to Done Jul 6, 2022
@golang golang locked and limited conversation to collaborators Jul 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
Archived in project
Development

No branches or pull requests

2 participants