x/build/maintner/maintnerd: /logs endpoint fails to serve 304 successfully during long poll, ends up serving 502 instead #53569

dmitshur · 2022-06-27T13:53:05Z

The /logs endpoint supports long polling via waitsizenot parameter. To avoid being flagged as a timeout, it tries to serve a 304 in just under a minute (55 seconds specifically):

// Return a 304 if there's no activity in just under a minute.
// This keeps some occasional activity on the TCP connection
// so we (and any proxies) know it's alive, and can fit
// within reason read/write deadlines on either side.
ctx, cancel := context.WithTimeout(r.Context(), 55*time.Second)
defer cancel()
changed := gl.waitSizeNot(ctx, oldSize)
if !changed {
	w.WriteHeader(http.StatusNotModified)
	return
}

(Source: x/build/maintner/maintnerd/gcslog/gcslog.go#243-253.)

I think this worked fine before, but in practice it seems not to work at all now. Instead it always serves a 502 after 60 seconds:

$ time curl -i 'https://maintner.golang.org/logs?waitsizenot=2270415713'
HTTP/2 502 
content-type: text/html; charset=UTF-8
referrer-policy: no-referrer
content-length: 332
date: Mon, 27 Jun 2022 02:56:31 GMT
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000


<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>502 Server Error</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Server Error</h1>
<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>
<h2></h2>
</body></html>
curl -i 'https://maintner.golang.org/logs?waitsizenot=2270415713'  0.02s user 0.02s system 0% cpu 1:00.14 total

Can be reproduced during quiet times (i.e., need 60 seconds+ of inactivity) by manually computing the current corpus size, or by running maintwatch.

Maybe 55 seconds has become too short because of new internal timeouts on the load balancing side. Can trivially try something like 45-50 or 20-25 seconds.

This isn't a big deal for functionality because most callers retry both 502 no differently than a correctly served 304, but adds some noise and inefficiency.

The text was updated successfully, but these errors were encountered:

gopherbot · 2022-06-29T20:25:00Z

Change https://go.dev/cl/415254 mentions this issue: maintner/maintnerd/gcslog: adjust time before 304

dmitshur · 2022-06-29T22:32:29Z

A few test queries with the /debug304 endpoint (from CL 415254) show that waiting under 30 seconds results in 304 being returned, but 30 or more seconds results in 502 after a minute.

I now suspect it's our ingress configuration that caused this to stop working. I think @cagedmantis ran into something similar in CL 406334, and it seems to align with https://cloud.google.com/kubernetes-engine/docs/how-to/ingress-features#timeout:

You can use a BackendConfig to set a backend service timeout period in seconds. If you do not specify a value, the default value is 30 seconds.

I'll see if works well to configure it to 60 seconds and keep the current 55 timeout, otherwise it's fine to go to 25 seconds.

gopherbot · 2022-06-30T13:18:50Z

Change https://go.dev/cl/415354 mentions this issue: maintner/maintnerd: set ingress backend timeout to 60 seconds

dmitshur added Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 27, 2022

dmitshur added this to the Unreleased milestone Jun 27, 2022

dmitshur mentioned this issue Jun 27, 2022

x/build/maintner: test failures with "416 Requested Range Not Satisfiable" from maintner.golang.org #51211

Closed

dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 30, 2022

dmitshur self-assigned this Jun 30, 2022

dmitshur added this to In Progress in Go Release Team Jun 30, 2022

gopherbot closed this as completed in golang/build@f0d6776 Jul 6, 2022

Go Release Team automation moved this from In Progress to Done Jul 6, 2022

golang locked and limited conversation to collaborators Jul 6, 2023

gopherbot added the FrozenDueToAge label Jul 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x/build/maintner/maintnerd: /logs endpoint fails to serve 304 successfully during long poll, ends up serving 502 instead #53569

x/build/maintner/maintnerd: /logs endpoint fails to serve 304 successfully during long poll, ends up serving 502 instead #53569

dmitshur commented Jun 27, 2022

gopherbot commented Jun 29, 2022

dmitshur commented Jun 29, 2022

gopherbot commented Jun 30, 2022

x/build/maintner/maintnerd: /logs endpoint fails to serve 304 successfully during long poll, ends up serving 502 instead #53569

x/build/maintner/maintnerd: /logs endpoint fails to serve 304 successfully during long poll, ends up serving 502 instead #53569

Comments

dmitshur commented Jun 27, 2022

gopherbot commented Jun 29, 2022

dmitshur commented Jun 29, 2022

gopherbot commented Jun 30, 2022