-
Notifications
You must be signed in to change notification settings - Fork 18k
x/build/devapp: dev.golang.org cert has expired #32272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We don't use certbot. We use Go's acme autocert package which should handle renewal automatically. |
Ah I see. Well, it's something else then. |
The certificate of https://dev-staging.golang.org expires on Tuesday, May 28, 2019 at 6:20:45 AM Eastern Daylight Time, which is just 1~ day away.
Edit: It's actually powered by |
I've looked at I thought simply redeploying would resolve the immediate problem of expired certificate, so I tried that on the staging instance first. The newly deployed version is currently in a crash-loop. Its logs say:
I got worried thinking https://maintner.golang.org/logs is also expired, but when I checked it, it seems to be using a healthy valid HTTPS cert that expires on July 19, 2019. So I can't explain that devapp error right now. |
FWIW "missing server name" seems normal. It just means the client didn't provide a name in the SNI in ClientHelloInfo. Intentionally or otherwise. |
I was able to deploy the staging instance of devapp using the parent commit of CL 176257, and it resolved the unknown authority error when fetching https://maintner.golang.org/logs. That CL removed the step that copied However, the certificate at https://dev-staging.golang.org hasn't been updated, so that still needs investigation and fixing. |
Found it. It's fallout from my gitlock removal CLs. We're missing ca-certificates. That explains both errors. |
Change https://golang.org/cl/179077 mentions this issue: |
Nevermind. The CL I sent only explains the latter error and not the LetsEncrypt cert issue. I had the timeline wrong. In the logs I see:
|
CL 176257 refactored the Dockerfile to remove the use of gitlock but I forgot to include ca-certificates here. Updates golang/go#32272 (fixes maybe) Updates golang/go#26872 Change-Id: I7b0e3a756bc9805e81e499b8b7d7c6ed0defb871 Reviewed-on: https://go-review.googlesource.com/c/build/+/179077 Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
The LetsEncrypt problem is because we have 2 replicas now. /cc @andybons, who increased it from 1 to 2 in https://golang.org/cl/127036 That means renewals only have a 50% chance of working, as challenge info is kept in memory with x/crypto/acme/autocert. |
As a temporary fix I've reduced it from 2 to 1. There are several better fixes that will take more time. |
Also, the readiness probe added in CL 127036 doesn't work, because we're redirecting it to https. kubectl describe says:
|
Actually I don't know what's up with that /healthz https error. The http-to-https redirect handler does appear to special case (allow) that path over http. But I don't understand this output. It's like we're running some code that's not in the repo and I can't find....
Sending requests to GitHub!? gzdata cache? What is all that? I don't know what code logs that and I can't find it. |
A "make deploy-prod" got the right code running again. When I changed its config from 2 to 1 replicas and re-applied the That's another thing to fix here. |
Well, it's running now and logs look happy, but I can't connect to it. I have to run, though, so no time to debug further. Either @dmitshur can take over or we can wait until tomorrow. |
It looks like we are using https://github.com/golang/build/blob/78beebf19480669724269a7b3ee6c2eed3d2c64b/devapp/devapp.go#L137 If there's some HTTPS redirection bug preventing the "http-01" challenge from working via I can try adding |
Good find. I bet that's it. We should audit all our services for that. |
Looks like it was, I tried on https://dev-staging.golang.org/ and its cert has now been updated to expire on August 25, 2019 (previously, it was May 28, 2019). I'll apply the same quick fix to dev.golang.org and leave this issue open for all the followup cleanup work here. |
Change https://golang.org/cl/179097 mentions this issue: |
https://dev.golang.org/ is working now. Thanks @bradfitz. Although it didn't work right away. Specifically, I saw the following in logs:
The "refusing to serve autocert on provided domain" error is from our Then later a bunch of:
Then a rate limit error from LE:
Then more EOFs:
And after a while it did renew the cert successfully. I don't know about the details of how Let's Encrypt works to explain this off the top of my head. |
We need to add this manually in order to enable the tls-alpn-01 challenge, since we're using GetCertificate directly instead of via Manager.TLSConfig. We also don't have the http-01 challenge enabled (HTTPHandler isn't being used), so this is the only way for a Let's Encrypt certificate to be acquired now that tls-sni-* challenges have been deprecated. In the future, this code can probably be simplified by using higher-level APIs of autocert, but this fixes the immediate issue. Updates golang/go#32272 Change-Id: Ia72bca3e44bc585b0dfe5c7bcd3e4f544272d1ab Reviewed-on: https://go-review.googlesource.com/c/build/+/179097 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Old. |
Issued On Tuesday, 26 February 2019 at 17:05:27
Expires On Monday, 27 May 2019 at 18:05:27
Probably needs a certbot renew run.
@dmitshur @bradfitz
The text was updated successfully, but these errors were encountered: