You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Looking at Let's Encrypt server logs, I've noticed that our highest-traffic clients send a User-Agent header of "Go-http-client/1.1". These clients tend to be requesting the same domains repeatedly at a very high rate (14 requests per second), and typically hitting rate limits. I believe one common reason for this is that some downstream libraries do "on-demand issuance", where they will try to issue a certificate in the middle of a TLS handshake if they don't yet have a certificate to match the SNI header.
Because each incoming handshake runs in a separate goroutine, this means potentially unbounded goroutines independently asking crypto/acme to issue a certificate for the same hostname. This is very inefficient, since any given process needs at most one currently valid certificate per hostname.
I would recommend some sort of global state that prevents more than one certificate request per hostname (or hostname set) to be in progress at once. Even better would be to include error status in that global state, so hostnames that recently failed validation (e.g. because they don't exist) are not retried for some minimum period of time. There are some recommendations about error backoffs at https://letsencrypt.org/docs/integration-guide/, although that page does not specifically address the question of synchronization across goroutines.
Thanks,
Jacob
The text was updated successfully, but these errors were encountered:
Hmm, the broken consumer doesn't appear to be autocert, which has synchronization. Adding internal synchronization to the package around domain requests seems pretty heavy-handed, as there are a variety of use cases for requesting multiple certificates containing the same domain (for example separate RSA+ECDSA certificates) that could require more careful coordination in the client application. In general the package tries to wrap the ACME API pretty directly, with a few helpers for doing things like retries.
There probably aren't too many clients doing on-demand issuance, perhaps we could get them fixed and update the package documentation to clarify best practices?
Most common scenarios will want to use autocert subdirectory instead, which provides automatic access to certificates from Let's Encrypt and any other ACME-based CA.
It seems like kube-lego slightly predates the existence of autocert, so it's possible that's the reason they're not using the higher level library. I'll follow up with them, and reopen this if fixing things there doesn't reduce the issue (which would suggest that there's another popular client out there with a similar problem).
Looking at Let's Encrypt server logs, I've noticed that our highest-traffic clients send a User-Agent header of "Go-http-client/1.1". These clients tend to be requesting the same domains repeatedly at a very high rate (14 requests per second), and typically hitting rate limits. I believe one common reason for this is that some downstream libraries do "on-demand issuance", where they will try to issue a certificate in the middle of a TLS handshake if they don't yet have a certificate to match the SNI header.
Because each incoming handshake runs in a separate goroutine, this means potentially unbounded goroutines independently asking
crypto/acme
to issue a certificate for the same hostname. This is very inefficient, since any given process needs at most one currently valid certificate per hostname.I would recommend some sort of global state that prevents more than one certificate request per hostname (or hostname set) to be in progress at once. Even better would be to include error status in that global state, so hostnames that recently failed validation (e.g. because they don't exist) are not retried for some minimum period of time. There are some recommendations about error backoffs at https://letsencrypt.org/docs/integration-guide/, although that page does not specifically address the question of synchronization across goroutines.
Thanks,
Jacob
The text was updated successfully, but these errors were encountered: