Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/crypto/acme: badNonce errors should be retried automatically #19703

Closed
jameshartig opened this issue Mar 24, 2017 · 7 comments
Closed

x/crypto/acme: badNonce errors should be retried automatically #19703

jameshartig opened this issue Mar 24, 2017 · 7 comments

Comments

@jameshartig
Copy link
Contributor

jameshartig commented Mar 24, 2017

When a call to the Let's Encrypt server returns urn:acme:error:badNonce, it should be retried with the nonce returned with that error. Currently it's up to the caller to handle this and automatically retry, which can be tedious since you'd have to wrap or add code around every call to a acme.Client method.

The reason why you get this error is explained in certbot/certbot#2244 (comment):

Many nonces go unused, which means we have to expire them after a while. More importantly, we currently keep track of nonces per-frontend and therefore per-DC. When we switch traffic between DCs, any inflight sessions will get badNonce errors.

I propose that the postJWS method in acme handle this and automatically retry once if a badNonce error is returned, like what was done for certbot/certbot. It looks like all the calls that utilize nonces use that method. When quickly prototyping a solution, I ran into the issue that postJWS just returns the response and relies on the caller to notice an error and call responseError. To make things easy, I had 4XX and 5XX errors automatically handled inside postJWS and then the badNonce error is checked in there.

According to the spec, the error should be urn:ietf:params:acme:error:badNonce but the error that Let's Encrypt returns is urn:acme:error:badNonce, I wasn't sure which is correct, so in my prototype I just looked for the suffix matching.

Prototype patch: jameshartig/crypto@957374c

@jameshartig jameshartig changed the title golang.org/x/crypto/acme: badNonce errors should be retried automatically x/crypto/acme: badNonce errors should be retried automatically Mar 24, 2017
@gopherbot gopherbot added this to the Unreleased milestone Mar 24, 2017
@marktheunissen
Copy link
Contributor

@fastest963 we're running this patch in production, it fixed the issues well when we first deployed it on Monday, but ever since April 6th (coinciding with Let's Encrypt maintenance), we are seeing "400 urn:acme:error:badNonce: JWS has invalid anti-replay nonce xxxxxxx" again on a regular basis. Are you also encountering the problem again? Any idea if there is a regression on their side?

@FiloSottile
Copy link
Contributor

The feature LGTM. I'd submit the patch to Gerrit for core review. A test would be great.

Also, curious about @marktheunissen's issue. Maybe need to retry more times?

@jameshartig
Copy link
Contributor Author

@marktheunissen I haven't seen the same problem. What could be happening is that there's still stored invalid nonces. I've updated my patch above in the summary to clear out any stored nonces after we get a badNonce error. Can you let me know if that fixes your issue?

@FiloSottile thanks for checking it out! I'll add a test and submit to Gerrit after I hear back from @marktheunissen.

@marktheunissen
Copy link
Contributor

Thanks! Will give it a try on Monday.

@marktheunissen
Copy link
Contributor

We now think it might be something to do with our usage of the lib, rather than a change in Boulder (Let's Encrypt) ... still looking into it.

@jameshartig
Copy link
Contributor Author

I created the following CL https://go-review.googlesource.com/c/40130/

jameshartig added a commit to jameshartig/crypto that referenced this issue Apr 12, 2017
After receiving a badNonce error, the call can be safely retried. Nonce
errors can happen unexpectedly based on an unknown expiration date or
server-side changes. Rather than force the caller handle these errors,
retryPostJWS will keep retrying until success or a different error.

According to the spec, the error returned should be
"urn:ietf:params:acme:error:badNonce", but the error that Let's Encrypt
returns is "urn:acme:error:badNonce" so we just check the suffix.

Fixes golang/go#19703

Change-Id: Id15012dff91e51d28ed8bc54f13a6212186cb7df
@gopherbot
Copy link

CL https://golang.org/cl/40130 mentions this issue.

jameshartig added a commit to jameshartig/crypto that referenced this issue Apr 13, 2017
After receiving a badNonce error, the call can be safely retried. Nonce
errors can happen unexpectedly based on an unknown expiration date or
server-side changes. Rather than force the caller handle these errors,
retryPostJWS will keep retrying until success or a different error.

According to the spec, the error returned should be
"urn:ietf:params:acme:error:badNonce", but the error that Let's Encrypt
returns is "urn:acme:error:badNonce" so we just check the suffix.

Fixes golang/go#19703

Change-Id: Id15012dff91e51d28ed8bc54f13a6212186cb7df
@golang golang locked and limited conversation to collaborators Apr 13, 2018
c-expert-zigbee pushed a commit to c-expert-zigbee/crypto_go that referenced this issue Mar 28, 2022
After receiving a badNonce error, the call can be safely retried. Nonce
errors can happen unexpectedly based on an unknown expiration date or
server-side changes. Rather than force the caller handle these errors,
retryPostJWS will keep retrying until success or a different error.

According to the spec, the error returned should be
"urn:ietf:params:acme:error:badNonce", but the error that Let's Encrypt
returns is "urn:acme:error:badNonce" so we just check the suffix.

Fixes golang/go#19703

Change-Id: Id15012dff91e51d28ed8bc54f13a6212186cb7df
Reviewed-on: https://go-review.googlesource.com/40130
Run-TryBot: Alex Vaghin <ddos@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Vaghin <ddos@google.com>
c-expert-zigbee pushed a commit to c-expert-zigbee/crypto_go that referenced this issue Mar 29, 2022
After receiving a badNonce error, the call can be safely retried. Nonce
errors can happen unexpectedly based on an unknown expiration date or
server-side changes. Rather than force the caller handle these errors,
retryPostJWS will keep retrying until success or a different error.

According to the spec, the error returned should be
"urn:ietf:params:acme:error:badNonce", but the error that Let's Encrypt
returns is "urn:acme:error:badNonce" so we just check the suffix.

Fixes golang/go#19703

Change-Id: Id15012dff91e51d28ed8bc54f13a6212186cb7df
Reviewed-on: https://go-review.googlesource.com/40130
Run-TryBot: Alex Vaghin <ddos@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Vaghin <ddos@google.com>
LewiGoddard pushed a commit to LewiGoddard/crypto that referenced this issue Feb 16, 2023
After receiving a badNonce error, the call can be safely retried. Nonce
errors can happen unexpectedly based on an unknown expiration date or
server-side changes. Rather than force the caller handle these errors,
retryPostJWS will keep retrying until success or a different error.

According to the spec, the error returned should be
"urn:ietf:params:acme:error:badNonce", but the error that Let's Encrypt
returns is "urn:acme:error:badNonce" so we just check the suffix.

Fixes golang/go#19703

Change-Id: Id15012dff91e51d28ed8bc54f13a6212186cb7df
Reviewed-on: https://go-review.googlesource.com/40130
Run-TryBot: Alex Vaghin <ddos@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Vaghin <ddos@google.com>
BiiChris pushed a commit to BiiChris/crypto that referenced this issue Sep 15, 2023
After receiving a badNonce error, the call can be safely retried. Nonce
errors can happen unexpectedly based on an unknown expiration date or
server-side changes. Rather than force the caller handle these errors,
retryPostJWS will keep retrying until success or a different error.

According to the spec, the error returned should be
"urn:ietf:params:acme:error:badNonce", but the error that Let's Encrypt
returns is "urn:acme:error:badNonce" so we just check the suffix.

Fixes golang/go#19703

Change-Id: Id15012dff91e51d28ed8bc54f13a6212186cb7df
Reviewed-on: https://go-review.googlesource.com/40130
Run-TryBot: Alex Vaghin <ddos@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Vaghin <ddos@google.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants