Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/crypto/acme: CreateOrderCert returning Order's status ('valid') is not acceptable for finalization #42278

Open
jameshartig opened this issue Oct 29, 2020 · 8 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@jameshartig
Copy link
Contributor

jameshartig commented Oct 29, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.2 linux/amd64

Does this issue reproduce with the latest release?

The build that is seeing the error is using:

golang.org/x/crypto (git) - 06a226f Use boolean tag (0x01) for boolean type.

and I don't see any fixes related to acme since that.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/toaster/.cache/go-build"
GOENV="/home/toaster/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/opt/gopath/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/opt/gopath"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build034489393=/tmp/go-build -gno-record-gcc-switches"

What did you do?

After calling WaitOrder for the order to be ready and authorizations to succeed, we call CreateOrderCert with the order's FinalizeURL and a freshly created CSR. We only call CreateOrderCert once.

What did you expect to see?

I expect it to ignore this error when calling the finalize URL and continue to fetch the certificate since the order is Valid.

What did you see instead?

CreateOrderCert is returning:

403 urn:ietf:params:acme:error:orderNotReady: Order's status ('valid') is not acceptable for finalization

I imagine this is because of some retry logic that is not accounting for the fact that it's already valid. This just started happening today, I'm not sure if something changed on Let's Encrypt's end either.

@gopherbot gopherbot added this to the Unreleased milestone Oct 29, 2020
@jameshartig
Copy link
Contributor Author

jameshartig commented Oct 29, 2020

The problem with ignoring this error with something like:

if e, ok := err.(*Error); ok && e.ProblemType == "urn:ietf:params:acme:error:orderNotReady" {
	// ignore error if valid
}

is that we don't have the order to fetch the certificate. CreateOrderCert doesn't have the order's URI to re-fetch the order's status, so should this be up to the caller do handle this?

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 30, 2020
@dmitshur
Copy link
Contributor

CC @rolandshoemaker, @x1ddos, @FiloSottile per owners.

@rolandshoemaker
Copy link
Member

Let's Encrypt will only set the status of an order to 'valid' if the certificate has already been issued. It's plausible if you are doing parallel validations of identical name sets you may be running into orders that are being reused at the server side. Are you checking if Order.CertURL is not empty if WaitOrder returns an order with a 'valid' status?

@rolandshoemaker
Copy link
Member

Oh sorry I misread the initial issue. This seems like expected behavior, if WaitOrder returns a valid order you should be using FetchCert rather than CreateOrderCert.

@jameshartig
Copy link
Contributor Author

jameshartig commented Nov 2, 2020

@rolandshoemaker I'm not doing parallel validations or fetches. I'm doing validations and then calling CreateOrderCert. Before calling CreateOrderCert the order is not valid. My current theory is that the issue is CreateOrderCert is internally retrying the authorization call but not realizing that the "error" ("Order's status ('valid') is not acceptable for finalization") on the second invocation is an okay response.

Also, to clarify, this seems like an edge case, we've been using the same code in production for close to a year and just ran into this problem last week a few times and it has since stopped happening.

@rolandshoemaker
Copy link
Member

CreateOrderCert doesn't do anything fancy, all it does is POST the finalization request, poll for the order status to change to 'valid', and fetch the resulting certificate. You will only get the 'wrong status' error if a. you've already called CreateOrderCert for the finalization URL before, or b. there is an issue on Let's Encrypt's end in computing order statuses.

Note that if you are creating two orders with the same set of names Let's Encrypt will reuse the order which would cause this exact problem (this is regardless of the CSR you use in CreateOrderCert). For example if you try to create two orders for ["example.com", "b.example.com"] before calling CreateOrderCert on the first order then Let's Encrypt will return the same order object, so you'll only be able to call CreateOrderCert once, the second time will return the error you are seeing as the order will have already been finalized.

@jameshartig
Copy link
Contributor Author

CreateOrderCert doesn't do anything fancy, all it does is POST the finalization request, poll for the order status to change to 'valid', and fetch the resulting certificate.

I see inside of the POST for the finalization request that it has some retry logic. Is there any way I can debug what happened in there for future occurrences?

@rolandshoemaker
Copy link
Member

Oh hrm, I guess another possible explanation here is that LE is timing out internally, causing a 500 status code to be returned to the client but still actually completing the issuance. Since we receive a 500 we retry the call, resulting in the unexpected status error because the finalization actually happened despite us never seeing it happen.

I'm still not sure adding a case to ignore the error and refetch the order is the correct solution to this problem, since it introduces a somewhat complex codepath that relies on matching against a textual error that is specific to Let's Encrypt's implementation. I think I'm still of the opinion this is better handled in user applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants