Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/crypto/acme/autocert: add Manager.StapleOCSP #51064

Open
mcrute opened this issue Feb 8, 2022 · 22 comments
Open

x/crypto/acme/autocert: add Manager.StapleOCSP #51064

mcrute opened this issue Feb 8, 2022 · 22 comments
Assignees
Labels
FeatureRequest Proposal Proposal-Accepted Proposal-Crypto Proposal related to crypto packages or other security issues
Milestone

Comments

@mcrute
Copy link

mcrute commented Feb 8, 2022

OSCP stapling is considered a good practice to reduce load on OCSP servers and also to eliminate a round-trip to the CA when a user-agent verifies the certificate. autocert.Manager does not currently staple OCSP responses and it should. I propose that autocert.Manager be updated to staple the OCSP response when the certificate is generated and then treat OCSP expiration as a case of renewal in the domainRenewal logic.

Stapling would likely need to happen in two different places:

  • autocert.Manager.cert which is the cache fetch path for autocert.Manager.GetCertificate. The code should load the certificate, check the expiration of the OCSP staple and in the absence of a staple or if the staple is expired it should attempt to fetch an OCSP response and staple it. On this path there should be a rather short deadline (perhaps 10 seconds) that, upon expiration, returns the certificate without an OCSP staple to prevent OCSP outages from unduly impacting users. The background renewal logic will eventually fix missing staples.
  • autocert.Manager.authorizeCert which is in the cache-miss path for autocert.Manager.GetCertificate. This code should continue to fetch a certificate from the CA and upon success also make an OCSP request and staple that response to the certificate. This is also used by autocert.domainRenewal.do which would then inherit the OCSP stapling logic.

WIthin the domain renewal logic the follow updates would be needed:

  • autocert.domainRenewal.renew performs the renewal of a certificate in the background of an autocert.Manager. This would need to instead check the both the OCSP staple expiration and the certificate expiration and renew whichever one was about to expire.
  • autocert.domainRenewal.next performs the schedule calculation for the renewal loop that runs as a goroutine for each certificate. This would need to be updated to consider upcoming OCSP staple expiration as a reason to trigger a renewal run.

It would also be possible to do OCSP stapling as a separate set of goroutines within autocert.Manager but that seems unnecessary given that the existing renewal logic seems easy to adapt to also handling this use-case.

Additionally, if this proposal is accepted I will add support for fetching OCSP responses and stapling them to a certificate to x/crypto/ocsp so that the functionality can be shared outside of autocert.

If there is consensus that this feature would be accepted and the approach is valid then I will implement this and create a CL.

@mcrute mcrute added the Proposal label Feb 8, 2022
@gopherbot gopherbot added this to the Proposal milestone Feb 8, 2022
@seankhliao seankhliao added Proposal-Crypto Proposal related to crypto packages or other security issues FeatureRequest labels Feb 8, 2022
@seankhliao
Copy link
Member

cc @golang/security

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Feb 16, 2022
@rsc
Copy link
Contributor

rsc commented Feb 16, 2022

cc @rolandshoemaker

@rsc
Copy link
Contributor

rsc commented Feb 16, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) Feb 16, 2022
@rolandshoemaker
Copy link
Member

This has significant cross-over with #40017 (and #22274), which is on-hold and still doesn't have a fully scoped proposal. Since both implementations are likely to share a significant amount of functionality, I think it makes more sense to focus on the crypto/x509 functionality, with an eye to how it can be extended more generally to a TLS server (vs. a verifying client).

Ideally this should probably be mostly implemented at the crypto/tls level, with the only real change for autocert being adding a bool to Manager which sets something in the tls.Config that enables fetching/stapling.

@mcrute
Copy link
Author

mcrute commented Feb 28, 2022

I don't see the overlap with #40017, as I read that it's primarily about client verification of OCSP and OCSP staples. This proposal is approaching it from the other side and would require a client to fetch an OCSP response but not verify it in the way a client would. I see some overlap in the solution space for #22274 and this issue though as both will require an OCSP client.

How would you feel about tackling these problems independently? In stage one autocert.Manager can learn a StapleOCSP flag which will do the stapling within autocert. This will necessitate the building of an OCSP client that can at least decode a certificate, fetch a response from the CA, and validate that the response is good before stapling to the cert. It would likely also require a background goroutine to watch the manager state and refresh staples as they're coming close to expiration (doing this out-of-band would have the least impact on end-users of a server using autocert in the case of a CA being slow or suffering an outage).

In stage two we can look into a solution for #22274 which would allow us to request the Must-Staple extension from the CA when we request certificates and then build out OCSP fetching within crypto/tls (using the client we built in stage one as the foundation for that fetching logic). In the conclusion of this phase we can drop our local implementation in favor of the crypto/tls implementation.

I expect phase two will take longer than phase one so this gets the functionality into the hands of autocert users quickly and gives us a test-case for OCSP stapling in the standard library that can inform our decisions about crypto/tls changes. Aside from the flag on autocert.Manager the implementation should be entirely invisible to users so we don't have to make any breaking API changes when we later migrate to a platform-level solution in crypto/tls.

What do you think?

@rsc
Copy link
Contributor

rsc commented Mar 9, 2022

Since #40017 and #22274 are on hold, and since @rolandshoemaker said we should wait to address those first, it sounds like we should put this on hold too. Does anyone object?

@rolandshoemaker
Copy link
Member

I'm working on a plan to implement incrementally OCSP stapling, which will need to happen before there is a specific API change for x/crypto/acme/autocert, but I think in general the API proposal here is simply adding a bool to Manager which would enable the behavior, which I'm not opposed to.

@rolandshoemaker
Copy link
Member

@mcrute Looking at the path to getting stapling functionality everywhere, I think this is a reasonable place to start. In general I think we just need a tls.Config.GetCertificate function, which the manager can create, which spawns a goroutine that keeps the response up to date.

Since we're going to want to provide this functionality more broadly down the road though I think this should be written with an eye to being relatively generic, such that it can later be moved into the standard library (in particular the OCSP client logic should probably be separate from the stapling management logic, so that it could feasibly be reused for fetching responses for other purposes).

@mcrute
Copy link
Author

mcrute commented Mar 9, 2022

@rolandshoemaker based on my previous comment would it be worthwhile for me to create a PR that we can discuss more concretely for the manager work?

Agreed on the OCSP client point. Was thinking about introducing a Client struct to x/crypto/ocsp with a long term goal of that making it into the standard library to support your vision of stapling throughout the library. I've already written code to this end in my local workspace and could post that for review if you think it's worthwhile.

@rsc
Copy link
Contributor

rsc commented Mar 16, 2022

@rolandshoemaker It sounds like you have an idea about what API should be added to package autocert. What is that API precisely? Thanks.

@rolandshoemaker
Copy link
Member

@rsc I think the following API diff for autocert would be reasonable

type Manager struct {
    ...

    // EnableOCSPStapling enables OCSP stapling for each certificate it obtains, returning
    // a fresh OCSP response from the GetCertificate method.
    EnableOCSPStapling bool
}

@mcrute that seems like a reasonable approach, if you'd like to send a CL tag me as a reviewer and we can work through what it should look like.

@AGWA
Copy link

AGWA commented Mar 16, 2022

I don't see the overlap with #40017, as I read that it's primarily about client verification of OCSP and OCSP staples. This proposal is approaching it from the other side and would require a client to fetch an OCSP response but not verify it in the way a client would.

Unfortunately, OCSP servers operated by publicly-trusted CAs exhibit a variety of pathological behaviors and are not guaranteed to return valid responses. Therefore, TLS servers need to verify OCSP responses before stapling them, to avoid causing verification errors in clients. https://gist.github.com/sleevi/5efe9ef98961ecfb4da8 is a good writeup that describes what a robust OCSP stapling implementation needs to do.

@mcrute
Copy link
Author

mcrute commented Mar 17, 2022

@rolandshoemaker I'll start cleaning up my local patches and get them into a CL for discussion. Will probably take about a week because of some other obligations I need to attend to this week.

@mcrute
Copy link
Author

mcrute commented Mar 17, 2022

@AGWA thanks for that link, I've excerpted it below because I think there are bits worth talking about in considering this design.

  1. Support for keeping a long-lived (disk) cache of OCSP responses.

In the case of autocert.Manager the staples would be stored in application memory but they could be cached to disk. I'm not sure I see the value of caching to disk for long running applications, though I have no specific opposition to this since Manager already supports a disk cache by default.

  1. Validate the server responses to make sure it is something the client will accept.

The correct way to validate this is laid out in RFC 6960 3.2, which is how my local patches currently work.

  1. Refreshes the response, in the background, with sufficient time before expiration.

My proposal, at least in autocert.Manager, is to handle this as a sub-function of the renewal timer system.

  1. That said, even with background refreshing, such a system should observe the Lightweight OCSP Profile of RFC 5019.

Not something I had considered but I will work this into my client.

  1. As with any system doing background requests on a remote server, don't be a jerk and hammer the server when things are bad.

Yes, exponential back-off and retry with jitter is what I'd consider the best practice here. There's already prior art for this in autocert.Manager so just adapting that to work for our OCSP case should be pretty straightforward.

  1. Distributed or proxiable fetching

A valid point but I think this is an exercise for the user in the case of autocert.Manager integration. Building a distributed cache is rather application specific.

  1. The ability to serve old responses while fetching new responses.

That is, it shouldn't be mutually exclusive - it's not that there is the 'ONE TRUE RESPONSE' - some flexibility for multiple responses is needed.

What is the correct behavior here? RFC 6960 3.2 is not clear on what expiration means and what is the appropriate behavior in the case of expiration and makes weak recommendations such as "sufficiently recent" (where "sufficiently recent" is not specified).

Firefox defines "sufficiently recent" as within 24 hours of the time in the response.

I guess clients could use an expired OCSP response as a hint that the cert is valid but that strikes me as client-by-client behavior. I don't see any reason to staple a response more than 24-hours expired.

@rolandshoemaker WDYT?

  1. Some idea of what to do when "things go bad".

Knowing that something has gone wrong in stapling is the first step to handling the case "when things go bad", which is going to be pretty system and operator specific. Generally the way I handle this in systems is for the Manager equivalent to accept a chan error that is read by a error reporting goroutine which generates metrics for a monitoring system. This isn't a common pattern within the standard library though so I'm interested in what a good solution would be here. My local patch set just discards errors.

  1. Configurable OCSP responder per-certificate-being-checked.

This may be necessary for a general purpose OCSP stapling solution, but I don't think it's all that useful for autocert.Manager integration since there's only one CA and it does embed the responder URL in the certificates.

  1. Staple by default.

This will be configurable in the autocert.Manager API.

@rolandshoemaker
Copy link
Member

Will write up more thoughts tomorrow, but see https://github.com/rolandshoemaker/stapled/blob/master/ocsp/ocsp.go for a partial implementation of a fetcher/verifier that takes into account the post written by Sleevi.

@rsc
Copy link
Contributor

rsc commented Mar 23, 2022

How important is OCSP stapling really, given the considerations in https://blog.apnic.net/2022/03/22/whats-going-on-with-certificate-revocation/ ?

@rolandshoemaker
Copy link
Member

The value of OCSP in general is debatable, but since both Firefox, and (basically) every browser on macOS and iOS, still do OCSP checking, stapling allows clients to avoid making at least one extra request, which in some cases may be considered valuable. Additionally there is a privacy win (although in my opinion somewhat minor) from stapling, since you no longer need to query (in plaintext) an OCSP server revealing the serial number of the certificate you are verifying.

@rsc
Copy link
Contributor

rsc commented Mar 30, 2022

OK, it sounds like people think we should do this, and it sounds like the API is just one extra bool in the Manager struct. Above, Roland wrote:

type Manager struct {
    ...

    // EnableOCSPStapling enables OCSP stapling for each certificate it obtains, returning
    // a fresh OCSP response from the GetCertificate method.
    EnableOCSPStapling bool
}

I might suggest using 'StapleOCSP' instead, to match the title of this proposal and generally be more direct.

Otherwise, does anyone object to adding this?

@rsc
Copy link
Contributor

rsc commented Apr 6, 2022

Let's make the new field StapleOCSP bool. Sounds like people are in favor.

@rsc rsc changed the title proposal: x/crypto/acme/autocert: Staple OCSP proposal: x/crypto/acme/autocert: add Manager.StapleOCSP Apr 6, 2022
@rolandshoemaker
Copy link
Member

Sounds good to me.

@rsc
Copy link
Contributor

rsc commented Apr 13, 2022

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Accept in Proposals (old) Apr 13, 2022
@rsc rsc moved this from Likely Accept to Accepted in Proposals (old) May 4, 2022
@rsc
Copy link
Contributor

rsc commented May 4, 2022

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: x/crypto/acme/autocert: add Manager.StapleOCSP x/crypto/acme/autocert: add Manager.StapleOCSP May 4, 2022
@rsc rsc modified the milestones: Proposal, Backlog May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FeatureRequest Proposal Proposal-Accepted Proposal-Crypto Proposal related to crypto packages or other security issues
Projects
Status: Accepted
Development

No branches or pull requests

6 participants