proposal: x/crypto/acme/autocert cluster compatibility #42361

mvasi90 · 2020-11-03T17:55:42Z

I'm planning to adapt x/crypto/acme/autocert to work in a cluster with multiple nodes.

Known issues of current autocert

Create a new certificate for each subdomain
This forces you to use an external client to request certificates for other subdomains (like: mail. smtp. imap. etc.)

The SAN field of x509.v3 is needed, in order to avoid external clients and avoid rate limit. (Wildcards aren't a desirable option).
Cache stored in file
This doesn't work in a cluster with multiple nodes. Each node has its own storage.
Decentralized business logic
In a cluster, a node requests the renew of a certificate and the CA server requests the domain verification to another node (internal dynamic DNS, floating IPs, etc.). This doesn't work either.

My approach:

Use the existent database (encrypted on disk and TLS connection: slaves, clients, master) and store all needed credentials, not only the cache. (Certificates, keys, etc.)

I have the following code for testing

func main() {
        routerTest := createRouter()
	hs := make(hosts)
	hs["test.com"] = routerTest
	hs["www.test.com"] = routerTest
	hs["sub1.test.com"] = routerTest
	m := &autocert.Manager{
		//Cache:  TODO
		Email:      "my@email.com",
		Prompt:     autocert.AcceptTOS,
		HostPolicy: autocert.HostWhitelist("test.com", "www.test.com", "sub1.test.com"),
	}
	tlsConfig := m.TLSConfig()
	tlsConfig.PreferServerCipherSuites = true
	tlsConfig.MinVersion = tls.VersionTLS12
	s := &http.Server{
		Handler:        hs,
		Addr:           ":443",
		TLSConfig:      tlsConfig,
		ReadTimeout:    10 * time.Second,
		WriteTimeout:   15 * time.Second,
		MaxHeaderBytes: 1 << 20,
		IdleTimeout:    time.Minute,
	}

	go http.ListenAndServe(":80", m.HTTPHandler(nil))
	if err := s.ListenAndServeTLS("", ""); err != http.ErrServerClosed {
		fmt.Println("tls:", err)
		os.Exit(1) // can be replaced by log.Fatal(), but this is a service (systemd) and the date is displayed twice.
	}
}

What happens now?

Every time a client connects, (in order to get the corresponding certificate) ClientHelloInfo is sent to

func (m *Manager) GetCertificate(hello *tls.ClientHelloInfo) (*tls.Certificate, error)

The next step is to check if the client supprts the ALPN protocol (if true, it uses the stored certificate. But it is secure? Any client can set ALPN as supported protocol and obtain information, while the authorization flow is active):

...
if wantsTokenCert(hello) {
	m.challengeMu.RLock()
	defer m.challengeMu.RUnlock()
	if cert := m.certTokens[name]; cert != nil {
		return cert, nil
	}
	if cert, err := m.cacheGet(ctx, certKey{domain: name, isToken: true}); err == nil {
		return cert, nil
	}
	// TODO: cache error results?
	return nil, fmt.Errorf("acme/autocert: no token cert for %q", name)
}
...

If the client does not support ALPN, then it is a regular domain. (What is a regular domain? it Is expected to be any client other than the CA Server?)

...
// regular domain
ck := certKey{
	domain: strings.TrimSuffix(name, "."), // golang.org/issue/18114
	isRSA:  !supportsECDSA(hello),
}
cert, err := m.cert(ctx, ck)
if err == nil {
	return cert, nil
}
if err != ErrCacheMiss {
	return nil, err
}

// first-time
if err := m.hostPolicy()(ctx, name); err != nil {
	return nil, err
}
cert, err = m.createCert(ctx, ck)
if err != nil {
	return nil, err
}
m.cachePut(ctx, ck, cert)
return cert, nil

My proposal (and my current implementation) is:

Create a CacheDB (similar to CacheDir)
Store cache information in the database (centralized for all nodes). This is easy.
Store more data in the database
Any additional data is needed between nodes?
- certState ?
- httpTokens ?
- certTokens ?
- ...

The main improvement will be the use of the SAN field to add additional subdomains:

Split domain name (from ClientHello) and remove the subdomain (if any). If the domain exists in the whitelist, then add all related subdomains (if any) when generating the CSR.
Store the certificate and key into the database and use it in all nodes.

The text was updated successfully, but these errors were encountered:

toothrot · 2020-11-03T18:11:13Z

I'm not sure if DbCache is useful enough on its own, or if people will end up writing their own implementation using the Cache interface regardless, as I assume the nuances of each application's database are too specific to generalize in this package.

To get around thundering-herd scenarios of cache expiration, I've implemented probabilistic early expiration in a GCS cache in the past with the current API, which is helpful in scenarios with multiple frontends sharing a certificate.

To me, it feels like this proposal is doing too much, and that solving grouped CSR subdomain requests should be separate from any discussion over whether a new DbCache interface is added.

/cc @bradfitz @x1ddos

mvasi90 · 2020-11-03T18:43:44Z

I'm not sure if DbCache is useful enough on its own, or if people will end up writing their own implementation using the Cache interface regardless, as I assume the nuances of each application's database are too specific to generalize in this package.

Yes, you are right, I thought of that. I mention it as my own implementation according to my current configuration. I know that it cannot be generalized, it is not something generic, each one has its own system with or without a database.

Using DNS verification (txt record) is not a solution for everyone. Not all providers offer APIs to manipulate dns records.
Using synchronized storage between nodes wouldn't be a solution either (it is neither efficient nor stable when any node goes down).

To get around thundering-herd scenarios of cache expiration, I've implemented probabilistic early expiration in a GCS cache in the past with the current API, which is helpful in scenarios with multiple frontends sharing a certificate.

This is not a problem for me. I can synchronize the certificates on all nodes (pub/sub) or just use a 24h timer task to check if the certificate is updated (or being updated. The first node will update it, for example using priority values).

To me, it feels like this proposal is doing too much, and that solving grouped CSR subdomain requests should be separate from any discussion over whether a new DbCache interface is added.

The DbCache is the less important for me. It is easy to implement one in any storage model.
Leaving the certificates to be cached (leaving the cache itself), I don't know what other data should be shared between the nodes. autocert has data and acme also has data which is stored in the internal memory of the process (in golang). For instance: Client.kid (keyID) is stored in memory but not in the cache.

rsc · 2020-11-18T18:57:08Z

Talked with @bradfitz a bit about this.
It sounds like we could add a Manager.ChallengeCache field of type Cache?
But it also sounds like we could just start using the existing Manager.Cache field?

Can anyone think of problems with just reusing the existing cache for holding challenges?

mvasi90 · 2020-12-05T16:14:25Z

Today I'm looking at the cache implementation and design. It is poorly designed.

The Cache interface must not be inside DirCache.
DirCache should be named dircache.go instead of cache.go
autocert.go has several wrappers like cacheGet() and cachePut() between it and the cache, but it already prepares the key for DirCache using the suffix "+token", "+http-01" and "+rsa". This must be done within each cache implementation. In other words, cache functions receive parameters and each implementation of the cache (DirCache, DBCache,etc) treats them however it wants. For instance: DirCache prepare the file suffix "+token" and the DBCache set the column token to 1.

For this reason, any modification of the cache interface in autocert "will cause the wheel to accumulate so many patches until it becomes oval".
In my opinion, autocert should be rewritten from scratch, but I'm not going to do it because I'm too busy.

Instead, I'm going to organize the code a bit by separating the cache to be generic and then implementing the missing functions to make it compatible with other future cache implementations. This is frustrating because autocert should already be designed that way. The cache is an interface.
I don't want to be cruel but there should be a filter to select developers and not allow anyone to put their fingers.

Over time you end up wasting more time solving problems than developing libraries from scratch.

ianlancetaylor · 2020-12-06T00:54:00Z

@mvasi90 Thanks, but please do not put patches in GitHub issues. Please put them into Gerrit or GitHub pull requests instead, as described at https://golang.org/doc/contribute.html. That will check that the code has the necessary copyright agreement. Thanks.

mvasi90 · 2020-12-06T01:14:01Z

@ianlancetaylor, sorry for the disturbances.

This has been a guide, the less personal data the third-party algorithms record, the better. The less they know about me, the better.

Here my collaboration ends.

ianlancetaylor · 2020-12-06T01:57:01Z

If you are unable to sign a copyright license agreement, then unfortunately we are unable accept your contribution. Closing this issue.

toothrot changed the title ~~[Proposal] x/crypto/acme/autocert cluster compatibility~~ Proposal: x/crypto/acme/autocert cluster compatibility Nov 3, 2020

gopherbot added this to the Proposal milestone Nov 3, 2020

gopherbot added the Proposal label Nov 3, 2020

ianlancetaylor added the Proposal-Crypto label Nov 11, 2020

rsc changed the title ~~Proposal: x/crypto/acme/autocert cluster compatibility~~ proposal: x/crypto/acme/autocert cluster compatibility Nov 18, 2020

rsc mentioned this issue Nov 19, 2020

proposal: review meeting minutes #33502

Open

ianlancetaylor closed this as completed Dec 6, 2020

golang locked and limited conversation to collaborators Dec 6, 2021

gopherbot added the FrozenDueToAge label Dec 6, 2021

rsc moved this to Declined in Proposals Aug 10, 2022

rsc added this to Proposals Aug 10, 2022

rsc removed this from Proposals Oct 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: x/crypto/acme/autocert cluster compatibility #42361

proposal: x/crypto/acme/autocert cluster compatibility #42361

mvasi90 commented Nov 3, 2020 •

edited

Loading

toothrot commented Nov 3, 2020

mvasi90 commented Nov 3, 2020

rsc commented Nov 18, 2020

mvasi90 commented Dec 5, 2020

ianlancetaylor commented Dec 6, 2020

mvasi90 commented Dec 6, 2020

ianlancetaylor commented Dec 6, 2020

proposal: x/crypto/acme/autocert cluster compatibility #42361

proposal: x/crypto/acme/autocert cluster compatibility #42361

Comments

mvasi90 commented Nov 3, 2020 • edited Loading

toothrot commented Nov 3, 2020

mvasi90 commented Nov 3, 2020

rsc commented Nov 18, 2020

mvasi90 commented Dec 5, 2020

ianlancetaylor commented Dec 6, 2020

mvasi90 commented Dec 6, 2020

ianlancetaylor commented Dec 6, 2020

mvasi90 commented Nov 3, 2020 •

edited

Loading