proposal: do something smart with virtual machine forks & crypto #52544

zx2c4 · 2022-04-25T14:23:31Z

When virtual machines are snapshotted and restored, the RNG usually returns the same values, and existing session keys of various protocols usually stay the same, leading to nonce reuse and sometimes catastrophic cryptographic failure.

In Linux 5.18, I wrote a driver for Microsoft's "vmgenid" virtual device, which is implemented by Hyper-V, QEMU, VMware, and probably Firecracker soon. This reseeds the RNG using some hypervisor supplied unique value, and then notifies in-kernel subscribers about the VM fork event. WireGuard is one such (actually, at the moment, the only) subscriber, and it uses this event to invalidate outgoing session keys to prevent nonce reuse.

For Linux 5.19, I intend to bring those notifications to userspace, by way of a sysctl poll notifier. There might be future magic memory mapped things to get rid of the races inherent, but that's a ways off and uncertain at best, and adding a notifier like this is pretty simple and short in terms of code.

From a userspace perspective, in C, it looks like this:

  struct pollfd fd = { .fd = open("/proc/sys/kernel/random/fork_event", O_RDONLY)  };
  assert(fd.fd >= 0);
  for (;;) {
    assert(poll(&fd, 1, -1) > 0);
    puts("vm fork detected");
  }

Open a file. Poll on it. When poll returns, the VM has forked/snapshotted/resumed/etc.

Since this is a new userspace API, I wanted to make sure that there are paths forward for it in different environments. For little C programs, it's pretty obvious what to do and that it can work, but stepping through how this might work out with a complex runtime like Go seems like it'd be a prudent thing to do.

So this proposal is at the moment rather open ended: how can we use this functionality in Go, and do we even want this functionality in Go?

Some preliminary thoughts:

With regards to crypto/rand, Go no longer buffers, as of 3ae414c, so there's nothing for us to do there.
crypto/tls uses long-lived session keys in CTR mode (AES-GCM, ChaCha20Poly1305, etc), which among other things, should probably be zeroed out on VM fork events.
Other things, such as wireguard-go, don't live in the standard library but would certainly benefit from having this. Do we want to expose an API for it (in the standard library? in x/sys/unix? in x/crypto?), or tell external things to poll on that file themselves?
In general, the risk is probably worse for UDP-based things like WireGuard or DTLS than for TCP-based things like TLS, which will hopefully time out before making the problem too bad.

Focused more on the kernel side of this, I am interested in learning how this discussion pans out from you all regarding Go. There's currently about 4 weeks til the 5.19 merge window opens, so the more Go considerations we can have before then the better.

CC @bradfitz @ianlancetaylor @tklauser @FiloSottile @julieqiu @diagprov @josharian @mdlayher @rsc

The text was updated successfully, but these errors were encountered:

mdlayher · 2022-04-25T15:49:12Z

crypto/tls uses long-lived session keys in CTR mode (AES-GCM, ChaCha20Poly1305, etc), which among other things, should probably be zeroed out on VM fork events.

It seems like it'd make sense for the runtime to expose a hook for these sorts of events (iff GOOS=linux && 5.19+, though it'll be a while before most folks can use it)

Other things, such as wireguard-go, don't live in the standard library but would certainly benefit from having this. Do we want to expose an API for it (in the standard library? in x/sys/unix? in x/crypto?), or tell external things to poll on that file themselves?

It's probably reasonable to expose it in a third party package, I'm not even sure if x/sys makes sense since it's just a combination of opening a file descriptor and polling which is easily done with existing primitives.

I'm excited to see this work happening and am in favor of doing smart things in Go. It'll be easy to do for external code, but I suspect more concrete proposals would be requested for changes to crypto/tls and such.

ianlancetaylor · 2022-04-25T23:12:37Z

CC @golang/security

ianlancetaylor · 2022-04-25T23:15:32Z

Sorry if this is a foolish question, but are there timing issues here? If you suspend the VM just before the program fetches some random numbers, and then restart the VM, is anything going to prevent it from fetching those numbers, and getting duplicate values for each restart, before it gets around to polling the descriptor?

zx2c4 · 2022-04-25T23:31:10Z

Not a foolish question at all. Yes, the whole mechanism is racy, from the core of it on up. The notification begins as an ACPI interrupt which is sent after the system is already restored. So it is, at most, a best effort sort of mitigating thing, but not bulletproof.

Currently the vmgenid device Microsoft specified exposes a 16 byte ID. I would like for them to augment the virtual device to additionally expose a 4 byte counter, so that it can be mmap'd all the way through to userspace, making reading it after every crypto operation (but before transmission) an inexpensive operation. But that doesn't exist now, and it'll be long ways off if it does ever happen, and having to bracket all protocol code with those sorts of checks is way more invasive than responding to an async notification. So unfortunately perfect might be the enemy of not-terrible, so for now this is what we've got.

But hey, 5 minutes of nonce reuse is a lot worse than a few nanoseconds or whatever the propagation latency is.

zx2c4 · 2022-04-25T23:36:16Z

It seems like it'd make sense for the runtime to expose a hook for these sorts of events (iff GOOS=linux && 5.19+, though it'll be a while before most folks can use it)

Practically speaking, /proc/sys/kernel/random/fork_event may or may not exist, not just depending on the kernel version but depending on whether CONFIG_VMGENID is enabled. So probably the way such a notifier would be structured would be that everyone can use the API it exposes, and then various OS-specific backends fill in the implementation.

There's the sysctl file in Linux. Windows exposes an NT object path to userspace that does something similar, whose API I can probably wrangle out of the driver. FreeBSD has a vmgenid driver, but I don't know whether it's exposed to userspace (yet).

ianlancetaylor · 2022-04-25T23:43:20Z

One can imagine something along the lines of

var epoch int32

var startCheckEpoch sync.Once

func checkEpoch() {
    f, err := os.Open("/proc/sys/kernel/random/fork_event")
    if err != nil {
        return // Nothing we ca do.
    }
    for {
        var b [1]byte
        n, err := f.Read(b[:])
        // Check for relevant errors
        atomic.AddInt32(&epoch, 1)
    }
}

func Epoch(int) {
    startCheckEpoch.Do(func() {
        go checkEpoch()
    })
    return int(atomic.LoadInt32(&epoch)
}

Then randomization code can check whether Epoch() has changed. For extra efficiecy split out Epoch and the starting of checkEpoch. This wouldn't need to be in the Go runtime at all. But it does assume that the kernel doesn't just make the descriptor pollable, it actually writes a byte to the descriptor when something has happened.

zx2c4 · 2022-04-25T23:56:19Z

Ahhh, so you've implemented the "check everytime" pattern on top of the "async notifier" pattern. The more natural way, I was thinking, would be to register a callback or a channel read/write like we do for unix signals, so apps can respond to the event. For example, https://git.zx2c4.com/wireguard-linux/tree/drivers/net/wireguard/device.c#n94 wg_vm_notification gets called on a notification event, and then all the session keys get invalidated for sending.

Regarding your mention about read() instead of poll(), it currently doesn't work that way. I could maybe try to retrofit that on top of the sysctl poll api, but right now what the kernel does is just trigger a single event per fd that gets consumed by a call to poll(). It was originally designed to poll on domainname/hostname changes, so subsequent reads will just show whatever the current domainname/hostname is. In that sense, poll() isn't indicating that the file is readable, since it always is for domainname/hostname, but just that it has changed. To that end, it doesn't use POLLOUT but actually POLLERR. Since fork_event doesn't have anything to read ever, reads just return -ENODATA.

This POLLERR thing might have interesting effects to consider with netpoll. If you're curious to play, /proc/sys/kernel/hostname is the hostname notifier that exists in kernels now, which has the same semantics as fork_event (aside from the "ianslaptop" vs -ENODATA difference).

rsc · 2022-05-04T17:52:58Z

It sounds like the main thing being proposed is a way to watch the notifier file, and that there are no changes in the standard library itself? You mentioned TLS but I have a hard time understanding what it means to fork one end of a TCP connection.

What kind of API are you looking for? Would it be something like os/signal.Notify? Or maybe we could make up a new fake signal for that API? What is the name for this?

Calling a single suspend+resume a "fork" (when there's still just one VM) is a bit confusing, for what that's worth. It sounds like the signal is really on VM restart or VM resume or something like that.

Other people are interested in sleep+wakeup events on laptops (well, wakeup). Does it make sense to solve that problem and piggy back this onto it?

What is the threat model exactly? The attacker can make copies of the VM? Or the VM is copied in the normal course of operation and just leaks accidentally?

rsc · 2022-05-04T18:06:37Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

zx2c4 · 2022-05-04T20:06:24Z

It sounds like the main thing being proposed is a way to watch the notifier file, and that there are no changes in the standard library itself?

Unless the standard library wants to export this functionality or use this functionality, such as for:

You mentioned TLS but I have a hard time understanding what it means to fork one end of a TCP connection.

Say you're transmitting the wall clock through a TLS connection using AES-GCM just looping and Write()ing time.Now() to a TLS conn. VM is snapshotted and then runs for 30 seconds. VM is then resumed from 30 second old snapshot. Result: TLS reuses the same AES-GCM counter for new wall clock values, until TCP realizes its packets aren't making it and throttles, and then times out. So there is still a nonce-reuse scenario with TCP, just shorter lived.

What kind of API are you looking for? Would it be something like os/signal.Notify? Or maybe we could make up a new fake signal for that API? What is the name for this?

A os/signal.Notify-like thing might be sensible, sure.

Calling a single suspend+resume a "fork" (when there's still just one VM) is a bit confusing, for what that's worth. It sounds like the signal is really on VM restart or VM resume or something like that.

It's not quite a resume or a restart. It's running a snapshot after it's been rewound or copied elsewhere.

Other people are interested in sleep+wakeup events on laptops (well, wakeup). Does it make sense to solve that problem and piggy back this onto it?

There's currently no kernel facility that I'm aware of for this. I don't know if there will be anything like that in the near future; I could ask Rafael (the PM maintainer) and see his thoughts, but I assume it's complicated, because userspaces would have to be able to block suspend while the handlers did their thing, and that's not trivial. There is a dbus thing that one part of systemd broadcasts, which programs can watch for. It'd be nice to have all this under one interface, but I don't know if it'll happen in practice.

What is the threat model exactly? The attacker can make copies of the VM? Or the VM is copied in the normal course of operation and just leaks accidentally?

The attacker here is a network adversary who is looking for things like repeated nonces or the same keys generated twice. The latter case is already handled in the RNG, so from a Go perspective, I think we're mostly concerned about the repeated nonces case.

rsc · 2022-05-11T17:28:01Z

The attacker here is a network adversary who is looking for things like repeated nonces or the same keys generated twice. The latter case is already handled in the RNG, so from a Go perspective, I think we're mostly concerned about the repeated nonces case.

Thanks. That helps clarify this.

Calling a single suspend+resume a "fork" (when there's still just one VM) is a bit confusing, for what that's worth. It sounds like the signal is really on VM restart or VM resume or something like that.

It's not quite a resume or a restart. It's running a snapshot after it's been rewound or copied elsewhere.

I agree that what you want a notification about is running a snapshot after it's been rewound or copied elsewhere. But I don't see how the VM would know that. All it knows is it got resumed. What happened while it was suspended is beyond its ability to know. Isn't it?

This is all probably not terribly relevant, though. It appears that we can already poll /proc/sys/kernel/hostname today:

First os.Open it.
Call f.SyscallConn to get a syscall.Conn
Call conn.Read with a function that returns false once. It will get called back again when the fd is ready for either POLLOUT or POLLERR. No need to do actual I/O on the fd. The second callback indicates that something happened. In fact you can just keep returning false to get each subsequent event. (And conn.Write would presumably work too; they both watch POLLERR.)

So it appears we don't need an API change immediately to support this. If TLS wants to watch this file on Linux, it can do that completely internally.

Maybe in the long term we'll want an API, but it sounds like we should use it and understand it more before then.

@zx2c4 If you agree that we can do without the API for now, then we'll move toward declining this. Let us know what you think.

rsc · 2022-05-18T17:26:16Z

It still seems like we probably don't need any API changes here. It might be worth experimenting in crypto/tls (without API changes, in *_linux.go files) once the kernel API is finalized.

rsc · 2022-05-25T17:27:37Z

@zx2c4 If you agree that we can do without the API for now, then we'll move toward declining this. Let us know what you think.

zx2c4 · 2022-06-01T21:33:45Z

Sorry for the delay. I think it's fine to close this for now. The conversation above, especially #52544 (comment), gives me a good idea of what sort of thing to target.

I think the way this can proceed is, when the kernel part is released, I'll prototype something internal for crypto/tls, for its actual use case. And then based on experience there, we can possibly then revisit whether a generic API makes sense or if it's trivial enough to do out of std.

rsc · 2022-06-08T18:00:35Z

This proposal has been declined as retracted.
— rsc for the proposal review group

zx2c4 added the Proposal label Apr 25, 2022

gopherbot added this to the Proposal milestone Apr 25, 2022

ianlancetaylor added this to Incoming in Proposals (old) Apr 25, 2022

rsc moved this from Incoming to Active in Proposals (old) May 4, 2022

zx2c4 closed this as completed Jun 1, 2022

rsc moved this from Active to Declined in Proposals (old) Jun 8, 2022

golang locked and limited conversation to collaborators Jun 8, 2023

gopherbot added the FrozenDueToAge label Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: do something smart with virtual machine forks & crypto #52544

proposal: do something smart with virtual machine forks & crypto #52544

zx2c4 commented Apr 25, 2022

mdlayher commented Apr 25, 2022

ianlancetaylor commented Apr 25, 2022

ianlancetaylor commented Apr 25, 2022

zx2c4 commented Apr 25, 2022 •

edited

zx2c4 commented Apr 25, 2022

ianlancetaylor commented Apr 25, 2022

zx2c4 commented Apr 25, 2022 •

edited

rsc commented May 4, 2022

rsc commented May 4, 2022

zx2c4 commented May 4, 2022

rsc commented May 11, 2022

rsc commented May 18, 2022

rsc commented May 25, 2022

zx2c4 commented Jun 1, 2022

rsc commented Jun 8, 2022

proposal: do something smart with virtual machine forks & crypto #52544

proposal: do something smart with virtual machine forks & crypto #52544

Comments

zx2c4 commented Apr 25, 2022

mdlayher commented Apr 25, 2022

ianlancetaylor commented Apr 25, 2022

ianlancetaylor commented Apr 25, 2022

zx2c4 commented Apr 25, 2022 • edited

zx2c4 commented Apr 25, 2022

ianlancetaylor commented Apr 25, 2022

zx2c4 commented Apr 25, 2022 • edited

rsc commented May 4, 2022

rsc commented May 4, 2022

zx2c4 commented May 4, 2022

rsc commented May 11, 2022

rsc commented May 18, 2022

rsc commented May 25, 2022

zx2c4 commented Jun 1, 2022

rsc commented Jun 8, 2022

zx2c4 commented Apr 25, 2022 •

edited

zx2c4 commented Apr 25, 2022 •

edited