Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: do something smart with virtual machine forks & crypto #52544

Closed
zx2c4 opened this issue Apr 25, 2022 · 15 comments
Closed

proposal: do something smart with virtual machine forks & crypto #52544

zx2c4 opened this issue Apr 25, 2022 · 15 comments

Comments

@zx2c4
Copy link
Contributor

zx2c4 commented Apr 25, 2022

When virtual machines are snapshotted and restored, the RNG usually returns the same values, and existing session keys of various protocols usually stay the same, leading to nonce reuse and sometimes catastrophic cryptographic failure.

In Linux 5.18, I wrote a driver for Microsoft's "vmgenid" virtual device, which is implemented by Hyper-V, QEMU, VMware, and probably Firecracker soon. This reseeds the RNG using some hypervisor supplied unique value, and then notifies in-kernel subscribers about the VM fork event. WireGuard is one such (actually, at the moment, the only) subscriber, and it uses this event to invalidate outgoing session keys to prevent nonce reuse.

For Linux 5.19, I intend to bring those notifications to userspace, by way of a sysctl poll notifier. There might be future magic memory mapped things to get rid of the races inherent, but that's a ways off and uncertain at best, and adding a notifier like this is pretty simple and short in terms of code.

From a userspace perspective, in C, it looks like this:

  struct pollfd fd = { .fd = open("/proc/sys/kernel/random/fork_event", O_RDONLY)  };
  assert(fd.fd >= 0);
  for (;;) {
    assert(poll(&fd, 1, -1) > 0);
    puts("vm fork detected");
  }

Open a file. Poll on it. When poll returns, the VM has forked/snapshotted/resumed/etc.

Since this is a new userspace API, I wanted to make sure that there are paths forward for it in different environments. For little C programs, it's pretty obvious what to do and that it can work, but stepping through how this might work out with a complex runtime like Go seems like it'd be a prudent thing to do.

So this proposal is at the moment rather open ended: how can we use this functionality in Go, and do we even want this functionality in Go?

Some preliminary thoughts:

  • With regards to crypto/rand, Go no longer buffers, as of 3ae414c, so there's nothing for us to do there.
  • crypto/tls uses long-lived session keys in CTR mode (AES-GCM, ChaCha20Poly1305, etc), which among other things, should probably be zeroed out on VM fork events.
  • Other things, such as wireguard-go, don't live in the standard library but would certainly benefit from having this. Do we want to expose an API for it (in the standard library? in x/sys/unix? in x/crypto?), or tell external things to poll on that file themselves?
  • In general, the risk is probably worse for UDP-based things like WireGuard or DTLS than for TCP-based things like TLS, which will hopefully time out before making the problem too bad.

Focused more on the kernel side of this, I am interested in learning how this discussion pans out from you all regarding Go. There's currently about 4 weeks til the 5.19 merge window opens, so the more Go considerations we can have before then the better.

CC @bradfitz @ianlancetaylor @tklauser @FiloSottile @julieqiu @diagprov @josharian @mdlayher @rsc

@zx2c4 zx2c4 added the Proposal label Apr 25, 2022
@gopherbot gopherbot added this to the Proposal milestone Apr 25, 2022
@mdlayher
Copy link
Member

crypto/tls uses long-lived session keys in CTR mode (AES-GCM, ChaCha20Poly1305, etc), which among other things, should probably be zeroed out on VM fork events.

It seems like it'd make sense for the runtime to expose a hook for these sorts of events (iff GOOS=linux && 5.19+, though it'll be a while before most folks can use it)

Other things, such as wireguard-go, don't live in the standard library but would certainly benefit from having this. Do we want to expose an API for it (in the standard library? in x/sys/unix? in x/crypto?), or tell external things to poll on that file themselves?

It's probably reasonable to expose it in a third party package, I'm not even sure if x/sys makes sense since it's just a combination of opening a file descriptor and polling which is easily done with existing primitives.


I'm excited to see this work happening and am in favor of doing smart things in Go. It'll be easy to do for external code, but I suspect more concrete proposals would be requested for changes to crypto/tls and such.

@ianlancetaylor
Copy link
Contributor

CC @golang/security

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Apr 25, 2022
@ianlancetaylor
Copy link
Contributor

Sorry if this is a foolish question, but are there timing issues here? If you suspend the VM just before the program fetches some random numbers, and then restart the VM, is anything going to prevent it from fetching those numbers, and getting duplicate values for each restart, before it gets around to polling the descriptor?

@zx2c4
Copy link
Contributor Author

zx2c4 commented Apr 25, 2022

Not a foolish question at all. Yes, the whole mechanism is racy, from the core of it on up. The notification begins as an ACPI interrupt which is sent after the system is already restored. So it is, at most, a best effort sort of mitigating thing, but not bulletproof.

Currently the vmgenid device Microsoft specified exposes a 16 byte ID. I would like for them to augment the virtual device to additionally expose a 4 byte counter, so that it can be mmap'd all the way through to userspace, making reading it after every crypto operation (but before transmission) an inexpensive operation. But that doesn't exist now, and it'll be long ways off if it does ever happen, and having to bracket all protocol code with those sorts of checks is way more invasive than responding to an async notification. So unfortunately perfect might be the enemy of not-terrible, so for now this is what we've got.

But hey, 5 minutes of nonce reuse is a lot worse than a few nanoseconds or whatever the propagation latency is.

@zx2c4
Copy link
Contributor Author

zx2c4 commented Apr 25, 2022

It seems like it'd make sense for the runtime to expose a hook for these sorts of events (iff GOOS=linux && 5.19+, though it'll be a while before most folks can use it)

Practically speaking, /proc/sys/kernel/random/fork_event may or may not exist, not just depending on the kernel version but depending on whether CONFIG_VMGENID is enabled. So probably the way such a notifier would be structured would be that everyone can use the API it exposes, and then various OS-specific backends fill in the implementation.

There's the sysctl file in Linux. Windows exposes an NT object path to userspace that does something similar, whose API I can probably wrangle out of the driver. FreeBSD has a vmgenid driver, but I don't know whether it's exposed to userspace (yet).

@ianlancetaylor
Copy link
Contributor

One can imagine something along the lines of

var epoch int32

var startCheckEpoch sync.Once

func checkEpoch() {
    f, err := os.Open("/proc/sys/kernel/random/fork_event")
    if err != nil {
        return // Nothing we ca do.
    }
    for {
        var b [1]byte
        n, err := f.Read(b[:])
        // Check for relevant errors
        atomic.AddInt32(&epoch, 1)
    }
}

func Epoch(int) {
    startCheckEpoch.Do(func() {
        go checkEpoch()
    })
    return int(atomic.LoadInt32(&epoch)
}

Then randomization code can check whether Epoch() has changed. For extra efficiecy split out Epoch and the starting of checkEpoch. This wouldn't need to be in the Go runtime at all. But it does assume that the kernel doesn't just make the descriptor pollable, it actually writes a byte to the descriptor when something has happened.

@zx2c4
Copy link
Contributor Author

zx2c4 commented Apr 25, 2022

Ahhh, so you've implemented the "check everytime" pattern on top of the "async notifier" pattern. The more natural way, I was thinking, would be to register a callback or a channel read/write like we do for unix signals, so apps can respond to the event. For example, https://git.zx2c4.com/wireguard-linux/tree/drivers/net/wireguard/device.c#n94 wg_vm_notification gets called on a notification event, and then all the session keys get invalidated for sending.

Regarding your mention about read() instead of poll(), it currently doesn't work that way. I could maybe try to retrofit that on top of the sysctl poll api, but right now what the kernel does is just trigger a single event per fd that gets consumed by a call to poll(). It was originally designed to poll on domainname/hostname changes, so subsequent reads will just show whatever the current domainname/hostname is. In that sense, poll() isn't indicating that the file is readable, since it always is for domainname/hostname, but just that it has changed. To that end, it doesn't use POLLOUT but actually POLLERR. Since fork_event doesn't have anything to read ever, reads just return -ENODATA.

This POLLERR thing might have interesting effects to consider with netpoll. If you're curious to play, /proc/sys/kernel/hostname is the hostname notifier that exists in kernels now, which has the same semantics as fork_event (aside from the "ianslaptop" vs -ENODATA difference).

@rsc
Copy link
Contributor

rsc commented May 4, 2022

It sounds like the main thing being proposed is a way to watch the notifier file, and that there are no changes in the standard library itself? You mentioned TLS but I have a hard time understanding what it means to fork one end of a TCP connection.

What kind of API are you looking for? Would it be something like os/signal.Notify? Or maybe we could make up a new fake signal for that API? What is the name for this?

Calling a single suspend+resume a "fork" (when there's still just one VM) is a bit confusing, for what that's worth. It sounds like the signal is really on VM restart or VM resume or something like that.

Other people are interested in sleep+wakeup events on laptops (well, wakeup). Does it make sense to solve that problem and piggy back this onto it?

What is the threat model exactly? The attacker can make copies of the VM? Or the VM is copied in the normal course of operation and just leaks accidentally?

@rsc
Copy link
Contributor

rsc commented May 4, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) May 4, 2022
@zx2c4
Copy link
Contributor Author

zx2c4 commented May 4, 2022

It sounds like the main thing being proposed is a way to watch the notifier file, and that there are no changes in the standard library itself?

Unless the standard library wants to export this functionality or use this functionality, such as for:

You mentioned TLS but I have a hard time understanding what it means to fork one end of a TCP connection.

Say you're transmitting the wall clock through a TLS connection using AES-GCM just looping and Write()ing time.Now() to a TLS conn. VM is snapshotted and then runs for 30 seconds. VM is then resumed from 30 second old snapshot. Result: TLS reuses the same AES-GCM counter for new wall clock values, until TCP realizes its packets aren't making it and throttles, and then times out. So there is still a nonce-reuse scenario with TCP, just shorter lived.

What kind of API are you looking for? Would it be something like os/signal.Notify? Or maybe we could make up a new fake signal for that API? What is the name for this?

A os/signal.Notify-like thing might be sensible, sure.

Calling a single suspend+resume a "fork" (when there's still just one VM) is a bit confusing, for what that's worth. It sounds like the signal is really on VM restart or VM resume or something like that.

It's not quite a resume or a restart. It's running a snapshot after it's been rewound or copied elsewhere.

Other people are interested in sleep+wakeup events on laptops (well, wakeup). Does it make sense to solve that problem and piggy back this onto it?

There's currently no kernel facility that I'm aware of for this. I don't know if there will be anything like that in the near future; I could ask Rafael (the PM maintainer) and see his thoughts, but I assume it's complicated, because userspaces would have to be able to block suspend while the handlers did their thing, and that's not trivial. There is a dbus thing that one part of systemd broadcasts, which programs can watch for. It'd be nice to have all this under one interface, but I don't know if it'll happen in practice.

What is the threat model exactly? The attacker can make copies of the VM? Or the VM is copied in the normal course of operation and just leaks accidentally?

The attacker here is a network adversary who is looking for things like repeated nonces or the same keys generated twice. The latter case is already handled in the RNG, so from a Go perspective, I think we're mostly concerned about the repeated nonces case.

@rsc
Copy link
Contributor

rsc commented May 11, 2022

The attacker here is a network adversary who is looking for things like repeated nonces or the same keys generated twice. The latter case is already handled in the RNG, so from a Go perspective, I think we're mostly concerned about the repeated nonces case.

Thanks. That helps clarify this.

Calling a single suspend+resume a "fork" (when there's still just one VM) is a bit confusing, for what that's worth. It sounds like the signal is really on VM restart or VM resume or something like that.

It's not quite a resume or a restart. It's running a snapshot after it's been rewound or copied elsewhere.

I agree that what you want a notification about is running a snapshot after it's been rewound or copied elsewhere. But I don't see how the VM would know that. All it knows is it got resumed. What happened while it was suspended is beyond its ability to know. Isn't it?

This is all probably not terribly relevant, though. It appears that we can already poll /proc/sys/kernel/hostname today:

  1. First os.Open it.
  2. Call f.SyscallConn to get a syscall.Conn
  3. Call conn.Read with a function that returns false once. It will get called back again when the fd is ready for either POLLOUT or POLLERR. No need to do actual I/O on the fd. The second callback indicates that something happened. In fact you can just keep returning false to get each subsequent event. (And conn.Write would presumably work too; they both watch POLLERR.)

So it appears we don't need an API change immediately to support this. If TLS wants to watch this file on Linux, it can do that completely internally.

Maybe in the long term we'll want an API, but it sounds like we should use it and understand it more before then.

@zx2c4 If you agree that we can do without the API for now, then we'll move toward declining this. Let us know what you think.

@rsc
Copy link
Contributor

rsc commented May 18, 2022

It still seems like we probably don't need any API changes here. It might be worth experimenting in crypto/tls (without API changes, in *_linux.go files) once the kernel API is finalized.

@rsc
Copy link
Contributor

rsc commented May 25, 2022

@zx2c4 If you agree that we can do without the API for now, then we'll move toward declining this. Let us know what you think.

@zx2c4
Copy link
Contributor Author

zx2c4 commented Jun 1, 2022

Sorry for the delay. I think it's fine to close this for now. The conversation above, especially #52544 (comment), gives me a good idea of what sort of thing to target.

I think the way this can proceed is, when the kernel part is released, I'll prototype something internal for crypto/tls, for its actual use case. And then based on experience there, we can possibly then revisit whether a generic API makes sense or if it's trivial enough to do out of std.

@zx2c4 zx2c4 closed this as completed Jun 1, 2022
@rsc rsc moved this from Active to Declined in Proposals (old) Jun 8, 2022
@rsc
Copy link
Contributor

rsc commented Jun 8, 2022

This proposal has been declined as retracted.
— rsc for the proposal review group

@golang golang locked and limited conversation to collaborators Jun 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

5 participants