proposal: cmd/compile: use Asyncify for switching goroutines on wasm #43033

neelance · 2020-12-06T12:47:29Z

WebAssembly still has no support for jump instructions or funclets. All control flow between basic blocks has to be expressed via if/switch and loop constructs. This means that a CFG with arbitrary control flow can not be directly expressed in wasm. (I still haven't seen a good explanation for this design choice except that V8 supposedly would have a hard time supporting arbitrary control flow).

The advertised solution for this limitation is to use the Relooper algorithm to turn arbitrary control flow back into structured control flow. Unfortunately this approach is incompatible with unwinding and restoring the stack when switching goroutines. This is why Go is not using the Relooper but instead quite naively uses a jump variable and a jump table at the beginning of each function (a loop and a huge switch). The performance overhead of this approach is quite bad.

In the long term, wasm stack switching might offer a solution that can avoid unwinding the stack. Until then, it has come to my attention that there is an intermediate solution called Asyncify. This is a post-processing pass on a wasm binary that adds stack unwinding but does so in a much more optimized way than the Go compiler currently does.

I propose using Asyncify within the Go compiler. This should give a very significant performance gain. As a consequence we would need to also implement the Relooper algorithm or some newer alternative called the Stackifier. A downside would be that it would add a compile time depencency to Binaryen's wasm-opt command which implements Asyncify. This dependency would go away when wasm stack switching is available.

For more context see WebAssembly/design#796, especially WebAssembly/design#796 (comment).

The text was updated successfully, but these errors were encountered:

eliasnaur · 2020-12-07T11:56:35Z

A downside would be that it would add a compile time depencency to Binaryen's wasm-opt command which implements Asyncify. This dependency would go away when wasm stack switching is available.

Do you mean "compile time" as in during go tool compile, or "link time" as in go tool link? If the latter, it may be worth making wasm-opt optional by repurposing -linkmode external to use wasm-opt and -linkmode internal to use the old, inefficient approach.

It would be a pity to loose the ability to build WASM binaries without any dependencies. How much more work is implementing Asyncify in Go, compared to the (I assume) significant work to implement Relooper or Stackifier?

neelance · 2020-12-07T12:49:33Z

Do you mean "compile time" as in during go tool compile, or "link time" as in go tool link? If the latter, it may be worth making wasm-opt optional by repurposing -linkmode external to use wasm-opt and -linkmode internal to use the old, inefficient approach.

You are right, my wording was not specific enough. It is "link time", right before emitting the final wasm binary.

Regarding supporting both approaches at once: A big maybe. The Asyncify approach is quite different from what we're doing right now, for example it restores wasm locals. So we would need to also extend our current approach to still be able to use the benefits of Asyncify.

It would be a pity to loose the ability to build WASM binaries without any dependencies. How much more work is implementing Asyncify in Go, compared to the (I assume) significant work to implement Relooper or Stackifier?

They may each use quite a number of helper functions for analysis. I can't say how they compare in terms of total implementation complexity.

What I know for sure is that there is no way around implementing Relooper/Stackifier unless the WebAssembly project reverses course and implements jump/funclets, which seems very unlikely. With Asyncify however there is the option to use it and it is quite likely that it (or any work on our own implementation) will become obsolete once wasm stack switching lands. So from a pure practical perspective I myself am not really willing to spend time on reimplementing Asyncify.

cherrymui · 2020-12-07T21:13:40Z

Last time I looked (quite a long time ago), Asyncify is mostly suitable to handle unwinding/rewinding at specific locations that are known statically. For Go, however, the unwinding/rewinding is more dynamic, or, if we want to mark them statically, potentially nearly all call sites. Does it work well for our use case? It could be that I didn't understand it correctly, or things have changed. Sorry if that's the case.

neelance · 2020-12-07T21:42:19Z

Another reason Asyncify is fast is that we analyze the entire program’s call graph to see which functions might unwind, so that we can avoid modifying functions that can’t.

Yes, we need to mark all places that can unwind, for example channel operations. Asyncify should then be able to figure out which call sites may also unwind. I am not sure how many call sites would be affected. Do you have some idea?

cherrymui · 2020-12-07T21:58:16Z

A stack switch can happen if a goroutine needs to grow its stack. This can happen at the entry of nearly all functions (except nosplit functions). So I think nearly all call sites would be affected.

neelance · 2020-12-07T22:09:51Z

This is a good question. I hadn't considered it yet. Yes, it switches the Go stack, but do we also have to switch the WebAssembly stack? Couldn't we simply continue after the call to morestack?

cherrymui · 2020-12-07T22:24:50Z

Couldn't we simply continue after the call to morestack?

That could be a possibility. That would make morestack a bit different from other architectures. For example, currently morestacknever returns, it just does a context switch to the goroutine stack. I don't know what complexity it will be to change that.

Also, once we have threads, we can have preemptions. Preemption check is also at function entries, and morestack is dual-functioned. In this case, we'll do a real goroutine switch.

neelance · 2020-12-07T22:49:13Z

Also, once we have threads, we can have preemptions. Preemption check is also at function entries, and morestack is dual-functioned. In this case, we'll do a real goroutine switch.

Yes, I fully agree with this conclusion. Using Asyncify would prevent us from doing preemption like Go usually does.

kripken · 2020-12-10T21:15:26Z

If it would be useful, I could look into porting the Binaryen CFG code (which is ~2,000 lines of C++ that I wrote, and that uses an improved Relooper algorithm) to Go. (I've only just tinkered with Go so far, however, so it would be my first production code...)

neelance · 2020-12-10T22:21:14Z

If it would be useful, I could look into porting the Binaryen CFG code (which is ~2,000 lines of C++ that I wrote, and that uses an improved Relooper algorithm) to Go.

Thanks for the offer! ~~That way we wouldn't need a dependency to wasm-opt.~~ (edit: I misunderstood, see below)

kripken · 2020-12-10T23:02:47Z

Oh sorry, maybe I wasn't clear: That would just handle the relooping. You would still need the asyncify logic as well. That's much larger - just the asyncify pass itself is almost 2,000 lines, and it depends on the rest of the Binaryen optimizer which is an order of magnitude bigger at least.

A relooper in Go would let you emit good code for the case where you don't need stack switching (can Go emit such programs?), or where you use wasm stack switching in the future. But otherwise you'd still need wasm-opt for asyncify.

kripken · 2022-10-28T04:57:58Z

A relevant update here: there is now a JavaScript Promise Integration (JSPI) proposal for wasm, which basically does what wasm-opt's Asyncify does, but in the wasm VM. That is, Asyncify is like a polyfill for that new feature.

JSPI is not a full wasm stack switching proposal, but it is the first part of that support. It relies on JS Promises, so it is not pure wasm and will have some overhead (as JS must be used to swap between running and paused code). But it has far less overhead than Asyncify (and I believe, the current implementation Go uses in wasm).

JSPI is still experimental, but there is a working prototype in V8 and there is successful integration in Emscripten for example. So this could be a good time to experiment with it.

In particular, if we experiment with that successfully, then the other part of the plan from earlier in this issue could make sense again: We could port a small Relooper implementation to Go, and then that + JSPI should let Go emit pretty efficient wasm I believe. Thoughts?

pulsejet · 2025-03-10T04:35:00Z

I gave a shot to JSPI and built a toy scheduler for tinygo (details tinygo-org/tinygo#4789)

The basic idea seems to work - one caveat I see is the engine would maintain a full wasm stack for each goroutine, which is large-ish (984kB from my understanding), so starting up a lot of goroutines would likely oom quickly. But I think this would still be a massive perf improvement if it can eliminate all the br_table instructions Go currently emits.

Notably, tinygo already uses Asyncify and the produced wasm is tiny (8M vs 1M for me). But the gc implementation is too slow to be usable for us.

neelance added Proposal arch-wasm labels Dec 6, 2020

gopherbot added this to the Proposal milestone Dec 6, 2020

neelance mentioned this issue Dec 7, 2020

Please Support Arbitrary Labels and Gotos. WebAssembly/design#796

Open

lbguilherme mentioned this issue Jul 1, 2021

Plans about WebAssembly crystal-lang/crystal#829

Closed

rsc moved this to Incoming in Proposals Aug 10, 2022

rsc added this to Proposals Aug 10, 2022

rcoreilly mentioned this issue Jul 15, 2024

cmd/compile: incorrect compilation of specific functions in wasm #68156

Open

seankhliao changed the title ~~proposal: wasm: use Asyncify for switching goroutines~~ proposal: cmd/compile: use Asyncify for switching goroutines on wasm Apr 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: cmd/compile: use Asyncify for switching goroutines on wasm #43033

proposal: cmd/compile: use Asyncify for switching goroutines on wasm #43033

neelance commented Dec 6, 2020

eliasnaur commented Dec 7, 2020

neelance commented Dec 7, 2020

cherrymui commented Dec 7, 2020

neelance commented Dec 7, 2020 •

edited

Loading

cherrymui commented Dec 7, 2020

neelance commented Dec 7, 2020

cherrymui commented Dec 7, 2020

neelance commented Dec 7, 2020

kripken commented Dec 10, 2020

neelance commented Dec 10, 2020 •

edited

Loading

kripken commented Dec 10, 2020

kripken commented Oct 28, 2022

pulsejet commented Mar 10, 2025 •

edited

Loading

proposal: cmd/compile: use Asyncify for switching goroutines on wasm #43033

proposal: cmd/compile: use Asyncify for switching goroutines on wasm #43033

Comments

neelance commented Dec 6, 2020

eliasnaur commented Dec 7, 2020

neelance commented Dec 7, 2020

cherrymui commented Dec 7, 2020

neelance commented Dec 7, 2020 • edited Loading

cherrymui commented Dec 7, 2020

neelance commented Dec 7, 2020

cherrymui commented Dec 7, 2020

neelance commented Dec 7, 2020

kripken commented Dec 10, 2020

neelance commented Dec 10, 2020 • edited Loading

kripken commented Dec 10, 2020

kripken commented Oct 28, 2022

pulsejet commented Mar 10, 2025 • edited Loading

neelance commented Dec 7, 2020 •

edited

Loading

neelance commented Dec 10, 2020 •

edited

Loading

pulsejet commented Mar 10, 2025 •

edited

Loading