New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wasm: improve Efficiency on Init and Constants #26622
Comments
/cc @neelance |
What you see is a workaround for WebAssembly/design#796 (comment). If you want to see this improved, please consider reaching out to the WebAssembly team to get them to take this issue seriously. |
If you mean just the part I said about blocks, then I think it's not a good workaround for inits. Maybe the ssa compiler doesn't give you a choice, but the init function should delegate to functions representing other inits. But that doesn't address size much. Regardless, my concerns about the tens of thousands of instructions should not be dismissed with a link to wasm's block design. That Go choose to eagerly instantiate dozens of publicly visible yet possibly unused structs for string usage is annoying, and other runtimes do it to though they often wait for explicit charset use first. But aside from that, surely the algorithm could be smarter to use, say, the stack instead of updating then setting, then re-getting the same global. Surely, if possible, the best approach to building structs from all constant expressions would be to use the data section and load the fixed struct memory representation into memory instead. Especially on slices of these. Though I know recognizing cases where this optimization may apply is extra work, using the data sections and doing memory init with fewer instructions would be ideal. There has to be some way to reduce the startup function instruction count. Even if you were right and it's wasm's fault for their block design and they had arbitrary jumps, you'd just trade block+end with a label and still have a large jump table. And then you'd still have the tons of other instructions completely unrelated to your link. I understand that you had to use jump tables to have suspension points for your resumable coroutines, but that's orthogonal to what I'm talking about with the other costs of memory and package initialization. |
Sure, there are probably quite a few optimizations specific to WebAssembly that we should add, but I think they should not require very deep changes. This is because the reasoning of the comment I linked above also applies to other issues: The SSA that the Go compiler generates works fine for all other architectures that Go targets. If WebAssembly requires that the Go compiler now has to do something entirely different, then this is primarily a limitation of WebAssembly, not of the Go compiler. I think it would be most helpful if you could make some very concrete suggestions, e.g. pick some portion of the generated wasm code and show how we could simplify it. Then we can talk about the feasibility of such simplification. |
Ok, I will do more in-depth analysis of the ssa insns, wasm insns, and make specific concrete suggestions. I will update when I have more. |
Ok, analysis done... Expand to see some detailed notes
For the following Go code: package main
import (
"fmt"
)
func main() {
fmt.Println("Hello, World!")
} Here are some quick notes in bulleted form (note, a lot of them are irrelevant or you surely already know them, but interesting maybe to an outsider):
After a lot of trying different things, I have only one concrete suggestion right now: Make a helper function (i.e. what the JVM might call a "synthetic" function) for the call-prep insns. Specifically, these 9 insns can be extracted to a helper function taking a single Expand to see some other possibilities I considered
There are of course other possibilities that I could not easily validate their benefits:
I did not do much analysis beyond the unicode init, so it's quite possible there are lots of savings elsewhere in more complex code. |
Thanks for your investigation.
Could you please also compare the sizes of the gzipped versions of those two? And please also compare the startup time on the latest Firefox Developer Edition, since this is currently the best WebAssembly runtime and thus a good benchmark. |
I was going to use it for the non-web use. I wasn't really wanting to do the before steps of investigation, or these next steps, or the ones following. I'll understand if y'all close the issue as not enough evidence of improvement. I'm thinking the 6% saved probably won't show much improvement in download size or startup time for web browsers. I actually think my one concrete suggestion is not enough and the shear size of the payload has many angles to tackle from. But at this point, I accept that this issue doesn't have enough real improvement or suggestions. |
It is good that you are thinking about improvements. But WebAssembly and its runtimes are still in an early stage. For example Firefox Developer Edition is able to load a large WebAssembly binary much more quickly than current Chrome. This is evidence that there are techniques that WebAssembly runtimes can and should apply. Any optimizations of the WebAssembly binary itself should be benchmarked against a sufficiently mature WebAssembly runtime. Otherwise we may optimize for the wrong things. This being said, the goto/jmp limitation is really the severest right now. It is also a blocker for some other optimizations that I have in mind. I would really appreciate if we could somehow get that ball rolling. I'm not yet sure how... |
@ianlancetaylor I think we can close this. |
I toyed around w/ some ideas at https://github.com/cretz/go-wasm-bake. Still nothing concrete to suggest to the compiler. |
The WASM code emitted for
unicode.init
is ridiculously large and inefficient for what it does (22446 instructions as of this writing). This is undoubtedly due tounicode/tables
. There are a ton of blocks, a big table switch, and a bunch of get global + const + store + set global. This should really be fixed IMO.Ideally, this would leverage the data section. While I have not looked at the ssa specifically, I understand that the WASM compiler is likely just handling it generically as would be expected. There are a few options here:
Change how unicode/tables.go works for all archs to use a more packed format at compile time that doesn't require so much struct/slice creation and have an as-fast-as-today runtime lookup (e.g. inlined R16() might return pseudo-pointer to Range16 arr, and then can be indexed and return Lo on request). I think this could even be an opt-in build tag and could result in faster startup time (not that Go's startup time is slow or anything)Breaks backwards compat guarantees, forgot all of these vars are exposedNot sure the best solution here, but any improvement would be welcome to WASM startup time and WASM binary size. Selfishly, I tried to compile
unicode.init
to JVM in my non-web WASM impl and exceeded the JVM method size limit.The text was updated successfully, but these errors were encountered: