syscall/js: performance considerations #32591

dmitshur · 2019-06-13T03:18:51Z

I was porting some frontend Go code to be compiled to WebAssembly instead of GopherJS, and noticed the performance was noticeably reduced. The Go code in question makes a lot of DOM manipulation calls and queries, so I decided to benchmark the performance of making calls from WebAssembly to the JavaScript APIs via syscall/js.

I found it's approximately 10x slower than native JavaScript.

Results of running a benchmark in Chrome 75.0.3770.80 on macOS 10.14.5:

  131.212518 ms/op - WebAssembly via syscall/js
   61.850000 ms/op - GopherJS via syscall/js
   12.040000 ms/op - GopherJS via github.com/gopherjs/gopherjs/js
   11.320000 ms/op - native JavaScript

Here's the benchmark code I used, written to be self-contained:

Source Code

main.go

package main

import (
	"fmt"
	"runtime"
	"syscall/js"
	"testing"
	"time"

	"honnef.co/go/js/dom/v2"
)

var document = dom.GetWindow().Document().(dom.HTMLDocument)

func main() {
	loaded := make(chan struct{})
	switch readyState := document.ReadyState(); readyState {
	case "loading":
		document.AddEventListener("DOMContentLoaded", false, func(dom.Event) { close(loaded) })
	case "interactive", "complete":
		close(loaded)
	default:
		panic(fmt.Errorf("internal error: unexpected document.ReadyState value: %v", readyState))
	}
	<-loaded

	for i := 0; i < 10000; i++ {
		div := document.CreateElement("div")
		div.SetInnerHTML(fmt.Sprintf("foo <strong>bar</strong> baz %d", i))
		document.Body().AppendChild(div)
	}

	time.Sleep(time.Second)

	runBench(BenchmarkGoSyscallJS, WasmOrGJS+" via syscall/js")
	if runtime.GOARCH == "js" { // GopherJS-only benchmark.
		runBench(BenchmarkGoGopherJS, "GopherJS via github.com/gopherjs/gopherjs/js")
	}
	runBench(BenchmarkNativeJavaScript, "native JavaScript")

	document.Body().Style().SetProperty("background-color", "lightgreen", "")
}

func runBench(f func(*testing.B), desc string) {
	r := testing.Benchmark(f)
	msPerOp := float64(r.T) * 1e-6 / float64(r.N)
	fmt.Printf("%f ms/op - %s\n", msPerOp, desc)
}

func BenchmarkGoSyscallJS(b *testing.B) {
	var total float64
	for i := 0; i < b.N; i++ {
		total = 0
		divs := js.Global().Get("document").Call("getElementsByTagName", "div")
		for j := 0; j < divs.Length(); j++ {
			total += divs.Index(j).Call("getBoundingClientRect").Get("top").Float()
		}
	}
	_ = total
}

func BenchmarkNativeJavaScript(b *testing.B) {
	js.Global().Set("NativeJavaScript", js.Global().Call("eval", nativeJavaScript))
	b.ResetTimer()
	js.Global().Get("NativeJavaScript").Invoke(b.N)
}

const nativeJavaScript = `(function(N) {
	var i, j, total;
	for (i = 0; i < N; i++) {
		total = 0;
		var divs = document.getElementsByTagName("div");
		for (j = 0; j < divs.length; j++) {
			total += divs[j].getBoundingClientRect().top;
		}
	}
	var _ = total;
})`

wasm.go

// +build wasm

package main

import "testing"

const WasmOrGJS = "WebAssembly"

func BenchmarkGoGopherJS(b *testing.B) {}

gopherjs.go

// +build !wasm

package main

import (
	"testing"

	"github.com/gopherjs/gopherjs/js"
)

const WasmOrGJS = "GopherJS"

func BenchmarkGoGopherJS(b *testing.B) {
	var total float64
	for i := 0; i < b.N; i++ {
		total = 0
		divs := js.Global.Get("document").Call("getElementsByTagName", "div")
		for j := 0; j < divs.Length(); j++ {
			total += divs.Index(j).Call("getBoundingClientRect").Get("top").Float()
		}
	}
	_ = total
}

I know syscall/js is documented as "Its current scope is only to allow tests to run, but not yet to provide a comprehensive API for users", but I wanted to open this issue to discuss the future. Performance is important for Go applications that need to make a lot of calls into the JavaScript world.

What is the current state of syscall/js performance, and are there known opportunities to improve it?

/cc @neelance @cherrymui @hajimehoshi

The text was updated successfully, but these errors were encountered:

agnivade · 2019-06-13T04:27:30Z

It would be also good to benchmark with Firefox and see the results.

IIUC, you are just benchmarking DOM manipulation. And since DOM manipulation anyways happens outside wasm, it is just about the price of context jump from wasm land to browser land and back. In that case, I wonder if it is even within the control of syscall/js and not the underlying wasm engine.

Would be also good to benchmark equivalent code using Rust and C and compare the benchmarks. I think that may be a better apples-apples comparison just to compare syscall/js performance with other languages.

cherrymui · 2019-06-13T17:46:35Z

As @agnivade said, probably worth trying Firefox. V8 is known to have some performance problems with the Wasm code generated by the Go compiler.

dmitshur · 2019-06-13T17:52:53Z

It would be also good to benchmark with Firefox and see the results.

Agreed. I'll do this later and share results.

IIUC, you are just benchmarking DOM manipulation. And since DOM manipulation anyways happens outside wasm, it is just about the price of context jump from wasm land to browser land and back. In that case, I wonder if it is even within the control of syscall/js and not the underlying wasm engine.

Yes. When I said syscall/js, I meant the entire performance cost of jumping from Wasm to the browser APIs and back. It's what the user sees when they use the API to interact with the JavaScript world.

Would be also good to benchmark equivalent code using Rust and C and compare the benchmarks. I think that may be a better apples-apples comparison just to compare syscall/js performance with other languages.

Agreed, that would be good and more representative of the actual WebAssembly <-> JS call overhead. Doing that would give us more information. I won't have a chance to do this, but if someone else can, it'd be helpful.

eliasnaur · 2019-06-13T18:07:03Z

Perhaps it's not worth doing anything substantial here before something like WASI is standardized. @neelance even did a WIP implementation at #31105.

dmitshur · 2019-06-14T02:21:56Z

I've tried the benchmark again with recent development versions of 3 browsers:

Chrome Canary
Version 77.0.3824.0 (Official Build) canary (64-bit)

    114.154496 ms/op - WebAssembly via syscall/js
     63.350000 ms/op - GopherJS via syscall/js
     11.740000 ms/op - GopherJS via github.com/gopherjs/gopherjs/js
     11.360000 ms/op - native JavaScript

Firefox Nightly
69.0a1 (2019-06-13) (64-bit)

     94.150003 ms/op - WebAssembly via syscall/js
     85.300000 ms/op - GopherJS via syscall/js
      7.695000 ms/op - GopherJS via github.com/gopherjs/gopherjs/js
      7.405000 ms/op - native JavaScript

Safari Technology Preview
Release 85 (Safari 13.0, WebKit 14608.1.28.1)

     57.249996 ms/op - WebAssembly via syscall/js
     42.866666 ms/op - GopherJS via syscall/js
      5.536666 ms/op - GopherJS via github.com/gopherjs/gopherjs/js
      5.073333 ms/op - native JavaScript

The results are pretty consistent across the 3 browsers in that doing lots of DOM queries via WebAssembly was about 10x slower than with pure JavaScript.

hajimehoshi · 2019-06-14T03:05:43Z

Could you share the code to take the benchmark to output the values [s/op]?

agnivade · 2019-06-14T04:39:58Z

Thanks for the tests @dmitshur. I would have thought that after https://hacks.mozilla.org/2018/10/calls-between-javascript-and-webassembly-are-finally-fast-%F0%9F%8E%89/, the DOM access overhead would have reduced in Firefox. And interesting that Safari is much faster for DOM access than Firefox.

The tests with Rust/C should give us a better idea on what exactly can be improved from Go side. If anybody can post results for that, that'll be great.

dmitshur · 2019-06-14T06:05:25Z

@hajimehoshi Sure. I've updated the source code in the original post.

gopherbot · 2019-06-22T16:24:07Z

Change https://golang.org/cl/183457 mentions this issue: runtime,syscall/js: reuse wasm memory DataView

eliasnaur · 2019-08-01T13:16:01Z

@martisch suggested that I add a "real-world" example that demonstrates the performance hit of webassembly compared to running natively. A good example is the "gophers" demo from Gio (gioui.org). With modules enabled and using Go 1.13 (tip), you can build and run the demo with two commands:

    $ export GO111MODULE=on
    $ go run gioui.org/cmd/gio -target js gioui.org/apps/gophers -stats # for building gophers
    $ go run github.com/shurcooL/goexec 'http.ListenAndServe(":8080", http.FileServer(http.Dir("gophers")))' # for serving gophers on localhost:8080

Then, open a browser and open http://localhost:8080. The target frame time is ~16.7ms (60 Hz), but on my macbook pro it almost never hit the target.

Running the example natively,

     $ go run gioui.org/apps/gophers -stats

it easily hits the 60 Hz target.

In both Chrome and Firefox the builtin profiler is a great way to see what takes up the time. I've attached a screenshot of a single frame from Chrome's "Performance" tab. The frame time is 24ms.

~~Unfortunately, the function names are all mangled ("wasm-function[]") which makes it much harder to discern which functions take up time.~~ (Fixed by not passing -w -s to ldflags).

CC @cherrymui who recently optimized wasm.

agnivade · 2019-08-01T14:58:43Z

Thanks @eliasnaur. I have sent CL 183457 which should alleviate the DOM overhead to some extent. Would you be able to try that and check if it helps at all ? Just a note that the CL only optimizes DOM overhead, so if your app is heavy on computations in the wasm land itself, it might not help very much.

Regarding profiles, yes the Chrome profiler is a great tool. The wasm-function[] is indeed a bother (see bug I filed with the Chrome team). Until then, may I take this chance to suggest you to use wasmbrowsertest ? It was mentioned in @johanbrandhorst's Gophercon talk. Using this you can natively take cpu profiles for wasm just as you would do for amd64. It automagically converts wasm-function to their appropriate names and you can directly analyze the profiles using go tool pprof 🙂. That should give you better insight into your app regarding what's going on and you can see if there is a possibility to optimize hot functions.

eliasnaur · 2019-08-02T14:03:31Z

Thanks @agnivade, wasmbrowsertest is definitely useful for running benchmark and standalone tests on wasm. However, the full drawing and rendering to a window doesn't lend itself to that model yet.

Fortunately, I figured out how to bring back function names: the gioui.org/cmd/gio command passed -ldflags=-w -s which as a side effect stripped the function names from browser debuggers. I've removed the flags which didn't save much space anyway.

Finally I updated my comment to add the -stats flag that enable profiling without Ctrl-P.

agnivade · 2019-08-02T14:23:50Z

Re: function names, it is a cold cache phenomenon as far as I understood. For the first time, it comes up as wasm-function, and then on all consecutive reloads, the names show up. Although, it is hard to reproduce. See the bug I filed.

Anyways, I see some syscall/js.ValueCall in the profile. So my CL should^TM be able to help. Feel free to give it a try whenever you have a chance.

eliasnaur · 2019-08-02T14:38:50Z

I tested with your CL 183457 which seems to help: the frame times are lower and more consistent. This is an example for a 17ms frame (the above profile had frame times above 20ms):

However, the CPU usage still seems too high. According to the profile, almost 10ms of CPU time is spent building the vector shape for the frame timer in the top right corner. The text layout code is definitely CPU heavy and unoptimized, but 10ms seems excessive.

To verify the profile, I cut out the rendering of the statistics label and redid the profile:

Firefox also misses the frame target:

It looks like CPU heavy code is faster in Firefox, whereas DOM calls are slower. Perhaps DOM calls are only slower because Firefox' WebGL implementation is slower.

In summary, it looks like the demo is CPU bound, leading to the claim that Go generates inefficient webassembly.

I'll work on preparing a benchmark that can run in wasmbrowsertest and that skips all rendering/DOM calls.

agnivade · 2019-08-02T15:41:30Z

Great stuff ! I think we are getting somewhere. Yes, the wasm code generation can use some love. I have a couple of CLs which apply some rewrite optimizations which were there in amd64 but absent in wasm, which should go in when the tree opens.

But it would be great if you can prepare a standalone benchmark. That would allow us to compare the generated code with amd64 and see if there are some obvious places for improvement.

eliasnaur · 2019-08-02T20:32:09Z

I split the UI update from its rendering and added a benchmark. To see the difference, I ran:

    $ go test -bench . -count 8 -cpu 1 gioui.org/apps/gophers > native.bench
    $ GOOS=js GOARCH=wasm go test -exec ~/go/bin/wasmbrowsertest -bench . -count 8 gioui.org/apps/gophers > wasm.bench
    $ benchstat native.bench wasm.bench
    name  old time/op  new time/op   delta
    UI    14.9µs ± 1%  216.5µs ±22%  +1354.01%  (p=0.000 n=7+8)

So more than 10 times slower on wasm compared to native code, at least on my 2014 MBP.

agnivade · 2019-08-17T17:44:30Z

I investigated the profiles and started looking at the GOSSAFUNC output of some hot functions. The amd64 code showed lots of (MUL/DIV)SS. However, the wasm code showed something interesting, there were lots of F32DemoteF64 and F64PromoteF32 in the generated code. For example:

v395 00419 (14) F32Load	"".ctrl1+32(SP)
v395 00420 (14) F64PromoteF32
v395 00421 (14) F64Sub
v395 00422 (14) F64Const	$(0.5)

And in fact, several times, code like this was generated -

v403 00474 (213) I32WrapI64
v403 00475 (213) F32Load	""..autotmp_318-64(SP)
v403 00476 (213) F64PromoteF32
v403 00477 (213) F32DemoteF64
v403 00478 (213) F32Store	$0

This means all 32 bit FP values are being promoted to 64bit, then worked on, and then again demoted to 32 bit before being written back to memory.

A quick look into WasmOps.go revealed that 32 bit FP instructions were missing. And then I understood why. It is because all the FP registers (F0-F15) are treated as 64 bit registers.

Now here is where my speculation begins. Since Go SSA works with only registers, these virtual registers were created to work with SSA. But in the generated code, all references to registers are rewritten to local.(get|set|tee). So theoretically it should be possible to construct another set of 32bit registers and add 32 bit FP instructions which just deal with them, and avoid this 32-64 jump.

@neelance / @cherrymui - Is this analysis correct ? If so, how would you recommend to extend the F0-F15 register set to include 32 bit registers too. I have a local CL where I have already added the 32 bit instructions. Now I just need to fix these local.(get|set|tee) to work with 32 bit values.

neelance · 2019-08-17T18:12:23Z

The F64PromoteF32+F32DemoteF64 combination should only happen if rounding to 32 bits is actually necessary. In many cases the Go spec allows to use 64 bit precision for float32 values.

Yes, it is possible to add registers for 32 bit floats, but I'm not sure how much this would affect performance, because I guess that CPUs are not faster on 32 bit floats than on 64 bit floats (might be wrong).

agnivade · 2019-08-17T19:04:41Z

The F64PromoteF32+F32DemoteF64 combination should only happen if rounding to 32 bits is actually necessary.

I think you are referring to this

case ssa.OpWasmLoweredRound32F:
		getValue64(s, v.Args[0])
		s.Prog(wasm.AF32DemoteF64)
		s.Prog(wasm.AF64PromoteF32)

I actually found another code path in case ssa.OpWasmF32Store where getValue64 actually generates a F64PromoteF32 and then because of if v.Op == ssa.OpWasmF32Store {, another AF32DemoteF64 gets added. I did not look much deeper into it though.

Yes, it is possible to add registers for 32 bit floats, but I'm not sure how much this would affect performance,

Sure, if there is no perf boost, then there is no use. But I would like to try and check the benchmarks. What is the right way to add 32 bit registers ? Just add F16-F32 ? Or is there another way ?

neelance · 2019-08-17T23:17:22Z

Is the F64PromoteF32 the one emitted by case ssa.OpLoadReg: of ssaGenValueOnStack? If yes, then this is indeed something we could optimize.

What is the right way to add 32 bit registers ? Just add F16-F32 ? Or is there another way ?

This is not easy to describe in a few words...

ghost · 2019-08-18T04:02:46Z

@agnivade does your CL contain fixes to 32-bit integral instructions? it also amazed me to see such 32-64-32 int/fp convertions.

agnivade · 2019-08-18T04:23:01Z

I have not sent any CL yet. And no, I have not looked into 32bit integral instructions.

Currently, every call to mem() incurs a new DataView object. This was necessary because the wasm linear memory could grow at any time. Now, whenever the memory grows, we make a call to the front-end. This allows us to reuse the existing DataView object and create a new one only when the memory actually grows. This gives us a boost in performance during DOM operations, while incurring an extra trip to front-end when memory grows. However, since the GrowMemory calls are meant to decrease over the runtime of an application, this is a good tradeoff in the long run. The benchmarks have been tested inside a browser (Google Chrome 75.0.3770.90 (Official Build) (64-bit)). It is hard to get stable nos. for DOM operations since the jumps make the timing very unreliable. But overall, it shows a clear gain. name old time/op new time/op delta DOM 135µs ±26% 84µs ±10% -37.22% (p=0.000 n=10+9) Go1 benchmarks do not show any noticeable degradation: name old time/op new time/op delta BinaryTree17 22.5s ± 0% 22.5s ± 0% ~ (p=0.743 n=8+9) Fannkuch11 15.1s ± 0% 15.1s ± 0% +0.17% (p=0.000 n=9+9) FmtFprintfEmpty 324ns ± 1% 303ns ± 0% -6.64% (p=0.000 n=9+10) FmtFprintfString 535ns ± 1% 515ns ± 0% -3.85% (p=0.000 n=10+10) FmtFprintfInt 609ns ± 0% 589ns ± 0% -3.28% (p=0.000 n=10+10) FmtFprintfIntInt 938ns ± 0% 920ns ± 0% -1.92% (p=0.000 n=9+10) FmtFprintfPrefixedInt 950ns ± 0% 924ns ± 0% -2.72% (p=0.000 n=10+9) FmtFprintfFloat 1.41µs ± 1% 1.43µs ± 0% +1.01% (p=0.000 n=10+10) FmtManyArgs 3.66µs ± 1% 3.46µs ± 0% -5.43% (p=0.000 n=9+10) GobDecode 38.8ms ± 1% 37.8ms ± 0% -2.50% (p=0.000 n=10+8) GobEncode 26.3ms ± 1% 26.3ms ± 0% ~ (p=0.853 n=10+10) Gzip 1.16s ± 1% 1.16s ± 0% -0.37% (p=0.008 n=10+9) Gunzip 210ms ± 0% 208ms ± 1% -1.01% (p=0.000 n=10+10) JSONEncode 48.0ms ± 0% 48.1ms ± 1% +0.29% (p=0.019 n=9+9) JSONDecode 348ms ± 1% 326ms ± 1% -6.34% (p=0.000 n=10+10) Mandelbrot200 6.62ms ± 0% 6.64ms ± 0% +0.37% (p=0.000 n=7+9) GoParse 23.9ms ± 1% 24.7ms ± 1% +2.98% (p=0.000 n=9+9) RegexpMatchEasy0_32 555ns ± 0% 561ns ± 0% +1.10% (p=0.000 n=8+10) RegexpMatchEasy0_1K 3.94µs ± 1% 3.94µs ± 0% ~ (p=0.906 n=9+8) RegexpMatchEasy1_32 516ns ± 0% 524ns ± 0% +1.51% (p=0.000 n=9+10) RegexpMatchEasy1_1K 4.39µs ± 1% 4.40µs ± 1% ~ (p=0.171 n=10+10) RegexpMatchMedium_32 25.1ns ± 0% 25.5ns ± 0% +1.51% (p=0.000 n=9+8) RegexpMatchMedium_1K 196µs ± 0% 203µs ± 1% +3.23% (p=0.000 n=9+10) RegexpMatchHard_32 11.2µs ± 1% 11.6µs ± 1% +3.62% (p=0.000 n=10+10) RegexpMatchHard_1K 334µs ± 1% 348µs ± 1% +4.21% (p=0.000 n=9+10) Revcomp 2.39s ± 0% 2.41s ± 0% +0.78% (p=0.000 n=8+9) Template 385ms ± 1% 336ms ± 0% -12.61% (p=0.000 n=10+9) TimeParse 2.18µs ± 1% 2.18µs ± 1% ~ (p=0.424 n=10+10) TimeFormat 2.28µs ± 1% 2.22µs ± 1% -2.30% (p=0.000 n=10+10) name old speed new speed delta GobDecode 19.8MB/s ± 1% 20.3MB/s ± 0% +2.56% (p=0.000 n=10+8) GobEncode 29.1MB/s ± 1% 29.2MB/s ± 0% ~ (p=0.810 n=10+10) Gzip 16.7MB/s ± 1% 16.8MB/s ± 0% +0.37% (p=0.007 n=10+9) Gunzip 92.2MB/s ± 0% 93.2MB/s ± 1% +1.03% (p=0.000 n=10+10) JSONEncode 40.4MB/s ± 0% 40.3MB/s ± 1% -0.28% (p=0.025 n=9+9) JSONDecode 5.58MB/s ± 1% 5.96MB/s ± 1% +6.80% (p=0.000 n=10+10) GoParse 2.42MB/s ± 0% 2.35MB/s ± 1% -2.83% (p=0.000 n=8+9) RegexpMatchEasy0_32 57.7MB/s ± 0% 57.0MB/s ± 0% -1.09% (p=0.000 n=8+10) RegexpMatchEasy0_1K 260MB/s ± 1% 260MB/s ± 0% ~ (p=0.963 n=9+8) RegexpMatchEasy1_32 62.1MB/s ± 0% 61.1MB/s ± 0% -1.53% (p=0.000 n=10+10) RegexpMatchEasy1_1K 233MB/s ± 1% 233MB/s ± 1% ~ (p=0.190 n=10+10) RegexpMatchMedium_32 39.8MB/s ± 0% 39.1MB/s ± 1% -1.74% (p=0.000 n=9+10) RegexpMatchMedium_1K 5.21MB/s ± 0% 5.05MB/s ± 1% -3.09% (p=0.000 n=9+10) RegexpMatchHard_32 2.86MB/s ± 1% 2.76MB/s ± 1% -3.43% (p=0.000 n=10+10) RegexpMatchHard_1K 3.06MB/s ± 1% 2.94MB/s ± 1% -4.06% (p=0.000 n=9+10) Revcomp 106MB/s ± 0% 105MB/s ± 0% -0.77% (p=0.000 n=8+9) Template 5.04MB/s ± 1% 5.77MB/s ± 0% +14.48% (p=0.000 n=10+9) Updates #32591 Change-Id: Id567e14a788e359248b2129ef1cf0adc8cc4ab7f Reviewed-on: https://go-review.googlesource.com/c/go/+/183457 Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Richard Musiol <neelance@gmail.com>

Currently, every call to mem() incurs a new DataView object. This was necessary because the wasm linear memory could grow at any time. Now, whenever the memory grows, we make a call to the front-end. This allows us to reuse the existing DataView object and create a new one only when the memory actually grows. This gives us a boost in performance during DOM operations, while incurring an extra trip to front-end when memory grows. However, since the GrowMemory calls are meant to decrease over the runtime of an application, this is a good tradeoff in the long run. The benchmarks have been tested inside a browser (Google Chrome 75.0.3770.90 (Official Build) (64-bit)). It is hard to get stable nos. for DOM operations since the jumps make the timing very unreliable. But overall, it shows a clear gain. name old time/op new time/op delta DOM 135µs ±26% 84µs ±10% -37.22% (p=0.000 n=10+9) Go1 benchmarks do not show any noticeable degradation: name old time/op new time/op delta BinaryTree17 22.5s ± 0% 22.5s ± 0% ~ (p=0.743 n=8+9) Fannkuch11 15.1s ± 0% 15.1s ± 0% +0.17% (p=0.000 n=9+9) FmtFprintfEmpty 324ns ± 1% 303ns ± 0% -6.64% (p=0.000 n=9+10) FmtFprintfString 535ns ± 1% 515ns ± 0% -3.85% (p=0.000 n=10+10) FmtFprintfInt 609ns ± 0% 589ns ± 0% -3.28% (p=0.000 n=10+10) FmtFprintfIntInt 938ns ± 0% 920ns ± 0% -1.92% (p=0.000 n=9+10) FmtFprintfPrefixedInt 950ns ± 0% 924ns ± 0% -2.72% (p=0.000 n=10+9) FmtFprintfFloat 1.41µs ± 1% 1.43µs ± 0% +1.01% (p=0.000 n=10+10) FmtManyArgs 3.66µs ± 1% 3.46µs ± 0% -5.43% (p=0.000 n=9+10) GobDecode 38.8ms ± 1% 37.8ms ± 0% -2.50% (p=0.000 n=10+8) GobEncode 26.3ms ± 1% 26.3ms ± 0% ~ (p=0.853 n=10+10) Gzip 1.16s ± 1% 1.16s ± 0% -0.37% (p=0.008 n=10+9) Gunzip 210ms ± 0% 208ms ± 1% -1.01% (p=0.000 n=10+10) JSONEncode 48.0ms ± 0% 48.1ms ± 1% +0.29% (p=0.019 n=9+9) JSONDecode 348ms ± 1% 326ms ± 1% -6.34% (p=0.000 n=10+10) Mandelbrot200 6.62ms ± 0% 6.64ms ± 0% +0.37% (p=0.000 n=7+9) GoParse 23.9ms ± 1% 24.7ms ± 1% +2.98% (p=0.000 n=9+9) RegexpMatchEasy0_32 555ns ± 0% 561ns ± 0% +1.10% (p=0.000 n=8+10) RegexpMatchEasy0_1K 3.94µs ± 1% 3.94µs ± 0% ~ (p=0.906 n=9+8) RegexpMatchEasy1_32 516ns ± 0% 524ns ± 0% +1.51% (p=0.000 n=9+10) RegexpMatchEasy1_1K 4.39µs ± 1% 4.40µs ± 1% ~ (p=0.171 n=10+10) RegexpMatchMedium_32 25.1ns ± 0% 25.5ns ± 0% +1.51% (p=0.000 n=9+8) RegexpMatchMedium_1K 196µs ± 0% 203µs ± 1% +3.23% (p=0.000 n=9+10) RegexpMatchHard_32 11.2µs ± 1% 11.6µs ± 1% +3.62% (p=0.000 n=10+10) RegexpMatchHard_1K 334µs ± 1% 348µs ± 1% +4.21% (p=0.000 n=9+10) Revcomp 2.39s ± 0% 2.41s ± 0% +0.78% (p=0.000 n=8+9) Template 385ms ± 1% 336ms ± 0% -12.61% (p=0.000 n=10+9) TimeParse 2.18µs ± 1% 2.18µs ± 1% ~ (p=0.424 n=10+10) TimeFormat 2.28µs ± 1% 2.22µs ± 1% -2.30% (p=0.000 n=10+10) name old speed new speed delta GobDecode 19.8MB/s ± 1% 20.3MB/s ± 0% +2.56% (p=0.000 n=10+8) GobEncode 29.1MB/s ± 1% 29.2MB/s ± 0% ~ (p=0.810 n=10+10) Gzip 16.7MB/s ± 1% 16.8MB/s ± 0% +0.37% (p=0.007 n=10+9) Gunzip 92.2MB/s ± 0% 93.2MB/s ± 1% +1.03% (p=0.000 n=10+10) JSONEncode 40.4MB/s ± 0% 40.3MB/s ± 1% -0.28% (p=0.025 n=9+9) JSONDecode 5.58MB/s ± 1% 5.96MB/s ± 1% +6.80% (p=0.000 n=10+10) GoParse 2.42MB/s ± 0% 2.35MB/s ± 1% -2.83% (p=0.000 n=8+9) RegexpMatchEasy0_32 57.7MB/s ± 0% 57.0MB/s ± 0% -1.09% (p=0.000 n=8+10) RegexpMatchEasy0_1K 260MB/s ± 1% 260MB/s ± 0% ~ (p=0.963 n=9+8) RegexpMatchEasy1_32 62.1MB/s ± 0% 61.1MB/s ± 0% -1.53% (p=0.000 n=10+10) RegexpMatchEasy1_1K 233MB/s ± 1% 233MB/s ± 1% ~ (p=0.190 n=10+10) RegexpMatchMedium_32 39.8MB/s ± 0% 39.1MB/s ± 1% -1.74% (p=0.000 n=9+10) RegexpMatchMedium_1K 5.21MB/s ± 0% 5.05MB/s ± 1% -3.09% (p=0.000 n=9+10) RegexpMatchHard_32 2.86MB/s ± 1% 2.76MB/s ± 1% -3.43% (p=0.000 n=10+10) RegexpMatchHard_1K 3.06MB/s ± 1% 2.94MB/s ± 1% -4.06% (p=0.000 n=9+10) Revcomp 106MB/s ± 0% 105MB/s ± 0% -0.77% (p=0.000 n=8+9) Template 5.04MB/s ± 1% 5.77MB/s ± 0% +14.48% (p=0.000 n=10+9) Updates golang#32591 Change-Id: Id567e14a788e359248b2129ef1cf0adc8cc4ab7f Reviewed-on: https://go-review.googlesource.com/c/go/+/183457 Run-TryBot: Agniva De Sarker <agniva.quicksilver@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Richard Musiol <neelance@gmail.com>

BenLubar · 2021-05-18T19:49:41Z

In my experience, a large portion of the time spent in trivial js.Value.*() calls is decoding the UTF-8 encoded strings, such as the member name.

As of May 2021, in cases like function calls, it might be significantly faster to cache the result of this.Get("functionName").Call("bind", this) and then use Invoke instead of Call.

dmitshur added Performance NeedsInvestigation arch-wasm labels Jun 13, 2019

dmitshur added this to the Go1.14 milestone Jun 13, 2019

This was referenced Jun 25, 2019

wasm: 3x performance overhead of using webassembly in node 8 #26277

Closed

misc/wasm: use {get/set}BigUint64 functions when BigInt hits stage4 #32785

Closed

rsc modified the milestones: Go1.14, Backlog Oct 9, 2019

hajimehoshi mentioned this issue Sep 2, 2020

Wasm on mobile browsers with an empty scene results in 35fps chrome, 10fps firefox. hajimehoshi/ebiten#1329

Closed

inkeliz mentioned this issue Dec 17, 2020

internal/glimpl: [wasm] remove syscall/js gioui/gio#4

Closed

inkeliz mentioned this issue Feb 14, 2022

cmd/compile: replace CallImport with go:wasmimport directive #38248

Closed

deferred-impact mentioned this issue Nov 14, 2023

Use obj.func.Invoke() instead of obj.Call() gowebapi/webidl-bind#12

Open

seankhliao added the OS-JS label Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

syscall/js: performance considerations #32591

syscall/js: performance considerations #32591

dmitshur commented Jun 13, 2019 •

edited

Loading

main.go

wasm.go

gopherjs.go

agnivade commented Jun 13, 2019

cherrymui commented Jun 13, 2019

dmitshur commented Jun 13, 2019 •

edited

Loading

eliasnaur commented Jun 13, 2019

dmitshur commented Jun 14, 2019 •

edited

Loading

hajimehoshi commented Jun 14, 2019

agnivade commented Jun 14, 2019

dmitshur commented Jun 14, 2019

gopherbot commented Jun 22, 2019

eliasnaur commented Aug 1, 2019 •

edited

Loading

agnivade commented Aug 1, 2019 •

edited

Loading

eliasnaur commented Aug 2, 2019

agnivade commented Aug 2, 2019

eliasnaur commented Aug 2, 2019

agnivade commented Aug 2, 2019

eliasnaur commented Aug 2, 2019 •

edited

Loading

agnivade commented Aug 17, 2019

neelance commented Aug 17, 2019

agnivade commented Aug 17, 2019

neelance commented Aug 17, 2019

ghost commented Aug 18, 2019

agnivade commented Aug 18, 2019

BenLubar commented May 18, 2021

syscall/js: performance considerations #32591

syscall/js: performance considerations #32591

Comments

dmitshur commented Jun 13, 2019 • edited Loading

main.go

wasm.go

gopherjs.go

agnivade commented Jun 13, 2019

cherrymui commented Jun 13, 2019

dmitshur commented Jun 13, 2019 • edited Loading

eliasnaur commented Jun 13, 2019

dmitshur commented Jun 14, 2019 • edited Loading

hajimehoshi commented Jun 14, 2019

agnivade commented Jun 14, 2019

dmitshur commented Jun 14, 2019

gopherbot commented Jun 22, 2019

eliasnaur commented Aug 1, 2019 • edited Loading

agnivade commented Aug 1, 2019 • edited Loading

eliasnaur commented Aug 2, 2019

agnivade commented Aug 2, 2019

eliasnaur commented Aug 2, 2019

agnivade commented Aug 2, 2019

eliasnaur commented Aug 2, 2019 • edited Loading

agnivade commented Aug 17, 2019

neelance commented Aug 17, 2019

agnivade commented Aug 17, 2019

neelance commented Aug 17, 2019

ghost commented Aug 18, 2019

agnivade commented Aug 18, 2019

BenLubar commented May 18, 2021

dmitshur commented Jun 13, 2019 •

edited

Loading

dmitshur commented Jun 13, 2019 •

edited

Loading

dmitshur commented Jun 14, 2019 •

edited

Loading

eliasnaur commented Aug 1, 2019 •

edited

Loading

agnivade commented Aug 1, 2019 •

edited

Loading

eliasnaur commented Aug 2, 2019 •

edited

Loading