Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall/js: increase performance of Call, Invoke, and New by not allowing new slices to escape onto the heap #39740

Closed
finnbear opened this issue Jun 21, 2020 · 69 comments
Labels
arch-wasm WebAssembly issues NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@finnbear
Copy link

finnbear commented Jun 21, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.4 linux/amd64

Does this issue reproduce with the latest release?

Yes, in go1.15beta1

What did you do?

I used js.Call, js.New, and js.Invoke with multiple arguments, hundreds of times per frame, 60 frames per second in order to implement my WebGL game.

What did you expect to see?

The above functions are absolutely essential for getting work done in js/wasm, as they are analogous to a simple function call. They should be optimized as such. No excess go heap memory should be allocated, no extra go garbage collection. After implementing a hack to fix the issue I will go on to describe (see below), here is a CPU profile of memory allocation:
image

Importantly, memory allocation is 8% of the total time and does not include makeArgs allocating any slices that end up on the heap

What did you see instead?

2+ new slices allocated on the heap per call to one of the above three functions, as the first few lines of the makeArgs function...

go/src/syscall/js/js.go

Lines 361 to 363 in 60f7876

func makeArgs(args []interface{}) ([]Value, []ref) {
argVals := make([]Value, len(args))
argRefs := make([]ref, len(args))

...possibly are assumed to escape to the heap via...

go/src/syscall/js/js.go

Lines 405 to 406 in 60f7876

func valueCall(v ref, m string, args []ref) (ref, bool)

...or valueNew/valueInvoke, or if go's escape analysis optimization doesn't work on the caller's stack.

A CPU profile shows the significant penalty associated with allocating too many slices (mallocgc now accounts for over 20%), and that doesn't even include the extra garbage collector load that claims a few FPS.
image

I thought of adding //go:noescape before each of the above and potentially other functions implemented in javascript, but I didn't get any improvement right away. Only my hack to makeArgs worked consistently:

const maxArgs = 9 // A better hack/fix would grow the slices automatically
var argVals = make([]Value, 0, maxArgs)
var argRefs = make([]ref, 0, maxArgs)

func makeArgs(args []interface{}) ([]Value, []ref) {
	for i, _ := range argVals {
		argVals[i] = Value{}
	}
	for i, _ := range argRefs { // I don't think this is strictly necessary
		argRefs[i] = 0
	}

	argVals = argVals[:len(args)]
	argRefs = argRefs[:len(args)]

	for i, arg := range args {
		v := ValueOf(arg)
		argVals[i] = v
		argRefs[i] = v.ref
	}
	return argVals, argRefs
}

Another potential solution, building on the above hack, would be to make makeArgs take slices, to put the values/refs in, as arguments (instead of returning new slices). The caller could deal with efficiently handling the memory. Maybe this could help go's escape analysis, in conjunction with //go:noescape.

Side note: Ideally, a solution would address the heap allocation of an interface for certain arguments. Hundreds of thousands get allocated before the garbage collector runs, causing the garbage collector to take 12-20ms.

@cagedmantis cagedmantis added this to the Backlog milestone Jun 24, 2020
@cagedmantis cagedmantis added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance labels Jun 24, 2020
@cagedmantis
Copy link
Contributor

/cc @neelance @cherrymui @hajimehoshi

@cagedmantis cagedmantis changed the title syscall/js: Increase performance of Call, Invoke, and New by not allowing new slices to escape onto the heap syscall/js: increase performance of Call, Invoke, and New by not allowing new slices to escape onto the heap Jun 24, 2020
@neelance
Copy link
Member

@cherrymui Would this be a good case for sync.Pool? Would it be okay to restrict the maximum number of arguments so we can have a pool of slices with a fixed size?

finnbear added a commit to finnbear/go that referenced this issue Aug 13, 2020
finnbear added a commit to finnbear/go that referenced this issue Aug 13, 2020
@finnbear
Copy link
Author

finnbear commented Aug 13, 2020

I've formulated 3 fully complete solutions to the problem:

  1. Using sync/pool as suggested by @neelance: master...finnbear:makeargs-syncpool
  • Performance: ~50% improvement to time spent in malloc related to makeArgs
  • Garbage collection: Some (mainly interfaces)
  • Benefits: Using sync/pool is standard, safe, and future-proof (multithreaded WASM)
  • Issues: Interface allocations undo half of the performance gain of not allocating slices, doesn't help performance for higher than X arguments (X = 16 in my example)
  1. Using a slice as a pool: master...finnbear:makeargs-slicepool
  • Performance: ~99.5% improvement to time spent in malloc related to makeArgs
  • Garbage collection: Very little
  • Benefits: Near perfect performance and is safe
  • Issues: Doesn't help peformance for higher than X arguments (X = 16 in my example), and not future-proof if multiple threads are to be supported.
  1. Using a single slice (like my original hack): master...finnbear:makeargs-singleslice
  • Performance: ~99.999% improvement to time spent in malloc related to makeArgs
  • Garbage collection: Almost none
  • Benefits: Perfect performance, simple, probably safe
  • Issues: I logically came to the conclusion that this is 100% safe but it may not seem that way (in the case of re-entrant recursive calls). It is also not future proof if multiple threads are to be supported.
    • In my understanding, JS/WASM uses a single, unified call tree
    • I don't think it's possible for execution to re-enter the code at a position that is unsafe

My suggestion is for you to:

  1. Investigate and implement solution 3
  2. Investigate and implement solution 2
  3. Try to get //go:noescape to work (I could not)
  4. Investigate and implement solution 1
  5. Do nothing

For what it's worth, I have validated all of the above solutions in my WebAssembly game to get an idea of their performance, and observed no crashing/panicking/issues.

@neelance @cherrymui @hajimehoshi

@finnbear
Copy link
Author

finnbear commented Jan 24, 2021

I've been looking for a solution to this ever since I filed the issue, and ended up with a patch that solves the problem (both the slice problem and the interface escaping to heap problem) for me. Specifically, heap allocations per frame went from ~1500-40,000 (almost all related to syscall/js) to ~60-300 (which are unrelated to syscall/js). Note that this makes ValueOf unsafe to call except during Call, Invoke, and New. If this was being integrated into the standard library, one solution would be to leave ValueOf as-is and make a second, unexported version with the noescape change.

patch -u $GOROOT/src/syscall/js/js.go -i see_below.patch
--- js.go.dist	2021-01-23 15:50:00.931644132 -0800
+++ js.go	2021-01-23 17:31:54.938784167 -0800
@@ -145,6 +145,11 @@
 	return valueGlobal
 }
 
+func noescape(foo interface{}) interface{} {
+	bar := uintptr(unsafe.Pointer(&foo))
+	return *((*interface{})(unsafe.Pointer(bar ^ 0)))
+}
+
 // ValueOf returns x as a JavaScript value:
 //
 //  | Go                     | JavaScript             |
@@ -159,7 +164,8 @@
 //  | map[string]interface{} | new object             |
 //
 // Panics if x is not one of the expected types.
-func ValueOf(x interface{}) Value {
+func ValueOf(a interface{}) Value {
+	x := noescape(a)
 	switch x := x.(type) {
 	case Value: // should precede Wrapper to avoid a loop
 		return x
@@ -215,11 +221,12 @@
 			o.Set(k, v)
 		}
 		return o
-	default:
-		panic("ValueOf: invalid value")
 	}
+	runtime.KeepAlive(a)
+	panic("ValueOf: invalid value")
 }
 
+//go:noescape
 func stringVal(x string) ref
 
 // Type represents the JavaScript type of a Value.
@@ -303,6 +310,7 @@
 	return r
 }
 
+//go:noescape
 func valueGet(v ref, p string) ref
 
 // Set sets the JavaScript property p of value v to ValueOf(x).
@@ -317,6 +325,7 @@
 	runtime.KeepAlive(xv)
 }
 
+//go:noescape
 func valueSet(v ref, p string, x ref)
 
 // Delete deletes the JavaScript property p of value v.
@@ -329,6 +338,7 @@
 	runtime.KeepAlive(v)
 }
 
+//go:noescape
 func valueDelete(v ref, p string)
 
 // Index returns JavaScript index i of value v.
@@ -342,6 +352,7 @@
 	return r
 }
 
+//go:noescape
 func valueIndex(v ref, i int) ref
 
 // SetIndex sets the JavaScript index i of value v to ValueOf(x).
@@ -356,11 +367,24 @@
 	runtime.KeepAlive(xv)
 }
 
+//go:noescape
 func valueSetIndex(v ref, i int, x ref)
 
-func makeArgs(args []interface{}) ([]Value, []ref) {
-	argVals := make([]Value, len(args))
-	argRefs := make([]ref, len(args))
+var (
+	argValsSlice []Value
+	argRefsSlice []ref
+)
+
+func makeArgs(args []interface{}) (argVals []Value, argRefs []ref) {
+	for i, _ := range argValsSlice {
+		argValsSlice[i] = Value{}
+	}
+	if len(args) > cap(argValsSlice) {
+		argValsSlice = make([]Value, 0, len(args))
+		argRefsSlice = make([]ref, 0, len(args))
+	}
+	argVals = argValsSlice[:len(args)]
+	argRefs = argRefsSlice[:len(args)]
 	for i, arg := range args {
 		v := ValueOf(arg)
 		argVals[i] = v
@@ -380,6 +404,7 @@
 	return r
 }
 
+//go:noescape
 func valueLength(v ref) int
 
 // Call does a JavaScript call to the method m of value v with the given arguments.
@@ -402,6 +427,7 @@
 	return makeValue(res)
 }
 
+//go:noescape
 func valueCall(v ref, m string, args []ref) (ref, bool)
 
 // Invoke does a JavaScript call of the value v with the given arguments.
@@ -421,6 +447,7 @@
 	return makeValue(res)
 }
 
+//go:noescape
 func valueInvoke(v ref, args []ref) (ref, bool)
 
 // New uses JavaScript's "new" operator with value v as constructor and the given arguments.
@@ -440,6 +467,7 @@
 	return makeValue(res)
 }
 
+//go:noescape
 func valueNew(v ref, args []ref) (ref, bool)
 
 func (v Value) isNumber() bool {
@@ -539,8 +567,10 @@
 	return string(b)
 }
 
+//go:noescape
 func valuePrepareString(v ref) (ref, int)
 
+//go:noescape
 func valueLoadString(v ref, b []byte)
 
 // InstanceOf reports whether v is an instance of type t according to JavaScript's instanceof operator.
@@ -551,6 +581,7 @@
 	return r
 }
 
+//go:noescape
 func valueInstanceOf(v ref, t ref) bool
 
 // A ValueError occurs when a Value method is invoked on
@@ -577,6 +608,7 @@
 	return n
 }
 
+//go:noescape
 func copyBytesToGo(dst []byte, src ref) (int, bool)
 
 // CopyBytesToJS copies bytes from src to dst.
@@ -591,4 +623,5 @@
 	return n
 }
 
+//go:noescape
 func copyBytesToJS(dst ref, src []byte) (int, bool)

@hajimehoshi
Copy link
Member

I'm excited about this suggestion and change, @finnbear!

@hajimehoshi
Copy link
Member

Issues: I logically came to the conclusion that this is 100% safe but it may not seem that way (in the case of re-entrant recursive calls). It is also not future proof if multiple threads are to be supported.

I believe so too: the argument slice is used only when a callback is invoked from the Go side, and this invoking should never happen recursively. @neelance what do you think?

I don't have any insights about go:noescape, but do we really need them?

@neelance
Copy link
Member

The documentation of go:noescape says:

The //go:noescape directive must be followed by a function declaration without a body (meaning that the function has an implementation not written in Go). It specifies that the function does not allow any of the pointers passed as arguments to escape into the heap or into the values returned from the function. This information can be used during the compiler's escape analysis of Go code calling the function.

If I understand this correctly, then //go:noescape makes no difference for func valueIndex(v ref, i int) ref since neither ref nor int are pointers.

For a case like func copyBytesToGo(dst []byte, src ref) (int, bool) it should be fine to apply //go:noescape because the pointer of []byte is only accessed inside of copyBytesToGo and not stored anywhere else.

It might also apply to string parameters.

I don't understand what's going on with x := noescape(a) but my gut feeling is bad. 😄
Please investigate which lines exactly are responsible for moving the value to the heap.

@hajimehoshi
Copy link
Member

I was still wondering if we really need //go:noescape if 3. (using a single slice) could work. Or, am I missing something?

@finnbear
Copy link
Author

finnbear commented Jan 25, 2021

I'm excited about this suggestion and change, @finnbear!

Me too!

I don't have any insights about go:noescape, but do we really need them?

Yes, at the minimum for valueNew, valueCall, valueInvoke, copyBytesToJS (because those are called quite often in my use case) and take slices. Without this directive, those functions maintain the pointers of their arguments for all GC knows, meaning that those arguments (slices and contents of those slices) must be stored on the heap.

If I understand this correctly, then //go:noescape makes no difference for func valueIndex(v ref, i int) ref since neither ref nor int are pointers.

You're 100% right. I just added it to all functions on the basis that none should cause pointers to escape, but it would have no effect when there are no pointers involved. It should be applied in the minimal number of cases though, so that one should be removed.

I believe so too: the argument slice is used only when a callback is invoked from the Go side, and this invoking should never happen recursively.

To elaborate on this, I think it is safe even if the call is re-entrant. The chronology of events would look like this:

  1. js.Global().Call("foo", 1)
  2. makeArgs sets global slice to [1]
  3. call to valueCall
  4. loadSliceOfValues transfers [1] to a JS copy
  5. call to user function "foo" with the JS copy
  6. user function calls Go function
  7. Go function calls js.Global().Call("bar", 2)
  8. makeArgs sets global slice to [2]
  9. call to valueCall
  10. loadSliceOfValues transfers [2] to a JS copy
  11. call to user function "bar" with the JS copy

I might be missing something but I don't think there is an issue with the global slice approach (as long as wasm/js is single threaded). Also, maybe //go:nosplit could be used to guarantee no weird edge case happens?

For a case like func copyBytesToGo(dst []byte, src ref) (int, bool) it should be fine to apply //go:noescape because the pointer of []byte is only accessed inside of copyBytesToGo and not stored anywhere else.

It might also apply to string parameters.

Yes, and yes.

I don't understand what's going on with x := noescape(a) but my gut feeling is bad. smile
Please investigate which lines exactly are responsible for moving the value to the heap.

Yeah, I can see why that looks bad.

What it does: The noescape function disassociates the parameter a from the variable x from an escape analysis perspective. Despite the fact that the parameter can escape to the heap (such as in the case Value), it can only do so for the duration that it is on the caller's stack (if called from New, Call, or Invoke).

How it works: XOR-ing a pointer type with 0 doesn't change the pointer, but Go can't track that the pointer is the same.

Why it's needed: First, we need a command that shows us when and why things escape in src/syscall/js

Basic 'does it escape' check: GOOS=js GOARCH=wasm go build -gcflags="-m"

Running this, it can be determined that too many variables called args and the parameter to ValueOf escape to the heap

Advanced 'why does it escape' check: GOOS=js GOARCH=wasm go build -gcflags="-m -m"

Running this, it can be determined that there are two ways things escape.

  1. Through functions implemented in JS where Go doesn't know any better, hence //go:noescape
  2. Through ValueOf. It seems like the three cases responsible for this are Value (direct escape), Wrapper (interface/virtual function call, that Go can't know if the pointer is escaped via), and string (JS method stringVal, which may or may not be fixable with //go:noescape there)

How it could be made better/more "safe":

  • The new ValueOf is unsafe to call if it's return value is stored somewhere, so we can't let users get their hands on it. This is something I ignored for my project, since I am the only user and I know what not to do. One solution would be to create a copy of ValueOf that lets it's parameter escape, and an unexported version that doesn't.
  • Maybe the noescape() function could be moved into New, Call, and Invoke. This would fix the above issue of needing to duplicate ValueOf but I don't think it would work unless ValueOf was inlined into New, Call, and Invoke
  • Moving around runtime.KeepAlive() from the end of the internal version of ValueOf to the end of Call, Invoke, and New.
  • I might be missing an easier way to get the same effect. Maybe some general/compiler developers/implementers have a better way.

I was still wondering if we really need //go:noescape if 3. (using a single slice) could work. Or, am I missing something?

There are two types of heap allocations and slice(s) are only one of them. In my original proposal, I didn't propose a way to eliminate the second type of allocations: interfaces. As long as the slices contain pointers related to interfaces, and the ValueOf function goes unchanged, any parameters passed to Call, Invoke, and New will be heap allocated. Even word-sized values (int64, float64, etc.) and sub-word-sized values (byte, etc.) due to a lack of a compiler optimization (these types are common in a WebGL app). Getting rid of slice allocations is a good first and easy step, but both are issues. (Note: another possibility for slowing down interface allocations is to support pointer parameters i.e. both *float64 and float64. This would allow the caller to maintain a buffer of interfaces of those pointers, and reassign them before passing them to Call, Invoke, New, and ValueOf. This is 'a' solution but not something I would prefer as the caller.)

I'd be happy to investigate more (how the patch could be made better, safer, and more broadly applicable) and answer other questions, as I want to get something like this into the standard library so that something as basic as a function call doesn't allocate. Also, I've deployed the patch to my game and will continue to monitor it (so far no issues, bearing in mind I know not to call the new ValueOf function anywhere else. I already mentioned how much allocations improved, according to runtime.ReadMemStats).

Thanks @hajimehoshi and @neelance for developing wasm/js and considering this proposal as a way to improve it!

@neelance
Copy link
Member

Okay, I think I slowly manage to wrap my head around this...

The x := noescape(a) is a hack, I think it is very unlikely for this to go upstream as it can easily have bugs. We should rather aim for a variant of ValueOf(x interface{}) Value with which Go can properly determine that x does not escape. This variant should probably return ref instead of Value.

You have mentioned the three problematic cases:

  • case Value: Returning x.ref should easily solve this.
  • case string: This actually needs to allocate a new ref which needs garbage collection. However, we can explicitly garbage collect after valueCall so we don't need the implicit garbage collection of makeValue and the gcPtr field.
  • case Wrapper: I am not sure here. An interface value is not mutable, so maybe we can just copy it somehow? Maybe it would suffice to do y := x and then y.JSValue()?

@finnbear
Copy link
Author

finnbear commented Jan 25, 2021

@neelance

The x := noescape(a) is a hack

You're not wrong 😉

I think it is very unlikely for this to go upstream as it can easily have bugs

I see what you mean. However, I have read runtime source code and seen similar hacks in performance critical segments like facilitating function calls and garbage collection. If your other idea didn't work, it might be worth consulting with other Go developers to see if the correctness of a solution can be proven despite it using a hack.

This variant should probably return ref instead of Value.

I did not consider that. Sounds good, since ref isn't a pointer.

we can explicitly garbage collect after valueCall

I hope you don't mean calling runtime.GC() on every single valueCall. That would "solve" the allocation while killing the performance. I think I understand what you mean: you can explicitly collect temporary refs created for a single function call (by neglecting to do this, I was leaking strings 😢)

An interface value is not mutable, so maybe we can just copy it somehow?

You're right that the interface's pointer is immutable. However, the relevant issue is where that pointer points to (heap vs off heap). GC doesn't really know if the variant of JSValue used by an arbitrary Wrapper type is like this:

type BadWrapper struct {
    Value js.Value
}

var escapeRoute *BadWrapper

// Implements js.Wrapper
func (this *BadWrapper) JSValue() js.Value {
    escapeRoute = this
    return this.Value
}

so it assumes the worst. This is why a solution specific to Call, Invoke, and New may be warranted where the argument is known to be on the caller's stack (unless Wrapper was changed to return a ref and not a Value interface)

Side note: As far as I know, the only type I use that implements js.Wrapper is js.Func. If Func were used instead of Wrapper in the switch statement, maybe Go could figure out that the receiver is not maliciously escaping the pointer.

Maybe it would suffice to do y := x and then y.JSValue()?

I doubt this will work (I think GC will see right through it) but I can test it.

@neelance
Copy link
Member

I doubt this will work (I think GC will see right through it) but I can test it.

Right, it won't work. I thought the problematic pointer was the interface itself, but it is the value that the interface contains.

@neelance
Copy link
Member

The only option I see is to get rid of Wrapper...

I would be okay with the tradeoff. syscall/js is a low level package, so performance is more important than ease of use.

@neelance
Copy link
Member

@dennwc What's your take on this? You contributed http://golang.org/cl/141644 which we would need to revert.

@dennwc
Copy link
Contributor

dennwc commented Jan 26, 2021

I'm all for performance here, so feel free to revert.

That's unfortunate though, since it was quite useful to allow custom objects to be passed to native syscall/js functions. Without it high-level libraries had to wrap every method of syscall/js. This might be fine if performance is at stake here.

Is there any way we can solve this? E.g. make a private interface method on Wrapper, so only the syscall/js.Value can implement it, but still allow to embed it into custom type. Would it solve an issue?

@finnbear
Copy link
Author

finnbear commented Jan 26, 2021

Is there any way we can solve this? E.g. make a private interface method on Wrapper, so only the syscall/js.Value can implement it, but still allow to embed it into custom type. Would it solve an issue?

Let's test it!

As a baseline, with no syscall/js optimizations, my game does 708 allocations per frame (I am benchmarking in a very constant state in game, so the numbers will be comparable later. Since I'm testing on a development site, I don't have access to the game state that generated 40,000 allocations per frame, but any improvement that follows should be more than proportional).

I'll start by applying //go:noescape to stringVal, because without that, we won't get anywhere with ValueOf's parameter x not escaping to the heap.

Here's the optimization output before your idea:

./js.go:162:14: parameter x leaks to {heap} with derefs=0:
./js.go:162:14:   flow: {temp} = x:
./js.go:162:14:   flow: x = {temp}:
./js.go:162:14:     from case Wrapper: return x.JSValue() (switch case) at ./js.go:166:2
./js.go:162:14:   flow: {heap} = x:
./js.go:162:14:     from x.JSValue() (call parameter) at ./js.go:167:19

And here is it after simply un-exporting the Wrapper JS value method:

./js.go:162:14: parameter x leaks to ~r1 with derefs=1:
./js.go:162:14:   flow: {temp} = x:
./js.go:162:14:   flow: x = *{temp}:
./js.go:162:14:     from case Value: return x (switch case) at ./js.go:164:2
./js.go:162:14:   flow: ~r1 = x:
./js.go:162:14:     from return x (return) at ./js.go:165:3 // case Value: return x

Under my understanding, ~r1 is the return value, which is far better than the heap. Now the question is: is this still sufficient to get the same level of allocation reduction as my earlier hack approach?

I'll continue testing by applying my makeArgs single slice patch, to get rid of slice allocations. My game now runs at 336 allocations per frame

Now I'll //go:noescape all JS-implemented functions that take pointers. Still 336 allocations per frame. Here's an optimization ouptut of Value.New:

./js.go:447:20: parameter args leaks to {heap} with derefs=1:
./js.go:447:20:   flow: args = args:
./js.go:447:20:     from makeArgs(args) (call parameter) at ./js.go:448:30
./js.go:447:20:   flow: {temp} = args:
./js.go:447:20:   flow: arg = *{temp}:
./js.go:447:20:     from for loop (range-deref) at ./js.go:380:16
./js.go:447:20:   flow: x = arg:
./js.go:447:20:     from ValueOf(arg) (call parameter) at ./js.go:381:15
./js.go:447:20:   flow: {temp} = x:
./js.go:447:20:   flow: x = {temp}:
./js.go:447:20:     from case Wrapper: return x.jsValue() (switch case) at ./js.go:166:2
./js.go:447:20:   flow: {heap} = x:
./js.go:447:20:     from x.jsValue() (call parameter) at ./js.go:167:19

So maybe wrapper is a problem after all...let's try removing it entirely and putting the concrete type Func in the switch case. Wow...only 59 allocations per frame. That's about how many allocations go on outside of syscall/js. And we didn't even have to do a hack.

Unfortunately, unless there is something I'm missing or you are willing to use the hack method, getting rid of Wrapper is what is required.

Here are the patches that ended up eliminating the syscall/js allocations:
js.patch.txt
func.patch.txt

Edit: There is some weirdness going on. Although allocations are definitively at 59 per frame, the optimization output from Value.New still says args escapes to heap:

./js.go:436:20: parameter args leaks to {heap} with derefs=2:
./js.go:436:20:   flow: args = args:
./js.go:436:20:     from makeArgs(args) (call parameter) at ./js.go:437:30
./js.go:436:20:   flow: {temp} = args:
./js.go:436:20:   flow: arg = *{temp}:
./js.go:436:20:     from for loop (range-deref) at ./js.go:369:16
./js.go:436:20:   flow: x = arg:
./js.go:436:20:     from ValueOf(arg) (call parameter) at ./js.go:370:15
./js.go:436:20:   flow: {temp} = x:
./js.go:436:20:   flow: x = *{temp}:
./js.go:436:20:     from case Func: return x.Value (switch case) at ./js.go:155:2
./js.go:436:20:   flow: ~r1 = x:
./js.go:436:20:     from x.Value (dot) at ./js.go:156:11
./js.go:436:20:     from return x.Value (return) at ./js.go:156:3
./js.go:436:20:   flow: v = ~r1:
./js.go:436:20:     from v := ValueOf(arg) (assign) at ./js.go:370:5
./js.go:436:20:   flow: {heap} = v:
./js.go:436:20:     from argVals[i] = v (assign) at ./js.go:371:14

And Invoke, which I call a lot, looks like this (GC also claims args escape to heap)

./js.go:416:23: parameter args leaks to {heap} with derefs=2:
./js.go:416:23:   flow: {heap} = **args:
./js.go:416:23:     from makeArgs(args) (call parameter) at ./js.go:417:30

Looks like they both "escape" via my single slice optimization in makeArgs.

There are a few possibilities:

  1. GC figures out that the functions don't escape when compiling them into a larger app like my game
  2. There are multiple types of escaping to the heap and some don't cause excessive memory allocations

@neelance @hajimehoshi @dennwc

@neelance
Copy link
Member

Unfortunately, unless there is something I'm missing or you are willing to use the hack method, getting rid of Wrapper is what is required.

The "hack" is not a proper solution for upstream, right? Because as far as I can see, the hack is not a workaround to a shortcoming of the compiler. The complier is actually right that Wrapper can escape, so circumventing this check with a hack means that it can then break at runtime.

Removing Wrapper seems like the only way forward. I believe that this change needs to go through the proposal process. @finnbear Would you mind creating a small proposal for the removal of Wrapper with a summary of our discussion and post it as a new issue?

@finnbear
Copy link
Author

finnbear commented Jan 28, 2021

The complier is actually right that Wrapper can escape

If we un-exported it, and implemented it in a way that it can't, that might not be the case. However, from what I have tried so far, even though it doesn't escape to heap, the allocations still happen somewhere in the call stack.

The "hack" is not a proper solution for upstream, right?

Not as written, no. It might be able to be adapted into a proper solution though.

Would you mind creating a small proposal for the removal of Wrapper

I would be happy to (after I spend about 30 more minutes trying different combinations of ways to keep Wrapper and eliminate allocations, just to make sure removing it is the only way)!

@finnbear
Copy link
Author

@neelance proposal submitted: #44006

Note that, as part of the proposal, I propose yet another way to solve this very issue. It intentionally makes no effort to make the argument/ref slices global in any way. Here's a snippet:

makeArgs gets replaced with:

func storeArgs(args []interface{}, argValsDst []Value, argRefsDst []ref) {
	for i, arg := range args {
		v := ValueOf(arg)
		argValsDst[i] = v
		argRefsDst[i] = v.ref
	}
}

// This function must be inlined
func makeArgs(args []interface{}) (argVals []Value, argRefs []ref) {
	const maxStackArgs = 16
	if len(args) <= maxStackArgs {
		// Allocated on the stack
		argVals = make([]Value, len(args), maxStackArgs)
		argRefs = make([]ref, len(args), maxStackArgs)
	} else {
		// Allocated on the heap
		argVals = make([]Value, len(args))
		argRefs = make([]ref, len(args))
	}
	return
}

Calls to makeArgs get replaced with:

argVals, argRefs := makeArgs(args)
storeArgs(args, argVals, argRefs)

finnbear referenced this issue in BenLubar-PR/golang-go Aug 9, 2021
The existence of the Wrapper interface causes unavoidable heap
allocations in syscall/js.ValueOf. See the linked issue for a full
explanation.

Fixes golang#44006.
@neelance
Copy link
Member

@finnbear I just looked into your patch some more. The makeArgs/storeArgs solution is really clever. 🎯

Do the //go:noescape on valuePrepareString and valueLoadString make any difference?

@abrander
Copy link

abrander commented Dec 21, 2021

I'm not sure if this is the correct place to raise this issue, forgive me if it's not.

I can understand the reasoning for removing the Wrapper case from ValueOf. But why remove the interface itself? It's incredible useful for types embedding js.Value.

The Wrapper type makes it possible to do stuff like this:

package a

import (
	"syscall/js"
)

func DoStuffInJavascriptLand(obj js.Wrapper) {
	js.Global().Call("some_js_function", obj.JSValue())
}
package b

import (
	"syscall/js"
)

type Thingie struct {
	js.Value
}


type AnotherThingie struct {
	js.Value
}

func NewThingie() *Thingie {
	return &Thingie{Value: js.Global.New("something")}
}

func NewAnotherThingie() *Thingie {
	return &AnotherThingie{Value: js.Global.New("something")}
}
package main

import (
	"a"
	"b"
)

func main() {
	t1 := b.NewThingie()
	t2 := b.NewAnotherThingie()

	a.DoStuffInJavascriptLand(t1)
	a.DoStuffInJavascriptLand(t2)
}

I would imagine we will see a lot of type Wrapper interface {JSValue() js.Value} scattered across packages after the removal of js.Wrapper.

Did I miss something?

@finnbear
Copy link
Author

But why remove the interface itself? It's incredible useful for types embedding js.Value.

The reason is not a technical limitation, because as you point out, as long as the case is removed from ValueOf, the performance issue is resolved.

I think it has more to do with how confusing it would have been to keep the Wrapper interface without allowing it to work in Call, New, and Invoke. At that point, it would be more of a suggestion of how to organize code, without making the code any simpler.

Also, code using Wrapper will now refuse to compile, alerting the programmer that they have to do some refactoring. If Wrapper was kept, the code would compile but panic at runtime, which seems undesirable.

I would imagine we will see a lot of type Wrapper interface {JSValue() js.Value} scattered across packages after the removal of js.Wrapper.

You might be right, and if this happens, you could write a proposal to re-instate Wrapper.

However, if you briefly assume that Wrapper never existed and was never part of the Call, New, or Invoke apis, would it make sense to add it to the standard library (without it working in Call, New, or Invoke)? I doubt it would make sense.

@abrander
Copy link

I think it has more to do with how confusing it would have been to keep the Wrapper interface without allowing it to work in Call, New, and Invoke. At that point, it would be more of a suggestion of how to organize code, without making the code any simpler.

Maybe. But just for the record: ValueOf accepting a Wrapper was never documented anywhere as far as I know. If the documentation were your only guide, you wouldn't even know that ValueOf accepted a Wrapper.

Also, code using Wrapper will now refuse to compile, alerting the programmer that they have to do some refactoring. If Wrapper was kept, the code would compile but panic at runtime, which seems undesirable.

Code implicitly using js.Wrapper will refuse to compile, so far we agree. But all the code that relied upon the (undocumented) feature of ValueOf accepting a js.Wrapper will experience a runtime panic. The (non-)presence of the Wrapper interface changes nothing about this.

I would imagine we will see a lot of type Wrapper interface {JSValue() js.Value} scattered across packages after the removal of js.Wrapper.
You might be right, and if this happens, you could write a proposal to re-instate Wrapper.

I wish I knew how :-)

I now realize it's worse than I imagined. The JSValue() method disappeared too. We can't even fix this by adding a Wrapper in every package.

However, if you briefly assume that Wrapper never existed and was never part of the Call, New, or Invoke apis, would it make sense to add it to the standard library (without it working in Call, New, or Invoke)? I doubt it would make sense.

I'm uncertain. It fitted a purpose - recognizing types backed by a js.Value. Either by embedding a js.Value or by satisfying the interface in another way. How can packages in any meaningful way express that they accept a js.Value wrapped in anything without a Wrapper-like interface post 1.18? That's how I saw js.Wrapper used.

My suggestion as of now would be to write a small wrapper for syscall/js itself, adding JSValue() and Wrapper. But that won't help inter-project calling much.

TLDR: js.Wrapper gave the whole community an interface for "everything backed by a javascript value". We don't have that anymore.

@finnbear
Copy link
Author

I wish I knew how :-)

The good news is that submitting a proposal is as simple as creating a GitHub issue with a title similar to proposal: syscall/js: Reinstate js.Wrapper interface. The issue description would include a brief justification for the change to add back js.Wrapper.

You make some good points, so there is a chance your proposal would be accepted :)

@garet90
Copy link
Contributor

garet90 commented Oct 20, 2022

Any updates on this and #49799 ?

@finnbear
Copy link
Author

finnbear commented Oct 21, 2022

Any updates on this and #49799 ?

AFAIK the PR happened to coincide with a release freeze.

I haven't touched Go since submitting it, and It looks like a fairly trivial merge conflict materialized (interface{} -> any). I might attempt to fix it, but I have deleted the Go beta toolchain I used for the PR to save disk space, so hopefully testing locally isn't required.

If anyone wants to help move the PR forward, I'd be happy to grant access to my fork.

@garet90

@garet90
Copy link
Contributor

garet90 commented Dec 21, 2022

Sure, we're currently in a release freeze right now but I'd be happy to help push it forward after

@finnbear
Copy link
Author

Sure, we're currently in a release freeze right now but I'd be happy to help push it forward after

Cool, I've invited you to be a collaborator on my go fork, which should probably allow you to edit the PR (https://github.com/finnbear/go/tree/makeargs-stackslice). Feel free to pursue other options like making your own fork/PR with the same changes if it is easier.

@Pryz Pryz added the arch-wasm WebAssembly issues label Mar 15, 2023
@mokiat
Copy link

mokiat commented Apr 16, 2023

I am hitting this same problem. I am happy to see that someone has reported it already, though sad to see that it lingers for nearly 3 years 😢

In my case, I am making a large number of WebGL calls and the JS heap jumps from 2MB to 4MB in the timespan of 200ms-400ms. It gets GCed and then all over again, resulting in a jagged heap chart. This, combined with #54444 , causes the whole game to stutter and the CPU to spike. In contrast, the native implementation of the same game has a steady heap profile.

I tried to follow the thread above and as far as I can tell the #49799 MR should help fix this. I am not sure I understood how the js.Wrapper fits into all of this, though.

Is there any chance that with #57968 this would get more traction soon? I would be happy to lend my help, though I fear my understandings in the low-level inner workings of the Go runtime might not be up to the task yet.

EDIT: After a bit more troubleshooting, it appears that while this issue is related, the majority of the allocations might actually be occuring inside the wasm_exec.js file (syscall/js.valueInvoke and syscall/js.copyBytesToJS->subarray).

@unitoftime
Copy link

unitoftime commented Apr 1, 2024

Hi I am also running into this same issue and would love to see it fixed.

This is from an 'alloc_space' memory profile in WASM (go version go1.22.1 linux/amd64), I'm seeing ~39% of my allocations happening in js.makeArgs, specifically at the two slice allocations flagged in the original issue description:
image
image

Happy to try and help, if I can! Also would be happy if there was a workaround to reduce my allocations by calling different APIs, but I couldn't find anything.

@finnbear
Copy link
Author

finnbear commented Apr 1, 2024

@mokiat @unitoftime AFAIK, someone just needs to revive this PR which unfortunately landed during a release freeze. I've since migrated to Rust and don't necessarily have the Go toolchains installed or the relevant details in mind (e.g. the meaning of //go:wasmimport gojs syscall/js.funcName). Consider making a new PR with similar changes! I recommend applying the diff to your installed Go toolchain and recompiling your application to test.

@unitoftime
Copy link

Cool. Thanks for the info and initial research! I'll take a more in depth look next chance I get.

unitoftime added a commit to unitoftime/go that referenced this issue Apr 4, 2024
The existing implementation causes unecessary heap allocations for
javascript syscalls: Call, Invoke, and New. The new change seeks
to hint the Go compiler to allocate arg slices with length <=16
to the stack.

Fixes golang#39740
unitoftime added a commit to unitoftime/go that referenced this issue Apr 4, 2024
The existing implementation causes unnecessary heap allocations for
javascript syscalls: Call, Invoke, and New. The new change seeks
to hint the Go compiler to allocate arg slices with length <=16
to the stack.

Fixes golang#39740
unitoftime pushed a commit to unitoftime/go that referenced this issue Apr 4, 2024
The existing implementation causes unnecessary heap allocations for
javascript syscalls: Call, Invoke, and New. The new change seeks to
hint the Go compiler to allocate arg slices with length <=16 to the stack.

Fixes golang#39740
unitoftime pushed a commit to unitoftime/go that referenced this issue Apr 4, 2024
The existing implementation causes unnecessary heap allocations for
javascript syscalls: Call, Invoke, and New. The new change seeks to
hint the Go compiler to allocate arg slices with length <=16 to the stack.

Fixes golang#39740
unitoftime pushed a commit to unitoftime/go that referenced this issue Apr 4, 2024
The existing implementation causes unnecessary heap allocations for
javascript syscalls: Call, Invoke, and New. The new change seeks to
hint the Go compiler to allocate arg slices with length <=16 to the stack.

Fixes golang#39740
@gopherbot
Copy link

Change https://go.dev/cl/576575 mentions this issue: syscall/js: allocate arg slices on stack for small numbers of args

unitoftime pushed a commit to unitoftime/go that referenced this issue Apr 7, 2024
The existing implementation causes unnecessary heap allocations for
javascript syscalls: Call, Invoke, and New. The new change seeks to
hint the Go compiler to allocate arg slices with length <=16 to the
stack.

Original Work: CL 367045
- Calling a JavaScript function with 16 arguments or fewer will not
induce two additional heap allocations, at least with the current Go
compiler.
- Using syscall/js features with slices and strings of
statically-known length will not cause them to be escaped to the heap,
at least with the current Go compiler.
- The reduction in allocations has the additional benefit that the
garbage collector runs less often, blocking WebAssembly's one and only
thread less often.

Fixes golang#39740
unitoftime pushed a commit to unitoftime/go that referenced this issue Apr 18, 2024
The existing implementation causes unnecessary heap allocations for
javascript syscalls: Call, Invoke, and New. The new change seeks to
hint the Go compiler to allocate arg slices with length <=16 to the
stack.

Original Work: CL 367045
- Calling a JavaScript function with 16 arguments or fewer will not
induce two additional heap allocations, at least with the current Go
compiler.
- Using syscall/js features with slices and strings of
statically-known length will not cause them to be escaped to the heap,
at least with the current Go compiler.
- The reduction in allocations has the additional benefit that the
garbage collector runs less often, blocking WebAssembly's one and only
thread less often.

Fixes golang#39740
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Apr 18, 2024
@dmitshur dmitshur modified the milestones: Backlog, Go1.23 Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-wasm WebAssembly issues NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet