Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: wasm GC causes stop the world for a very long time #54444

Open
mokiat opened this issue Aug 13, 2022 · 4 comments
Open

runtime: wasm GC causes stop the world for a very long time #54444

mokiat opened this issue Aug 13, 2022 · 4 comments
Assignees
Labels
arch-wasm WebAssembly issues compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@mokiat
Copy link

mokiat commented Aug 13, 2022

What version of Go are you using (go version)?

$ go version
go version go1.19 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/user/.cache/go-build"
GOENV="/home/user/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/user/.go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/user/.go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.19"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.19/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.19"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build306969192=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I am working on a hobby game engine that has both Native (OpenGL) and WASM (WebGL2) support.

When running a larger scene in WebAssembly I notice that whenever a GC is triggered it causes everything to freeze for about a second or more.

Actually, it happens with smaller scenes too but just less often (due to GC triggering less often). I think that the duration of the freeze is proportional to the number of active pointer references but this needs more testing to verify.

I try not to allocate much memory but it happens that once a frame I might allocate a few KBs here and there (I have a custom react-style Go UI framework that is hard to make allocation-free due to tree invalidation logic and besides, one should be able to make use of Go's memory management). This means that after 15s-60s everything just freezes.

I managed to write a dummy program that reproduces the issue:

var (
	statsRunning int32
	gcStats      = new(debug.GCStats)
	cached       []byte
)

func main() {
	// Force the GC to trigger more often. If you remove this, you might need
	// to wait up to a minute in some cases.
	debug.SetGCPercent(5)

	aLotOfData := make([]int, 20_000_000)
	aLotOfRefData := make([]*int, 20_000_000)
	for i := range aLotOfRefData {
		aLotOfRefData[i] = &aLotOfData[i]
	}

	callbackFunc := js.FuncOf(func(this js.Value, args []js.Value) any {
		frameStart := time.Now()

		// Check stats once in a while.
		if atomic.CompareAndSwapInt32(&statsRunning, 0, 1) {
			go func() {
				debug.ReadGCStats(gcStats)
				if len(gcStats.Pause) > 0 {
					log.Printf("GC STATS! Total: %d; Time since last GC: %s; Pause total: %s; Pause last: %s",
						gcStats.NumGC,
						time.Since(gcStats.LastGC),
						gcStats.PauseTotal,
						gcStats.Pause[0],
					)
				}
				atomic.StoreInt32(&statsRunning, 0)
			}()
		}

		processingTime := time.Now()

		// Do some processing so that the frame takes some time and so that the
		// runtime is forced to go through a lot of pointers (not sure if relevant).
		for i := 0; i < len(aLotOfRefData); i++ {
			*aLotOfRefData[i]++
		}

		// Couldn't figure out a better way to force the compiler not to optimize
		// the slice allocation. Hence keep it in a variable and use it next frame.
		if cached == nil {
			cached = make([]byte, 1_000_000)
		} else {
			for j := range cached {
				cached[j]++
			}
			cached = nil
		}

		processingDuration := time.Since(processingTime)

		log.Printf("Frame time: %s (processing: %s)", time.Since(frameStart), processingDuration)
		return nil
	})

	intervalFunc := js.Global().Get("setInterval")
	intervalFunc.Invoke(callbackFunc, 1)

	<-make(chan struct{})
}

What did you expect to see?

Smooth and consistent logging in the Browser Console.

What did you see instead?

A stop the world situation once in a while (every 2 seconds or so). The Total counter (indicating the number of GC runs) always increases immediatelly after the freeze, showing that it is related to GC.

What's interesting is that none of the stats (even the frame time) indicate that GC took more time than indicated but the fact is that the Browser freezes. Could it be that Go is doing something that JS or the Browser does not like and needs to adjust to once setInterval callback has completed (e.g. memory shuffling or something of the kind)?

This is reproducible with both Chrome and Firefox. Have not tried other browsers.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 13, 2022
@seankhliao seankhliao added Performance arch-wasm WebAssembly issues labels Aug 13, 2022
@joedian joedian added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 15, 2022
@joedian joedian added this to the Go1.20 milestone Aug 15, 2022
@mknyszek mknyszek self-assigned this Aug 17, 2022
@mknyszek mknyszek modified the milestones: Go1.20, Backlog Aug 17, 2022
@mknyszek
Copy link
Contributor

Sorry for the incredibly late response to this. I assigned it to myself and then forgot. :(

I'm frankly a bit surprised that you're seeing freezes of up to 2 seconds or more. On Wasm, the Go runtime operates in a purely cooperative manner, and the GC is basically fully incremental. There's a worker goroutine that might start running when there's nothing else to run, but for the most part the main application thread will just do GC work as it allocates memory. The GC is almost fully concurrent, so the metrics you're reading aren't going to fully represent the time the GC took away from the main application thread. (The pauses are relevant, but the time since the last GC isn't really all that useful.)

There's the runtime/metrics package, and we could try dumping some of those stats. I suppose it's possible that if the application runs for quite a bit of time before yielding to the background GC goroutine that that goroutine might end up hogging some time. Now that I think of it, that does seem somewhat plausible.

Unfortunately I'm having some trouble reproducing locally with Node.js. It looks like the setInterval callback isn't firing? (Nothing is failing; I did have to replace the blocking channel recv at the end there with a Gosched loop though, otherwise the deadlock detector fires since it's not aware of the callback... I would've figured we disabled that for Wasm.)

@mokiat
Copy link
Author

mokiat commented Jul 21, 2023

Hi @mknyszek ,

It's been a while since I have reported this. Since then I have reduced my memory allocation during frames a lot and so this has not been visible much anymore. The problems I have been facing since are more related to #39740.

That said, I will try to reproduce this in the following days and will write back with information. It is possible that something has changed in Go or in browser's handling of WebAssembly and that this issue is not valid anymore.

@mokiat
Copy link
Author

mokiat commented Jul 30, 2023

@mknyszek ,

I am still able to reproduce it with the example code I have provided above. Running it and looking at the code, I can always see a freeze around when a GC occurs. I have adjusted the code to print millisecond precision.

Following is a screenshot of the logs when running in Firefox:
gc-freeze-firefox

Following is a screenshot of the logs when running in Chrome:
gc-freeze-chrome

In both cases a freeze of around one second can be observed, though the GC stats don't indicate as much.

I will run some browser performance analysis to see if something is reported there.

@mokiat
Copy link
Author

mokiat commented Jul 30, 2023

Here are a few browser measurements, though I admit I have a hard time reasoning about some of those.

Bottom-Up Self Time:
bottom-up-self

Bottom-Up Total Time:
bottom-up-total

Memory Summary:
memory-summary

This one has spikes that are at an interval similar to the freezes.

Call Tree:
call-tree

Pause history:
gc-pause-times

The last one shows only ~1 millisecond times. Nothing close to the 1+ second freeze I experience.

I will try and plug into runtime/metrics one of the upcoming days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-wasm WebAssembly issues compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Development

No branches or pull requests

5 participants