Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/mobile: rendering freeze #48434

Open
changkun opened this issue Sep 17, 2021 · 8 comments
Open

x/mobile: rendering freeze #48434

changkun opened this issue Sep 17, 2021 · 8 comments
Labels
mobile Android, iOS, and x/mobile NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@changkun
Copy link
Member

changkun commented Sep 17, 2021

What version of Go are you using (go version)?

$ go version
go version devel go1.18-9133245be7 Sat Sep 4 20:35:25 2021 +0000 darwin/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="auto"
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/changkun/Library/Caches/go-build"
GOENV="/Users/changkun/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/changkun/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/changkun/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/changkun/sdk/gotip"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/changkun/sdk/gotip/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="devel go1.18-9133245be7 Sat Sep 4 20:35:25 2021 +0000"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/changkun/dev/changkun.de/fyne/iosrend/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/g2/6fmr1qzx0ns3shq74zrp6bd40000gn/T/go-build130131505=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

This issue is observed in practice, see hajimehoshi/ebiten#1332, hajimehoshi/ebiten#1322, fyne-io/fyne#950.

While working on an (almost) reproducer. The problem seems to happen more often on iOS/Android and was not observed on the desktop.

A recent debugging on the rendering freeze seems to suggest gl package might block forever when using golang.org/x/mobile/app and golang.org/x/mobile/gl packages, as the rendering loop, stops processing GL calls after a while and the workAvaliable fails to receive more events.

Furthermore, according to a discussion in CL 350212, it seems #48433 might be related.

What did you expect to see?

No rendering freeze

What did you see instead?

Rendering freeze

@gopherbot gopherbot added the mobile Android, iOS, and x/mobile label Sep 17, 2021
@gopherbot gopherbot added this to the Unreleased milestone Sep 17, 2021
@gopherbot
Copy link

Change https://golang.org/cl/350212 mentions this issue: gl: fix dead lock between package app and gl

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 20, 2021
@cagedmantis
Copy link
Contributor

/cc @hyangah

@gopherbot
Copy link

Change https://golang.org/cl/351769 mentions this issue: gl: fix rendering freeze between app and gl

@changkun
Copy link
Member Author

I simplified the DoWork function a little bit to the following (Removed cgo batching):

func (ctx *context) DoWork() {
	for {
		select {
		case w := <-ctx.work:
			ret := C.processFn((*C.struct_fnargs)(unsafe.Pointer(&w.args)), (*C.char)(w.parg))
			if w.blocking {
				ctx.retvalue <- ret
			}
		default:
			return
		}
	}
}

And also captured some information from stack traces, it looks the problem is caused by a cgo call that has not yet returned.

(dlv) gr 5
Switched from 0 to 5 (thread 2652429)
(dlv) frame 0
> runtime.systemstack_switch() ./sdk/gotip/src/runtime/asm_amd64.s:311 (PC: 0x4066560)
Warning: debugging optimized function
Frame 0: ./sdk/gotip/src/runtime/asm_amd64.s:311 (PC: 4066560)
   306: // of the G stack. We need to distinguish the routine that
   307: // lives at the bottom of the G stack from the one that lives
   308: // at the top of the system stack because the one at the top of
   309: // the system stack terminates the stack walk (see topofstack()).
   310: TEXT runtime·systemstack_switch(SB), NOSPLIT, $0-0
=> 311:         RET
   312:
   313: // func systemstack(fn func())
   314: TEXT runtime·systemstack(SB), NOSPLIT, $0-8
   315:         MOVQ    fn+0(FP), DI    // DI = fn
   316:         get_tls(CX)
(dlv) up
> runtime.systemstack_switch() ./sdk/gotip/src/runtime/asm_amd64.s:311 (PC: 0x4066560)
Warning: debugging optimized function
Frame 1: ./sdk/gotip/src/runtime/cgocall.go:167 (PC: 40054e5)
   162:         // sync preemption will succeed (though this is not a matter
   163:         // of correctness).
   164:         osPreemptExtEnter(mp)
   165:
   166:         mp.incgo = true
=> 167:         errno := asmcgocall(fn, arg)
   168:
   169:         // Update accounting before exitsyscall because exitsyscall may
   170:         // reschedule us on to a different M.
   171:         mp.incgo = false
   172:         mp.ncgo--

From the goroutine stack trace side, it looks like the goroutine is not yet returned from a Cgo call.
The relevant thread backtrace (from darwin/amd64) is:

  thread #8
    frame #0: 0x00007fff204bb2ba libsystem_kernel.dylib`mach_msg_trap + 10
    frame #1: 0x00007fff204bb62c libsystem_kernel.dylib`mach_msg + 60
    frame #2: 0x00007fff22caa793 IOKit`io_connect_method + 384
    frame #3: 0x00007fff22caa5e9 IOKit`IOConnectCallMethod + 244
    frame #4: 0x00007fff2846ceb1 IOAccelerator`IOAccelResourceCreate + 124
    frame #5: 0x00007fff4643c077 libGPUSupportMercury.dylib`gpusGetKernelTexture + 380
    frame #6: 0x000000003a63312b AMDRadeonX4000GLDriver`glrATI_SI_LoadSysTextureStandard + 5982
    frame #7: 0x000000003a63012e AMDRadeonX4000GLDriver`glrUpdateTexture + 724
    frame #8: 0x00007fff4643bc58 libGPUSupportMercury.dylib`gldModifyTexSubImage + 180
    frame #9: 0x00007fff6be14465 GLEngine`glTexImage2D_Exec + 1873
    frame #10: 0x00007fff6bdf3195 libGL.dylib`glTexImage2D + 55
    frame #11: 0x000000000439211f iosrend`processFn + 639
    frame #12: 0x0000000004391e81 iosrend`_cgo_647ba9049754_Cfunc_processFn + 33
    frame #13: 0x0000000004068484 iosrend`runtime.asmcgocall.abi0 + 100
    frame #14: 0x00000000040665c9 iosrend`runtime.systemstack.abi0 + 73
    frame #15: 0x00007fff2031a82f libsystem_malloc.dylib`szone_size + 45
    frame #16: 0x00000000042567ed iosrend`golang.org/x/mobile/gl._Cfunc_processFn.abi0 + 77
    frame #17: 0x0000000004256b05 iosrend`golang.org/x/mobile/gl.(*context).DoWork + 229
    frame #18: 0x0000000004373a2a iosrend`golang.org/x/mobile/app.(*app).loop + 202
    frame #19: 0x0000000004373b8a iosrend`golang.org/x/mobile/app.startloop.func1 + 42
    frame #20: 0x00000000040687a1 iosrend`runtime.goexit.abi0 + 1
    frame #21: 0x0000000004256b05 iosrend`golang.org/x/mobile/gl.(*context).DoWork + 229
    frame #22: 0x0000000004373a2a iosrend`golang.org/x/mobile/app.(*app).loop + 202
    frame #23: 0x0000000004373b8a iosrend`golang.org/x/mobile/app.startloop.func1 + 42
    frame #24: 0x00000000040687a1 iosrend`runtime.goexit.abi0 + 1
    frame #25: 0x0000000004373a2a iosrend`golang.org/x/mobile/app.(*app).loop + 202
    frame #26: 0x0000000004373b8a iosrend`golang.org/x/mobile/app.startloop.func1 + 42
    frame #27: 0x00000000040687a1 iosrend`runtime.goexit.abi0 + 1
    frame #28: 0x0000000004373b8a iosrend`golang.org/x/mobile/app.startloop.func1 + 42
    frame #29: 0x00000000040687a1 iosrend`runtime.goexit.abi0 + 1
    frame #30: 0x00000000040687a1 iosrend`runtime.goexit.abi0 + 1

The relevant thread backtrace from darwin/arm64:

  thread #4
    frame #0: 0x00000001bb04d9b4 libsystem_kernel.dylib`mach_msg_trap + 8
    frame #1: 0x00000001bb04dd60 libsystem_kernel.dylib`mach_msg + 76
    frame #2: 0x00000001bdb28dd0 IOKit`io_connect_method + 440
    frame #3: 0x00000001bdb28bec IOKit`IOConnectCallMethod + 236
    frame #4: 0x00000001d5b1b970 IOGPU`IOGPUResourceCreate + 224
    frame #5: 0x00000001d5b159f8 IOGPU`-[IOGPUMetalResource initWithDevice:remoteStorageResource:options:args:argsSize:] + 476
    frame #6: 0x00000001d5b18ab8 IOGPU`-[IOGPUMetalTexture initWithBuffer:descriptor:sysMemOffset:sysMemRowBytes:vidMemSize:vidMemRowBytes:args:argsSize:isStrideTexture:] + 616
    frame #7: 0x00000001d5b1925c IOGPU`-[IOGPUMetalTexture initWithPrimaryBuffer:heapIndex:bufferIndex:bufferOffset:length:descriptor:sysMemRowBytes:vidMemSize:vidMemRowBytes:args:argsSize:] + 96
    frame #8: 0x000000021c45bc10 AGXMetal13_3`___lldb_unnamed_symbol4663$$AGXMetal13_3 + 468
    frame #9: 0x0000000134553aa8 AppleMetalOpenGLRenderer`GLDTextureRec::loadObj() + 3156
    frame #10: 0x000000013458a960 AppleMetalOpenGLRenderer`gldModifyTexSubImage + 160
    frame #11: 0x000000021bd9a7e8 GLEngine`glTexImage2D_Exec + 1944
    frame #12: 0x000000021bd7b8d0 libGL.dylib`glTexImage2D + 80
    frame #13: 0x00000001047563b4 iosrend`processFn(args=0x000001400039dd38, parg=<unavailable>) at work.c:136:3 [opt]

As far as I can confirm this occurs in all arch of macOS (amd64, arm64), iOS (arm64). I first thought this might be a driver issue, but since Android is also reported in arm64, by @hajimehoshi. It seems unlikely to have a driver issue on two different platforms (?), especially the backend if different (darwin/arm64 is using Metal as backend, apparently Android can't use that).

From the stack traces, the io_connect_method is waiting for something, the GPU???

Any hint would be highly appreciated...

@changkun
Copy link
Member Author

This is a feedback from Apple (9665054):

Getting a hang, or long pauses here, usually means there's a Metal resource, or GL texture leak. The kernel side of IOGPUFamily will eventually start stalling for long periods of time once we reach the per-process limit on the number of resources created by the process (which is in the tens of thousands).

@hajimehoshi
Copy link
Member

#52116 Isn't this related?

@changkun
Copy link
Member Author

changkun commented Aug 6, 2022

@hajimehoshi could you maybe elaborate a bit more on how they are related?

@hajimehoshi
Copy link
Member

NVM, #52116 was a crash rather than freezing so this was not related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mobile Android, iOS, and x/mobile NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants