Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: "fatal: morestack on g0" when messing with the stack #39411

Closed
88hcsif opened this issue Jun 4, 2020 · 5 comments
Closed

runtime: "fatal: morestack on g0" when messing with the stack #39411

88hcsif opened this issue Jun 4, 2020 · 5 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@88hcsif
Copy link

88hcsif commented Jun 4, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.2 linux/arm

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm"
GOBIN=""
GOCACHE="/home/pi/.cache/go-build"
GOENV="/home/pi/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="arm"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/pi/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_arm"
GCCGO="gccgo"
GOARM="6"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -marm -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build677601937=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I am trying to expand an existing go application that runs games and emulators written following the libretro API.
Libretro games and emulators are C libraries that are dynamically loaded by a "frontend" application.
Once a game starts, the frontend enters a loop in which it calls an API from the library to run a single "step" in the game/emulator and then it does some processing (like collecting device inputs) and keeps doing this until the game is stopped. The main logic is therefore single threaded.

In particular, I am trying to get a Nintendo64 emulator to work with the go frontend.
This emulator (and probably a few others) is a bit peculiar, because it does not quite run on the same thread as the frontend.
It actually uses a library to dump the thread state (stack + registers) to memory and switch between its own thread state and the frontend thread state.
This makes it easy to convert a game/emulator that was not designed for the libretro API and instead was designed to run continuously.
The game/emulator can relinquish control to the frontend at any point in the code and then come back to it when the frontend runs the next step.

Now, the problem is that when the library tries to swap the frontend state for the emulator state for the very first time, go tends to crash with the following error: fatal: morestack on g0.
This does not happen every single time, and when the first switch goes smoothly all subsequent switches between frontend thread state and emulator thread state also work fine.
By default, the emulator sets up a thread stack of 4MiB (on a 32bit system).
I tried lowering this size to as much as 32KiB and I got the impression that the crash would happen less often but it still happens from time to time.

I am fairly new to Go and I understand that Go uses dynamically allocated stacks that start at 2KiB and increase as needed.
I understand that messing around with the stack size may confuse the Go runtime environment.
And I admit this is a pretty strange setup but...

  1. is there anything that I can change in the application to avoid this crash?
  2. or is there anything that should be changed in the Go runtime to allow for this use case?

What did you expect to see?

Frontend application and emulator run fine.

What did you see instead?

Application crashes with: fatal: morestack on g0.

@randall77
Copy link
Contributor

I think we need more information.
How are Go and the libretro library communicating? Are you using cgo? Pipes? dlopen?
Does libretro make its own OS threads? How (e.g. pthreads)? Or does it coopt the thread that calls into it?

@dmitshur dmitshur added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 5, 2020
@dmitshur dmitshur added this to the Backlog milestone Jun 5, 2020
@88hcsif
Copy link
Author

88hcsif commented Jun 5, 2020

Thanks for the interest!

The libretro library is loaded using dlopen:
h = C.dlopen(C.CString(pathWithExt), C.RTLD_LAZY)
The run step function is loaded using dlSym:
retroRun = C.dlsym(h, C.CString("retro_run"))
And it is later called using cgo:
C.bridge_retro_run(retroRun)
where the bridge is defined as:

void bridge_retro_run(void f) {
return ((void (
)(void))f)();
}

The libretro library may also create its own threads using pthreads, I can see various mentions in the code but i am not sure if those code paths are actually used.
What I am certain of is that the "main" emulator logic coopts the thread that calls into it.
I think one of the reasons for that is OpenGL.
This libretro library uses OpenGL and hardware acceleration to render the graphic.
The OpenGL context is created by the frontend and then shared with the libretro library.
My limited understand of OpenGL is that any OpenGL operation only works on the single thread on which the OpenGL context is created.
For this reason I am also running with this thread locked even if the rest of the go application uses go routines.

As evidence and to give a bit more details, here is the call stack right before the libretro libray coopts the thread (using gdb):

Thread 6 "worker" hit Breakpoint 1, co_switch (handle=0xa161b800) at libretro-common/libco/armeabi.c:100
100	   cothread_t co_previous_handle = co_active();
(gdb) bt
#0  co_switch (handle=0xa161b800) at libretro-common/libco/armeabi.c:100
#1  0x934fe064 in retro_run () at libretro/libretro.c:1503
#2  0x006c3778 in bridge_retro_run (f=0x934fdfbc )
    at /home/pi/workspace/cloud-game/pkg/emulator/libretro/nanoarch/cfuncs.go:62
#3  0x006c3fc8 in _cgo_e181eea56bf5_Cfunc_bridge_retro_run (v=0x1456dbc) at cgo-gcc-prolog:166
#4  0x000a54e8 in runtime.asmcgocall () at /usr/local/go/src/runtime/asm_arm.s:607
#5  0x000a3898 in runtime.mcall () at /usr/local/go/src/runtime/asm_arm.s:285
#6  0x006c0b80 in crosscall_arm1 () at gcc_arm.S:30
#7  0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Here is the call stack right after the cooption happened for the first time:

Thread 6 "worker" hit Breakpoint 2, EmuThreadFunction () at libretro/libretro.c:464
464	    log_cb(RETRO_LOG_DEBUG, CORE_NAME ": [EmuThread] M64CMD_EXECUTE\n");
(gdb) bt
#0  EmuThreadFunction () at libretro/libretro.c:464
#1  0x93527468 in co_switch (handle=)
    at libretro-common/libco/armeabi.c:101
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Here is the segmentation fault that happens intermittently:

fatal: morestack on g0

Thread 6 "worker" received signal SIGSEGV, Segmentation fault.
runtime.abort () at /usr/local/go/src/runtime/asm_arm.s:794
794		MOVW	(R0), R1
(gdb) bt
#0  runtime.abort () at /usr/local/go/src/runtime/asm_arm.s:794
#1  0x00076440 in runtime.write (fd=, p=0x0, n=9654726)
    at /usr/local/go/src/runtime/time_nofake.go:30
#2  runtime.badmorestackg0 () at /usr/local/go/src/runtime/proc.go:438
#3  0x00077ec0 in runtime.startTheWorldWithSema (emitTraceEvent=false, ~r1=)
    at :1
Backtrace stopped: Cannot access memory at address 0xe

Also it does not happen immediately after the thread is coopted but, if it happens, it happens a few calls in when I am guessing the stack increases past some limit.

Let me know if you need more details!

@randall77
Copy link
Contributor

I don't think the Go runtime will be happy with a C library coopting one of its OS threads. My guess is that both libretro and the Go runtime are trying to use the same stack at the same time.
I would recommend implementing a stub in C, possibly in your bridge_retro_build function, that makes a new thread, switches to it, and calls retroRun on that new thread.

@88hcsif
Copy link
Author

88hcsif commented Jun 6, 2020

Thanks for the suggestion! It works!
(I have not seen the crash in a while, even after reincreasing the original libretro library stack size, and I do not expect to).
Feel free to close this issue with the appropriate tag.

@randall77
Copy link
Contributor

Ok. Good luck.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

4 participants