Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/cgo: "fatal: morestack on g0" when interacting with PHP Fibers #62130

Open
dunglas opened this issue Aug 18, 2023 · 27 comments
Open

cmd/cgo: "fatal: morestack on g0" when interacting with PHP Fibers #62130

dunglas opened this issue Aug 18, 2023 · 27 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@dunglas
Copy link
Contributor

dunglas commented Aug 18, 2023

What version of Go are you using (go version)?

$ go version
go version go1.20.5 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/slim/.cache/go-build"
GOENV="/home/slim/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/slim/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/slim/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org/,direct"
GOROOT="/usr/local/go"
GOSUMDB="[sum.golang.org](http://sum.golang.org/)"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.5"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/slim/pub/src/frankenphp/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3097512463=/tmp/go-build -gno-record-gcc-switches"

What did you do?

FrankenPHP is a Go module that embeds the PHP executor in Go programs such as the Caddy web server, and handles HTTP requests received by "net/http" with PHP scripts executed directly in the Go process, similarly to Apache mod_php.

To do so PHP is compiled as a C library and executed using cgo.

Under certain conditions, if the script uses PHP fibers, the program crashes with the message "fatal: morestack on g0" (see backtraces and reproducer below).

PHP Fibers do low-level stack manipulation and are likely badly interacting Go and cgo own stack management.
Under the hood, PHP Fibers can use the boost.context library (by default) or ucontext.h (if PHP is compiled with --disable-fiber-asm). The problem occurs with both implementations.
Fibers code is in https://github.com/php/php-src/blob/master/Zend/zend_fibers.c.

The problem seems to only randomly occur and only on Linux with x86 processors (we're not able to reproduce this on Mac or on ARM).

The problem seems to occur when the C code executed in a PHP Fiber (with its own stack) calls a Go function using cgo.

We currently have no idea if it's a problem in Go, in PHP, or "just" a bad interaction of their respective stack management systems.

What did you expect to see?

No error.

What did you see instead?

fatal: morestack on g0

Here are some stack traces collected by @slim with GDB:

(gdb) backtrace
#0  0x0000000000470106 in ?? ()
#1  0x000000000046e165 in runtime.morestack () at /usr/local/go/src/runtime/asm_amd64.s:542
#2  0x000000000042b8dc in runtime.(*mheap).alloc.func1 () at /usr/local/go/src/runtime/mheap.go:955
#3  0x000000000046e0e9 in runtime.systemstack () at /usr/local/go/src/runtime/asm_amd64.s:496
#4  0x00007fffa7ffec38 in ?? ()
#5  0x00007fffa4bff770 in ?? ()
#6  0x000000c000480000 in ?? ()
#7  0x00007fffa4bff770 in ?? ()
#8  0x00000000005c6bbd in crosscall2 () at /usr/local/go/src/runtime/cgo/asm_amd64.s:30
#9  0x0000000000758a0e in go_add_header (rh=23, cString=0x0, length=12175376) at _cgo_export.c:135
#10 0x00000000007596ff in frankenphp_send_headers (sapi_headers=0x7fff3801d800) at frankenphp.c:461
#11 frankenphp_send_headers (sapi_headers=0x7fff3801d800) at frankenphp.c:446
#12 0x00007ffff73fc665 in sapi_send_headers () at /home/slim/pub/src/php-franken/main/SAPI.c:883
#13 0x00007ffff734367b in php_header () at /home/slim/pub/src/php-franken/ext/standard/head.c:72
#14 0x00007ffff74086b2 in php_output_header () at /home/slim/pub/src/php-franken/main/output.c:122
#15 0x00007ffff740b208 in php_output_op (op=0, str=0x7fffc6202068 "Fiber 54", len=8) at /home/slim/pub/src/php-franken/main/output.c:1077
#16 0x00007ffff7408ee2 in php_output_write (str=0x7fffc6202068 "Fiber 54", len=8) at /home/slim/pub/src/php-franken/main/output.c:261
#17 0x00007ffff750a407 in ZEND_ECHO_SPEC_TMPVAR_HANDLER () at /home/slim/pub/src/php-franken/Zend/zend_vm_execute.h:14221
#18 0x00007ffff7566b69 in execute_ex (ex=0x7fffc6269070) at /home/slim/pub/src/php-franken/Zend/zend_vm_execute.h:57509
#19 0x00007ffff748c221 in zend_call_function (fci=0x7fffc625bcb8, fci_cache=0x7fffc625bcf8) at /home/slim/pub/src/php-franken/Zend/zend_execute_API.c:949
#20 0x00007ffff75b4fc3 in zend_fiber_execute (transfer=0x7fffa4bfffc0) at /home/slim/pub/src/php-franken/Zend/zend_fibers.c:504
#21 0x00007ffff75b46b2 in zend_fiber_trampoline (data=...) at /home/slim/pub/src/php-franken/Zend/zend_fibers.c:299
#22 0x00007ffff73e772f in make_fcontext () at /home/slim/pub/src/php-franken/Zend/asm/make_x86_64_sysv_elf_gas.S:135
#23 0x00007fffa4c00040 in ?? ()
#24 0x00007fffa4c00000 in ?? ()
#25 0x00007fffa4c00000 in ?? ()
#26 0x0000009200000172 in ?? ()
#27 0x0000000000000000 in ?? ()
[New Thread 0x7fffa07f8700 (LWP 28725)]
fatal: morestack on g0

Thread 11 "thpool-0" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x7fffa3fff700 (LWP 28718)]
0x0000000000471366 in ?? ()
(gdb) backtrace
#0  0x0000000000471366 in ?? ()
#1  0x000000000046f3c5 in runtime.morestack.abi0 ()
#2  0x000000000042c9fc in runtime.(*mheap).alloc.func1 ()
#3  0x000000000046f349 in runtime.systemstack.abi0 ()
#4  0x00007fffa3ffec38 in ?? ()
#5  0x00007fff9f9ff770 in ?? ()
#6  0x000000c000400400 in ?? ()
#7  0x00007fff9f9ff770 in ?? ()
#8  0x00000000005c7e1d in crosscall2 ()
#9  0x000000000070b780 in ?? ()
#10 0x00007fff9f9ff780 in ?? ()
#11 0x0000000000000000 in ?? ()

fatal: morestack on g0

Thread 10 "thpool-0" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x7fffc4918700 (LWP 28744)]
0x0000000000471366 in ?? ()
(gdb) backtrace
#0  0x0000000000471366 in ?? ()
#1  0x000000000046f3c5 in runtime.morestack.abi0 ()
#2  0x000000000042c9fc in runtime.(*mheap).alloc.func1 ()
#3  0x000000000046f349 in runtime.systemstack.abi0 ()
#4  0x00007fffc4917c38 in ?? ()
#5  0x00007fff83fff770 in ?? ()
#6  0x000000c000380400 in ?? ()
#7  0x00007fff83fff770 in ?? ()
#8  0x00000000005c7e1d in crosscall2 ()
#9  0x000000000070b780 in ?? ()
#10 0x00007fff83fff780 in ?? ()
#11 0x0000000000000000 in ?? ()

fatal: morestack on g0

Thread 11 "thpool-1" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 0x7fffabfff700 (LWP 28775)]
0x0000000000471366 in ?? ()
(gdb) backtrace
#0  0x0000000000471366 in ?? ()
#1  0x000000000046f3c5 in runtime.morestack.abi0 ()
#2  0x000000000042c9fc in runtime.(*mheap).alloc.func1 ()
#3  0x000000000046f349 in runtime.systemstack.abi0 ()
#4  0x00007fffabffec38 in ?? ()
#5  0x00007fff93fff770 in ?? ()
#6  0x000000c000300400 in ?? ()
#7  0x00007fff93fff770 in ?? ()
#8  0x00000000005c7e1d in crosscall2 ()
#9  0x000000000070b780 in ?? ()
#10 0x00007fff93fff780 in ?? ()
#11 0x0000000000000000 in ?? ()

The issue also always occurs when executing a Fiber in a Docker container running on GitHub Actions, (but oddly, not when running directly in a GHA runner).

A reproducer as well as some extra investigation are available in this PR: dunglas/frankenphp#171

The problem can be reproduced locally by compiling PHP with the appropriate flags needed by FrankenPHP, checking out the fibers branch of FrankenPHP and running go test -run Fiber.

@dmitshur dmitshur changed the title cgo: "fatal: morestack on g0" when interacting with PHP Fibers cmd/cgo: "fatal: morestack on g0" when interacting with PHP Fibers Aug 18, 2023
@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 18, 2023
@dmitshur dmitshur added this to the Backlog milestone Aug 18, 2023
@dmitshur
Copy link
Contributor

CC @ianlancetaylor.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 18, 2023
@cherrymui
Copy link
Member

Currently the Go runtime assumes the C stack for each thread is contiguous and never change. It's stack bounds is recorded in the g0 of the thread. When the C (or PHP) code switches stacks, its SP will not be within the boundary of the original stack, therefore causing morestack on g0 error. This is not surprising.

I don't know how to make it work in the current scheme. Maybe we could have a function that the user code can call to tell the runtime that the C code is going to switch stacks therefore disable stack bounds checks on g0 (e.g. by setting its bounds to [0, ^uintptr(0)])? That would require a new API, though.

@withinboredom
Copy link

@cherrymui how does that explain it mostly crashing in virtual/namespaced contexts (vms and docker) but not natively? Does Go do something differently then?

@ianlancetaylor
Copy link
Contributor

I don't know why the difference. But whether it crashes is going to depend on exactly where memory is allocated for fiber stacks. Go doesn't do anything different in virtual contexts, but it is possible that in those contexts memory is allocated at different addresses.

@PettitWesley
Copy link

PettitWesley commented Aug 22, 2023

AWS for Fluent Bit uses Cgo to compile go code as .so that can be loaded with dlopen by Fluent Bit which is in C: https://github.com/aws/aws-for-fluent-bit

Example go plugin: https://github.com/aws/amazon-cloudwatch-logs-for-fluent-bit

Fluent Bit source: https://github.com/fluent/fluent-bit

fluent-bit_1         | fatal: morestack on g0

Previous Go stable version 1.20.7 did not have this issue.

@ me and I can provide anymore details if useful.

PettitWesley added a commit to PettitWesley/aws-for-fluent-bit that referenced this issue Aug 22, 2023
recent upgrade of go stable to 1.21.0 has
broken the Go plugins. Which are failing
in integ with: "morestack on g0"
We have not deeply investigated this yet.
Waiting on Go community
golang/go#62130 (comment)

Signed-off-by: Wesley Pettit <wppttt@amazon.com>
PettitWesley added a commit to aws/aws-for-fluent-bit that referenced this issue Aug 22, 2023
recent upgrade of go stable to 1.21.0 has
broken the Go plugins. Which are failing
in integ with: "morestack on g0"
We have not deeply investigated this yet.
Waiting on Go community
golang/go#62130 (comment)

Signed-off-by: Wesley Pettit <wppttt@amazon.com>
@cherrymui
Copy link
Member

@PettitWesley that looks like a different issue. Does the C code do its own stack management? Does the C code calls into Go, then calls back to C? Thanks.

@PettitWesley
Copy link

@cherrymui The C code does its own stack management and has a smaller than normal stack size IIRC, I can find the details if needed.

Does the C code calls into Go

Yea, the C code invokes a function exposed by the Go code. And that's it, there's no call from Go into C AFAIK.

@mknyszek
Copy link
Contributor

In triage, we think that there isn't a reasonable way we're going to make this work. If C code (or non-Go) is switching the stack, it's going to be a lot of work to make that work with the Go runtime reliably, and we don't plan to work on that any time soon. Apologies.

@mknyszek mknyszek closed this as not planned Won't fix, can't repro, duplicate, stale Aug 23, 2023
@ianlancetaylor
Copy link
Contributor

When we receive a signal, we check for the current stack pointer, and adjust the gsignal stack bounds to match (adjustSignalStack in runtime/signal_unix.go). We could do the same for calls from C to Go: a quick check for whether the stack pointer is within expected bounds, and if not a slower process to fetch the current stack bounds and use them.

@leonardocustodio
Copy link

Is there a reason why for us the exact opposite happens?
php/php-src#12021
I cannot reproduce this on Linux with x86 only in Apple Sillicon...

@ianlancetaylor
Copy link
Contributor

I expect there is a reason but I doubt it's very interesting. This problem will happen in some cases if a C thread changes its stack location, but whether a problem occurs depends on precisely how the stack location is changed. How the stack location is changed may depend on how the program is built and how the kernel behaves.

@gopherbot
Copy link

Change https://go.dev/cl/525455 mentions this issue: runtime: allow update of system stack bounds

@prattmic
Copy link
Member

prattmic commented Sep 8, 2023

@dunglas could you verify that your application works properly with https://go.dev/cl/525455?

(If you aren't familiar with building the toolchain from a CL, you can use gotip download 525455: https://pkg.go.dev/golang.org/dl/gotip)

@dunglas
Copy link
Contributor Author

dunglas commented Sep 8, 2023

@prattmic Unfortunately it still doesn't work, but the error message changed:

Error logs === RUN TestFiberBasic_module logger.go:130: 2023-09-08T20:00:51.252Z DEBUG FrankenPHP started runtime: newstack at runtime.printlock+0x55 sp=0xc0002dbb08 stack=[0xc0002da000, 0xc0002dc000] morebuf={pc:0x408f85 sp:0xc0002dbb10 lr:0x0} sched={pc:0x43f175 sp:0xc0002dbb08 lr:0x0 ctxt:0x0} runtime.callbackUpdateSystemStack(0x408e5b?, 0x7f7bc4dff998, 0x0) /root/sdk/gotip/src/runtime/cgocall.go:237 +0x85 fp=0xc0002dbb70 sp=0xc0002dbb10 pc=0x408f85 runtime.cgocallbackg(0x447c1c?, 0xc0001a76c0?, 0x300000002?) /root/sdk/gotip/src/runtime/cgocall.go:287 +0x7a fp=0xc0002dbc00 sp=0xc0002dbb70 pc=0x40919a runtime.cgocallbackg(0x8a5020, 0x7f7bc4dffa10, 0x0) :1 +0x29 fp=0xc0002dbc28 sp=0xc0002dbc00 pc=0x4772a9 runtime.cgocallback(0xc0002dbc88, 0x408e85, 0x92fb60) /root/sdk/gotip/src/runtime/asm_amd64.s:1035 +0xcc fp=0xc0002dbc50 sp=0xc0002dbc28 pc=0x47446c runtime.systemstack_switch() /root/sdk/gotip/src/runtime/asm_amd64.s:474 +0x8 fp=0xc0002dbc60 sp=0xc0002dbc50 pc=0x4726e8 runtime.cgocall(0x92fb60, 0xc0002dbcc0) /root/sdk/gotip/src/runtime/cgocall.go:175 +0x85 fp=0xc0002dbc98 sp=0xc0002dbc60 pc=0x408e85 github.com/***/frankenphp._Cfunc_frankenphp_execute_script(0x7f7bc80256b0) _cgo_gotypes.go:802 +0x7b fp=0xc0002dbcc0 sp=0xc0002dbc98 pc=0x89c51b github.com/***/frankenphp.go_execute_script(0x1ca9) /go/src/app/frankenphp.go:471 +0x2ce fp=0xc0002dbe18 sp=0xc0002dbcc0 pc=0x89f5ce _cgoexp_1cabc619c1cb_go_execute_script(0x7f7bc7ffea40) _cgo_gotypes.go:957 +0x36 fp=0xc0002dbe30 sp=0xc0002dbe18 pc=0x8a4e76 runtime.cgocallbackg1(0x8a4e40, 0xc0002dbfe0?, 0x0) /root/sdk/gotip/src/runtime/cgocall.go:403 +0x2ce fp=0xc0002dbf00 sp=0xc0002dbe30 pc=0x4095ce runtime.cgocallbackg(0x0?, 0x0?, 0x0?) /root/sdk/gotip/src/runtime/cgocall.go:319 +0x12d fp=0xc0002dbf90 sp=0xc0002dbf00 pc=0x40924d runtime.cgocallbackg(0x8a4e40, 0x7f7bc7ffea40, 0x0) :1 +0x29 fp=0xc0002dbfb8 sp=0xc0002dbf90 pc=0x4772a9 runtime.cgocallback(0x0, 0x0, 0x0) /root/sdk/gotip/src/runtime/asm_amd64.s:1035 +0xcc fp=0xc0002dbfe0 sp=0xc0002dbfb8 pc=0x47446c runtime.goexit({}) /root/sdk/gotip/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0002dbfe8 sp=0xc0002dbfe0 pc=0x4746c1 fatal error: runtime: stack split at bad time

runtime stack:
runtime.throw({0xa21a9f?, 0x0?})
/root/sdk/gotip/src/runtime/panic.go:1018 +0x5c fp=0x7f7bc4dff7e8 sp=0x7f7bc4dff7b8 pc=0x43d45c
runtime.newstack()
/root/sdk/gotip/src/runtime/stack.go:995 +0xa36 fp=0x7f7bc4dff998 sp=0x7f7bc4dff7e8 pc=0x459576
traceback: unexpected SPWRITE function runtime.morestack
runtime.morestack()
/root/sdk/gotip/s

snip

https://github.com/dunglas/frankenphp/actions/runs/6125832955/job/16628619273?pr=171

Here is a reproducer allowing to trigger the error in Docker: dunglas/frankenphp#171
(Just take a look at the php-8-2-bookworm, linux/amd64 line of the matrix, other failures are due to the inability to use gotip when cross-compiling).

@prattmic
Copy link
Member

prattmic commented Sep 8, 2023

Ah, I missed that the initial comment was reporting an issue in 1.20, not 1.21. My CL only fixes the regression in 1.21 (#62440).

To be clear there are two classes here:

  1. Threads created in C that change stacks when there are no Go frames on the stack. e.g.,
  • Thread creation. using stack 1
  • Call from C to Go
  • (Optional) Go to C call
  • C to Go return
  • Go to C return
  • Switch to stack 2
  • Second call to Go

This worked in 1.20, though we never really intended to support it. It broke in 1.21, and https://go.dev/cl/525455 fixes it.

  1. Threads changing stacks when there are Go frames on the stack. This could occur on a C or Go thread:
  • Go to C call on stack 1
  • C changes to stack 2
  • C to Go call on stack 2

I believe this is the case that PHP Fibers are doing. This was not supported on 1.20 and continues to be unsupported even after my CL.

The error you hit is my CL explicitly checking for this case and bailing out.

A follow-up may be able to support this case by allowing stack bounds to change on nested call as long as it is careful to restore the previous stack bounds on return (either before cgocallback returns to C, or after cgocall gets back into Go).

Off the top of my head I can't think of other things that would break from using different system stacks on different Go frames on the same thread, but it is a rather precarious situation, and one I'm still not convinced we should support.

@dunglas
Copy link
Contributor Author

dunglas commented Sep 8, 2023

I believe this is the case that PHP Fibers are doing.

This is indeed the case.

one I'm still not convinced we should support

This would help us a lot to have support for this: it's the last blocker to be able to embed PHP in Go web servers with support for 100% of the native features of the language. Maybe could we make this feature (temporarily) opt-in or mark it as unsafe in some way?

@prattmic
Copy link
Member

prattmic commented Sep 8, 2023

Reopening for now, since we have more information, but we still need to make a decision on this.

@prattmic prattmic reopened this Sep 8, 2023
@prattmic prattmic removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 8, 2023
@prattmic prattmic added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Sep 8, 2023
@ianlancetaylor
Copy link
Contributor

I don't see any major difficulty if Go calls C, C changes stack, C calls Go, C changes stack back, C returns to Go. In general we don't care where the C stack is as long as it's there. We have to get the bookeeping right but it seems doable.

That said I wouldn't be surprised if we will be expected to support Go calls C, C saves stack somewhere, a different thread picks up the saved C stack, that different thread returns to Go. I don't think that will work because we will have lost the g pointer. Is that a pattern that PHP Fibers expects to work?

@dunglas
Copy link
Contributor Author

dunglas commented Sep 8, 2023

As far as I understand, no: Fibers may call Go code (basically to send data to the client), but on the same C thread.

@ianlancetaylor
Copy link
Contributor

Thanks. That's not quite what I'm asking. What I'm asking is whether it is possible for Go to call into the fiber library on one thread, for that thread to save the current stack, and for another thread to pick up that stack and start running it. In other words, I'm asking whether Fibers are like goroutines, in that they can move between threads. Or do all fibers always run on the thread on which they started?

@dunglas
Copy link
Contributor Author

dunglas commented Sep 9, 2023

Fibers always run on the same thread. PHP (as well as libraries used for fibers) doesn't start nor manage threads itself, it lets the web server (or other kind of SAPIs) do it. Every PHP requests (including Fibers they start) are "locked" on the same thread.

gopherbot pushed a commit that referenced this issue Sep 11, 2023
Since CL 495855, Ms are cached for C threads calling into Go, including
the stack bounds of the system stack.

Some C libraries (e.g., coroutine libraries) do manual stack management
and may change stacks between calls to Go on the same thread.

Changing the stack if there is more Go up the stack would be
problematic. But if the calls are completely independent there is no
particular reason for Go to care about the changing stack boundary.

Thus, this CL allows the stack bounds to change in such cases. The
primary downside here (besides additional complexity) is that normal
systems that do not manipulate the stack may not notice unintentional
stack corruption as quickly as before.

Note that callbackUpdateSystemStack is written to be usable for the
initial setup in needm as well as updating the stack in cgocallbackg.

Fixes #62440.
For #62130.

Change-Id: I7841b056acea1111bdae3b718345a3bd3961b4a8
Reviewed-on: https://go-review.googlesource.com/c/go/+/525455
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@ianlancetaylor
Copy link
Contributor

OK, thanks, then at least for Fibers my concern is unfounded.

@gopherbot
Copy link

Change https://go.dev/cl/527715 mentions this issue: runtime: allow update of system stack bounds on callback from C thread

gopherbot pushed a commit that referenced this issue Sep 12, 2023
[This is a redo of CL 525455 with the test fixed on darwin by defining
_XOPEN_SOURCE, and disabled with android, musl, and openbsd, which do
not provide getcontext.]

Since CL 495855, Ms are cached for C threads calling into Go, including
the stack bounds of the system stack.

Some C libraries (e.g., coroutine libraries) do manual stack management
and may change stacks between calls to Go on the same thread.

Changing the stack if there is more Go up the stack would be
problematic. But if the calls are completely independent there is no
particular reason for Go to care about the changing stack boundary.

Thus, this CL allows the stack bounds to change in such cases. The
primary downside here (besides additional complexity) is that normal
systems that do not manipulate the stack may not notice unintentional
stack corruption as quickly as before.

Note that callbackUpdateSystemStack is written to be usable for the
initial setup in needm as well as updating the stack in cgocallbackg.

Fixes #62440.
For #62130.

Change-Id: I0fe0134f865932bbaff1fc0da377c35c013bd768
Reviewed-on: https://go-review.googlesource.com/c/go/+/527715
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/530480 mentions this issue: [release-branch.go1.21] runtime: allow update of system stack bounds on callback from C thread

jsubirat added a commit to newrelic/newrelic-fluent-bit-output that referenced this issue Dec 13, 2023
jsubirat added a commit to newrelic/newrelic-fluent-bit-output that referenced this issue Dec 13, 2023
jsubirat added a commit to newrelic/newrelic-fluent-bit-output that referenced this issue Dec 13, 2023
* Pin Go to 1.20.12 due to golang/go#62130

* Go 1.20.12 image only available in bookworm and bullseye

* Revert back to bullseye

* Finally stuck to 1.20.7 as that's the latest version available through Chocolatey, and I prefer a consistent versioning
@dunglas
Copy link
Contributor Author

dunglas commented Jan 10, 2024

Hi,

if we propose a patch to solve this problem (at least, in the case of PHP Fibers), would you accept it?

@cherrymui
Copy link
Member

I thought this has been fixed by the CLs above? Could you try using Go 1.22rc1 or Go tip? Thanks.

cc @prattmic

gopherbot pushed a commit that referenced this issue Jan 10, 2024
…on callback from C thread

[This cherry-pick combines CL 527715, CL 527775, CL 527797, and
CL 529216.]

[This is a redo of CL 525455 with the test fixed on darwin by defining
_XOPEN_SOURCE, and disabled with android, musl, and openbsd, which do
not provide getcontext.]

Since CL 495855, Ms are cached for C threads calling into Go, including
the stack bounds of the system stack.

Some C libraries (e.g., coroutine libraries) do manual stack management
and may change stacks between calls to Go on the same thread.

Changing the stack if there is more Go up the stack would be
problematic. But if the calls are completely independent there is no
particular reason for Go to care about the changing stack boundary.

Thus, this CL allows the stack bounds to change in such cases. The
primary downside here (besides additional complexity) is that normal
systems that do not manipulate the stack may not notice unintentional
stack corruption as quickly as before.

Note that callbackUpdateSystemStack is written to be usable for the
initial setup in needm as well as updating the stack in cgocallbackg.

For #62440.
For #62130.
For #63209.

Change-Id: I0fe0134f865932bbaff1fc0da377c35c013bd768
Reviewed-on: https://go-review.googlesource.com/c/go/+/527715
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
(cherry picked from commit 4f9fe6d)
(cherry picked from commit e8ba057)
(cherry picked from commit a843991)
(cherry picked from commit d110d7c)
Reviewed-on: https://go-review.googlesource.com/c/go/+/530480
Auto-Submit: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Bypass: Michael Pratt <mpratt@google.com>
@dunglas
Copy link
Contributor Author

dunglas commented Jan 14, 2024

facebook-github-bot pushed a commit to facebook/fbthrift that referenced this issue Feb 2, 2024
Summary:
We might need to rollback to an old Go version.
The first blocker is the usage of maps library that was released in Go 1.21
We are temporarily rolling this back and the issue with golang/go#62130 is solved

Reviewed By: podtserkovskiy

Differential Revision: D53321207

fbshipit-source-id: 4a39f2f139c62fc9a4c7d4d4392f67ec1dd48d1f
facebook-github-bot pushed a commit to facebook/hhvm that referenced this issue Feb 2, 2024
Summary:
We might need to rollback to an old Go version.
The first blocker is the usage of maps library that was released in Go 1.21
We are temporarily rolling this back and the issue with golang/go#62130 is solved

Reviewed By: podtserkovskiy

Differential Revision: D53321207

fbshipit-source-id: 4a39f2f139c62fc9a4c7d4d4392f67ec1dd48d1f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
Status: In Progress
Development

No branches or pull requests

10 participants