runtime: crash when >10k C threads call Go after CL 485500 #60004

prattmic · 2023-05-05T17:12:25Z

debug.SetMaxThreads has a default limit of 10k threads, which is enforced by runtime.checkmcount when the M is created.

https://go.dev/cl/485500 creates an M for every C thread that calls into Go, freeing the M only when the thread exits. Thus, 10k unique C threads calling into Go at least once will trigger the thread exhaustion throw (assuming the threads don't exit). Prior to CL 485500, the 10k threads would all need to call into Go concurrently to trigger the throw.

I don't think we strictly need to do anything. A program with 10k C threads is exceptional and I think it is reasonable to say that such programs should adjust debug.SetMaxThreads.

On the other hand, the primary purpose of this limit is to prevent Go programs from accidentally fork-bombing the system. e.g., by creating 1 million goroutines which all block on a system call, thus requiring 1 million threads. This concern doesn't really apply to threads created by C, since by definition we aren't automatically creating threads. Thus, I think it would reasonable to exclude extra Ms from the SetMaxThreads limit.

cc @cherrymui @doujiang24

The text was updated successfully, but these errors were encountered:

cherrymui · 2023-05-05T18:57:53Z

I think it would reasonable to exclude extra Ms from the SetMaxThreads limit.

That sounds reasonable to me. If C code wants to create a lot of threads, the Go runtime probably should not stop it from doing so.

gopherbot · 2023-05-05T21:18:24Z

Change https://go.dev/cl/492743 mentions this issue: runtime: exclude extra M's from debug.SetMaxThreads

gopherbot · 2023-05-05T21:18:24Z

Change https://go.dev/cl/492742 mentions this issue: runtime: clean up extra M API

There are quite a few locations that get/put Ms from the extra M list, but the API is pretty clumsy to use. Add an easier to use getExtraM / putExtraM API. There are only two minor semantic changes: 1. dropm no longer calls setg(nil) inside the lockextra critical section. It is important that this thread no longer references the G (and in turn M) once it is published to the extra M list and another thread could acquire it. But there is no reason that needs to happen only after lockextra. 2. extraMLength (renamed from extraMCount) is no longer protected by lockextra and is instead simply an atomic (though writes are still in the critical section). The previous readers all dropped lockextra before using the value they read anyway. For #60004. Change-Id: Ifca4d6c84d605423855d89f49af400ca07de56f4 Reviewed-on: https://go-review.googlesource.com/c/go/+/492742 Run-TryBot: Michael Pratt <mpratt@google.com> Commit-Queue: Michael Pratt <mpratt@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Michael Pratt <mpratt@google.com>

gopherbot · 2023-05-17T16:18:07Z

Change https://go.dev/cl/495855 mentions this issue: runtime/cgo: store M for C-created thread in pthread key

This reapplies CL 485500, with a fix drafted in CL 492987 incorporated. CL 485500 is reverted due to #60004 and #60007. #60004 is fixed in CL 492743. #60007 is fixed in CL 492987 (incorporated in this CL). [Original CL 485500 description] This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and CL 485316 incorporated. CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go calls by binding the M to the C thread. See below for its description. CL 482975 is a followup fix to a C declaration in testprogcgo. CL 485315 is a followup fix for x_cgo_getstackbound on Illumos. CL 485316 is a followup cleanup for ppc64 assembly. CL 479915 passed the G to _cgo_getstackbound for direct updates to gp.stack.lo. A G can be reused on a new thread after the previous thread exited. This could trigger the C TSAN race detector because it couldn't see the synchronization in Go (lockextra) preventing the same G from being used on multiple threads at the same time. We work around this by passing the address of a stack variable to _cgo_getstackbound rather than the G. The stack is generally unique per thread, so TSAN won't see the same address from multiple threads. Even if stacks are reused across threads by pthread, C TSAN should see the synchonization in the stack allocator. A regression test is added to misc/cgo/testsanitizer. [Original CL 481061 description] This reapplies CL 392854, with the followup fixes in CL 479255, CL 479915, and CL 481057 incorporated. CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go calls by binding the M to the C thread. See below for its description. CL 479255 is a followup fix for a small bug in ARM assembly code. CL 479915 is another followup fix to address C to Go calls after the C code uses some stack, but that CL is also buggy. CL 481057, by Michael Knyszek, is a followup fix for a memory leak bug of CL 479915. [Original CL 392854 description] In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls. So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call. Instead, we only dropm while the C thread exits, so the extra M won't leak. When invoking a Go function from C: Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor. And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits. When returning back to C: Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C. This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows. This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread. For the newly added BenchmarkCGoInCThread, some benchmark results: 1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz 2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz [CL 479915 description] Currently, when C calls into Go the first time, we grab an M using needm, which sets m.g0's stack bounds using the SP. We don't know how big the stack is, so we simply assume 32K. Previously, when the Go function returns to C, we drop the M, and the next time C calls into Go, we put a new stack bound on the g0 based on the current SP. After CL 392854, we don't drop the M, and the next time C calls into Go, we reuse the same g0, without recomputing the stack bounds. If the C code uses quite a bit of stack space before calling into Go, the SP may be well below the 32K stack bound we assumed, so the runtime thinks the g0 stack overflows. This CL makes needm get a more accurate stack bound from pthread. (In some platforms this may still be a guess as we don't know exactly where we are in the C stack), but it is probably better than simply assuming 32K. [CL 492987 description] On the first call into Go from a C thread, currently we set the g0 stack's high bound imprecisely based on the SP. With CL 485500, we keep the M and don't recompute the stack bounds when it calls into Go again. If the first call is made when the C thread uses some deep stack, but a subsequent call is made with a shallower stack, the SP may be above g0.stack.hi. This is usually okay as we don't check usually stack.hi. One place where we do check for stack.hi is in the signal handler, in adjustSignalStack. In particular, C TSAN delivers signals on the g0 stack (instead of the usual signal stack). If the SP is above g0.stack.hi, we don't see it is on the g0 stack, and throws. This CL makes it get an accurate stack upper bound with the pthread API (on the platforms where it is available). Also add some debug print for the "handler not on signal stack" throw. Fixes #51676. Fixes #59294. Fixes #59678. Fixes #60007. Change-Id: Ie51c8e81ade34ec81d69fd7bce1fe0039a470776 Reviewed-on: https://go-review.googlesource.com/c/go/+/495855 Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>

prattmic added the NeedsFix label May 5, 2023

prattmic added this to the Go1.21 milestone May 5, 2023

prattmic added this to Go Compiler / Runtime May 5, 2023

gopherbot added the compiler/runtime label May 5, 2023

gopherbot closed this as completed in 734b26d May 9, 2023

github-project-automation bot moved this to Done in Go Compiler / Runtime May 9, 2023

mknyszek removed this from Go Compiler / Runtime Oct 25, 2023

golang locked and limited conversation to collaborators May 16, 2024

gopherbot added the FrozenDueToAge label May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: crash when >10k C threads call Go after CL 485500 #60004

runtime: crash when >10k C threads call Go after CL 485500 #60004

prattmic commented May 5, 2023 •

edited by gabyhelp

Loading

cherrymui commented May 5, 2023

gopherbot commented May 5, 2023

gopherbot commented May 5, 2023

gopherbot commented May 17, 2023

runtime: crash when >10k C threads call Go after CL 485500 #60004

runtime: crash when >10k C threads call Go after CL 485500 #60004

Comments

prattmic commented May 5, 2023 • edited by gabyhelp Loading

cherrymui commented May 5, 2023

gopherbot commented May 5, 2023

gopherbot commented May 5, 2023

gopherbot commented May 17, 2023

prattmic commented May 5, 2023 •

edited by gabyhelp

Loading