Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: CL 481061 "runtime/cgo: store M for C-created thread in pthread key" causes C TSAN failures #59678

Closed
prattmic opened this issue Apr 17, 2023 · 14 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@prattmic
Copy link
Member

When built with C TSAN, x_cgo_getstackbound triggers race detection on g->stacklo because the synchronization is in Go, which isn't instrumented.

Hopefully simple repro coming soon.

cc @cherrymui @doujiang24

@prattmic prattmic added the NeedsFix The path to resolution is known, but the work has not been done. label Apr 17, 2023
@prattmic prattmic added this to the Go1.21 milestone Apr 17, 2023
@prattmic prattmic self-assigned this Apr 17, 2023
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Apr 17, 2023
@gopherbot
Copy link

Change https://go.dev/cl/485316 mentions this issue: Revert "Revert "Revert "runtime: consolidate function descriptor definitions on PPC64"""

@gopherbot
Copy link

Change https://go.dev/cl/485317 mentions this issue: Revert "runtime: correct GoCheckBindM's C declaration in EnsureBindM test"

@gopherbot
Copy link

Change https://go.dev/cl/485275 mentions this issue: Revert "runtime/cgo: store M for C-created thread in pthread key"

@gopherbot
Copy link

Change https://go.dev/cl/485315 mentions this issue: Revert "runtime/cgo: use pthread_attr_get_np on Illumos"

gopherbot pushed a commit that referenced this issue Apr 17, 2023
This reverts CL 481795.

Reason for revert: CL 481061 causes C TSAN failures and must be
reverted. See CL 485275. This CL depends on CL 481061.

For #59678.

Change-Id: I5ec1f495154205ebdf19cd44c6e6452a7a3606f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/485315
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
gopherbot pushed a commit that referenced this issue Apr 17, 2023
…nitions on PPC64"""

This reverts CL 481075 (a re-apply of previously reverted CL 478917).

Reason for revert: CL 481061 causes C TSAN failures and must be
reverted. See CL 485275. This CL depends on CL 481061.

For #59678.

Change-Id: I4bf7f43d9df1ae28e04cd4065552bcbee82ef13f
Reviewed-on: https://go-review.googlesource.com/c/go/+/485316
Reviewed-by: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
gopherbot pushed a commit that referenced this issue Apr 17, 2023
…test"

This reverts CL 482975.

Reason for revert: CL 481061 causes C TSAN failures and must be
reverted. See CL 485275. This CL depends on CL 481061.

For #59678.

Change-Id: I4599e93d536149bcec94a5a1542533107699514f
Reviewed-on: https://go-review.googlesource.com/c/go/+/485317
Reviewed-by: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
gopherbot pushed a commit that referenced this issue Apr 17, 2023
This reverts CL 481061.

Reason for revert: When built with C TSAN, x_cgo_getstackbound triggers
race detection on `g->stacklo` because the synchronization is in Go,
which isn't instrumented.

For #51676.
For #59294.
For #59678.

Change-Id: I38afcda9fcffd6537582a39a5214bc23dc147d47
Reviewed-on: https://go-review.googlesource.com/c/go/+/485275
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
@prattmic
Copy link
Member Author

This is reproducible with this program:

// Copyright 2023 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

package main

/*
#cgo CFLAGS: -fsanitize=thread
#cgo LDFLAGS: -fsanitize=thread

#include <pthread.h>

void go_callback();

static void *thr(void *arg) {
    go_callback();
    return 0;
}

static void foo() {
    pthread_t th;
    pthread_attr_t attr;
    pthread_attr_init(&attr);
    pthread_attr_setstacksize(&attr, 256 << 10);
    pthread_create(&th, &attr, thr, 0);
    pthread_join(th, 0);
}
*/
import "C"

import (
        "time"
)

//export go_callback
func go_callback() {
}

func main() {
        for i := 0; i < 2; i++ {
                go func() {
                        for {
                                C.foo()
                        }
                }()
        }

        time.Sleep(1000*time.Millisecond)
}

It only seems to trigger with TSAN v3, which is included with clang but not gcc on my machine:

$ CGO_CFLAGS=-fsanitize=thread CC=clang go build tsan.go
$ TSAN_OPTIONS="halt_on_error=1" ./tsan
==================
WARNING: ThreadSanitizer: data race (pid=423188)
  Write of size 8 at 0x7f368c0064e0 by thread T28:
    #0 x_cgo_getstackbound <null> (tsan14+0x527119) (BuildId: 80075c8ad0035a8addeb81107abcda0c2aa5bce3)
    #1 runtime.asmcgocall.abi0 /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:902 (tsan14+0x5219e5) (BuildId: 80075c8ad0035a8addeb81107abcda0c2aa5bce3)
    #2 thr tsan14.cgo2.c (tsan14+0x525ec3) (BuildId: 80075c8ad0035a8addeb81107abcda0c2aa5bce3)

  Previous write of size 8 at 0x7f368c0064e0 by thread T29:
    #0 x_cgo_getstackbound <null> (tsan14+0x527119) (BuildId: 80075c8ad0035a8addeb81107abcda0c2aa5bce3)
    #1 runtime.asmcgocall.abi0 /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:902 (tsan14+0x5219e5) (BuildId: 80075c8ad0035a8addeb81107abcda0c2aa5bce3)
    #2 thr tsan14.cgo2.c (tsan14+0x525ec3) (BuildId: 80075c8ad0035a8addeb81107abcda0c2aa5bce3)

  Thread T28 (tid=423217, running) created by main thread at:
    #0 pthread_create <null> (tsan14+0x45048d) (BuildId: 80075c8ad0035a8addeb81107abcda0c2aa5bce3)
    #1 foo tsan14.cgo2.c (tsan14+0x525e6f) (BuildId: 80075c8ad0035a8addeb81107abcda0c2aa5bce3)
    #2 _cgo_0dfa424d8f6e_Cfunc_foo <null> (tsan14+0x525dfe) (BuildId: 80075c8ad0035a8addeb81107abcda0c2aa5bce3)
    #3 runtime.asmcgocall.abi0 /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:872 (tsan14+0x5219a7) (BuildId: 80075c8ad0035a8addeb81107abcda0c2aa5bce3)

  Thread T29 (tid=423218, finished) created by thread T2 at:
    #0 pthread_create <null> (tsan14+0x45048d) (BuildId: 80075c8ad0035a8addeb81107abcda0c2aa5bce3)
    #1 foo tsan14.cgo2.c (tsan14+0x525e6f) (BuildId: 80075c8ad0035a8addeb81107abcda0c2aa5bce3)
    #2 _cgo_0dfa424d8f6e_Cfunc_foo <null> (tsan14+0x525dfe) (BuildId: 80075c8ad0035a8addeb81107abcda0c2aa5bce3)
    #3 runtime.asmcgocall.abi0 /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:872 (tsan14+0x5219a7) (BuildId: 80075c8ad0035a8addeb81107abcda0c2aa5bce3)

SUMMARY: ThreadSanitizer: data race (/usr/local/google/home/mpratt/src/go/misc/cgo/testsanitizers/testdata/tsan14+0x527119) (BuildId: 80075c8ad0035a8addeb81107abcda0c2aa5bce3) in x_cgo_getstackbound
==================

@prattmic
Copy link
Member Author

The "race" occurs on the write to g->stacklo in https://go-review.git.corp.google.com/c/go/+/481061/2/src/runtime/cgo/gcc_stack_unix.c.

My understanding of what happens is:

  1. T1 calls into Go, calls needm.
  2. needm gets M1, and calls cgo_getstackbound, writing M1.g0.stack.lo.
  3. T1 exits, calls dropm via thread destructor, which puts M1 back on the extra list.
  4. T2 calls into Go, calls needm.
  5. needm gets M2 since it is back on the extra list, and calls cgo_getstackbound, writing M1.g0.stack.lo.

I believe that TSAN sees this as a racing write because it is unaware of synchronization in the runtime preventing two threads from using the same M (lockextra).

I tried to resolve this by adding a dummy barrier just prior to the crosscall in pthread_thread_destructor, in an attempt to annotate the barrier that dropm provides, but it did not work.

diff --git a/src/runtime/cgo/gcc_libinit.c b/src/runtime/cgo/gcc_libinit.c
index 9676593211..ec95f90a18 100644
--- a/src/runtime/cgo/gcc_libinit.c
+++ b/src/runtime/cgo/gcc_libinit.c
@@ -142,6 +142,8 @@ pthread_key_destructor(void* g) {
                // We restore g by using the stored g, before dropm in runtime.cgocallback,
                // since the g stored in the TLS by Go might be cleared in some platforms,
                // before this destructor invoked.
+                _cgo_tsan_acquire();
+                _cgo_tsan_release();
                x_crosscall2_ptr(NULL, g, 0, 0);
        }
 }

cc @ianlancetaylor @cherrymui do either of you know if I am misunderstanding the semantics of _cgo_tsan_acquire/_cgo_tsan_release? Or perhaps I missed a case and there really is a race here.

@gopherbot
Copy link

Change https://go.dev/cl/485500 mentions this issue: runtime/cgo: store M for C-created thread in pthread key

@prattmic
Copy link
Member Author

At the moment, https://go.dev/cl/485500 contains the original CL merged with the extra reverted followup CLs, plus a new failing test (based on the snippet aove).

@prattmic
Copy link
Member Author

My initial temptation is to change cgo_getstackbound to take a uintptr_t* argument to place the stack bound, which is just a local variable in the caller, and let the caller write to gp.stack.lo. This avoids having C write to the g at all.

I'm just held back by worry at the back of my head that if I've missed something and there really is a race here then that will just mask it.

@ianlancetaylor
Copy link
Contributor

There's a lot I don't know here, but why can't cgo_getstackbound return a value rather than setting g->stacklo?

I don't think passing a uintptr_t* will help, because TSAN will still cleverly see the memory write.

If cgo_getstackbound does need to set the value, then I think the fix is going to be to call _cgo_tsan_acquire before the memory write and call _cgo_tsan_release after the memory write.

@prattmic
Copy link
Member Author

asmcgocall only returns an int32. That isn't large enough to return the stack address. My proposed uintptr_t* parameter was intended to be an output parameter as a substitute for a return value. The idea is that we will pass the address of a local variable on the stack. Since the system stack is unique to each thread [1], TSAN won't ever see the same address across different threads.

[1] I suppose pthreads could reuse the stack across thread exit, but in that case I think TSAN should have an instrumented barrier in pthread exit.

@ianlancetaylor
Copy link
Contributor

I see. Sure, if you prefer passing a pointer to a local stack variable, that seems like it should work.

@cherrymui
Copy link
Member

Passing a stack local variable would work. I have a plan to use cgo_getstackbound to set the stack bound for the initial C stack in e.g. https://cs.opensource.google/go/go/+/master:src/runtime/cgo/gcc_linux_arm64.c;l=84 . In both places passing a *G seems natural, but passing a stack local works as well.

Agreed that an alternative would be putting _cgo_tsan_acquire and _cgo_tsan_release before and after the write.

@gopherbot
Copy link

Change https://go.dev/cl/495855 mentions this issue: runtime/cgo: store M for C-created thread in pthread key

gopherbot pushed a commit that referenced this issue May 17, 2023
This reapplies CL 485500, with a fix drafted in CL 492987 incorporated.

CL 485500 is reverted due to #60004 and #60007. #60004 is fixed in
CL 492743. #60007 is fixed in CL 492987 (incorporated in this CL).

[Original CL 485500 description]

This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and
CL 485316 incorporated.

CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 482975 is a followup fix to a C declaration in testprogcgo.
CL 485315 is a followup fix for x_cgo_getstackbound on Illumos.
CL 485316 is a followup cleanup for ppc64 assembly.

CL 479915 passed the G to _cgo_getstackbound for direct updates to
gp.stack.lo. A G can be reused on a new thread after the previous thread
exited. This could trigger the C TSAN race detector because it couldn't
see the synchronization in Go (lockextra) preventing the same G from
being used on multiple threads at the same time.

We work around this by passing the address of a stack variable to
_cgo_getstackbound rather than the G. The stack is generally unique per
thread, so TSAN won't see the same address from multiple threads. Even
if stacks are reused across threads by pthread, C TSAN should see the
synchonization in the stack allocator.

A regression test is added to misc/cgo/testsanitizer.

[Original CL 481061 description]

This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.

CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.

[Original CL 392854 description]

In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.

When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key,  only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.

When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.

This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.

This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.

For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz

[CL 479915 description]

Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.

This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.

[CL 492987 description]

On the first call into Go from a C thread, currently we set the g0
stack's high bound imprecisely based on the SP. With CL 485500, we
keep the M and don't recompute the stack bounds when it calls into
Go again. If the first call is made when the C thread uses some
deep stack, but a subsequent call is made with a shallower stack,
the SP may be above g0.stack.hi.

This is usually okay as we don't check usually stack.hi. One place
where we do check for stack.hi is in the signal handler, in
adjustSignalStack. In particular, C TSAN delivers signals on the
g0 stack (instead of the usual signal stack). If the SP is above
g0.stack.hi, we don't see it is on the g0 stack, and throws.

This CL makes it get an accurate stack upper bound with the
pthread API (on the platforms where it is available).

Also add some debug print for the "handler not on signal stack"
throw.

Fixes #51676.
Fixes #59294.
Fixes #59678.
Fixes #60007.

Change-Id: Ie51c8e81ade34ec81d69fd7bce1fe0039a470776
Reviewed-on: https://go-review.googlesource.com/c/go/+/495855
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants