-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: TestTSAN: fails with signal handler spoils errno
#66427
Comments
Because I fear this might end up being relevant (seems like we'd notice inability to run all.bash on Linux, it happens constantly) could you also supply Linux and Gcc version information? No linker preload or nonstandard path, that sort of thing? |
@golang/runtime |
OS release info:
GCC info: `gcc -dumpspec` output
|
Path is non-standard because I need the go paths in there:
|
We are seeing this on our builders as well. For example: https://ci.chromium.org/ui/p/golang/builders/ci/gotip-linux-amd64-staticlockranking/b8752359739889548657/overview This seems like a recent regression. |
Huh, @muhlemmer reports this on 1.22.1, not tip. On our builders, it looks like a very recent regression. https://ci.chromium.org/ui/test/golang/cmd%2Fcgo%2Finternal%2Ftestsanitizers.TestTSAN%2Ftsan14 shows the first failure was on 2024-03-25 in https://ci.chromium.org/ui/p/golang/builders/try/gotip-linux-amd64-boringcrypto/b8752453947853961217/test-results?sortby=&groupby=. So maybe there is some external factor here? |
signal handler spoils errno
Found new dashboard test flakes for:
2024-03-26 23:38 gotip-linux-amd64-staticlockranking go@50dcffb3 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-03-27 19:06 gotip-linux-amd64-boringcrypto go@2860e018 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-03-30 00:20 gotip-linux-amd64-staticlockranking go@ba9c445f cmd/cgo/internal/testsanitizers.TestTSAN/tsan13 (log)
2024-04-01 12:38 gotip-linux-amd64-newinliner go@6bfaafd3 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-04-01 20:28 gotip-linux-amd64 go@cd294f55 cmd/cgo/internal/testsanitizers.TestTSAN/tsan13 (log)
2024-04-02 16:25 gotip-linux-amd64-newinliner go@94dba612 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-04-03 22:44 gotip-linux-amd64-staticlockranking go@23fc9170 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-04-05 18:02 gotip-linux-amd64-newinliner go@62791eb4 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-04-05 22:09 gotip-linux-amd64-staticlockranking go@d186dde8 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-04-08 20:51 gotip-linux-amd64-boringcrypto go@8008998b cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-04-09 14:35 gotip-linux-amd64-newinliner go@de3a3c9e cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-04-10 16:37 gotip-linux-amd64-boringcrypto go@1bac2528 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-04-14 18:17 gotip-linux-amd64-newinliner go@37f48222 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-04-16 20:27 gotip-linux-amd64-newinliner go@661f9814 cmd/cgo/internal/testsanitizers.TestTSAN/tsan13 (log)
2024-04-17 16:36 gotip-linux-amd64 go@f367fea8 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-04-17 19:54 gotip-linux-amd64-goamd64v3 go@2073b35e cmd/cgo/internal/testsanitizers.TestTSAN/tsan13 (log)
2024-04-17 21:11 gotip-linux-amd64-boringcrypto go@334ce510 cmd/cgo/internal/testsanitizers.TestTSAN/tsan13 (log)
2024-04-17 21:11 gotip-linux-amd64-boringcrypto go@334ce510 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
|
Found new dashboard test flakes for:
2024-04-19 16:08 gotip-linux-amd64-staticlockranking go@d428a638 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-04-19 16:31 gotip-linux-amd64-boringcrypto go@1a0b8637 cmd/cgo/internal/testsanitizers.TestTSAN/tsan13 (log)
2024-04-19 16:31 gotip-linux-amd64-boringcrypto go@1a0b8637 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
|
Found new dashboard test flakes for:
2024-04-22 13:29 gotip-linux-amd64-boringcrypto go@2dddc7ef cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
|
Found new dashboard test flakes for:
2024-04-22 20:21 gotip-linux-amd64 go@6737f4ce cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
|
Found new dashboard test flakes for:
2024-04-23 16:49 gotip-linux-amd64-goamd64v3 go@08e73e61 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-04-23 16:49 gotip-linux-amd64-staticlockranking go@08e73e61 cmd/cgo/internal/testsanitizers.TestTSAN/tsan13 (log)
|
TSAN implementation of errno spoiling: https://github.com/llvm/llvm-project/blob/d5224b73ccd09a6759759791f58426b6acd4a2e2/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp#L2123 |
Found new dashboard test flakes for:
2024-04-24 13:36 gotip-linux-amd64-staticlockranking go@62dfa431 cmd/cgo/internal/testsanitizers.TestTSAN/tsan13 (log)
2024-04-24 15:55 gotip-linux-amd64-boringcrypto go@508e7619 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-04-24 15:55 gotip-linux-amd64-goamd64v3 go@508e7619 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-04-25 01:02 gotip-linux-amd64-goamd64v3 go@4351af68 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
|
Change https://go.dev/cl/581721 mentions this issue: |
Change https://go.dev/cl/581722 mentions this issue: |
I wrote CL https://go.dev/cl/581721 to check errno at various stages in the signal handler. Here is one of the failure https://ci.chromium.org/ui/p/golang/builders/try/gotip-linux-amd64/b8749565669963801361/test-results?q=ExactID%3Acmd%2Fcgo%2Finternal%2Ftestsanitizers.TestTSAN%2Ftsan14+VHash%3A59d5ef073852df4a+&sortby=&groupby=
The errno is changed in the code path that we got a signal on a non-Go thread when it is not executing Go code, and we don't do anything in our signal handler but re-raise the signal. Specifically, errno changed after re-raising the signal at https://cs.opensource.google/go/go/+/master:src/runtime/signal_unix.go;l=1096 . The signal is 28, a.k.a. SIGWINCH. Where does SIGWINCH come from? I don't know. Our test doesn't send SIGWINCH. Maybe the kernel or the builder recently starts to send SIGWINCH? And that's what changed recently? It also explains why I have not been able to reproduce locally or even on gomote (where I run the tests in a shell under ssh). The fact that it is failing on a non-Go thread also explains why it fails in tsan13 and tsan14, but not others. Only these two tests create threads in C and call back into Go. Based on that, I wrote another test, in CL https://go.dev/cl/581722, where it creates a thread in C, calls into Go in a loop, and at same time another thread repeatedly sends SIGWINCH to it. This fails consistently on my machine, as well as on the builders (even non-amd64). Now that I can reproduce it, I'll keep looking into the fix, and how re-raising the signal could clobber errno... Errno 22 is EINVAL. Potentially re-raising the signal causes some invalid operation, maybe in TSAN's signal handler? |
Change https://go.dev/cl/582077 mentions this issue: |
More debugging indicates that errno changes when we uninstall our signal handler before re-raising the signal https://cs.opensource.google/go/go/+/master:src/runtime/signal_unix.go;l=971 . This will call the TSAN-intercepted At least for the case of SIGWINCH, there is no C signal handler installed, and the signal is by default ignored. In this case re-raising the signal doesn't seem to do anything. Sent CL https://go.dev/cl/582077 to not re-raise signals that are ignored. With this, errno will not change. Not sure if this is the right/best fix. Or maybe the best fix is to snapshot and restore errno around re-raising the signal? Not sure if there is a (non-default) C signal handler installed (before Go signal handler). I think that can only happen if Go is a library (c-archive or c-shared)? In this case we only install handler for synchronous signals. And we also forward synchronous signals if it lands on a non-Go thread and there is a C handler. Maybe that is fine and we won't get to |
Found new dashboard test flakes for:
2024-04-26 21:24 gotip-linux-amd64-staticlockranking go@0e7f5cf3 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
2024-04-26 23:07 gotip-linux-amd64-newinliner go@ad22356e cmd/cgo/internal/testsanitizers.TestTSAN/tsan13 (log)
2024-04-26 23:07 gotip-linux-amd64-newinliner go@ad22356e cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
|
Found new dashboard test flakes for:
2024-04-29 14:01 gotip-linux-amd64-newinliner go@f6e6b637 cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
|
@cherrymui Perhaps you can run the test under Re: |
I tried
Yeah, that's pretty weird. Maybe something in the builder changed and it now thinks it has a window manager?... |
That might be the trigger for the original issue I posted. I have a drop-down terminal (yakuake) I run most of my shell stuff in, like the I do remember that particular day I had a bunch of shell stuff to do and I multi-tasked in multiple tabs, while toggling the drop-down terminal on a frequent basis while How it applies to pipeline tests, I have no idea. |
Do the builders use some kind of terminal-to-web emulator that might be sending such signals? |
Found new dashboard test flakes for:
2024-05-04 07:50 gotip-linux-amd64 go@8841f50d cmd/cgo/internal/testsanitizers.TestTSAN/tsan14 (log)
|
The added program fails consistently with "signal handler spoils errno" error under TSAN. For #66427. Change-Id: Id57b9e62aa30b273a1c793aecd86ec1f211062fc Reviewed-on: https://go-review.googlesource.com/c/go/+/581722 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Go version
go version go1.22.1 linux/amd64 (go 1.21.7 for bootstrap stage)
Output of
go env
in your module/workspace:What did you do?
git checkout go1.22.1 cd src ./all.bash
What did you see happen?
The following test fails sometimes (30-50% of the builds). Usually retrying the
all.bash
script works. I haven't noticed a similar error on the 1.21 tags, but that might have been pure luck, or something was introduced in 1.22 that causes this flakiness.What did you expect to see?
Test to pass each time.
I tried looking if there is already a similar issue, but most issues have to do with sigfaults or other panics. I did not find this specific warning.
The text was updated successfully, but these errors were encountered: