Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc/cgo/testsanitizers: enabling misc/cgo/testsanitizers testcases for ppc64le result in TLS relocation link errors and signal handling errors for cgo #45040

Closed
laboger opened this issue Mar 15, 2021 · 13 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@laboger
Copy link
Contributor

laboger commented Mar 15, 2021

What version of Go are you using (go version)?

$ go version
Fails with latest tip.

Does this issue reproduce with the latest release?

Yes, but the errors are different depending on the system. On a system where the verification of tsan fails in a certain way the test is skipped so the error doesn't occur and the test seems to pass.

This started with CL 297774 since these tests were not run on ppc64le before that.

What operating system and processor architecture are you using (go env)?

ppc64le
Fails consistently on Linux SMP Debian 4.19.160-2

go env Output
$ go env

What did you do?

This first appeared on the build dashboard: https://build.golang.org/log/a07cf12a0667dc38c330f06dc5c457841154ffc4 and then also appeared on a slowbot run: https://storage.googleapis.com/go-build-log/9fc3f56e/linux-ppc64le-power9osu_74ad5e7f.log.
I don't know why it would work sometimes but not others like has happened on the build dashboard.

What did you expect to see?

PASS

What did you see instead?

Various errors, including those shown in the logs above. It does pass sometimes as mentioned above, if the test decides that it shouldn't test tsan then the test seems to pass.

@ianlancetaylor
Copy link
Contributor

Sounds like the thread sanitizer doesn't really work on ppc64le. Let's either fix upcheckCSanitizer to detect this case or just skip the tests on that platform, by adding a t.Skip in misc/cgo/testsanitizers/tsan_test.go.

@laboger
Copy link
Contributor Author

laboger commented Mar 15, 2021

@pmur thinks he has a linker fix for the relocation error. I also got all but one tsan tests it to work by building them with a shared std and -linkshared.

With the linker error fixed, tsan8 still fails because it doesn't forward the Go signal.

But I agree that checkCSanitizer should be fixed.

@laboger
Copy link
Contributor Author

laboger commented Mar 16, 2021

How does this compare to using the -race option other than one is using LLVM and the other gcc?

@cherrymui cherrymui added the NeedsFix The path to resolution is known, but the work has not been done. label Mar 16, 2021
@cherrymui cherrymui added this to the Go1.17 milestone Mar 16, 2021
@laboger
Copy link
Contributor Author

laboger commented Mar 16, 2021

Sounds like the thread sanitizer doesn't really work on ppc64le. Let's either fix upcheckCSanitizer to detect this case or just skip the tests on that platform, by adding a t.Skip in misc/cgo/testsanitizers/tsan_test.go.

Even if we just skip this test, I think this is showing a problem that could happen for other cases where Go and C code is linked together in this way.

@gopherbot
Copy link

Change https://golang.org/cl/302209 mentions this issue: cmd/link: support 32b TLS_LE offsets on PPC64

@ianlancetaylor
Copy link
Contributor

How does this compare to using the -race option other than one is using LLVM and the other gcc?

It's not related to LLVM vs. GCC; both LLVM and GCC support -fsanitize=thread. The tests in misc/cgo/testsanitizers/test_tsan.go are testing that Go code works when the C code is compiled with -fsanitize=thread. That is, it is testing the C thread sanitizer rather than the Go race detector. Now, it happens that the C thread sanitizer and the Go race detector were written by the same person and use the same runtime code. But they are different in that in one case the C compiler is using that runtime code and in the other case the Go compiler is using (a different copy) of that runtime code.

These tests are relevant because people can compile their C code with -fsanitize=thread, and it should be possible for them to call between that C code and Go code and have things more or less work. The support for that on the Go side is in cmd/cgo (search for "tsan" in cmd/cgo/out.go) and runtime/cgo (search for "TSAN" in runtime/cgo/libcgo.h). The tests in misc/cgo/testsanitizer are just there to make sure that this code continues to work.

@laboger
Copy link
Contributor Author

laboger commented Mar 18, 2021

We were looking at those files yesterday and see that some files are only built for amd64 and arm64, so this has never worked on ppc64le. Looks like we need a callCgoSigaction which on arm64 that calls _cgo_sigaction. I tried to add that but then it seemed like there were calls back and forth between the Go code and the tsan functions that need to handle the different register conventions. That's as far as I got in my debugging.

@laboger
Copy link
Contributor Author

laboger commented Mar 22, 2021

We have the issue now that is mentioned in issue #31827 for arm64 and possibly others. The nonvolatile registers are currently not being saved for Power in sigtramp. I tried to add that code it didn't fix the problem.

The failure is a SEGV in the tsan library because it tries to store at an address off R31 but R31 points to code. The address in R31 has the same value as LR, which is interesting since Go code does a mflr to R31. If I run with cpu=1 it works.

@laboger laboger changed the title misc/cgo/testsanitizers: link errors due to recent enablement of tsan testcases on ppc64le misc/cgo/testsanitizers: link errors relocation truncated to fit: R_PPC64_TPREL16 against `runtime.tls_g' when linking against C shared libraries Mar 23, 2021
@laboger
Copy link
Contributor Author

laboger commented Mar 23, 2021

This failure started happening when the testsanitizer tests in misc/cgo were enabled for any system where gcc allows -fsanitize=thread. On some ppc64le systems, the initial test to determine if -fsanitize=thread is allowed on the system might pass, but when building the test the link step fails. The link failure is not limited to tsan libraries -- this TLS error message could occur at link time when linking against a C shared library with large enough amount of TLS.

@laboger laboger changed the title misc/cgo/testsanitizers: link errors relocation truncated to fit: R_PPC64_TPREL16 against `runtime.tls_g' when linking against C shared libraries misc/cgo/testsanitizers: enabling misc/cgo/testsanitizers testcases for ppc64le result in TLS relocation link errors and signal handling errors for cgo Mar 23, 2021
@gopherbot
Copy link

Change https://golang.org/cl/306369 mentions this issue: runtime/cgo,cmd/internal/obj/ppc64: fix signal forwarding with cgo

@laboger
Copy link
Contributor Author

laboger commented Mar 31, 2021

@Helflym My change does not change the way signals are done on AIX. I will leave that for you to change later if you feel it is needed.

@gopherbot
Copy link

Change https://golang.org/cl/315430 mentions this issue: cmd/dist: disable misc/cgo/testsanitizers on ppc64le

gopherbot pushed a commit that referenced this issue May 3, 2021
A while back in this release the sanitizer tests were enabled
for ppc64le, where previously they were never run. This
uncovered some errors in these tests on ppc64le. One linker
fix was made but there are still bugs in how tsan is made to
work within the code, especially in how signals are enabled
with cgo.

Some attempts were made to make this work but intermittent
failures continue to happen with the Trybots so I am just
going to disable this test for ppc64le within cmd/dist.

Updates #45040

Change-Id: I5392368ccecd4079ef568d0c645c9f7c94016d99
Reviewed-on: https://go-review.googlesource.com/c/go/+/315430
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: Cherry Zhang <cherryyz@google.com>
@laboger
Copy link
Contributor Author

laboger commented May 4, 2021

I talked to the person who does the testing for tsan on ppc64le to ask about the intermittent failures on the trybots related to address ranges.

--- FAIL: TestTSAN (51.29s)
    --- FAIL: TestTSAN/tsan7 (1.90s)
        tsan_test.go:53: `/workdir/tmp/TestTSAN409217416/tsan7` exited with exit status 66
            FATAL: ThreadSanitizer: unexpected memory mapping 0x77ef50d00000-0x77ef50e00000
    --- FAIL: TestTSAN/tsan (2.22s)
        tsan_test.go:53: `/workdir/tmp/TestTSAN1958858204/tsan` exited with exit status 66
            FATAL: ThreadSanitizer: unexpected memory mapping 0x7351a4500000-0x7351a4600000
    --- FAIL: TestTSAN/tsan12 (1.67s)
        tsan_test.go:53: `/workdir/tmp/TestTSAN2065846403/tsan12` exited with exit status 66
            FATAL: ThreadSanitizer: unexpected memory mapping 0x7b5eea100000-0x7b5eea200000
    --- FAIL: TestTSAN/tsan11 (1.93s)
        tsan_test.go:53: `/workdir/tmp/TestTSAN2040972630/tsan11` exited with exit status 66
            FATAL: ThreadSanitizer: unexpected memory mapping 0x7c1fe8c00000-0x7c1fe8d00000
    --- FAIL: TestTSAN/tsan10 (2.34s)
        tsan_test.go:53: `/workdir/tmp/TestTSAN900856456/tsan10` exited with exit status 66
            FATAL: ThreadSanitizer: unexpected memory mapping 0x7956d1300000-0x7956d1400000
    --- FAIL: TestTSAN/tsan9 (2.58s)
        tsan_test.go:53: `/workdir/tmp/TestTSAN2088699269/tsan9` exited with exit status 66
            FATAL: ThreadSanitizer: unexpected memory mapping 0x76369a200000-0x76369a300000
    --- FAIL: TestTSAN/tsan8 (2.08s)
        tsan_test.go:53: `/workdir/tmp/TestTSAN942167395/tsan8` exited with exit status 66
            FATAL: ThreadSanitizer: unexpected memory mapping 0x708b4ed00000-0x708b4ee00000
    --- FAIL: TestTSAN/tsan4 (1.65s)
        tsan_test.go:53: `/workdir/tmp/TestTSAN3515582645/tsan4` exited with exit status 66
            FATAL: ThreadSanitizer: unexpected memory mapping 0x7bbbe2500000-0x7bbbe2600000
    --- FAIL: TestTSAN/tsan6 (1.98s)
        tsan_test.go:53: `/workdir/tmp/TestTSAN563443373/tsan6` exited with exit status 66
            FATAL: ThreadSanitizer: unexpected memory mapping 0x72d7e1300000-0x72d7e1400000
    --- FAIL: TestTSAN/tsan5 (2.62s)
        tsan_test.go:53: `/workdir/tmp/TestTSAN690975148/tsan5` exited with exit status 66
            FATAL: ThreadSanitizer: unexpected memory mapping 0x776355b00000-0x776355c00000
    --- FAIL: TestTSAN/tsan3 (2.01s)
        tsan_test.go:53: `/workdir/tmp/TestTSAN2218669895/tsan3` exited with exit status 66
            FATAL: ThreadSanitizer: unexpected memory mapping 0x7a9187d00000-0x7a9187e00000
    --- FAIL: TestTSAN/tsan2 (1.66s)
        tsan_test.go:53: `/workdir/tmp/TestTSAN207966649/tsan2` exited with exit status 66
            FATAL: ThreadSanitizer: unexpected memory mapping 0x7c2ca0400000-0x7c2ca0500000
FAIL

He told me that is most likely due to the small amount of memory available on those systems, which is why it only happens there. Due to ASLR it might end up at different addresses which is why it can be intermittent.

gopherbot pushed a commit that referenced this issue May 10, 2021
Recently some tsan tests were enabled on ppc64le which had not
been enabled before. This resulted in failures on systems with
tsan available, and while debugging it was determined that
there were other issues related to the use of signals with cgo.

Signals were not being forwarded within programs linked against
libtsan because the nocgo sigaction was being called for ppc64le
with or without cgo. Adding callCgoSigaction and calling that
allows signals to be registered so that signal forwarding works.

For linux-ppc64 and aix-ppc64, this won't change. On linux-ppc64
there is no cgo. I can't test aix-ppc64 so those owners can enable
it if they want.

In reviewing comments about sigtramp in sys_linux_arm64 it was
noted that a previous issue in arm64 due to missing callee save
registers could also be a problem on ppc64x, so code was added
to save and restore those.

Also, the use of R31 as a temp register in some cases caused an
issue since it is a nonvolatile register in C and was being clobbered
in cases where the C code expected it to be valid. The code sequences to
load these addresses were changed to avoid the use of R31 when loading
such an address.

To get around a vet error, the stubs_ppc64x.go file in runtime
was split into stubs_ppc64.go and stubs_ppc64le.go.

Updates #45040

Change-Id: Ia4ecff950613cbe1b89471790b1d3819d5b5cfb9
Reviewed-on: https://go-review.googlesource.com/c/go/+/306369
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
@golang golang locked and limited conversation to collaborators May 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants