Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc/cgo/testtls: skip test if C toolchain exhibits known TLS bugs #24758

Open
mvdan opened this issue Apr 7, 2018 · 20 comments
Open

misc/cgo/testtls: skip test if C toolchain exhibits known TLS bugs #24758

mvdan opened this issue Apr 7, 2018 · 20 comments
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@mvdan
Copy link
Member

mvdan commented Apr 7, 2018

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1b pc=0xe56e4]

runtime stack:
runtime.throw(0x16739f, 0x2a)
	/workdir/go/src/runtime/panic.go:598 +0x54
runtime.sigpanic()
	/workdir/go/src/runtime/signal_unix.go:372 +0x22c

goroutine 5 [syscall, locked to thread]:
runtime.cgocall(0xe5678, 0x429f78, 0xe5208)
	/workdir/go/src/runtime/cgocall.go:128 +0x64 fp=0x429f60 sp=0x429f48 pc=0x123c0
_/workdir/go/misc/cgo/testtls._Cfunc_getTLS(0x0)
	_cgo_gotypes.go:43 +0x38 fp=0x429f74 sp=0x429f60 pc=0xe5134
_/workdir/go/misc/cgo/testtls.testTLS(0x478120)
	/workdir/go/misc/cgo/testtls/tls.go:21 +0x34 fp=0x429fb0 sp=0x429f74 pc=0xe5214
_/workdir/go/misc/cgo/testtls.TestTLS(0x478120)
	/workdir/go/misc/cgo/testtls/tls_test.go:12 +0x1c fp=0x429fb8 sp=0x429fb0 pc=0xe50ec
testing.tRunner(0x478120, 0x168c34)
	/workdir/go/src/testing/testing.go:791 +0xac fp=0x429fe4 sp=0x429fb8 pc=0xb3074
runtime.goexit()
	/workdir/go/src/runtime/asm_arm.s:840 +0x4 fp=0x429fe4 sp=0x429fe4 pc=0x63184
created by testing.(*T).Run
	/workdir/go/src/testing/testing.go:838 +0x240

goroutine 1 [runnable]:
testing.(*T).Run(0x478120, 0x1604a5, 0x7, 0x168c34, 0x410540)
	/workdir/go/src/testing/testing.go:839 +0x260
testing.runTests.func1(0x478090)
	/workdir/go/src/testing/testing.go:1081 +0x50
testing.tRunner(0x478090, 0x44bef8)
	/workdir/go/src/testing/testing.go:791 +0xac
testing.runTests(0x40c040, 0x206598, 0x1, 0x1, 0x458100)
	/workdir/go/src/testing/testing.go:1079 +0x21c
testing.(*M).Run(0x458100, 0x0)
	/workdir/go/src/testing/testing.go:996 +0x134
main.main()
	_testmain.go:42 +0x12c
exit status 2
FAIL	_/workdir/go/misc/cgo/testtls	0.046s

Some recent occurences:

https://build.golang.org/log/b7e1728e48a73089be64d42ea3b7e581eeae029c
https://build.golang.org/log/56a9b6b774c15433baff5b11a2f2eeb8aaa936a1

@mvdan mvdan added the Testing An issue that has been verified to require only test changes, not just a test failure. label Apr 7, 2018
@josharian
Copy link
Contributor

cc @FiloSottile

Thanks for filing bugs for the flaky tests, @mvdan. Hope we get back to a stable dashboard soon.

@FiloSottile
Copy link
Contributor

This is not my kind of TLS ;) it's Thread Local Storage.

@josharian
Copy link
Contributor

Hahaha. Oops. :P

cc @bcmills @ianlancetaylor

@cherrymui
Copy link
Member

The faulting PC, 0xe56e4, is in C code getTLS:

TEXT getTLS(SB) 
  :0                    0xe56d0                 e59f0014                MOVW 0x14(R15), R0      
  :0                    0xe56d4                 e92d4010                PUSH [R4,R14]           
  :0                    0xe56d8                 e08f0000                ADD R0, R15, R0         
  :0                    0xe56dc                 fa001a0a                BLX 0xebf0c             // __tls_get_addr
  :0                    0xe56e0                 e59f3008                MOVW 0x8(R15), R3       
  :0                    0xe56e4                 e7930000                MOVW (R3)(R0), R0       // <--- faulting PC
  :0                    0xe56e8                 e8bd8010                POP [R4,R15]            
  :0                    0xe56ec                 00120938                AND.S.EQ R8>>R9, R2, R0 
  :0                    0xe56f0                 0000001c                AND.EQ R12<<R0, R0, R0  

Here, we just loaded R3=0x1c from the constant pool. The faulting address is 0x1b, which means R0=-1, which means __tls_get_addr returned -1. I haven't looked into what could cause it. Maybe the TLS was not initialized correctly?

@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone Apr 18, 2018
@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 18, 2018
@ianlancetaylor ianlancetaylor changed the title cmd/dist: testtls segmentation fault on a linux-arm builder runtime: testtls segmentation fault on a linux-arm builder Jun 14, 2018
@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Go1.12 Jul 10, 2018
@andybons andybons modified the milestones: Go1.12, Go1.13 Feb 12, 2019
@bcmills
Copy link
Contributor

bcmills commented Jun 4, 2019

@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@bcmills
Copy link
Contributor

bcmills commented Jul 31, 2019

@bcmills
Copy link
Contributor

bcmills commented Aug 27, 2019

@bcmills
Copy link
Contributor

bcmills commented Sep 3, 2019

@bcmills
Copy link
Contributor

bcmills commented Sep 11, 2019

@bcmills
Copy link
Contributor

bcmills commented Sep 11, 2019

Wow — the last non-trivial change to this test was for issue #6992 back in 2014!

@bcmills
Copy link
Contributor

bcmills commented Sep 11, 2019

@cherrymui, do you know who the right person to look into this further would be?

@cherrymui
Copy link
Member

I'll look into this (as soon as I can, maybe not this week though).

@cherrymui
Copy link
Member

First, I spent some time rediscovered what I found last year (#24758 (comment)). I should have read all the comments first...

Then, by attaching strace I found that when the test passes the TLS accesses are made on the main thread, whereas when it fails the TLS accesses are made on a non-main thread. (at least for all the runs I investigated.)

If I change the test so that it always run on a non-main thread, it fails 100%. The same change works fine on Linux/AMD64.

On all the test logs, the failed one is the one with static linking (https://go.googlesource.com/go/+/refs/tags/go1.13/src/cmd/dist/test.go#1072). Here, we build the C code with -fPIC, which will generate __tls_get_addr for TLS access. Then we link with -static. On AMD64, this will make the linker patch __tls_get_addr to static TLS access. This doesn't happen on ARM. From glibc source code (https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/arm/libc-tls.c;h=22ec3b0c194d0315dc87e73126a0f5c71241346e;hb=0f02b6cfc44af73d4d4363c46b3cbb18b8ff9171#l22), it claims that linker patching is not necessary. But it doesn't seem to be the case.

The following C program fails on ARM but works fine on AMD64:

#include <stdio.h>
#include <pthread.h>

static __thread int x;

void* nonmain() {
	printf("nonmain &x=%p\n", &x);
	printf("nonmain x=%d\n", x);
	return 0;
}

int main() {
	pthread_t t;

	printf("main &x=%p\n", &x);
	printf("main x=%d\n", x);
	pthread_create(&t, 0, nonmain, 0);
	pthread_join(t, 0);
	return 0;
}
# cc -fPIC tls.c -pthread -static
# ./a.out 
main &x=0x844dc
main x=0
nonmain &x=0x13
Segmentation fault (core dumped)

So, it seems that -fPIC and -static just don't work together on ARM...

BTW, if would be easier if the builder has gdb installed. Could we install gdb there?

@gopherbot
Copy link

Change https://golang.org/cl/196378 mentions this issue: misc/cgo/testtls: force TLS accesses on a non-main thread

@cherrymui
Copy link
Member

Maybe we could work around this by letting the go command scan ldflags, and if it is linking statically on ARM, force recompiling all the C code in non-PIC mode?

@bcmills
Copy link
Contributor

bcmills commented Sep 24, 2019

@cherrymui, would you say this is a bug in the C compiler? If so, should we try upgrading the C compiler on the linux-arm builder?

Or, could we have misc/cgo/testtls probe for this bug explicitly (by, say, running the C test program) and skip the test (or suppress the error) if the test program fails?

@cherrymui
Copy link
Member

Yeah, I think this is a bug in the C toolchain, probably either the linker or glibc. The compiler seems ok.

I looked at the source code of glibc and the linker (gold) at tip, and I didn't see anything changed that might fix this. I doubt upgrading the C toolchain would help.

@bcmills bcmills added the NeedsFix The path to resolution is known, but the work has not been done. label Sep 24, 2019
@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 24, 2019
@bcmills bcmills changed the title runtime: testtls segmentation fault on a linux-arm builder misc/cgo/testtls: skip test if C toolchain exhibits known TLS bugs Sep 24, 2019
@bcmills
Copy link
Contributor

bcmills commented Sep 27, 2019

@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done. Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

9 participants