Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gccgo: gotools testcase failures on ppc64le #30508

Closed
laboger opened this issue Mar 1, 2019 · 11 comments
Closed

gccgo: gotools testcase failures on ppc64le #30508

laboger opened this issue Mar 1, 2019 · 11 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@laboger
Copy link
Contributor

laboger commented Mar 1, 2019

What version of Go are you using (go version)?

$ go version
go version
go version go1.12 gccgo (GCC) 9.0.1 20190226 (experimental) linux/ppc64le

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

$ go env
linux/ppc64le Ubuntu 18.04
We don't see this error on linux/ppc64, although it's possible it is being skipped there.

What did you do?

make check from the gotools directory

What did you expect to see?

No failures

What did you see instead?

There are two failures when running the gotools tests on the latest gccgo on ppc64le.

This failure has been happening since r264546 which was the upgrade to Go 1.11.

=== RUN TestAbort
FAIL: TestAbort
crash_test.go:95: testprog Abort exit status: exit status 2
crash_test.go:683: output does not contain "runtime.abort":
SIGABRT: abort
PC=140704077965708 m=0 sigcode=18446744073709551610

    goroutine 1 [running]:
    runtime.sighandler
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/signal_sighandler.go:145
    runtime.sigtrampgo
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/signal_unix.go:314
    runtime.sigtramp
            /home/boger/gccgo.work/trunk/bld/../src/libgo/runtime/go-signal.c:131

            :0
    gsignal
            :0

    goroutine 2 [force gc (idle)]:
    runtime.mcall
            /home/boger/gccgo.work/trunk/bld/../src/libgo/runtime/proc.c:344
    runtime.gopark
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/proc.go:330
    runtime.goparkunlock
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/proc.go:336
    runtime.forcegchelper
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/proc.go:279
    runtime.kickoff
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/proc.go:1198
    created by runtime.runtime..init3
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/proc.go:266 +76

    goroutine 3 [finalizer wait]:
    runtime.mcall
            /home/boger/gccgo.work/trunk/bld/../src/libgo/runtime/proc.c:344
    runtime.gopark
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/proc.go:330
    runtime.goparkunlock
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/proc.go:336
    runtime.runfinq
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/mfinal.go:146
    runtime.kickoff
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/proc.go:1198
    created by runtime.SetFinalizer
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/mfinal.go:370 +1420

   goroutine 4 [GC sweep wait]:
    runtime.mcall
            /home/boger/gccgo.work/trunk/bld/../src/libgo/runtime/proc.c:344
    runtime.gopark
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/proc.go:330
    runtime.goparkunlock
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/proc.go:336
    runtime.bgsweep
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/mgcsweep.go:72
    runtime.kickoff
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/proc.go:1198
    created by runtime.main
            /home/boger/gccgo.work/trunk/bld/../src/libgo/go/runtime/proc.go:218 +724

    r0 0xae
  r1 0x7ff8365efac0
    r2 0x7ff838b37200
    r3 0x0
    r4 0x7ff8365efae8
    r5 0x0
    r6 0x8
    r7 0x7ff83895e950
    r8 0x900000010280f033
    r9 0x0
    r10 0x0
    r11 0x0
    r12 0x0
    r13 0x7ff83a64bf00
    r14 0x0
    r15 0x0
    r16 0x0
    r17 0x0
    r18 0x0
    r19 0x0
    r20 0x0
    r21 0x0
    r22 0x0
    r23 0x0
    r24 0x0
    r25 0x7ff839ca532c
    r26 0x7ff83a494740
    r27 0xc0001ea060
   r28 0x7ff83a5ae608
    r29 0x6
    r30 0x0
    r31 0x7ff8365efae8
    pc  0x7ff83895e98c
    msr 0x900000010280f033
    cr  0x44000444
    lr  0x7ff83895e89c
    ctr 0x0
    xer 0x0

The second failure is a link error which has a bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89227

@gopherbot gopherbot added this to the Gccgo milestone Mar 1, 2019
@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 1, 2019
@bcmills
Copy link
Contributor

bcmills commented Mar 1, 2019

CC @ianlancetaylor @thanm

@ianlancetaylor
Copy link
Contributor

I can't recreate either of these problems. I'm testing on a machine in the GCC compile farm, gcc112.fsffrance.org. It's running CentOS Linux release 7.6.1810. On that machine make check-gotools passes with no errors.

@laboger
Copy link
Contributor Author

laboger commented Mar 1, 2019

I just noticed that the linker error does not appear on our gcc-testresults reports for power8 or power9, but I hit it every time on my builds. I will try to find what's different -- how I configure, or the gcc or ld versions on my machine. I've been mostly using newer distros.

The TestAbort failure appears consistently on our gcc-testresults and I hit it consistently.

I will try some older distros and try different configure options and see if the behavior is different.

@laboger
Copy link
Contributor Author

laboger commented Mar 7, 2019

FYI... the linker bug in the bugzilla has been fixed. TestAbort still fails in more recent distros.

@ianlancetaylor
Copy link
Contributor

After building the program in libgo/go/runtime/testdata/testprog, the output I see for GOTRACEBACK=system testprog Abort on CentOS 7.6 is as follows.

testprog output
SIGABRT: abort
PC=70367464585968 m=0 sigcode=18446744073709551610

goroutine 1 [running]:
runtime.sighandler
	../../../trunk/libgo/go/runtime/signal_sighandler.go:145
runtime.sigtrampgo
	../../../trunk/libgo/go/runtime/signal_unix.go:314
runtime.sigtramp
	../../../trunk/libgo/runtime/go-signal.c:54

	:0
__GI_raise
	:0
abort
	:0
runtime.abort
	../../../trunk/libgo/runtime/panic.c:46
main.Abort
	/home/ian/trunk/libgo/go/runtime/testdata/testprog/abort.go:21
main.main
	/home/ian/trunk/libgo/go/runtime/testdata/testprog/main.go:34
runtime.main
	../../../trunk/libgo/go/runtime/proc.go:226

goroutine 2 [force gc (idle)]:
runtime.mcall
	../../../trunk/libgo/runtime/proc.c:344
runtime.gopark
	../../../trunk/libgo/go/runtime/proc.go:330
runtime.goparkunlock
	../../../trunk/libgo/go/runtime/proc.go:336
runtime.forcegchelper
	../../../trunk/libgo/go/runtime/proc.go:279
runtime.kickoff
	../../../trunk/libgo/go/runtime/proc.go:1205
created by runtime.runtime..init3
	../../../trunk/libgo/go/runtime/proc.go:266 +56

goroutine 3 [finalizer wait]:
runtime.mcall
	../../../trunk/libgo/runtime/proc.c:344
runtime.gopark
	../../../trunk/libgo/go/runtime/proc.go:330
runtime.goparkunlock
	../../../trunk/libgo/go/runtime/proc.go:336
runtime.runfinq
	../../../trunk/libgo/go/runtime/mfinal.go:146
runtime.kickoff
	../../../trunk/libgo/go/runtime/proc.go:1205
created by runtime.SetFinalizer
	../../../trunk/libgo/go/runtime/mfinal.go:370 +1388

goroutine 4 [GC sweep wait]:
runtime.mcall
	../../../trunk/libgo/runtime/proc.c:344
runtime.gopark
	../../../trunk/libgo/go/runtime/proc.go:330
runtime.goparkunlock
	../../../trunk/libgo/go/runtime/proc.go:336
runtime.bgsweep
	../../../trunk/libgo/go/runtime/mgcsweep.go:72
runtime.kickoff
	../../../trunk/libgo/go/runtime/proc.go:1205
created by runtime.main
	../../../trunk/libgo/go/runtime/proc.go:218 +708

r0 0xfa
r1 0x3fffb186fc40
r2 0x3fffb3d57400
r3 0x0
r4 0x4208
r5 0x6
r6 0x8
r7 0x1
r8 0x0
r9 0x0
r10 0x0
r11 0x0
r12 0x0
r13 0x3fffb564c130
r14 0x0
r15 0x0
r16 0x0
r17 0x0
r18 0x0
r19 0x0
r20 0x0
r21 0x0
r22 0x0
r23 0x0
r24 0x0
r25 0x3fffb4d63130
r26 0x3fffb54b7690
r27 0xc00019a060
r28 0x3fffb55c2b60
r29 0x3fffb186fed9
r30 0x3fffb186fcf8
r31 0x6
pc  0x3fffb3bafaf0
msr 0x900000010280f033
cr  0x42000844
lr  0x3fffb3bb1e6c
ctr 0x0
xer 0x0

It looks like the difference is that on your system libgcc is unable to unwind through the signal handler. The key point is whether the function get_regs in libgcc/config/rs6000/linux-unwind.h is able to detect the signal handler. The code there expects the signal return frame to look like either of

  /* addi r1, r1, 128; li r0, 0x0077; sc  (sigreturn) */
  /* addi r1, r1, 128; li r0, 0x00AC; sc  (rt_sigreturn) */

If you run gdb --args testprog Abort, then inside gdb do break 'runtime.sigtramp', then run. You should see a break because of SIGABRT; type cont. gdb should then hit the breakpoint at runtime.sigtramp. At that point do up and x/5i $pc. I see this:

=> 0x3fffb7f90478 <__kernel_sigtramp_rt64>:	addi    r1,r1,128
   0x3fffb7f9047c <__kernel_sigtramp_rt64+4>:	li      r0,172
   0x3fffb7f90480 <__kernel_sigtramp_rt64+8>:	sc      
   0x3fffb7f90484:	.long 0x0
   0x3fffb7f90488:	.long 0x0

This matches the instruction sequence expected by get_regs.

What do you see?

@laboger
Copy link
Contributor Author

laboger commented May 29, 2019

I did this on Ubuntu 18.04 and still got the failures.
I did what you suggest above and see the same output from gdb:

(gdb) break 'runtime.sigtramp'
Function "runtime.sigtramp" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 ('runtime.sigtramp') pending.
(gdb) run
Starting program: /tmp/go-build960875584/testprog.exe Abort
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff3f5eff0 (LWP 132459)]
[New Thread 0x7ffff372eff0 (LWP 132460)]
[New Thread 0x7ffff2eceff0 (LWP 132461)]

Thread 1 "testprog.exe" received signal SIGABRT, Aborted.
0x00007ffff630e98c in __libc_signal_restore_set (set=0x7ffff3f9fb18) at ../sysdeps/unix/sysv/linux/nptl-signals.h:80
80	../sysdeps/unix/sysv/linux/nptl-signals.h: No such file or directory.
(gdb) c
Continuing.

Thread 1 "testprog.exe" hit Breakpoint 1, runtime.sigtramp (sig=6, info=0x7ffff3fefd78, context=0x7ffff3fef000)
    at /home/boger/gccgo.work/trunk/bld/../src/libgo/runtime/go-signal.c:73
73		gp = runtime_g();
(gdb) up
#1  <signal handler called>
(gdb) x/5i $pc
=> 0x7ffff7f804d8 <__kernel_sigtramp_rt64>:	addi    r1,r1,128
   0x7ffff7f804dc <__kernel_sigtramp_rt64+4>:	li      r0,172
   0x7ffff7f804e0 <__kernel_sigtramp_rt64+8>:	sc      
   0x7ffff7f804e4:	.long 0x0
   0x7ffff7f804e8:	.long 0x0
(gdb) quit

@ianlancetaylor
Copy link
Contributor

I'm sorry, I don't know what is happening here. I'm not sure how to make progress if I can't recreate the problem.

@laboger
Copy link
Contributor Author

laboger commented May 30, 2019

OK, looks like this only fails on the Ubuntu systems I used. I ran on a few RHEL 7.6 and those passed. The RHEL 7 systems all had older kernels, not sure if that is the difference. I know this is not a high priority, just wanted to get it off the list of failures. Looks like our test systems are all Ubuntu.

@laboger
Copy link
Contributor Author

laboger commented Jun 24, 2019

I'm pretty sure this problem has to do with split stack. The older distros where this works use glibc 2.17, and gccgo doesn't generate split stack on 2.17 or before. When split-stack is present the test fails.

@ianlancetaylor
Copy link
Contributor

Thanks. Perhaps there is something wrong with the unwind or exception handling information in libgcc/config/rs6000/morestack.S.

@laboger
Copy link
Contributor Author

laboger commented Jan 11, 2021

This has been working for a few months.

@laboger laboger closed this as completed Jan 11, 2021
@golang golang locked and limited conversation to collaborators Jan 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants