Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gccgo: regression in gcc 5.2.1 in gccgo testcase 235.go on ppc64le, ppc64 #13662

Closed
laboger opened this issue Dec 17, 2015 · 10 comments
Closed
Milestone

Comments

@laboger
Copy link
Contributor

laboger commented Dec 17, 2015

This regression has been around since 5.2.1, and started failing with r227785 on the gcc release 5 branch. You can see the failures in the gcc-testresults output for ppc64le and ppc64 consistently since then. If I remove that change the testcase passes. It does not fail on trunk.

Here is the information from go.log:

Setting GOARCH=ppc64le
Executing on host: /home/boger/gccgo.work/gcc5.0/bld/gcc/testsuite/go/../../gccgo -B/home/boger/gccgo.work/gcc5.0/bld/gcc/testsuite/go/../../ /home/boger/gccgo.work/gcc5.0/src/gcc/testsuite/go.test/test/235.go -fno-diagnostics-show-caret -fdiagnostics-color=never -I/home/boger/gccgo.work/gcc5.0/bld/powerpc64le-linux/./libgo -w -O2 -g -L/home/boger/gccgo.work/gcc5.0/bld/powerpc64le-linux/./libgo -L/home/boger/gccgo.work/gcc5.0/bld/powerpc64le-linux/./libgo/.libs -lm -o /home/boger/gccgo.work/gcc5.0/bld/gcc/testsuite/go/235.x (timeout = 300)
spawn /home/boger/gccgo.work/gcc5.0/bld/gcc/testsuite/go/../../gccgo -B/home/boger/gccgo.work/gcc5.0/bld/gcc/testsuite/go/../../ /home/boger/gccgo.work/gcc5.0/src/gcc/testsuite/go.test/test/235.go -fno-diagnostics-show-caret -fdiagnostics-color=never -I/home/boger/gccgo.work/gcc5.0/bld/powerpc64le-linux/./libgo -w -O2 -g -L/home/boger/gccgo.work/gcc5.0/bld/powerpc64le-linux/./libgo -L/home/boger/gccgo.work/gcc5.0/bld/powerpc64le-linux/./libgo/.libs -lm -o /home/boger/gccgo.work/gcc5.0/bld/gcc/testsuite/go/235.x^M
PASS: go.test/test/235.go compilation, -O2 -g
Setting LD_LIBRARY_PATH to .:/home/boger/gccgo.work/gcc5.0/bld/powerpc64le-linux/./libgo/.libs:/home/boger/gccgo.work/gcc5.0/bld/gcc:.:/home/boger/gccgo.work/gcc5.0/bld/powerpc64le-linux/./libgo/.libs:/home/boger/gccgo.work/gcc5.0/bld/gcc:/usr/local/gccgo/lib64
spawn [open ...]^M
bad: 0 should be 2
panic: 235

goroutine 16 [running]:
main.main
/home/boger/gccgo.work/gcc5.0/src/gcc/testsuite/go.test/test/235.go:72
created by main
/home/boger/gccgo.work/gcc5.0/bld/../src/libgo/runtime/go-main.c:48

goroutine 19 [chan receive]:
main.$nested0
/home/boger/gccgo.work/gcc5.0/src/gcc/testsuite/go.test/test/235.go:19
created by main.M
/home/boger/gccgo.work/gcc5.0/src/gcc/testsuite/go.test/test/235.go:17

goroutine 20 [chan receive]:
main.$nested0
/home/boger/gccgo.work/gcc5.0/src/gcc/testsuite/go.test/test/235.go:19
created by main.M
/home/boger/gccgo.work/gcc5.0/src/gcc/testsuite/go.test/test/235.go:17

goroutine 21 [chan receive]:
main.$nested0
/home/boger/gccgo.work/gcc5.0/src/gcc/testsuite/go.test/test/235.go:19
created by main.M
/home/boger/gccgo.work/gcc5.0/src/gcc/testsuite/go.test/test/235.go:17
FAIL: go.test/test/235.go execution, -O2 -g

@ianlancetaylor ianlancetaylor added this to the Gccgo milestone Dec 17, 2015
@ianlancetaylor
Copy link
Contributor

That's not good. The only effect of that change is to change when the garbage collector runs. If that causes the test to fail, then I think there must be something wrong with the garbage collector.

@laboger
Copy link
Contributor Author

laboger commented Dec 18, 2015

After further debugging I find that it fails if compiled with -O2 but passes without. Must be an optimization bug, I'll investigate further.

@laboger
Copy link
Contributor Author

laboger commented Dec 18, 2015

I've tried various experiments. It does appear to be a GC issue with freeing something too early. I can't get it to fail on gcc6; on gcc5 it passes without optimization, passes at -O2 if I set GOGC=off. After adding print statements I see that 'F' gets trashed before it gets in the case where it fails.

func main() {
    F := []uint64{2, 3, 5}
    var n = len(F)
    OUT := []uint64{
            2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 27, 30, 32, 36,
            40, 45, 48, 50, 54, 60, 64, 72, 75, 80, 81, 90, 96, 100, 108, 120, 125,
            128, 135, 144, 150, 160, 162, 180, 192, 200, 216, 225, 240, 243, 250,
            256, 270, 288, 300, 320, 324, 360, 375, 384, 400, 405, 432, 450, 480,
            486, 500, 512, 540, 576, 600, 625, 640, 648, 675, 720, 729, 750, 768,
            800, 810, 864, 900, 960, 972, 1000, 1024, 1080, 1125, 1152, 1200, 1215,
            1250, 1280, 1296, 1350, 1440, 1458, 1500, 1536, 1600}

    x := uint64(1)
    ins := make([]T, n)
    outs := make([]T, n)
    xs := make([]uint64, n)
    fmt.Printf("F[0]: %d F[1]: %d F[2]: %d\n", F[0], F[1], F[2])

./235
F[0]: 0 F[1]: 0 F[2]: 0
bad: 0 should be 2

If I add a print statement right after F is initialized, then the values are good and the test passes.

I don't know enough about what GC does or how it decides it shouldn't free something. Must be something unique about this testcase or else we'd be seeing lots of GC problems?

@laboger
Copy link
Contributor Author

laboger commented Jan 4, 2016

I've done some investigation on this and there are two things that don't seem right for the gcc 5 branch with gccgo on Power.

First, the failure started happening with commit 227785, with the change to subtract off the value of runtime_stacks_sys from the next_gc value.
// conservatively set next_gc to high value assuming that everything is live
// concurrent/lazy sweep will reduce this number while discovering new garbage
mstats.next_gc = mstats.heap_alloc+(mstats.heap_alloc-runtime_stacks_sys)*gcpercent/100;

On Power in gcc5, split stack support was not backported so the value of runtime_stacks_sys is a large value, which means next_gc is not being set to a high value as described in the comments. (In other words, the value of runtime_stacks_sys is likely to be a large portion of the total heap at the beginning). Instead the next run of GC is done soon. The problem doesn't happen on gcc6 because of the split stack implementation or the effect of this patch https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03503.html, both of which keep runtime_stacks_sys as a low value.

But as was noted above, running GC more frequently shouldn't cause a failure so there also seems to be a problem related to the use of GC with optimized code. I found that if the pointer returned from a heap allocation is kept in a register and never stored then it is being freed too early by GC causing the failure in this test. When next_gc is a low value and the next GC is done soon this problem is more likely to occur.

   F := []uint64{2, 3, 5}
10002248:   39 f8 ff 4b     bl      10001a80 <000000bf.plt_call.__go_new_nopointers>
1000224c:   18 00 41 e8     ld      r2,24(r1)
10002250:   fe ff e2 3c     addis   r7,r2,-2
10002254:   60 6c e7 e8     ld      r7,27744(r7)
10002258:   fe ff 02 3d     addis   r8,r2,-2
1000225c:   68 6c 08 e9     ld      r8,27752(r8)
10002260:   fe ff 22 3d     addis   r9,r2,-2
10002264:   70 6c 29 e9     ld      r9,27760(r9)
./src/gcc/testsuite/go.test/test/235.go:42
    var n = len(F)
    OUT := []uint64{
10002268:   20 03 80 38     li      r4,800
 ./src/gcc/testsuite/go.test/test/235.go:40
    return m
}


func main() {
    F := []uint64{2, 3, 5}
1000226c:   78 1b 7b 7c     mr      r27,r3    <=== r3 is not stored
10002270:   00 00 e3 f8     std     r7,0(r3)
10002274:   08 00 03 f9     std     r8,8(r3)
10002278:   f8 ff 7b 3b     addi    r27,r27,-8
1000227c:   10 00 23 f9     std     r9,16(r3)
./src/gcc/testsuite/go.test/test/235.go:42
    var n = len(F)
    OUT := []uint64{
10002280:   ff ff 62 3c     addis   r3,r2,-1
10002284:   e0 78 63 38     addi    r3,r3,30944
10002288:   f9 f7 ff 4b     bl      10001a80 <000000bf.plt_call.__go_new_nopointers>
1000228c:   18 00 41 e8     ld      r2,24(r1)
10002290:   20 03 a0 38     li      r5,800
10002294:   18 00 95 38     addi    r4,r21,24
10002298:   08 01 61 f8     std     r3,264(r1)
1000229c:   95 f9 ff 4b     bl      10001c30 <000000bf.plt_call.memcpy@@GLIBC_2.17>

By the time the call is made to memcpy, the storage containing the values that should be copied have been garbage collected and the values are 0. When compiled without optimization, the pointer is stored and the failure does not occur.

@ianlancetaylor
Copy link
Contributor

Thanks for looking into this. It shouldn't matter that r3 is not stored, because the value was copied into r27. Presumably the fmt.Printf, where the problem appears, is using r27 to retrieve the values.

What puzzles me, looking at the partial disassembly, is the addi r27,r27,-8. Why is that there? That means that r27 is not pointing into the buffer allocated by __go_new_nopointers. It appears that nothing is pointing into that buffer, which is presumably why the buffer is being freed. What does the instruction sequence look like for the fmt.Printf? Where does 8 get added back to r27?

@ianlancetaylor
Copy link
Contributor

I think the problem can be avoided by using the -fno-ivopts option, but it would be unfortunate to require that for all Go code.

@ianlancetaylor
Copy link
Contributor

I asked a question about this on the GCC mailing list: https://gcc.gnu.org/ml/gcc/2016-01/msg00023.html .

@laboger
Copy link
Contributor Author

laboger commented Jan 6, 2016

FYI...

Original problem was reported on gccgo from gcc 5.2.1, and did not fail
with gccgo built from trunk in my original experiments; however:

If I build the test with gccgo from trunk using -O2 and use GOGC=10 then
it also fails.
As you probably know, adding -fno-ivopts fixes it.

On 01/06/2016 09:19 AM, Ian Lance Taylor wrote:

I asked a question about this on the GCC mailing list:
https://gcc.gnu.org/ml/gcc/2016-01/msg00023.html .


Reply to this email directly or view it on GitHub
#13662 (comment).

@ianlancetaylor ianlancetaylor self-assigned this Jan 27, 2016
@ianlancetaylor
Copy link
Contributor

This is fixed on trunk.

Leaving open for now for the possibility of a backport to the GCC 5 branch.

@laboger
Copy link
Contributor Author

laboger commented Oct 18, 2016

We don't care about a backport of this to the gcc5 branch, so if you want to close it that's fine.

@golang golang locked and limited conversation to collaborators Oct 18, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants