Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: continuing TestSegv/SegvInCgo failures with "unknown pc" #50979

Open
bcmills opened this issue Feb 2, 2022 · 19 comments
Open

runtime: continuing TestSegv/SegvInCgo failures with "unknown pc" #50979

bcmills opened this issue Feb 2, 2022 · 19 comments
Assignees
Labels
arch-arm Issues solely affecting the 32-bit arm architecture. compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Feb 2, 2022

#!watchflakes
post <- pkg == "runtime" && test == "TestSegv" && `SegvInCgo` && `unknown pc`
--- FAIL: TestSegv (0.00s)
    --- FAIL: TestSegv/SegvInCgo (0.15s)
        testenv.go:361: [/var/gobuilder/buildlet/tmp/go-build403866584/testprogcgo.exe SegvInCgo] exit status: exit status 2
        crash_cgo_test.go:596: SIGSEGV: segmentation violation
            PC=0x6e267e74 m=2 sigcode=0
            
            goroutine 0 [idle]:
            runtime: unknown pc 0x6e267e74
            stack: frame={sp:0x5d9cddd8, fp:0x0} stack=[0x5d5ce388,0x5d9cdf88)
            0x5d9cdd58:  0x00122318  0x5d9cddf4  0x5d9cdd84  0x5d9cdd70 
            0x5d9cdd68:  0x6e2eb950  0x6e2e8380  0x6dec0000  0x00000000 
            0x5d9cdd78:  0x5d9cdda4  0x5d9cdd88  0x6e2eba88  0x6e2e90a8 
            0x5d9cdd88:  0x000f4240 <net.(*resolverConfig).tryUpdate+0x00000264>  0x5d9cddec  0x01406f40  0x6df6a000 
            0x5d9cdd98:  0x5d9cdde4  0x5d9cdda8  0x001222a0  0x6e2eba20 
            0x5d9cdda8:  0x5d9cdddc  0x5d9cddb8  0x6e1b3088  0x6e1f7b30 
            0x5d9cddb8:  0x00000001  0x5d9cddf4  0x5d9cde10  0x6df6a000 
            0x5d9cddc8:  0x0021da48  0x0042e240  0x000000f0  0x004025a0 
            0x5d9cddd8: <0x5d9cde3c  0x5d9cdde8  0x00122470  0x6e2e9838 
            0x5d9cdde8:  0x000000f0  0x6dec0000  0x00400000  0x22220002 
            0x5d9cddf8:  0x00000000  0x00000000  0xffffffff  0xffffffff 
            0x5d9cde08:  0xffffffff  0xffffffff  0x00000000  0x00000000 
            0x5d9cde18:  0x00000000  0x00000000  0x00000000  0x5d9e621e 
            0x5d9cde28:  0x00000040  0x00000000  0x0021dbd8  0x5d9cde40 
            0x5d9cde38:  0x0007b888 <runtime.asmcgocall+0x000000ac>  0x001223e8  0x00245974  0x00000000 
            0x5d9cde48:  0x00000001  0x00423901  0x5d9cde5c  0x00000000 
            runtime: unknown pc 0x6e267e74
            stack: frame={sp:0x5d9cddd8, fp:0x0} stack=[0x5d5ce388,0x5d9cdf88)
            0x5d9cdd58:  0x00122318  0x5d9cddf4  0x5d9cdd84  0x5d9cdd70 
            0x5d9cdd68:  0x6e2eb950  0x6e2e8380  0x6dec0000  0x00000000 
            0x5d9cdd78:  0x5d9cdda4  0x5d9cdd88  0x6e2eba88  0x6e2e90a8 
            0x5d9cdd88:  0x000f4240 <net.(*resolverConfig).tryUpdate+0x00000264>  0x5d9cddec  0x01406f40  0x6df6a000 
            0x5d9cdd98:  0x5d9cdde4  0x5d9cdda8  0x001222a0  0x6e2eba20 
            0x5d9cdda8:  0x5d9cdddc  0x5d9cddb8  0x6e1b3088  0x6e1f7b30 
            0x5d9cddb8:  0x00000001  0x5d9cddf4  0x5d9cde10  0x6df6a000 
            0x5d9cddc8:  0x0021da48  0x0042e240  0x000000f0  0x004025a0 
            0x5d9cddd8: <0x5d9cde3c  0x5d9cdde8  0x00122470  0x6e2e9838 
            0x5d9cdde8:  0x000000f0  0x6dec0000  0x00400000  0x22220002 
            0x5d9cddf8:  0x00000000  0x00000000  0xffffffff  0xffffffff 
            0x5d9cde08:  0xffffffff  0xffffffff  0x00000000  0x00000000 
            0x5d9cde18:  0x00000000  0x00000000  0x00000000  0x5d9e621e 
            0x5d9cde28:  0x00000040  0x00000000  0x0021dbd8  0x5d9cde40 
            0x5d9cde38:  0x0007b888 <runtime.asmcgocall+0x000000ac>  0x001223e8  0x00245974  0x00000000 
            0x5d9cde48:  0x00000001  0x00423901  0x5d9cde5c  0x00000000 
            
            goroutine 1 [sleep]:
            time.Sleep(0x3b9aca00)
            	/var/gobuilder/buildlet/go/src/runtime/time.go:194 +0x170
            main.SegvInCgo()
            	/var/gobuilder/buildlet/go/src/runtime/testdata/testprogcgo/segv.go:56 +0xcc
            main.main()
            	/var/gobuilder/buildlet/go/src/runtime/testdata/testprogcgo/main.go:34 +0x158
            
            goroutine 6 [runnable]:
            main._Cfunc_nop()
            	_cgo_gotypes.go:364 +0x38
            main.SegvInCgo.func1()
            	/var/gobuilder/buildlet/go/src/runtime/testdata/testprogcgo/segv.go:47 +0x20
            created by main.SegvInCgo
            	/var/gobuilder/buildlet/go/src/runtime/testdata/testprogcgo/segv.go:44 +0x74
            
            trap    0x0
            error   0x0
            oldmask 0x0
            r0      0x0
            r1      0x0
            r2      0x0
            r3      0x0
            r4      0x0
            r5      0x5d9cde10
            r6      0x6df6a000
            r7      0x21da48
            r8      0x42e240
            r9      0xf0
            r10     0x4025a0
            fp      0x5d9cdde4
            ip      0x6e300674
            sp      0x5d9cddd8
            lr      0x6e2e983c
            pc      0x6e267e74
            cpsr    0x400d0010
            fault   0x0
            
        crash_cgo_test.go:622: unexpectedly saw "runtime: " in output
FAIL
FAIL	runtime	155.487s

greplogs --dashboard -md -l -e '\Anetbsd-.*(?:\n.*)*FAIL: TestSegv/SegvInCgo .*(?:\n .*)*unknown pc' --since=2022-01-07

2022-02-01T16:10:04-93fe469/netbsd-arm-bsiegert

It is not obvious to me whether this has the same underlying cause as #50605.
(See previously #49182; CC @prattmic @cherrymui.)

@bcmills bcmills added arch-arm Issues solely affecting the 32-bit arm architecture. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-NetBSD labels Feb 2, 2022
@bcmills bcmills added this to the Backlog milestone Feb 2, 2022
@bcmills bcmills changed the title runtime: continuing TestSegv/SegvInCGO failures with "unknown pc" on netbsd-arm-bsiegert runtime: continuing TestSegv/SegvInCGO failures with "unknown pc" Feb 7, 2022
@bcmills bcmills removed the OS-NetBSD label Feb 7, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Feb 7, 2022

This failure mode has now also occurred on linux-amd64-clang, and linux/amd64 is a first class port.

Given how rare the failure seems to be and how little time is left in the Go 1.18 cycle, marking as release-blocker for Go 1.19 (instead of 1.18).

greplogs --dashboard -md -l -e 'FAIL: TestSegv/SegvInCgo .*(?:\n .*)*runtime: unknown pc' --since=2022-01-07

2022-02-04T22:34:05-f9763a6/linux-amd64-clang
2022-02-01T16:10:04-93fe469/netbsd-arm-bsiegert
2022-01-13T23:35:37-e550c30/linux-mips64le-mengzhuo

@bcmills
Copy link
Contributor Author

bcmills commented Feb 11, 2022

greplogs --dashboard -md -l -e 'FAIL: TestSegv/SegvInCgo .*(?:\n .*)*runtime: unknown pc' --since=2022-02-07

2022-02-10T15:23:05-452f24a/linux-riscv64-unmatched

@bcmills
Copy link
Contributor Author

bcmills commented Feb 11, 2022

Four flakes in a month during the testing-lull that is the code freeze makes me think this test is too noisy to leave enabled in the runtime package.

If we aren't going to be able to fix it ahead of the 1.18 release — especially given that one of the failures was observed on a first-class port — I think it at least needs a skip.

@bcmills bcmills modified the milestones: Go1.19, Go1.18 Feb 11, 2022
@gopherbot
Copy link

Change https://go.dev/cl/385154 mentions this issue: runtime: skip TestSegv/SegvInCgo failures with "runtime: unknown pc"

@bcmills bcmills modified the milestones: Go1.18, Go1.19 Feb 11, 2022
gopherbot pushed a commit that referenced this issue Feb 11, 2022
This test has failed on four different builders in the past month.
Moreover, because every Go program depends on "runtime", it is likely
to be run any time a user runs 'go test all' in their own program.

Since the test is known to be flaky, let's skip it to avoid
introducing testing noise until someone has time to investigate. It
seems like we have enough samples in the builder logs to at least
start with.

For #50979

Change-Id: I9748a82fbb97d4ed95d6f474427e5aa6ecdb023d
Reviewed-on: https://go-review.googlesource.com/c/go/+/385154
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@bcmills
Copy link
Contributor Author

bcmills commented Feb 11, 2022

This failure mode is now skipped, so this is no longer a 1.18 release-blocker. (I'll leave it up to @cherrymui and @prattmic to decide whether to prioritize a fix or move it to the Backlog.)

@bcmills bcmills changed the title runtime: continuing TestSegv/SegvInCGO failures with "unknown pc" runtime: continuing TestSegv/SegvInCgo failures with "unknown pc" Feb 24, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Mar 9, 2022

@prattmic, it looks like the change in the throw print in CL 390034 cause the skip for this test to no longer take effect. Could you update the skip?

greplogs --dashboard -md -l -e 'FAIL: TestSegv/SegvInCgo .*(?:\n .*)*unknown pc' --since=2022-03-01

2022-03-08T21:16:53-c3c7477/linux-mips64le-mengzhuo
2022-03-07T16:24:54-cc9d3f5/linux-mips64le-mengzhuo

@prattmic
Copy link
Member

prattmic commented Mar 9, 2022

Apologies, I thought I checked for these references, but didn't do a good job.

@gopherbot
Copy link

Change https://go.dev/cl/391139 mentions this issue: runtime: fix SegvInCgo skip check

gopherbot pushed a commit that referenced this issue Mar 10, 2022
CL 390034 changed this throw message to add the goid, breaking the
match.

For #50979.

Change-Id: I52d97695484938701e5b7c269e2caf0c87d44d7a
Reviewed-on: https://go-review.googlesource.com/c/go/+/391139
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
@heschi
Copy link
Contributor

heschi commented Mar 16, 2022

ping -- what's the status of this issue?

@prattmic
Copy link
Member

The failures here should be skipped, but we'd still like to investigate. Not a release blocker, though.

@ianlancetaylor
Copy link
Contributor

Rolling forward to 1.20. Please comment if you disagree. Thanks.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.19, Go1.20 Jun 24, 2022
@mknyszek mknyszek added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 11, 2022
@gopherbot
Copy link

Change https://go.dev/cl/430375 mentions this issue: runtime: Ignore "unknown pc" error in TestSegv/Segv

@bcmills
Copy link
Contributor Author

bcmills commented Sep 14, 2022

@golang/runtime, this failure mode should either be fixed or skipped, and there is an open CL (https://go.dev/cl/430375) that does the latter.

Marking as release-blocker pending a decision to either investigate and fix or merge the skip. Please don't leave flaky tests running if they aren't actively being worked on.

@bcmills bcmills added release-blocker arch-arm Issues solely affecting the 32-bit arm architecture. and removed arch-arm Issues solely affecting the 32-bit arm architecture. labels Sep 14, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Sep 14, 2022

Hmm.... Actually, looking a that CL it is for TestSegv/Segv, not TestSegv/SegvInCgo. 🤔

@cherrymui
Copy link
Member

For the openbsd failures

    --- FAIL: TestSegv/Segv (0.09s)
        crash_test.go:58: /tmp/workdir/tmp/go-build3686268924/testprogcgo.exe Segv: exit status 2
        crash_cgo_test.go:618: fatal error: unexpected signal during runtime execution
            [signal SIGSEGV: segmentation violation code=0x14 addr=0xe0000000b pc=0x2a15a3c4a]
            
            runtime stack:
            runtime.throw({0x248e92?, 0x0?})
            	/tmp/workdir/go/src/runtime/panic.go:1047 +0x5d fp=0x266840070 sp=0x266840040 pc=0x35cd7d
            runtime.sigpanic()
            	/tmp/workdir/go/src/runtime/signal_unix.go:821 +0x3e9 fp=0x2668400d0 sp=0x266840070 pc=0x373889
            runtime.syscall()
            	/tmp/workdir/go/src/runtime/sys_openbsd_amd64.s:466 +0x1f fp=0x2668400e0 sp=0x2668400d0 pc=0x392a3f

            goroutine 1 [syscall]:
            syscall.syscall(0x3b55a0, 0x1f0e, 0xb, 0x0)
            	/tmp/workdir/go/src/runtime/sys_openbsd3.go:24 +0x3b fp=0xc00005be48 sp=0xc00005be28 pc=0x38d77b
            syscall.syscall(0x32e1f8?, 0xc00007c0c0?, 0x0?, 0x43c201?)
            	<autogenerated>:1 +0x26 fp=0xc00005be90 sp=0xc00005be48 pc=0x393406
            syscall.Kill(0xc0ffffffff?, 0x0?)
            	/tmp/workdir/go/src/syscall/zsyscall_openbsd_amd64.go:893 +0x2f fp=0xc00005bec0 sp=0xc00005be90 pc=0x3b412f

sigpanic is called, which means the Go signal handler treats it as a kernel-issued SIGSEGV, instead of a user-sent one.

@golang/openbsd do you know if the OpenBSD kernel may report a user-sent SIGSEGV as kernel-sent (i.e. the signal code being not SI_USER)? Thanks.

@cherrymui cherrymui modified the milestones: Go1.20, Go1.21 Jan 13, 2023
@laboger
Copy link
Contributor

laboger commented Jan 26, 2023

I think this is the same problem as #52963 and @pmur created a CL to skip this failure https://go.dev/cl/430375.

There is a Go signal handler that tries to do a Go backtrace but gets sent to a thread running C code which won't work.

@bcmills
Copy link
Contributor Author

bcmills commented Jan 26, 2023

If the Go signal handler is trying to obtain a backtrace, does the test binary need to call runtime.SetCgoTraceback with some appropriate callback to ensure that it can backtrace a signal delivered to a C stack?

(It looks like the fact that the thread is running C code is an intentional part of the test.)

Or is the problem that the thread is running C code without any Go frames on the stack? In that case, would it help to thread-lock the C.nop() goroutine and use syscall.Tgkill (where available) to deliver the signal to that thread in particular? (Compare #19326.)

@laboger
Copy link
Contributor

laboger commented Jan 26, 2023

This is not easily reproducible. I think @pmur was able to make it fail and found that it was on a stack that was running only C code. But the Go stacktracer only works for a Go stackframe at least on PPC64. Slot 0 of the frame is the LR value in Go, but it is the caller's stack pointer in C and that is why you get the unknown PC error when running C code.

See @cherrymui's suggestion in the CL for fixing the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-arm Issues solely affecting the 32-bit arm architecture. compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: No status
Status: No status
Development

No branches or pull requests

8 participants