Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: crash without a message, potentially in the signal handler #56701

Closed
huskar-t opened this issue Nov 11, 2022 · 11 comments
Closed

runtime: crash without a message, potentially in the signal handler #56701

huskar-t opened this issue Nov 11, 2022 · 11 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@huskar-t
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.19.3 linux/amd64

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://goproxy.cn,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.19.3"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build846302182=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I am running a program using cgo on the centos 7.9-2009 kernel version 3.10.0-1160.71.1.el7.x86_64 machine. This program crashes from time to time. The results of analyzing the core dump file are as follows

gdb
#0 0x0000000000471961 in runtime.raise.abi0()
#1 0x000000000045280d in runtime.raisebadsignal()
#2 0x0000000000000003 in ?? ()
#3 0xffffffffffffbfeff in ?? ()
#4 0x0000000000000000 in ?? ()
(dlv) bt
0  0x0000000000471961 in runtime.raise
   at /usr/local/go/src/runtime/sys_linux_amd64.s:159
1  0x0000000000470001 in runtime.goexit
   at /usr/local/go/src/runtime/asm_amd64.s:1594
(dlv) regs
    Rip = 0x0000000000471961
    Rsp = 0x00007f2a8c2afed8
    Rax = 0x0000000000000000
    Rbx = 0x00007f2a8c2afdd8
    Rcx = 0xffffffffffffffff
    Rdx = 0x0000000000000003
    Rsi = 0x0000000000005c2d
    Rdi = 0x0000000000005c09
    Rbp = 0x00007f2a8c2afef8
     R8 = 0x0000000000000000
     R9 = 0x00007f2a8c2afb90
    R10 = 0x0000000000000008
    R11 = 0x0000000000000246
    R12 = 0x0000000000005c09
    R13 = 0x0000000000801000
    R14 = 0x000000c0021a2d00
    R15 = 0x00007f2a8c2b1700
 Rflags = 0x0000000000000246	[PF ZF IF IOPL=0]
     Es = 0x0000000000000000
     Cs = 0x0000000000000033
     Ss = 0x000000000000002b
     Ds = 0x0000000000000000
     Fs = 0x0000000000000000
     Gs = 0x0000000000000000
Fs_base = 0x00007f2a8c2b1700
Gs_base = 0x0000000000000000

(dlv) thread 23561
Switched from 23597 to 23561
(dlv) bt
0  0x00000000004718fd in runtime.usleep
   at /usr/local/go/src/runtime/sys_linux_amd64.s:140
1  0x0000000000410374 in runtime.notetsleepg
   at /usr/local/go/src/runtime/lock_futex.go:236
2  0x000000000046c5af in os/signal.signal_recv
   at /usr/local/go/src/runtime/sigqueue.go:152
3  0x0000000000aabe39 in os/signal.loop
   at /usr/local/go/src/os/signal/signal_unix.go:23
4  0x0000000000470001 in runtime.goexit
   at /usr/local/go/src/runtime/asm_amd64.s:1594
(dlv) 

Seems to be a crash caused by runtime.usleep ? Please guide me how to proceed

What did you expect to see?

What did you see instead?

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 11, 2022
@mknyszek mknyszek added this to the Backlog milestone Nov 11, 2022
@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 11, 2022
@mknyszek
Copy link
Contributor

It's unlikely that the crash is caused by runtime.usleep. The runtime.raise and runtime.raisebadsignal suggest to me something to do with signals or in the signal handler, but it's hard to say. Do you have a way to reproduce this or are you willing/able to share a core dump? (The lack of an error message strongly suggests to me that something goes wrong and then we crash in the signal handler, due to a segfault or something. We've seen that kind of thing before.)

@mknyszek
Copy link
Contributor

CC @golang/runtime

@mknyszek mknyszek changed the title runtime: No specific information about the crash runtime: crash without a message, potentially in the signal handler Nov 11, 2022
@huskar-t
Copy link
Author

@mknyszek
Thank you very much for your quick reply. Unfortunately there is a requirement for the version of the dynamic library that this program depends on, could you please guide me how to use gdb and dlv to further troubleshoot :)

@huskar-t
Copy link
Author

@mknyszek
Sometimes it runs for a long time without crashing. Can you guide me to use dlv and gdb? I can show you the output. This crash problem is really urgent for me

@cherrymui
Copy link
Member

cherrymui commented Nov 11, 2022

raisebadsignal is only called when a signal landed on a non-Go thread, but has Go signal handler installed (possibly because it calls into Go at some point). And the Go signal handler just re-send the same signal, as it doesn't handle it. If the register dump is trustworthy, it looks like it is signal 3 (from Rdx), which is SIGQUIT. Is there any place in your program that it sends SIGQUIT (or, having another process send it)?

@huskar-t
Copy link
Author

@cherrymui
Thank you very much for your attention. My program uses cgo and I set the environment variable GOTRACEBACK=crash. From the core dump file name, the crashed thread should be the thread created by C, but the bt display is runtime.raise. I'm a little curious if go dump core hides C crash information

@cherrymui
Copy link
Member

cherrymui commented Nov 11, 2022

From the core dump file name, the crashed thread should be the thread created by C, but the bt display is runtime.raise. I'm a little curious if go dump core hides C crash information

I think this is because the signal handler runs on the signal stack (sigaltstack), not the C stack. From the raisebadsignal frame maybe you can print the signal context (the c *sigctxt parameter), which would include the PC, SP, and register state when the signal is first landed (which should point to the C stack?).

@huskar-t
Copy link
Author

huskar-t commented Nov 11, 2022

(gdb) info locals 
No symbol table info available.
(gdb) info f
Stack level 1, frame at 0x7f2a8c2afee8:
 rip = 0x45280d in runtime.raisebadsignal; saved rip 0x3
 called by frame at 0x7f2a8c2afef0, caller of frame at 0x7f2a8c2afed8
 Arglist at 0x7f2a8c2afed8, args: 
 Locals at 0x7f2a8c2afed8, Previous frame's sp is 0x7f2a8c2afee8
 Saved registers:
  rip at 0x7f2a8c2afee0

@cherrymui What should I type in gdb to print the signal context?

@huskar-t
Copy link
Author

huskar-t commented Nov 12, 2022

I did some tests to observe the core dump caused by C crash.

package main

/*
#include <stdio.h>
#include <assert.h>

int DIV(const int i,const int v) {
    return i/v;
}
*/
import "C"
import "fmt"

func main() {
        v := C.DIV(C.int(1), C.int(0))
        fmt.Println(v)
}

some output

fatal error: unexpected signal during runtime execution
[signal SIGFPE: floating-point exception code=0x1 addr=0x482433 pc=0x482433]

runtime stack:
runtime.throw({0x49fea1?, 0x120?})
	/usr/local/go/src/runtime/panic.go:1047 +0x5d fp=0x7ffdf68bb808 sp=0x7ffdf68bb7d8 pc=0x432a9d
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:819 +0x369 fp=0x7ffdf68bb858 sp=0x7ffdf68bb808 pc=0x446c29

gdb output

(gdb) bt
warning: skipping .debug_frame info of /home/xftan/test/test_core: Found an FDE when not expecting it.
#0  0x000000000045fb41 in runtime.raise.abi0 ()
#1  0x0000000000446ce5 in runtime.dieFromSignal ()
#2  0x0000000000000006 in ?? ()
#3  0xffffffffffffffff in ?? ()
#4  0x000000c00000f2a8 in ?? ()
#5  0x000000000044727e in runtime.sigfwdgo ()
#6  0x0000000000000006 in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb) thread apply all bt

Thread 6 (Thread 0x7ff31b7fe700 (LWP 15197)):
#0  0x00000000004600c3 in runtime.futex.abi0 ()
#1  0x000000000042f096 in runtime.futexsleep ()
#2  0x000000000075a518 in runtime.newmHandoff ()
#3  0x0000000000000080 in ?? ()
#4  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7ff321493700 (LWP 15194)):
#0  0x00000000004600c3 in runtime.futex.abi0 ()
#1  0x000000000042f096 in runtime.futexsleep ()
#2  0x000000c000060948 in ?? ()
#3  0x0000000000000080 in ?? ()
#4  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7ff321c94700 (LWP 15193)):
#0  0x00000000004600c3 in runtime.futex.abi0 ()
#1  0x000000000042f10f in runtime.futexsleep ()
#2  0x000000000072c118 in runtime.sched ()
#3  0x0000000000000080 in ?? ()
#4  0x00007ff321c93d78 in ?? ()
#5  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7ff31bfff700 (LWP 15196)):
#0  0x00000000004600c3 in runtime.futex.abi0 ()
#1  0x000000000042f096 in runtime.futexsleep ()
#2  0x000000c000100148 in ?? ()
#3  0x0000000000000080 in ?? ()
#4  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7ff320c92700 (LWP 15195)):
#0  0x00000000004600c3 in runtime.futex.abi0 ()
#1  0x000000000042f096 in runtime.futexsleep ()
#2  0x000000c000060d48 in ?? ()
#3  0x0000000000000080 in ?? ()
---Type <return> to continue, or q <return> to quit---
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7ff349785740 (LWP 15192)):
#0  0x000000000045fb41 in runtime.raise.abi0 ()
#1  0x0000000000446ce5 in runtime.dieFromSignal ()
#2  0x0000000000000006 in ?? ()
#3  0xffffffffffffffff in ?? ()
#4  0x000000c00000f2a8 in ?? ()
#5  0x000000000044727e in runtime.sigfwdgo ()
#6  0x0000000000000006 in ?? ()
#7  0x0000000000000000 in ?? ()

dlv output

(dlv) bt
0  0x000000000045fb41 in runtime.raise
   at /usr/local/go/src/runtime/sys_linux_amd64.s:159
1  0x000000000045c020 in runtime.systemstack_switch
   at /usr/local/go/src/runtime/asm_amd64.s:459
2  0x00000000004045a5 in runtime.cgocall
   at /usr/local/go/src/runtime/cgocall.go:168
3  0x000000000048230c in main._Cfunc_DIV
   at _cgo_gotypes.go:43
4  0x00000000004823c5 in main.main
   at /home/xftan/test/main.go:15
5  0x00000000004352f2 in runtime.main
   at /usr/local/go/src/runtime/proc.go:250
6  0x000000000045e241 in runtime.goexit
   at /usr/local/go/src/runtime/asm_amd64.s:1594
(dlv) threads
* Thread 15192 at 0x45fb41 /usr/local/go/src/runtime/sys_linux_amd64.s:159 runtime.raise
  Thread 15193 at 0x4600c3 /usr/local/go/src/runtime/sys_linux_amd64.s:560 runtime.futex
  Thread 15194 at 0x4600c3 /usr/local/go/src/runtime/sys_linux_amd64.s:560 runtime.futex
  Thread 15195 at 0x4600c3 /usr/local/go/src/runtime/sys_linux_amd64.s:560 runtime.futex
  Thread 15196 at 0x4600c3 /usr/local/go/src/runtime/sys_linux_amd64.s:560 runtime.futex
  Thread 15197 at 0x4600c3 /usr/local/go/src/runtime/sys_linux_amd64.s:560 runtime.futex
(dlv) goroutines
* Goroutine 1 - User: _cgo_gotypes.go:43 main._Cfunc_DIV (0x48230c) (thread 15192) [chan receive]
  Goroutine 2 - User: /usr/local/go/src/runtime/proc.go:364 runtime.gopark (0x4356b6) [force gc (idle)]
  Goroutine 3 - User: /usr/local/go/src/runtime/proc.go:364 runtime.gopark (0x4356b6) [GC sweep wait]
  Goroutine 4 - User: /usr/local/go/src/runtime/proc.go:364 runtime.gopark (0x4356b6) [GC scavenge wait]
  Goroutine 5 - User: /usr/local/go/src/runtime/proc.go:364 runtime.gopark (0x4356b6) [finalizer wait]
[5 goroutines]
package main

/*
#include <stdio.h>
#include <assert.h>

int DIV(const int i,const int v) {
    assert(0);
    return i/v;
}
*/
import "C"
import "fmt"

func main() {
        v := C.DIV(C.int(1), C.int(0))
        fmt.Println(v)
}

output

t: /home/xftan/test/main.go:8: DIV: Assertion `0' failed.
SIGABRT: abort
PC=0x7f90dbfe6387 m=0 sigcode=18446744073709551610
signal arrived during cgo execution

gdb output

(gdb) bt
warning: skipping .debug_frame info of /home/xftan/test/t: Found an FDE when not expecting it.
#0  0x000000000045fb81 in runtime.raise.abi0 ()
#1  0x0000000000446d25 in runtime.dieFromSignal ()
#2  0x0000000000000006 in ?? ()
#3  0xffffffffffffffff in ?? ()
#4  0x000000c00009c428 in ?? ()
#5  0x00000000004472be in runtime.sigfwdgo ()
#6  0x0000000000000006 in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb) thread apply all bt

Thread 6 (Thread 0x7fcb0761d700 (LWP 15913)):
#0  0x000000000045fb1d in runtime.usleep.abi0 ()
#1  0x0000000000000005 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7fcb07e1e700 (LWP 15912)):
#0  0x000000000045fb1d in runtime.usleep.abi0 ()
#1  0x0000000000000005 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7fcb06e1c700 (LWP 15914)):
#0  0x000000000045fb1d in runtime.usleep.abi0 ()
#1  0x0000000000000005 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7fcb30110740 (LWP 15910)):
#0  0x000000000045fb1d in runtime.usleep.abi0 ()
#1  0x0000000000000005 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7fcb0861f700 (LWP 15911)):
#0  0x000000000045fb1d in runtime.usleep.abi0 ()
#1  0x0000000000000005 in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7fcb0661b700 (LWP 15915)):
#0  0x000000000045fb81 in runtime.raise.abi0 ()
#1  0x0000000000446d25 in runtime.dieFromSignal ()
#2  0x0000000000000006 in ?? ()
#3  0xffffffffffffffff in ?? ()
#4  0x000000c00009c428 in ?? ()
#5  0x00000000004472be in runtime.sigfwdgo ()
#6  0x0000000000000006 in ?? ()
#7  0x0000000000000000 in ?? ()

dlv output

(dlv) bt
 0  0x000000000045fb81 in runtime.raise
    at /usr/local/go/src/runtime/sys_linux_amd64.s:159
 1  0x0000000000446d25 in runtime.dieFromSignal
    at /usr/local/go/src/runtime/signal_unix.go:870
 2  0x00000000004472be in runtime.sigfwdgo
    at /usr/local/go/src/runtime/signal_unix.go:1086
 3  0x0000000000445a07 in runtime.sigtrampgo
    at /usr/local/go/src/runtime/signal_unix.go:432
 4  0x000000000045fe66 in runtime.sigtramp
    at /usr/local/go/src/runtime/sys_linux_amd64.s:359
 5  0x00007fcb2fd0e630 in ???
    at ?:-1
 6  0x0000000000446807 in runtime.crash
    at /usr/local/go/src/runtime/signal_unix.go:962
 7  0x0000000000446807 in runtime.sighandler
    at /usr/local/go/src/runtime/signal_unix.go:791
 8  0x0000000000445b94 in runtime.sigtrampgo
    at /usr/local/go/src/runtime/signal_unix.go:479
 9  0x000000000045fe66 in runtime.sigtramp
    at /usr/local/go/src/runtime/sys_linux_amd64.s:359
10  0x00007fcb2fd0e630 in ???
    at ?:-1
11  0x000000000040b6e7 in runtime.notesleep
    at /usr/local/go/src/runtime/lock_futex.go:160
12  0x00000000004393f1 in runtime.templateThread
    at /usr/local/go/src/runtime/proc.go:2225
13  0x0000000000437f93 in runtime.mstart1
    at /usr/local/go/src/runtime/proc.go:1425
14  0x0000000000437ed9 in runtime.mstart0
    at /usr/local/go/src/runtime/proc.go:1382
15  0x000000000045bfc5 in runtime.mstart
15  0x000000000045bfc5 in runtime.mstart
    at /usr/local/go/src/runtime/asm_amd64.s:391
16  0x0000000000482de9 in ???
    at ?:-1

It seems that C crashes are extremely difficult to troubleshoot. Although there is some critical information in the output, it cannot be written to the log file(#42888). This result frustrates me.:disappointed:

@mknyszek
Copy link
Contributor

Could you please clarify what the precise issue here is? Is the issue the inability to debug this because gdb and delve can't interpret the stack, or is it the occasional crashes your program gets (i.e. do you think these crashes might be a Go issue?)?

Both are possible issues, but to move forward on the latter we almost certainly are going to need a reproducer or a core dump to investigate ourselves.

@mknyszek mknyszek added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Nov 16, 2022
@huskar-t
Copy link
Author

@mknyszek I eventually found the cause of the problem from the output, and it was a memory error in C . What I need is to write the crash log to a file(#42888).It's better to close this issue.

@golang golang locked and limited conversation to collaborators Nov 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

4 participants