Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: occasional hard lockup / 100% CPU usage in go applications #56424

Open
ssokol opened this issue Oct 25, 2022 · 21 comments
Open

runtime: occasional hard lockup / 100% CPU usage in go applications #56424

ssokol opened this issue Oct 25, 2022 · 21 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@ssokol
Copy link

ssokol commented Oct 25, 2022

What version of Go are you using (go version)?

go version go1.19.1 linux/arm

Does this issue reproduce with the latest release?

Unknown. I've been having this issue since I started working on this project roughly 24 months ago. Same behavior on all go versions.

What operating system and processor architecture are you using (go env)?

Raspberry Pi OS v10.11 on armv7l

go env Output
GO111MODULE=""
GOARCH="arm"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go_path/pkg/mod"
GOOS="linux"
GOPATH="/root/go_path"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/root/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/root/go/pkg/tool/linux_arm"
GOVCS=""
GOVERSION="go1.19.1"
GCCGO="gccgo"
GOARM="6"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -marm -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2944008957=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I'm using go to build a set of services on the Raspberry Pi platform (Pi 4, CM4). Several of the higher duty-cycle services occasionally "run away", consuming upwards of 400% (100% x 4 cores) of available processing power. This usually happens after several hours or even a few days of running. When this happens, all of the goroutines in the application stop running, remaining in "runtime.gopark". (See the stack trace below.)

The program most given to this behavior is a network broadcaster. It subscribes to and receives data from a set of Redis pubsub channels and forwards that data to clients over UDP. The typical throughput is relatively low - only about 34 KB/second. It never gets above 38 KB/sec. It is run at a higher priority level, but it is not run as "realtime".

I've added a watchdog goroutine to try to detect the runway condition and terminate the service, but the watchdog timer never fires.

I tried pulling a profile using pprof but when the problem happens the http listener fails to respond.

I've tried tracing with strace but it doesn't return any data.

I've monitored open files with lsof and it does not appear to be leaking handles - open file count remains constant at 369.

I kicked it around with the Redigo team, and from what we can tell it's not something going on in Redigo - all the pubsub listeners simply get stuck in their waiting state.

What did you expect to see?

The application runs at its normal modest CPU load (5% - 8%)

What did you see instead?

The application sucks down as much CPU power as possible (395%)

Stack Trace

Too long to embed in this issue. Here it is as a gist:

https://gist.github.com/ssokol/b168de8b4546efd9b43a9d6af8538de9

@ssokol ssokol changed the title affected/package: Occasional hard lockup / 100% CPU usage in go applications Oct 25, 2022
@seankhliao seankhliao changed the title Occasional hard lockup / 100% CPU usage in go applications runtime: occasional hard lockup / 100% CPU usage in go applications Oct 25, 2022
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Oct 25, 2022
@seankhliao seankhliao added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 25, 2022
@ianlancetaylor
Copy link
Contributor

CC @golang/runtime

@prattmic
Copy link
Member

prattmic commented Oct 26, 2022

Could you collect a core dump (ulimit -c unlimited and set GOTRACEBACK=crash)?

Then in gdb running thread apply all bt should show where the threads are stuck.

@ssokol
Copy link
Author

ssokol commented Oct 26, 2022

Will do. Just to make sure I'm doing this correctly:

  1. I've updated the systemd service file to export the GOTRACEBACK=crash var
  2. I've updated the systemd service file to set the ulimit to unlimited

I tested out attaching to it from gdb (see below) and the output from the backtrace is minimal and not very useful as far as I can tell. Is there something else I need to do, either in my application or from gdb, to make the backtrace more verbose / useful?

Here's what I got. Note that this is in the normal state, not the runaway state.

root@pi4cm1(rw):/lib/systemd/system# gdb -p 12519
GNU gdb (Raspbian 8.2.1-2) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 12519
[New LWP 12520]
[New LWP 12521]
[New LWP 12522]
[New LWP 12523]
[New LWP 12524]
[New LWP 12525]
[New LWP 12534]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
0x0007f2ec in ?? ()
(gdb) thread apply all bt

Thread 8 (Thread 0xa30ff440 (LWP 12534)):
#0  0x0007f2ec in ?? ()
#1  0x000475b8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 7 (Thread 0xa3afe440 (LWP 12525)):
#0  0x0007f2ec in ?? ()
#1  0x000475b8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 6 (Thread 0xa42ff440 (LWP 12524)):
#0  0x0007f2ec in ?? ()
#1  0x000475b8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 5 (Thread 0xa4efe440 (LWP 12523)):
#0  0x0007f2ec in ?? ()
#1  0x000475b8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 4 (Thread 0xa56ff440 (LWP 12522)):
#0  0x0007f2ec in ?? ()
#1  0x000475b8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 3 (Thread 0xa60ff440 (LWP 12521)):
#0  0x0007f5a4 in ?? ()
#1  0x0004727c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 2 (Thread 0xa6ab9440 (LWP 12520)):
#0  0x0007f2f0 in ?? ()
#1  0x000476d8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (Thread 0xb6f28080 (LWP 12519)):
#0  0x0007f2ec in ?? ()
#1  0x000475b8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Apologies if this is a basic question, but I'm not at all familiar with using gdb on a go app.

@prattmic
Copy link
Member

prattmic commented Oct 26, 2022

Will do. Just to make sure I'm doing this correctly:

  1. I've updated the systemd service file to export the GOTRACEBACK=crash var
  2. I've updated the systemd service file to set the ulimit to unlimited

That sounds right. Now when you send a SIGABRT it should generate a core file (though I'm not sure where systemd will put it). That said, if you can simply attach the PID with gdb (as you did below) when it is stuck that is fine too.

I tested out attaching to it from gdb (see below) and the output from the backtrace is minimal and not very useful as far as I can tell. Is there something else I need to do, either in my application or from gdb, to make the backtrace more verbose / useful?

This is odd, it should show the symbols there. This isn't a stripped binary, is it (e.g., go build -ldflags="-s -w")?

You can at least double check that these addresses are in the range of the binary (see info proc mappings), and perhaps find the addresses in objdump -d <binary>.

@ssokol
Copy link
Author

ssokol commented Oct 26, 2022

Ah, it was a stripped binary. Rebuilt and running now with all the debugging symbols.

Here's the output from another test (again in the nominal not runaway state). Looks much better:

root@pi4cm1(rw):~/flightview/fvUnisocket# gdb -p 20193
GNU gdb (Raspbian 8.2.1-2) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 20193
[New LWP 20194]
[New LWP 20195]
[New LWP 20196]
[New LWP 20197]
[New LWP 20198]
[New LWP 20199]
[New LWP 20206]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
runtime.futex () at /root/go/src/runtime/sys_linux_arm.s:466
466		SWI	$0
Loading Go Runtime support.
(gdb) thread apply all bt

Thread 8 (Thread 0xa2eff440 (LWP 20206)):
#0  runtime.futex () at /root/go/src/runtime/sys_linux_arm.s:466
#1  0x000475b8 in runtime.futexsleep (addr=0x2102790, val=0, ns=<error reading variable: access outside bounds of object referenced via synthetic pointer>) at /root/go/src/runtime/os_linux.go:69
#2  0x0001c220 in runtime.notesleep (n=0x2102790) at /root/go/src/runtime/lock_futex.go:160
#3  0x00052f04 in runtime.mPark () at /root/go/src/runtime/proc.go:2239
#4  runtime.stopm () at /root/go/src/runtime/proc.go:2239
#5  0x00054704 in runtime.findRunnable (gp=<optimized out>, inheritTime=<optimized out>, tryWakeP=<optimized out>) at /root/go/src/runtime/proc.go:2866
#6  0x00055cd8 in runtime.schedule () at /root/go/src/runtime/proc.go:3206
#7  0x000564bc in runtime.park_m (gp=0x2002c30) at /root/go/src/runtime/proc.go:3355
#8  0x0007c5cc in runtime.mcall () at /root/go/src/runtime/asm_arm.s:265
#9  0x00000004 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 7 (Thread 0xa38ff440 (LWP 20199)):
#0  runtime.futex () at /root/go/src/runtime/sys_linux_arm.s:466
#1  0x000475b8 in runtime.futexsleep (addr=0x2102550, val=0, ns=<error reading variable: access outside bounds of object referenced via synthetic pointer>) at /root/go/src/runtime/os_linux.go:69
#2  0x0001c220 in runtime.notesleep (n=0x2102550) at /root/go/src/runtime/lock_futex.go:160
#3  0x00052f04 in runtime.mPark () at /root/go/src/runtime/proc.go:2239
#4  runtime.stopm () at /root/go/src/runtime/proc.go:2239
#5  0x00054704 in runtime.findRunnable (gp=<optimized out>, inheritTime=<optimized out>, tryWakeP=<optimized out>) at /root/go/src/runtime/proc.go:2866
#6  0x00055cd8 in runtime.schedule () at /root/go/src/runtime/proc.go:3206
#7  0x000564bc in runtime.park_m (gp=0x21043c0) at /root/go/src/runtime/proc.go:3355
#8  0x0007c5cc in runtime.mcall () at /root/go/src/runtime/asm_arm.s:265
#9  0x003e1690 in runtime.semtable ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 6 (Thread 0xa42ff440 (LWP 20198)):
#0  runtime.futex () at /root/go/src/runtime/sys_linux_arm.s:466
#1  0x000475b8 in runtime.futexsleep (addr=0x3ee154 <runtime.newmHandoff+12>, val=0, ns=<error reading variable: access outside bounds of object referenced via synthetic pointer>)
    at /root/go/src/runtime/os_linux.go:69
#2  0x0001c220 in runtime.notesleep (n=0x3ee154 <runtime.newmHandoff+12>) at /root/go/src/runtime/lock_futex.go:160
#3  0x00052dc0 in runtime.templateThread () at /root/go/src/runtime/proc.go:2217
#4  0x00051648 in runtime.mstart1 () at /root/go/src/runtime/proc.go:1425
#5  0x000515b0 in runtime.mstart0 () at /root/go/src/runtime/proc.go:1382
#6  0x0007c56c in runtime.mstart () at /root/go/src/runtime/asm_arm.s:210

Thread 5 (Thread 0xa4cff440 (LWP 20197)):
#0  runtime.epollwait () at /root/go/src/runtime/sys_linux_arm.s:699
#1  0x0004727c in runtime.netpoll (delay=<error reading variable: access outside bounds of object referenced via synthetic pointer>, ~r0=...) at /root/go/src/runtime/netpoll_epoll.go:126
#2  0x000545ec in runtime.findRunnable (gp=<optimized out>, inheritTime=<optimized out>, tryWakeP=<optimized out>) at /root/go/src/runtime/proc.go:2829
#3  0x00055cd8 in runtime.schedule () at /root/go/src/runtime/proc.go:3206
#4  0x000564bc in runtime.park_m (gp=0x21043c0) at /root/go/src/runtime/proc.go:3355
#5  0x0007c5cc in runtime.mcall () at /root/go/src/runtime/asm_arm.s:265
#6  0x00000004 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 4 (Thread 0xa56ff440 (LWP 20196)):
#0  runtime.futex () at /root/go/src/runtime/sys_linux_arm.s:466
#1  0x000475b8 in runtime.futexsleep (addr=0x202c790, val=0, ns=<error reading variable: access outside bounds of object referenced via synthetic pointer>) at /root/go/src/runtime/os_linux.go:69
#2  0x0001c220 in runtime.notesleep (n=0x202c790) at /root/go/src/runtime/lock_futex.go:160
#3  0x00052f04 in runtime.mPark () at /root/go/src/runtime/proc.go:2239
#4  runtime.stopm () at /root/go/src/runtime/proc.go:2239
#5  0x00054704 in runtime.findRunnable (gp=<optimized out>, inheritTime=<optimized out>, tryWakeP=<optimized out>) at /root/go/src/runtime/proc.go:2866
#6  0x00055cd8 in runtime.schedule () at /root/go/src/runtime/proc.go:3206
#7  0x00056df4 in runtime.goexit0 (gp=0x2105e00) at /root/go/src/runtime/proc.go:3532
#8  0x0007c5cc in runtime.mcall () at /root/go/src/runtime/asm_arm.s:265
#9  0x00000004 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 3 (Thread 0xa60ff440 (LWP 20195)):
--Type <RET> for more, q to quit, c to continue without paging--
#0  runtime.futex () at /root/go/src/runtime/sys_linux_arm.s:466
#1  0x000475b8 in runtime.futexsleep (addr=0x202c550, val=0, ns=<error reading variable: access outside bounds of object referenced via synthetic pointer>) at /root/go/src/runtime/os_linux.go:69
#2  0x0001c220 in runtime.notesleep (n=0x202c550) at /root/go/src/runtime/lock_futex.go:160
#3  0x00052f04 in runtime.mPark () at /root/go/src/runtime/proc.go:2239
#4  runtime.stopm () at /root/go/src/runtime/proc.go:2239
#5  0x00054704 in runtime.findRunnable (gp=<optimized out>, inheritTime=<optimized out>, tryWakeP=<optimized out>) at /root/go/src/runtime/proc.go:2866
#6  0x00055cd8 in runtime.schedule () at /root/go/src/runtime/proc.go:3206
#7  0x000564bc in runtime.park_m (gp=0x2002c30) at /root/go/src/runtime/proc.go:3355
#8  0x0007c5cc in runtime.mcall () at /root/go/src/runtime/asm_arm.s:265
#9  0x00000004 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 2 (Thread 0xa6afa440 (LWP 20194)):
#0  runtime.usleep () at /root/go/src/runtime/sys_linux_arm.s:627
#1  0x0005aef8 in runtime.sysmon () at /root/go/src/runtime/proc.go:5145
#2  0x00051648 in runtime.mstart1 () at /root/go/src/runtime/proc.go:1425
#3  0x000515b0 in runtime.mstart0 () at /root/go/src/runtime/proc.go:1382
#4  0x0007c56c in runtime.mstart () at /root/go/src/runtime/asm_arm.s:210

Thread 1 (Thread 0xb6f69080 (LWP 20193)):
#0  runtime.futex () at /root/go/src/runtime/sys_linux_arm.s:466
#1  0x000475b8 in runtime.futexsleep (addr=0x3dc9b0 <runtime.m0+208>, val=0, ns=<error reading variable: access outside bounds of object referenced via synthetic pointer>) at /root/go/src/runtime/os_linux.go:69
#2  0x0001c220 in runtime.notesleep (n=0x3dc9b0 <runtime.m0+208>) at /root/go/src/runtime/lock_futex.go:160
#3  0x00052f04 in runtime.mPark () at /root/go/src/runtime/proc.go:2239
#4  runtime.stopm () at /root/go/src/runtime/proc.go:2239
#5  0x00054704 in runtime.findRunnable (gp=<optimized out>, inheritTime=<optimized out>, tryWakeP=<optimized out>) at /root/go/src/runtime/proc.go:2866
#6  0x00055cd8 in runtime.schedule () at /root/go/src/runtime/proc.go:3206
#7  0x000564bc in runtime.park_m (gp=0x2002c30) at /root/go/src/runtime/proc.go:3355
#8  0x0007c5cc in runtime.mcall () at /root/go/src/runtime/asm_arm.s:265
#9  0x00000004 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

@prattmic
Copy link
Member

Yup, that looks correct!

@mknyszek mknyszek added this to the Backlog milestone Oct 26, 2022
@thanm thanm added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Oct 26, 2022
@ssokol
Copy link
Author

ssokol commented Oct 29, 2022

It took several days, but the runaway happened again. I tried to attach to the process from gdb without success - gdb simply hung "Attaching to process 2709" and never gave me the opportunity to request the backtrace.

I also tried sending SIGABRT (kill -6 2709) with no response. Process continued running at 400%. SIGKILL killed it but there was no core file produced.

Any additional suggestions?

@ssokol
Copy link
Author

ssokol commented Oct 31, 2022

Tried again today with gcore. Similar failure:

root@pi4cm1(rw):/# gcore -o ~/lockup.core 22022
Unable to attach: program exited with code 1.
You can't do that without a process to debug.
The program is not being run.
gcore: failed to create /root/lockup.core.22022

Attempting to attach somehow killed the process. What on earth is up with this?

@prattmic
Copy link
Member

gdb being unable to attach makes this sound like an OS bug, Go shouldn't do anything that prevents attaching.

You could try using the perf tool to collect a system-wide profile and see where all that CPU time is going.

$ sudo perf record -a -g sleep 5
$ sudo perf report

@ssokol
Copy link
Author

ssokol commented Oct 31, 2022

Ok, here's what perf shows:

root@pi4cm1(rw):/# perf record -a -g sleep 5
Reading /proc/4199/task/4199/maps time out. You may want to increase the time limit by --proc-map-timeout
[ perf record: Woken up 8 times to write data ]
Warning:
1 map information files for pre-existing threads were
not processed, if there are samples for addresses they
will not be resolved, you may find out which are these
threads by running with -v and redirecting the output
to a file.
The time limit to process proc map is too short?
Increase it by --proc-map-timeout
[ perf record: Captured and wrote 6.510 MB perf.data (80088 samples) ]



   ┌─Warning:─────────────────────────────────────────────┐
   │1 map information files for pre-existing threads were │
   │not processed, if there are samples for addresses they│
   │will not be resolved, you may find out which are these│
   │threads by running with -v and redirecting the output │
   │to a file.                                            │
   │The time limit to process proc map is too short?      │
   │Increase it by --proc-map-timeout                     │
   │                                                      │
   │                                                      │
   │Press any key...                                      │
   └──────────────────────────────────────────────────────┘



Samples: 80K of event 'cpu-clock:pppH', Event count (approx.): 20022000000
  Children      Self  Command          Shared Object             Symbol
+   41.96%     0.00%  fvUnisocket      [vectors]                 [.] 0xffff0fc0
+   41.65%    41.65%  fvUnisocket      [vectors]                 [.] 0x00000fc0
+   30.14%    29.94%  fvUnisocket      fvUnisocket               [.] kernelcas
+    5.05%     0.00%  fvUnisocket      [vectors]                 [.] 0xffff0fa0
+    5.04%     5.01%  fvUnisocket      fvUnisocket               [.] runtime/internal/atomic.Cas
+    5.01%     5.01%  fvUnisocket      [vectors]                 [.] 0x00000fa0
+    4.56%     4.53%  fvUnisocket      fvUnisocket               [.] runtime/internal/atomic.goLoad64
+    2.97%     0.00%  fvUnisocket      [vectors]                 [.] 0xffff0fd8
+    2.95%     2.95%  fvUnisocket      [vectors]                 [.] 0x00000fd8
+    2.52%     0.00%  fvUnisocket      [vectors]                 [.] 0xffff0fcc
+    2.50%     2.50%  fvUnisocket      [vectors]                 [.] 0x00000fcc
+    2.26%     0.00%  swapper          [kernel.kallsyms]         [k] cpu_startup_entry
+    2.25%     0.00%  swapper          [kernel.kallsyms]         [k] do_idle
+    2.12%     0.00%  swapper          [kernel.kallsyms]         [k] default_idle_call
+    2.09%     2.09%  swapper          [kernel.kallsyms]         [k] arch_cpu_idle
+    1.74%     0.00%  swapper          [unknown]                 [k] 0x002018ac
+    1.74%     0.00%  swapper          [kernel.kallsyms]         [k] secondary_start_kernel
+    1.48%     1.47%  fvUnisocket      fvUnisocket               [.] cas
+    1.19%     1.19%  fvUnisocket      fvUnisocket               [.] runtime/internal/atomic.goStore64
     0.96%     0.95%  dump1090         dump1090                  [.] convert_uc8_nodc_nopower
+    0.65%     0.00%  fvUnisocket      [kernel.kallsyms]         [k] __irq_usr
+    0.65%     0.00%  fvUnisocket      [kernel.kallsyms]         [k] gic_handle_irq
+    0.65%     0.00%  fvUnisocket      [kernel.kallsyms]         [k] __handle_domain_irq
+    0.65%     0.00%  fvUnisocket      [kernel.kallsyms]         [k] irq_exit
     0.65%     0.17%  fvUnisocket      [kernel.kallsyms]         [k] __softirqentry_text_start
+    0.52%     0.00%  swapper          [kernel.kallsyms]         [k] start_kernel
+    0.52%     0.00%  swapper          [kernel.kallsyms]         [k] arch_call_rest_init
+    0.52%     0.00%  swapper          [kernel.kallsyms]         [k] __noinstr_text_end
     0.38%     0.02%  fvUnisocket      [kernel.kallsyms]         [k] tasklet_action_common.constprop.4
     0.38%     0.00%  fvUnisocket      [kernel.kallsyms]         [k] tasklet_hi_action
     0.36%     0.05%  fvUnisocket      [kernel.kallsyms]         [k] usb_giveback_urb_bh
     0.31%     0.00%  fvUnisocket      [kernel.kallsyms]         [k] __usb_hcd_giveback_urb
     0.29%     0.00%  fvUnisocket      [uvcvideo]                [k] uvc_video_complete
     0.26%     0.00%  swapper          [ipv6].rodata.str1.4      [k] .LC0
     0.24%     0.00%  swapper          [ipv6]_ftrace_events      [k] __event_fib6_table_lookup+0x0
     0.19%     0.00%  perf_5.10        [kernel.kallsyms]         [k] __ret_fast_syscall
     0.18%     0.00%  perf_5.10        [unknown]                 [k] 00000000
     0.18%     0.00%  perf_5.10        libpthread-2.28.so        [.] __libc_write
     0.18%     0.00%  perf_5.10        [kernel.kallsyms]         [k] __se_sys_write
     0.18%     0.00%  perf_5.10        [kernel.kallsyms]         [k] ksys_write
     0.17%     0.00%  perf_5.10        [kernel.kallsyms]         [k] vfs_write
     0.17%     0.00%  perf_5.10        [kernel.kallsyms]         [k] ext4_file_write_iter
     0.17%     0.00%  perf_5.10        [kernel.kallsyms]         [k] ext4_buffered_write_iter
     0.17%     0.00%  perf_5.10        [kernel.kallsyms]         [k] generic_perform_write
     0.17%     0.00%  fvUnisocket      [uvcvideo]                [k] uvc_video_decode_isoc
     0.17%     0.17%  fvUnisocket      [uvcvideo]                [k] uvc_video_decode_start
     0.15%     0.15%  fvUnisocket      [kernel.kallsyms]         [k] _raw_spin_unlock_irqrestore
     0.14%     0.00%  dump1090         [unknown]                 [.] 0x807f7e7f
     0.12%     0.00%  redis-server     [kernel.kallsyms]         [k] __ret_fast_syscall
     0.12%     0.00%  dump1090         [unknown]                 [.] 0x80808180
     0.11%     0.00%  dump1090         [unknown]                 [.] 0x81807e7f
     0.11%     0.00%  dump1090         [unknown]                 [.] 0x7e7f8080
     0.11%     0.00%  dump1090         [unknown]                 [.] 0x807e817e
     0.11%     0.00%  dump1090         [unknown]                 [.] 0x80817f80
     0.10%     0.00%  dump1090         [unknown]                 [.] 0x807e7f7f
     0.10%     0.00%  fvLogger         [unknown]                 [k] 00000000
     0.10%     0.00%  redis-server     [unknown]                 [.] 0x00000011
     0.09%     0.09%  dump1090         dump1090                  [.] demodulate2400
     0.09%     0.00%  fvLogger         [kernel.kallsyms]         [k] __ret_fast_syscall
     0.09%     0.00%  dump1090         [unknown]                 [.] 0x00041002
     0.08%     0.01%  fvUnisocket      [kernel.kallsyms]         [k] usb_submit_urb
     0.08%     0.00%  fvLogger         fvLogger                  [.] 0x000804c4
     0.08%     0.00%  fvHealth         [kernel.kallsyms]         [k] __ret_fast_syscall
     0.08%     0.01%  fvUnisocket      [kernel.kallsyms]         [k] usb_hcd_submit_urb
     0.07%     0.00%  swapper          [kernel.kallsyms]         [k] arch_cpu_idle_exit
     0.07%     0.01%  swapper          [kernel.kallsyms]         [k] ledtrig_cpu
     0.07%     0.01%  fvUnisocket      [dwc2]                    [k] _dwc2_hcd_urb_enqueue
     0.07%     0.00%  swapper          [kernel.kallsyms]         [k] led_trigger_event
     0.06%     0.06%  swapper          [kernel.kallsyms]         [k] _raw_read_unlock_irqrestore

I've no idea what all that means. Any thoughts?

@ssokol
Copy link
Author

ssokol commented Oct 31, 2022

Drilling down into the top item in that list:

Samples: 80K of event 'cpu-clock:pppH', Event count (approx.): 20022000000, Thread: fvUnisocket
  Children      Self  Command      Shared Object      Symbol
-   41.96%     0.00%  fvUnisocket  [vectors]          [.] 0xffff0fc0                                                                                                                                     ◆
     0xffff0fc0                                                                                                                                                                                          ▒
-   41.65%    41.65%  fvUnisocket  [vectors]          [.] 00000fc0                                                                                                                                       ▒
     0xffff0fc0                                                                                                                                                                                          ▒
-    2.52%     0.00%  fvUnisocket  [vectors]          [.] 0xffff0fcc                                                                                                                                     ▒
     0xffff0fcc                                                                                                                                                                                          ▒
-    2.50%     2.50%  fvUnisocket  [vectors]          [.] 0x00000fcc                                                                                                                                     ▒
     0xffff0fcc                                                                                                                                                                                          ▒
-    1.48%     1.47%  fvUnisocket  fvUnisocket        [.] cas                                                                                                                                            ▒
     cas                                                                                                                                                                                                 ▒
-    0.65%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] __irq_usr                                                                                                                                      ▒
     __irq_usr                                                                                                                                                                                           ▒
     gic_handle_irq                                                                                                                                                                                      ▒
     __handle_domain_irq                                                                                                                                                                                 ▒
   - irq_exit                                                                                                                                                                                            ▒
        0.65% __softirqentry_text_start                                                                                                                                                                  ▒
-    0.65%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] gic_handle_irq                                                                                                                                 ▒
     gic_handle_irq                                                                                                                                                                                      ▒
     __handle_domain_irq                                                                                                                                                                                 ▒
   - irq_exit                                                                                                                                                                                            ▒
        0.65% __softirqentry_text_start                                                                                                                                                                  ▒
-    0.65%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] __handle_domain_irq                                                                                                                            ▒
     __handle_domain_irq                                                                                                                                                                                 ▒
   - irq_exit                                                                                                                                                                                            ▒
        0.65% __softirqentry_text_start                                                                                                                                                                  ▒
-    0.65%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] irq_exit                                                                                                                                       ▒
   - irq_exit                                                                                                                                                                                            ▒
        0.65% __softirqentry_text_start                                                                                                                                                                  ▒
     0.65%     0.17%  fvUnisocket  [kernel.kallsyms]  [k] __softirqentry_text_start                                                                                                                      ▒
     0.15%     0.15%  fvUnisocket  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore                                                                                                                    ▒
     0.06%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] run_rebalance_domains                                                                                                                          ▒
     0.05%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] update_blocked_averages                                                                                                                        ▒
     0.04%     0.03%  fvUnisocket  [kernel.kallsyms]  [k] run_timer_softirq

@prattmic
Copy link
Member

Interesting, it seems to be spinning on some atomics.

Could you expand the runtime/internal/atomic.Cas and runtime/internal/atomic.goLoad64 nodes to see what is calling them?

@ssokol
Copy link
Author

ssokol commented Oct 31, 2022

I think so - not at all familiar with perf, but I'll try... Here you go:

Samples: 80K of event 'cpu-clock:pppH', Event count (approx.): 20022000000, Thread: fvUnisocket
  Children      Self  Command      Shared Object      Symbol
+    5.04%     5.01%  fvUnisocket  fvUnisocket        [.] runtime/internal/atomic.Cas                                                                                                                    ◆
+    4.56%     4.53%  fvUnisocket  fvUnisocket        [.] runtime/internal/atomic.goLoad64                                                                                                               ▒
+    2.97%     0.00%  fvUnisocket  [vectors]          [.] 0xffff0fd8                                                                                                                                     ▒
+    2.95%     2.95%  fvUnisocket  [vectors]          [.] 0x00000fd8                                                                                                                                     ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] packet_rcv                                                                                                                                     ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] call_timer_fn                                                                                                                                  ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] tcp_write_timer                                                                                                                                ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] tcp_write_timer_handler                                                                                                                        ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] tcp_send_loss_probe                                                                                                                            ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] __tcp_retransmit_skb                                                                                                                           ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] __tcp_transmit_skb                                                                                                                             ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] ip_queue_xmit                                                                                                                                  ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] __ip_queue_xmit                                                                                                                                ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] ip_local_out                                                                                                                                   ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] ip_output                                                                                                                                      ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] ip_finish_output                                                                                                                               ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] __ip_finish_output                                                                                                                             ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] ip_finish_output2                                                                                                                              ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] dev_queue_xmit                                                                                                                                 ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] __dev_queue_xmit                                                                                                                               ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] __qdisc_run                                                                                                                                    ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] dev_hard_start_xmit                                                                                                                            ▒
     0.00%     0.00%  fvUnisocket  [kernel.kallsyms]  [k] sch_direct_xmit

If that's not what you need, please let me know and I'll try again. Specific instructions would be great!

@ssokol
Copy link
Author

ssokol commented Oct 31, 2022

I can also upload the perf.data file if that would help.

@prattmic
Copy link
Member

If that's not what you need, please let me know and I'll try again. Specific instructions would be great!

Yeah, it doesn't look quite right. The perf UI is honestly pretty confusing w.r.t. expanding entries.

Try this:

$ sudo perf report --stdio --comm fvUnisocket --percent-limit 0.5

--stdio will have it dump the entire expanded call graph directly to stdout rather than opening the UI. The --comm and --percent-limit options are just to try to limit results to the top results in fvUnisocket, so you can just copy the entire output to an attachment.

If for some reason that doesn't match results, just remove the --comm and --percent-limit options and copy the top of the output (everything before cpu_startup_entry).

@ssokol
Copy link
Author

ssokol commented Oct 31, 2022

Here's what that outputs:

# To display the perf.data header info, please use --header/--header-only options.
#
# comm: fvUnisocket
#
# Total Lost Samples: 0
#
# Samples: 80K of event 'cpu-clock:pppH'
# Event count (approx.): 20022000000
#
# Children      Self  Shared Object      Symbol                               
# ........  ........  .................  .....................................
#
    41.96%     0.00%  [vectors]          [.] 0xffff0fc0
            |
            ---0xffff0fc0

    41.65%    41.65%  [vectors]          [.] 0x00000fc0
            |
            ---0xffff0fc0

    30.14%    29.94%  fvUnisocket        [.] kernelcas
            |
            ---kernelcas

     5.05%     0.00%  [vectors]          [.] 0xffff0fa0
            |
            ---0xffff0fa0

     5.04%     5.01%  fvUnisocket        [.] runtime/internal/atomic.Cas
            |
            ---runtime/internal/atomic.Cas

     5.01%     5.01%  [vectors]          [.] 0x00000fa0
            |
            ---0xffff0fa0

     4.56%     4.53%  fvUnisocket        [.] runtime/internal/atomic.goLoad64
            |
            ---runtime/internal/atomic.goLoad64

     2.97%     0.00%  [vectors]          [.] 0xffff0fd8
            |
            ---0xffff0fd8

     2.95%     2.95%  [vectors]          [.] 0x00000fd8
            |
            ---0xffff0fd8

     2.52%     0.00%  [vectors]          [.] 0xffff0fcc
            |
            ---0xffff0fcc

     2.50%     2.50%  [vectors]          [.] 0x00000fcc
            |
            ---0xffff0fcc

     1.48%     1.47%  fvUnisocket        [.] cas
            |
            ---cas

     1.19%     1.19%  fvUnisocket        [.] runtime/internal/atomic.goStore64
            |
            ---runtime/internal/atomic.goStore64

     0.65%     0.00%  [kernel.kallsyms]  [k] __irq_usr
            |
            ---__irq_usr
               gic_handle_irq
               __handle_domain_irq
               irq_exit
               |          
                --0.65%--__softirqentry_text_start

     0.65%     0.00%  [kernel.kallsyms]  [k] gic_handle_irq
            |
            ---gic_handle_irq
               __handle_domain_irq
               irq_exit
               |          
                --0.65%--__softirqentry_text_start

     0.65%     0.00%  [kernel.kallsyms]  [k] __handle_domain_irq
            |
            ---__handle_domain_irq
               irq_exit
               |          
                --0.65%--__softirqentry_text_start

     0.65%     0.00%  [kernel.kallsyms]  [k] irq_exit
            |
            ---irq_exit
               |          
                --0.65%--__softirqentry_text_start

     0.65%     0.17%  [kernel.kallsyms]  [k] __softirqentry_text_start


# Samples: 0  of event 'dummy:HG'
# Event count (approx.): 0
#
# Children      Self  Shared Object  Symbol
# ........  ........  .............  ......
#


#
# (Tip: To show assembler sample contexts use perf record -b / perf script -F +brstackinsn --xed)
#

@prattmic
Copy link
Member

I can also upload the perf.data file if that would help.

Unfortunately the perf.data file doesn't include the symbol information (perf report finds the right binaries and reads them), so it is difficult to send them around. Sorry for all the questions.

Here's what that outputs:

Hrm, it seems perf didn't find/record the callers for some reason.

Could you try a few things?

  1. Attach the output of sudo perf report --stdio --dso fvUnisocket. Perhaps we can see something further down the list than 0.5% that looks like a caller.

  2. Rebuild your application with GOARM=7 (i.e., GOARM=7 go build). The presence of these symbols indicates a GOARM<7, even though your CPU is ARMv7. In theory this shouldn't make a difference, but it would be interesting to know if this makes the problem go away.

  3. If/when the program hangs again, try SIGABRT / gdb again, perhaps those aren't unconditionally broken.

@ssokol
Copy link
Author

ssokol commented Oct 31, 2022

Not a problem. Just want to get to the bottom of this if possible.

  1. Here you go:
root@pi4cm1(rw):/# perf report --stdio --dso fvUnisocket
Warning:
1 map information files for pre-existing threads were
not processed, if there are samples for addresses they
will not be resolved, you may find out which are these
threads by running with -v and redirecting the output
to a file.
The time limit to process proc map is too short?
Increase it by --proc-map-timeout
# To display the perf.data header info, please use --header/--header-only options.
#
# dso: fvUnisocket
#
# Total Lost Samples: 0
#
# Samples: 80K of event 'cpu-clock:pppH'
# Event count (approx.): 20022000000
#
# Children      Self  Command      Symbol
# ........  ........  ...........  .....................................
#
     5.04%     5.01%  fvUnisocket  [.] runtime/internal/atomic.Cas
            |
            ---runtime/internal/atomic.Cas

     4.56%     4.53%  fvUnisocket  [.] runtime/internal/atomic.goLoad64
            |
            ---runtime/internal/atomic.goLoad64

     1.48%     1.47%  fvUnisocket  [.] cas
            |
            ---cas

     1.19%     1.19%  fvUnisocket  [.] runtime/internal/atomic.goStore64
            |
            ---runtime/internal/atomic.goStore64



# Samples: 0  of event 'dummy:HG'
# Event count (approx.): 0
#
# Children      Self  Command  Symbol
# ........  ........  .......  ......
#


#
# (Tip: List events using substring match: perf list <keyword>)
#
  1. Will do.
  2. Will do.

@ssokol
Copy link
Author

ssokol commented Nov 7, 2022

I rebuilt with "GOARM=7" and that seems to have resolved the issue. I've run the rebuilt version for the past week without a lockup / runaway. The only problem is, I'm not sure if this truly fixed the problem, or simply made it much less likely to happen. The theory from a friend who's much better versed in the internals of Linux:

Here is my speculation. There is a latent locking issue somewhere in the runtime and the much slower workaround implementation that goes through kuser helper makes the problem more statistically likely because the probability of contention is higher. The native version will make the lock action much faster and reduce the probability but if you want to be a gracious go citizen get a stack dump. Likely if you change to v7 it may go away but you'll never be sure :)

Unfortunately, I was never able to get a core dump from the process. His theory on that:

The reason for sigabort and friends not working is that the user helper may be considered as a syscall or being in the kernel so it will abort as soon as it leaves that call but if it spins forever....

Not sure if this should be closed or left open for someone with better tools for debugging to take a look. At the very least it might be good to note this compatibility step (the "GOARM=7" bit) somewhere.

@prattmic
Copy link
Member

prattmic commented Nov 7, 2022

The theory from a friend who's much better versed in the internals of Linux:

That is certainly possible. It is also possible that there is a bug in our GOARM<7 locking path which uses the kuser mappings, or in the kuser code itself. What kernel version is this system using (uname -a)?

@ssokol
Copy link
Author

ssokol commented Nov 7, 2022

root@pi4cm1(rw):~# uname -a
Linux pi4cm1 5.10.63-v7l+ #1457 SMP Tue Sep 28 11:26:14 BST 2021 armv7l GNU/Linux

@joedian joedian removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

8 participants