Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: deadlock in runtime.findrunnable #35532

Closed
vadyus opened this issue Nov 12, 2019 · 16 comments
Closed

runtime: deadlock in runtime.findrunnable #35532

vadyus opened this issue Nov 12, 2019 · 16 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@vadyus
Copy link

vadyus commented Nov 12, 2019

What version of Go are you using (go version)?

$ go version
go version devel +99957b6 Tue Nov 12 05:35:33 2019 +0000 linux/amd64

Does this issue reproduce with the latest release?

No

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/root/sdk/gotip"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/root/sdk/gotip/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build330945311=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Built production app with latest go tip and run it on production traffic. App is highly concurrent and uses timers a lot.

What did you expect to see?

Serving requests as usual

What did you see instead?

Deadlock. App not accepting connections and not responding to os signals. All threads stuck on

dlv
(dlv) threads
* Thread 13055 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13056 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13057 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13058 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13059 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13060 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13061 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13062 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13063 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13072 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13073 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13074 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13075 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13076 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13077 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13078 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13079 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13080 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13081 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13086 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13087 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13088 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13089 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13090 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13091 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13093 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13094 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13095 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13096 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13097 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13098 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13099 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13100 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13101 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13102 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13103 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13104 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13106 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13107 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13108 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13110 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13111 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13112 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13113 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13118 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13122 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13126 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13127 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13128 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13129 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13130 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13131 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13132 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13133 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13134 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13135 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13136 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13137 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13138 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13150 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13151 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13152 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13153 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13154 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13155 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13156 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13157 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13158 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13159 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13160 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13166 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13167 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13168 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13173 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13174 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13223 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13224 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13227 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13228 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13229 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13233 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13292 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13293 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13300 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13305 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13327 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13328 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13514 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13515 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13516 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13517 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13518 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13519 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13520 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
  Thread 13791 at 0x469bb3 /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563 runtime.futex
(dlv) tr 13157
Switched from 13055 to 13157
(dlv) bt
0  0x0000000000469bb3 in runtime.futex
   at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563
1  0x00000000004322d6 in runtime.futexsleep
   at /root/sdk/gotip/src/runtime/os_linux.go:44
2  0x000000000040cdff in runtime.notesleep
   at /root/sdk/gotip/src/runtime/lock_futex.go:151
3  0x000000000043bcf0 in runtime.stopm
   at /root/sdk/gotip/src/runtime/proc.go:1856
4  0x000000000043d2ed in runtime.findrunnable
   at /root/sdk/gotip/src/runtime/proc.go:2388
5  0x000000000043de2c in runtime.schedule
   at /root/sdk/gotip/src/runtime/proc.go:2548
6  0x000000000043e27d in runtime.park_m
   at /root/sdk/gotip/src/runtime/proc.go:2688
7  0x000000000046594b in runtime.mcall
   at /root/sdk/gotip/src/runtime/asm_amd64.s:318

Deadlock happens in random amount of time/served requests. Didn't find any patterns of when it happens. Increasing load (just setting higher weight on load balancer for that backend) helps trigger deadlock faster

@agnivade
Copy link
Contributor

Your go version and go env output does not match. go env says it's Go 1.10 and go version says it's at tip. Which is it ?

Also, does this issue happen in 1.13.x also ?

@vadyus
Copy link
Author

vadyus commented Nov 12, 2019

Binary was built on a build server, go version output is from it. Than it was deployed to production server, go env is from it.

And no, latest release works fine

@agnivade
Copy link
Contributor

Thanks. Your go env output should be from the build server also. Showing the output from the prod server does not serve any purpose.

Copying some runtime people @aclements @mknyszek

@agnivade agnivade changed the title Deadlock in runtime.findrunnable runtime: deadlock in runtime.findrunnable Nov 12, 2019
@agnivade agnivade added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 12, 2019
@vadyus
Copy link
Author

vadyus commented Nov 12, 2019

Thanks. Your go env output should be from the build server also. Showing the output from the prod server does not serve any purpose.

Fixed that, thanks.

@bcmills
Copy link
Contributor

bcmills commented Nov 12, 2019

App is highly concurrent and uses timers a lot.

That suggests a possible connection to the timer refactoring that @ianlancetaylor did this cycle.

@bcmills bcmills added this to the Go1.14 milestone Nov 12, 2019
@bcmills
Copy link
Contributor

bcmills commented Nov 12, 2019

See previously #35375.

@bcmills
Copy link
Contributor

bcmills commented Nov 12, 2019

Commit 99957b6 was just this morning. Was that the same version used to build the deadlocking binary? (Several sources of deadlocks have been fixed over the past week, so the precise version is important.)

@vadyus
Copy link
Author

vadyus commented Nov 12, 2019

Yes, version in first message is precise

@ianlancetaylor
Copy link
Contributor

Is there a way that we can recreate the problem ourselves?

Are you sure that your program doesn't have a deadlock itself, such that there are no runnable goroutines? What leads you to think that this is a runtime problem?

Can you try running the program with GODEBUG=schedtrace=1,scheddetail=1 and show us the output as the program deadlocks? Thanks.

@vadyus
Copy link
Author

vadyus commented Nov 12, 2019

Is there a way that we can recreate the problem ourselves?

Just found it while trying new timers, will try to provide reproducible example. Might not be easy because tests/benchmarks are fine and I didn't find anything that might trigger this deadlock yet.

Are you sure that your program doesn't have a deadlock itself, such that there are no runnable goroutines? What leads you to think that this is a runtime problem?

Pretty much sure, it's an app serving traffic in our advert network (30+ instances, around 20B requests daily), and this issue doesnt reproduce on 1.13.4. The are a lot of background running routines that update data, send metrics, etc, that stopped too.

Can you try running the program with GODEBUG=schedtrace=1,scheddetail=1 and show us the output as the program deadlocks? Thanks.

Will do, post here when it's ready.

@vadyus
Copy link
Author

vadyus commented Nov 13, 2019

No luck of getting deadlock with schedtrace (it slows down app a lot, maybe thats why it's not triggered)

A little more digging into locked app. It still shows a little cpu usage. bt of threads that actually using some cpu:

 0  0x0000000000469bb3 in runtime.futex
    at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563
 1  0x0000000000465960 in runtime.systemstack_switch
    at /root/sdk/gotip/src/runtime/asm_amd64.s:330
 2  0x000000000041cd24 in runtime.gcStart
    at /root/sdk/gotip/src/runtime/mgc.go:1307
 3  0x000000000040e496 in runtime.mallocgc
    at /root/sdk/gotip/src/runtime/malloc.go:1129
 4  0x000000000044d5bc in runtime.makeslice
    at /root/sdk/gotip/src/runtime/slice.go:49
...
 0  0x0000000000469bb3 in runtime.futex
    at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563
 1  0x00000000004322d6 in runtime.futexsleep
    at /root/sdk/gotip/src/runtime/os_linux.go:44
 2  0x000000000040cb7a in runtime.lock
    at /root/sdk/gotip/src/runtime/lock_futex.go:102
 3  0x000000000045530f in runtime.addInitializedTimer
    at /root/sdk/gotip/src/runtime/time.go:344
 4  0x0000000000455b2f in runtime.modtimer
    at /root/sdk/gotip/src/runtime/time.go:614
 5  0x0000000000430db2 in internal/poll.runtime_pollSetDeadline
    at /root/sdk/gotip/src/runtime/netpoll.go:264
 6  0x00000000004d1e79 in internal/poll.setDeadlineImpl
    at /root/sdk/gotip/src/internal/poll/fd_poll_runtime.go:155
 7  0x000000000062d5e9 in net.(*conn).SetReadDeadline
....

Examining more threads shows some other backtraces that hang locked

0  0x0000000000469bb3 in runtime.futex
   at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563
1  0x00000000004322d6 in runtime.futexsleep
   at /root/sdk/gotip/src/runtime/os_linux.go:44
2  0x000000000040cdff in runtime.notesleep
   at /root/sdk/gotip/src/runtime/lock_futex.go:151
3  0x000000000043bcf0 in runtime.stopm
   at /root/sdk/gotip/src/runtime/proc.go:1856
4  0x000000000043c714 in runtime.gcstopm
   at /root/sdk/gotip/src/runtime/proc.go:2056
5  0x000000000043dfc7 in runtime.schedule
   at /root/sdk/gotip/src/runtime/proc.go:2497
6  0x000000000043ec66 in runtime.goexit0
   at /root/sdk/gotip/src/runtime/proc.go:2844
7  0x000000000046594b in runtime.mcall
   at /root/sdk/gotip/src/runtime/asm_amd64.s:318
 0  0x0000000000469bb3 in runtime.futex
    at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563
 1  0x00000000004322d6 in runtime.futexsleep
    at /root/sdk/gotip/src/runtime/os_linux.go:44
 2  0x000000000040cb7a in runtime.lock
    at /root/sdk/gotip/src/runtime/lock_futex.go:102
 3  0x000000000043176a in runtime.netpolldeadlineimpl
    at /root/sdk/gotip/src/runtime/netpoll.go:459
 4  0x0000000000431a0d in runtime.netpollReadDeadline
    at /root/sdk/gotip/src/runtime/netpoll.go:503
 5  0x0000000000456c64 in runtime.runOneTimer
    at /root/sdk/gotip/src/runtime/time.go:1128
 6  0x0000000000456895 in runtime.runtimer
    at /root/sdk/gotip/src/runtime/time.go:1030
 7  0x000000000043e100 in runtime.checkTimers
    at /root/sdk/gotip/src/runtime/proc.go:2626
 8  0x000000000043ca6a in runtime.findrunnable
    at /root/sdk/gotip/src/runtime/proc.go:2200
 9  0x000000000043de2c in runtime.schedule
    at /root/sdk/gotip/src/runtime/proc.go:2548
10  0x000000000043e27d in runtime.park_m
    at /root/sdk/gotip/src/runtime/proc.go:2688
11  0x000000000046594b in runtime.mcall
    at /root/sdk/gotip/src/runtime/asm_amd64.s:318
0  0x0000000000469bb3 in runtime.futex
   at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563
1  0x00000000004322d6 in runtime.futexsleep
   at /root/sdk/gotip/src/runtime/os_linux.go:44
2  0x000000000040cdff in runtime.notesleep
   at /root/sdk/gotip/src/runtime/lock_futex.go:151
3  0x000000000043bcf0 in runtime.stopm
   at /root/sdk/gotip/src/runtime/proc.go:1856
4  0x000000000043c714 in runtime.gcstopm
   at /root/sdk/gotip/src/runtime/proc.go:2056
5  0x000000000043dfc7 in runtime.schedule
   at /root/sdk/gotip/src/runtime/proc.go:2497
6  0x000000000043e476 in runtime.goschedImpl
   at /root/sdk/gotip/src/runtime/proc.go:2703
7  0x000000000043e6c4 in runtime.gopreempt_m
   at /root/sdk/gotip/src/runtime/proc.go:2731
8  0x000000000044f9be in runtime.newstack
   at /root/sdk/gotip/src/runtime/stack.go:1025
9  0x0000000000465aaf in runtime.morestack
   at /root/sdk/gotip/src/runtime/asm_amd64.s:449
0  0x0000000000469bb3 in runtime.futex
   at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563
1  0x00000000004322d6 in runtime.futexsleep
   at /root/sdk/gotip/src/runtime/os_linux.go:44
2  0x000000000040cdff in runtime.notesleep
   at /root/sdk/gotip/src/runtime/lock_futex.go:151
3  0x000000000043bcf0 in runtime.stopm
   at /root/sdk/gotip/src/runtime/proc.go:1856
4  0x000000000043f9c1 in runtime.exitsyscall0
   at /root/sdk/gotip/src/runtime/proc.go:3257
5  0x000000000046594b in runtime.mcall
   at /root/sdk/gotip/src/runtime/asm_amd64.s:318
0  0x0000000000469bb3 in runtime.futex
   at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563
1  0x00000000004322d6 in runtime.futexsleep
   at /root/sdk/gotip/src/runtime/os_linux.go:44
2  0x000000000040cdff in runtime.notesleep
   at /root/sdk/gotip/src/runtime/lock_futex.go:151
3  0x000000000043bc12 in runtime.templateThread
   at /root/sdk/gotip/src/runtime/proc.go:1834
4  0x000000000043a8c3 in runtime.mstart1
   at /root/sdk/gotip/src/runtime/proc.go:1125
5  0x000000000043a7de in runtime.mstart
   at /root/sdk/gotip/src/runtime/proc.go:1072
6  0x00000000004019bc in ???
   at ?:-1
7  0x00007f065da76e88 in ???
   at ?:-1
0  0x0000000000469bb3 in runtime.futex
   at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563
1  0x00000000004322d6 in runtime.futexsleep
   at /root/sdk/gotip/src/runtime/os_linux.go:44
2  0x000000000040ced6 in runtime.notetsleep_internal
   at /root/sdk/gotip/src/runtime/lock_futex.go:174
3  0x000000000040d0dc in runtime.notetsleepg
   at /root/sdk/gotip/src/runtime/lock_futex.go:228
4  0x000000000044d3ac in os/signal.signal_recv
   at /root/sdk/gotip/src/runtime/sigqueue.go:147
5  0x00000000007a62d2 in os/signal.loop
   at /root/sdk/gotip/src/os/signal/signal_unix.go:23
6  0x0000000000467a71 in runtime.goexit
   at /root/sdk/gotip/src/runtime/asm_amd64.s:1375
0  0x0000000000469bb3 in runtime.futex
   at /root/sdk/gotip/src/runtime/sys_linux_amd64.s:563
1  0x00000000004322d6 in runtime.futexsleep
   at /root/sdk/gotip/src/runtime/os_linux.go:44
2  0x000000000040cb7a in runtime.lock
   at /root/sdk/gotip/src/runtime/lock_futex.go:102
3  0x0000000000456fa0 in runtime.timeSleepUntil
   at /root/sdk/gotip/src/runtime/time.go:1255
4  0x0000000000442991 in runtime.sysmon
   at /root/sdk/gotip/src/runtime/proc.go:4478
5  0x000000000043a8c3 in runtime.mstart1
   at /root/sdk/gotip/src/runtime/proc.go:1125
6  0x000000000043a7de in runtime.mstart
   at /root/sdk/gotip/src/runtime/proc.go:1072
7  0x00000000004019bc in ???
   at ?:-1
   error: input/output error
(truncated)

Will try to dig deeper. Tell me if there anything i could try to get closer to problem

@vadyus
Copy link
Author

vadyus commented Nov 13, 2019

upd: running with GODEBUG=asyncpreemptoff=1 has no effect, still deadlock

@ianlancetaylor
Copy link
Contributor

Thanks, I think I see it.

internal/poll.runtime_pollSetDeadline acquires pd.lock and calls resettimer which can acquire timersLock for some P.

checkTimers acquires timersLock for some P and calls runtimer which calls runOneTimer which executes a timer function. The timer function can be netpollReadDeadline which calls netpolldeadlineimpl which acquires pd.lock.

So if your program changes the deadline of some network connection exactly when the previous deadline expires, a deadlock seems possible at least in theory.

Fortunately CL 207348 which I sent earlier today should fix this problem.

@gopherbot
Copy link

Change https://golang.org/cl/207348 mentions this issue: runtime: release timersLock while running timer

@vadyus
Copy link
Author

vadyus commented Nov 18, 2019

Running tip 2bde3c1 with CL 207348 looks fine. No deadlock for 5+ hours (got it triggered in less than an hour without patch)

@ianlancetaylor
Copy link
Contributor

Thanks for testing it.

@golang golang locked and limited conversation to collaborators Nov 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

5 participants