-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: unexpected FD destruction #45236
Comments
On my M1 Mac I got this test to pass: $ go test -v -run=Watchdog/default -count=1 ./wgengine
go: downloading github.com/tailscale/wireguard-go v0.0.0-20210324165952-2963b66bc23a
go: downloading golang.org/x/crypto v0.0.0-20210317152858-513c2a44f670
go: downloading golang.org/x/net v0.0.0-20210226172049-e18ecbb05110
go: downloading golang.org/x/sys v0.0.0-20210317225723-c4fcb01b228e
go: downloading golang.org/x/time v0.0.0-20210220033141-f8bda1e9f3ba
go: downloading golang.org/x/sync v0.0.0-20210220032951-036812b2e83c
=== RUN TestWatchdog
=== PAUSE TestWatchdog
=== CONT TestWatchdog
=== RUN TestWatchdog/default_watchdog_does_not_fire
=== PAUSE TestWatchdog/default_watchdog_does_not_fire
=== CONT TestWatchdog/default_watchdog_does_not_fire
userspace.go:178: Starting userspace wireguard engine (with fake TUN device)
userspace.go:255: link state: interfaces.State{defaultRoute=en0 ifs={en0:[192.168.0.197/24]} v4=true v6global=false}
userspace.go:366: Creating wireguard device...
userspace.go:372: Creating router...
userspace.go:399: Bringing wireguard device up...
wglog.go:51: [v2] UDP bind has been updated
wglog.go:51: [v2] Interface state was Down, requested Up, now Up
userspace.go:401: Bringing router up...
router_fake.go:24: [v1] warning: fakeRouter.Up: not implemented.
userspace.go:409: Clearing router settings...
router_fake.go:29: [v1] warning: fakeRouter.Set: not implemented.
userspace.go:413: Starting link monitor...
userspace.go:415: Starting magicsock...
userspace.go:418: Starting resolver...
userspace.go:422: Engine created.
magicsock.go:584: [v1] magicsock: starting endpoint update (initial)
magicsock.go:1007: portmapper: write udp4 0.0.0.0:55111->192.168.0.1:5351: use of closed network connection
magicsock.go:623: [v1] magicsock: ignoring pre-DERP map, STUN-less endpoint update: [192.168.0.197:53375]
router_fake.go:34: [v1] warning: fakeRouter.Close: not implemented.
wglog.go:51: [v2] Device closing
wglog.go:51: [v2] Device closed
--- PASS: TestWatchdog (0.00s)
--- PASS: TestWatchdog/default_watchdog_does_not_fire (0.00s)
PASS
ok tailscale.com/wgengine 0.010s Then I installed Little Snitch and activated its system extension that includes a content filter and it failed: $ go test -v -run=Watchdog/default -count=1 ./wgengine
=== RUN TestWatchdog
=== PAUSE TestWatchdog
=== CONT TestWatchdog
=== RUN TestWatchdog/default_watchdog_does_not_fire
=== PAUSE TestWatchdog/default_watchdog_does_not_fire
=== CONT TestWatchdog/default_watchdog_does_not_fire
userspace.go:178: Starting userspace wireguard engine (with fake TUN device)
userspace.go:255: link state: interfaces.State{defaultRoute=en0 ifs={en0:[192.168.0.197/24]} v4=true v6global=false}
userspace.go:366: Creating wireguard device...
userspace.go:372: Creating router...
userspace.go:399: Bringing wireguard device up...
wglog.go:51: [v2] UDP bind has been updated
wglog.go:51: [v2] Interface state was Down, requested Up, now Up
userspace.go:401: Bringing router up...
router_fake.go:24: [v1] warning: fakeRouter.Up: not implemented.
userspace.go:409: Clearing router settings...
router_fake.go:29: [v1] warning: fakeRouter.Set: not implemented.
userspace.go:413: Starting link monitor...
userspace.go:415: Starting magicsock...
userspace.go:418: Starting resolver...
userspace.go:422: Engine created.
magicsock.go:584: [v1] magicsock: starting endpoint update (initial)
watchdog.go:62: wgengine watchdog stacks:
goroutine profile: total 41
8 @ 0x102803a70 0x1027d3068 0x1027d2d68 0x102968a28 0x102837604
# 0x102968a27 github.com/tailscale/wireguard-go/device.(*Device).RoutineDecryption+0xe7 /Users/cadey/go/pkg/mod/github.com/tailscale/wireguard-go@v0.0.0-20210324165952-2963b66bc23a/device/receive.go:213
8 @ 0x102803a70 0x1027d3068 0x1027d2d68 0x102968cf4 0x102837604
# 0x102968cf3 github.com/tailscale/wireguard-go/device.(*Device).RoutineHandshake+0xd3 /Users/cadey/go/pkg/mod/github.com/tailscale/wireguard-go@v0.0.0-20210324165952-2963b66bc23a/device/receive.go:245
8 @ 0x102803a70 0x1027d3068 0x1027d2d68 0x10296bc3c 0x102837604
# 0x10296bc3b github.com/tailscale/wireguard-go/device.(*Device).RoutineEncryption+0xeb /Users/cadey/go/pkg/mod/github.com/tailscale/wireguard-go@v0.0.0-20210324165952-2963b66bc23a/device/send.go:372
1 @ 0x102803a70 0x1027d3068 0x1027d2d28 0x1028b6188 0x1028b2350 0x102837604
# 0x1028b6187 testing.tRunner.func1+0x207 /usr/local/go/src/testing/testing.go:1160
# 0x1028b234f testing.tRunner+0x10f /usr/local/go/src/testing/testing.go:1198
1 @ 0x102803a70 0x1027d3068 0x1027d2d28 0x1028b6188 0x1028b2350 0x1028b3b34 0x1028b2ca4 0x102ad847c 0x10280366c 0x102837604
# 0x1028b6187 testing.tRunner.func1+0x207 /usr/local/go/src/testing/testing.go:1160
# 0x1028b234f testing.tRunner+0x10f /usr/local/go/src/testing/testing.go:1198
# 0x1028b3b33 testing.runTests+0x283 /usr/local/go/src/testing/testing.go:1510
# 0x1028b2ca3 testing.(*M).Run+0x1b3 /usr/local/go/src/testing/testing.go:1418
# 0x102ad847b main.main+0x14b _testmain.go:47
# 0x10280366b runtime.main+0x26b /usr/local/go/src/runtime/proc.go:225
1 @ 0x102803a70 0x1027d3068 0x1027d2d28 0x1028b6430 0x102837604
# 0x1028b642f testing.runTests.func1.1+0x2f /usr/local/go/src/testing/testing.go:1517
1 @ 0x102803a70 0x1027d3068 0x1027d2d28 0x102973e60 0x102974504 0x102837604
# 0x102973e5f tailscale.com/wgengine/tstun.(*fakeTUN).Read+0x2f /Users/cadey/Code/tailscale/wgengine/tstun/faketun.go:40
# 0x102974503 tailscale.com/wgengine/tstun.(*TUN).poll+0xc3 /Users/cadey/Code/tailscale/wgengine/tstun/tun.go:194
1 @ 0x102803a70 0x1027d3068 0x1027d2d68 0x10296d7b4 0x102837604
# 0x10296d7b3 github.com/tailscale/wireguard-go/device.(*Device).RoutineTUNEventReader+0x83 /Users/cadey/go/pkg/mod/github.com/tailscale/wireguard-go@v0.0.0-20210324165952-2963b66bc23a/device/tun.go:20
1 @ 0x102803a70 0x1027d3068 0x1027d2d68 0x102ad5ab0 0x102837604
# 0x102ad5aaf tailscale.com/wgengine.newUserspaceEngine.func5+0x5f /Users/cadey/Code/tailscale/wgengine/userspace.go:381
1 @ 0x102803a70 0x102814b8c 0x10295bcb0 0x102837604
# 0x10295bcaf github.com/tailscale/wireguard-go/ratelimiter.(*Ratelimiter).Init.func1+0x8f /Users/cadey/go/pkg/mod/github.com/tailscale/wireguard-go@v0.0.0-20210324165952-2963b66bc23a/ratelimiter/ratelimiter.go:70
1 @ 0x102803a70 0x102814b8c 0x102974cfc 0x10296b3b4 0x102837604
# 0x102974cfb tailscale.com/wgengine/tstun.(*TUN).Read+0xcb /Users/cadey/Code/tailscale/wgengine/tstun/tun.go:278
# 0x10296b3b3 github.com/tailscale/wireguard-go/device.(*Device).RoutineReadFromTUN+0xc3 /Users/cadey/go/pkg/mod/github.com/tailscale/wireguard-go@v0.0.0-20210324165952-2963b66bc23a/device/send.go:226
1 @ 0x102803a70 0x102814b8c 0x102a778dc 0x102837604
# 0x102a778db tailscale.com/wgengine/monitor.(*Mon).debounce+0x16b /Users/cadey/Code/tailscale/wgengine/monitor/monitor.go:225
1 @ 0x102803a70 0x102815530 0x102815521 0x1028338a8 0x102842344 0x102970710 0x102837604
# 0x1028338a7 sync.runtime_Semacquire+0x37 /usr/local/go/src/runtime/sema.go:56
# 0x102842343 sync.(*WaitGroup).Wait+0x73 /usr/local/go/src/sync/waitgroup.go:130
# 0x10297070f github.com/tailscale/wireguard-go/device.newOutboundQueue.func1+0x2f /Users/cadey/go/pkg/mod/github.com/tailscale/wireguard-go@v0.0.0-20210324165952-2963b66bc23a/device/channels.go:32
1 @ 0x102803a70 0x102815530 0x102815521 0x1028338a8 0x102842344 0x102970770 0x102837604
# 0x1028338a7 sync.runtime_Semacquire+0x37 /usr/local/go/src/runtime/sema.go:56
# 0x102842343 sync.(*WaitGroup).Wait+0x73 /usr/local/go/src/sync/waitgroup.go:130
# 0x10297076f github.com/tailscale/wireguard-go/device.newInboundQueue.func1+0x2f /Users/cadey/go/pkg/mod/github.com/tailscale/wireguard-go@v0.0.0-20210324165952-2963b66bc23a/device/channels.go:50
1 @ 0x102803a70 0x102815530 0x102815521 0x1028338a8 0x102842344 0x1029707d0 0x102837604
# 0x1028338a7 sync.runtime_Semacquire+0x37 /usr/local/go/src/runtime/sema.go:56
# 0x102842343 sync.(*WaitGroup).Wait+0x73 /usr/local/go/src/sync/waitgroup.go:130
# 0x1029707cf github.com/tailscale/wireguard-go/device.newHandshakeQueue.func1+0x2f /Users/cadey/go/pkg/mod/github.com/tailscale/wireguard-go@v0.0.0-20210324165952-2963b66bc23a/device/channels.go:68
1 @ 0x102803a70 0x102815530 0x102815521 0x1028338f8 0x102860de4 0x102908b88 0x102914e28 0x102a8d50c 0x102837604
# 0x1028338f7 internal/poll.runtime_Semacquire+0x37 /usr/local/go/src/runtime/sema.go:61
# 0x102860de3 internal/poll.(*FD).Close+0x83 /usr/local/go/src/internal/poll/fd_unix.go:115
# 0x102908b87 net.(*netFD).Close+0x47 /usr/local/go/src/net/fd_posix.go:37
# 0x102914e27 net.(*conn).Close+0x57 /usr/local/go/src/net/net.go:207
# 0x102a8d50b tailscale.com/net/portmapper.closeCloserOnContextDone.func2+0x8b /Users/cadey/Code/tailscale/net/portmapper/portmapper.go:218
1 @ 0x102803a70 0x102833b48 0x102833b15 0x10283ef38 0x102aab388 0x102acf790 0x102ad65b8 0x102ad6548 0x102837604
# 0x102833b14 sync.runtime_notifyListWait+0xb4 /usr/local/go/src/runtime/sema.go:513
# 0x10283ef37 sync.(*Cond).Wait+0xb7 /usr/local/go/src/sync/cond.go:56
# 0x102aab387 tailscale.com/wgengine/magicsock.(*Conn).Close+0x197 /Users/cadey/Code/tailscale/wgengine/magicsock/magicsock.go:2506
# 0x102acf78f tailscale.com/wgengine.(*userspaceEngine).Close+0x2af /Users/cadey/Code/tailscale/wgengine/userspace.go:1244
# 0x102ad65b7 tailscale.com/wgengine.(*watchdogEngine).watchdog.func1+0x27 /Users/cadey/Code/tailscale/wgengine/watchdog.go:70
# 0x102ad6547 tailscale.com/wgengine.(*watchdogEngine).watchdogErr.func1+0x27 /Users/cadey/Code/tailscale/wgengine/watchdog.go:52
1 @ 0x102831ae0 0x1028e88a4 0x1028e86b0 0x1028e5c68 0x102ad28cc 0x102ad2ac4 0x102ad3a48 0x102ad7090 0x1028b2320 0x102837604
# 0x102831adf runtime/pprof.runtime_goroutineProfileWithLabels+0x4f /usr/local/go/src/runtime/mprof.go:716
# 0x1028e88a3 runtime/pprof.writeRuntimeProfile+0xb3 /usr/local/go/src/runtime/pprof/pprof.go:724
# 0x1028e86af runtime/pprof.writeGoroutine+0x8f /usr/local/go/src/runtime/pprof/pprof.go:684
# 0x1028e5c67 runtime/pprof.(*Profile).WriteTo+0x377 /usr/local/go/src/runtime/pprof/pprof.go:331
# 0x102ad28cb tailscale.com/wgengine.(*watchdogEngine).watchdogErr+0x16b /Users/cadey/Code/tailscale/wgengine/watchdog.go:61
# 0x102ad2ac3 tailscale.com/wgengine.(*watchdogEngine).watchdog+0x73 /Users/cadey/Code/tailscale/wgengine/watchdog.go:69
# 0x102ad3a47 tailscale.com/wgengine.(*watchdogEngine).Close+0x97 /Users/cadey/Code/tailscale/wgengine/watchdog.go:134
# 0x102ad708f tailscale.com/wgengine.TestWatchdog.func1+0x1df /Users/cadey/Code/tailscale/wgengine/watchdog_test.go:33
# 0x1028b231f testing.tRunner+0xdf /usr/local/go/src/testing/testing.go:1194
1 @ 0x1028344cc 0x102846ae4 0x102860cf4 0x1028602c0 0x102861d28 0x102908f84 0x10291ca54 0x10291b8e4 0x102a8bbe0 0x102a9fec8 0x102a9cc88 0x102837604
# 0x1028344cb syscall.syscall+0x1b /usr/local/go/src/runtime/sys_darwin.go:19
# 0x102846ae3 syscall.Close+0x43 /usr/local/go/src/syscall/zsyscall_darwin_arm64.go:519
# 0x102860cf3 internal/poll.(*FD).destroy+0x43 /usr/local/go/src/internal/poll/fd_unix.go:83
# 0x1028602bf internal/poll.(*FD).writeUnlock+0x4f /usr/local/go/src/internal/poll/fd_mutex.go:250
# 0x102861d27 internal/poll.(*FD).WriteTo+0x1b7 /usr/local/go/src/internal/poll/fd_unix.go:351
# 0x102908f83 net.(*netFD).writeTo+0x53 /usr/local/go/src/net/fd_posix.go:79
# 0x10291ca53 net.(*UDPConn).writeTo+0xa3 /usr/local/go/src/net/udpsock_posix.go:80
# 0x10291b8e3 net.(*UDPConn).WriteTo+0x63 /usr/local/go/src/net/udpsock.go:167
# 0x102a8bbdf tailscale.com/net/portmapper.(*Client).CreateOrGetMapping+0xb9f /Users/cadey/Code/tailscale/net/portmapper/portmapper.go:305
# 0x102a9fec7 tailscale.com/wgengine/magicsock.(*Conn).determineEndpoints+0x147 /Users/cadey/Code/tailscale/wgengine/magicsock/magicsock.go:1003
# 0x102a9cc87 tailscale.com/wgengine/magicsock.(*Conn).updateEndpoints+0xe7 /Users/cadey/Code/tailscale/wgengine/magicsock/magicsock.go:586
1 @ 0x1028344cc 0x1029249fc 0x102a77f54 0x102a77f2d 0x102a7762c 0x102837604
# 0x1028344cb syscall.syscall+0x1b /usr/local/go/src/runtime/sys_darwin.go:19
# 0x1029249fb golang.org/x/sys/unix.read+0x5b /Users/cadey/go/pkg/mod/golang.org/x/sys@v0.0.0-20210317225723-c4fcb01b228e/unix/zsyscall_darwin_arm64.go:1691
# 0x102a77f53 golang.org/x/sys/unix.Read+0x43 /Users/cadey/go/pkg/mod/golang.org/x/sys@v0.0.0-20210317225723-c4fcb01b228e/unix/syscall_unix.go:157
# 0x102a77f2c tailscale.com/wgengine/monitor.(*darwinRouteMon).Receive+0x1c /Users/cadey/Code/tailscale/wgengine/monitor/monitor_darwin.go:55
# 0x102a7762b tailscale.com/wgengine/monitor.(*Mon).pump+0x7b /Users/cadey/Code/tailscale/wgengine/monitor/monitor.go:203
watchdog.go:63: wgengine: watchdog timeout on Close
--- FAIL: TestWatchdog (0.00s)
--- FAIL: TestWatchdog/default_watchdog_does_not_fire (0.16s)
FAIL
FAIL tailscale.com/wgengine 0.166s
FAIL |
I think this is a red herring and the fd is getting (legitimately) closed somewhere else. I still don't understand why the close syscall blocks for so long with Little Snitch, and that may be relevant to some of our other problems, but I don't think this particular bug report is well-founded. |
(broken out from #45211)
On my new M1 mac, a previously reliable test has started failing. Reproduction (works fairly reliably, but only for me): check out github.com/tailscale/tailscale at 28af46fb3bf8e00afe5d3206b50f76cb2442584f and run
go test -v -run=Watchdog/default -count=1 ./wgengine
using 1.16.2.The test fails because a call to the close syscall blocks for over a second. At that point, the test dumps all goroutine stacks. The interesting stack is this one:
The bottom two frames of the stack trace are uninteresting. Then there's a UDP packet write: https://github.com/tailscale/tailscale/blob/28af46fb3bf8e00afe5d3206b50f76cb2442584f/net/portmapper/portmapper.go#L305. The FD for that write should still be live after that WriteTo call. Yet, working up the stack, that WriteTo call results in the FD being destroyed. (And then for reasons that are unclear to me, the close syscall blocks.)
cc @bradfitz @ianlancetaylor
Full test output
--- FAIL: TestWatchdog (0.00s)
--- FAIL: TestWatchdog/default_watchdog_does_not_fire (0.16s)
FAIL
FAIL tailscale.com/wgengine 0.305s
FAIL
The text was updated successfully, but these errors were encountered: