Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery #41549

Closed
leventov opened this issue Sep 22, 2020 · 9 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@leventov
Copy link

What version of Go are you using (go version)?

go1.15

What operating system and processor architecture are you using (go env)?

RaspberriPi Compute Module 3+, 4.19.88 #1 SMP Fri Jul 17 09:42:11 UTC 2020 armv7l GNU/Linux.

Additionally, the process runs within a Docker container.

What did you do?

Internet connection broke and then recovered.

tls.Conn.Read() stuck in runtime_pollWait.

1 @ 0x48608 0x412e8 0x75ffc 0xde6a8 0xdf670 0xdf655 0x1d1ab0 0x1e24e8 0x21a5ec 0x10f724 0x21a834 0x2180a4 0x21d074 0x21d07d 0xda238 0x4cd528 0x4cd4fd 0x4e54e8 0x7ad2c
--
  | #	0x75ffb		internal/poll.runtime_pollWait+0x43				runtime/netpoll.go:220 |   | #	0x75ffb		internal/poll.runtime_pollWait+0x43				runtime/netpoll.go:220
  | #	0xde6a7		internal/poll.(*pollDesc).wait+0x2f				internal/poll/fd_poll_runtime.go:87 |   | #	0xde6a7		internal/poll.(*pollDesc).wait+0x2f				internal/poll/fd_poll_runtime.go:87
  | #	0xdf66f		internal/poll.(*pollDesc).waitRead+0x17b			internal/poll/fd_poll_runtime.go:92 |   | #	0xdf66f		internal/poll.(*pollDesc).waitRead+0x17b			internal/poll/fd_poll_runtime.go:92
  | #	0xdf654		internal/poll.(*FD).Read+0x160					internal/poll/fd_unix.go:159 |   | #	0xdf654		internal/poll.(*FD).Read+0x160					internal/poll/fd_unix.go:159
  | #	0x1d1aaf	net.(*netFD).Read+0x37						net/fd_posix.go:55 |   | #	0x1d1aaf	net.(*netFD).Read+0x37						net/fd_posix.go:55
  | #	0x1e24e7	net.(*conn).Read+0x63						net/net.go:182 |   | #	0x1e24e7	net.(*conn).Read+0x63						net/net.go:182
  | #	0x21a5eb	crypto/tls.(*atLeastReader).Read+0x77				crypto/tls/conn.go:779 |   | #	0x21a5eb	crypto/tls.(*atLeastReader).Read+0x77				crypto/tls/conn.go:779
  | #	0x10f723	bytes.(*Buffer).ReadFrom+0xa3					bytes/buffer.go:204 |   | #	0x10f723	bytes.(*Buffer).ReadFrom+0xa3					bytes/buffer.go:204
  | #	0x21a833	crypto/tls.(*Conn).readFromUntil+0xc3				crypto/tls/conn.go:801 |   | #	0x21a833	crypto/tls.(*Conn).readFromUntil+0xc3				crypto/tls/conn.go:801
  | #	0x2180a3	crypto/tls.(*Conn).readRecordOrCCS+0xfb				crypto/tls/conn.go:608 |   | #	0x2180a3	crypto/tls.(*Conn).readRecordOrCCS+0xfb				crypto/tls/conn.go:608
  | #	0x21d073	crypto/tls.(*Conn).readRecord+0x14f				crypto/tls/conn.go:576 |   | #	0x21d073	crypto/tls.(*Conn).readRecord+0x14f				crypto/tls/conn.go:576
  | #	0x21d07c	crypto/tls.(*Conn).Read+0x158					crypto/tls/conn.go:1252 |   | #	0x21d07c	crypto/tls.(*Conn).Read+0x158					crypto/tls/conn.go:1252
  | #	0xda237		io.ReadAtLeast+0x6b						io/io.go:314 |   | #	0xda237		io.ReadAtLeast+0x6b						io/io.go:314
  | #	0x4cd527	io.ReadFull+0x67						io/io.go:333 |   | #	0x4cd527	io.ReadFull+0x67						io/io.go:333
  | #	0x4cd4fc	github.com/eclipse/paho.mqtt.golang/packets.ReadPacket+0x3c	github.com/eclipse/paho.mqtt.golang@v1.2.0/packets/packets.go:105 |   | #	0x4cd4fc	github.com/eclipse/paho.mqtt.golang/packets.ReadPacket+0x3c	github.com/eclipse/paho.mqtt.golang@v1.2.0/packets/packets.go:105
  | #	0x4e54e7	github.com/eclipse/paho%2emqtt%2egolang.incoming+0xe7		github.com/eclipse/paho.mqtt.golang@v1.2.0/net.go:132 |   | #	0x4e54e7	github.com/eclipse/paho%2emqtt%2egolang.incoming+0xe7		github.com/eclipse/paho.mqtt.golang@v1.2.0/net.go:132

Might be related to #27752

@davecheney
Copy link
Contributor

davecheney commented Sep 22, 2020

This is expected if a timeout has not been set on the connection. Has a timeout been set before calling Read?

@leventov
Copy link
Author

So there should probably be a SetReadDeadline() call before this line?
https://github.com/eclipse/paho.mqtt.golang/blob/ba85050a1f239f4e954dc95920213db51f937df1/net.go#L119

Still, I would expect that a read call (even untimed) would error with "internet disconnected" on internet disconnection, or would unstuck again when the internet connection has recovered, but not just stuck.

@davecheney
Copy link
Contributor

Yup, if it’s important, it needs a timeout.

Still, I would expect that a read call (even untimed) would error with "internet disconnected" on internet disconnection, or would unstuck again when the internet connection has recovered, but not just stuck

If the operating system has not signalled that the tcp connection has been closed or reset, there’s not much the runtime can do from user space.

@leventov
Copy link
Author

leventov commented Sep 22, 2020

So you think this is a kernel/Docker problem that it doesn't close the socket on internet disconnection, or no one's problem at all?

The runtime could probably detect the internet disconnection event and fail all outstanding Reads.

@davecheney
Copy link
Contributor

The runtime could probably detect the internet disconnection event and fail all outstanding Reads.

the network fd is handled by epoll (on linux) and if there is no event received from the kernel, there's nothing the runtime can do.

@networkimprov
Copy link

networkimprov commented Sep 23, 2020

See also #31490 re TCP keepalive problems.

TCP keepalive is on by default for both client and server net.Conn's

@cagedmantis cagedmantis changed the title tls.Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery tls: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery Sep 28, 2020
@cagedmantis cagedmantis changed the title tls: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery crypto/tls: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery Sep 28, 2020
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 28, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Sep 28, 2020
@cagedmantis
Copy link
Contributor

/cc @FiloSottile

@FiloSottile FiloSottile changed the title crypto/tls: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery net: Conn.Read() stuck in runtime_pollWait after internet connection loss and recovery Oct 5, 2020
@FiloSottile
Copy link
Contributor

Doesn't look like a crypto/tls specific issue, please tag me back in if I'm wrong.

@ianlancetaylor
Copy link
Member

I don't think there is anything we can change in the Go standard library here, so I'm going to close the issue.

Please comment if you disagree.

@golang golang locked and limited conversation to collaborators Oct 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants