net: FileConn can yield blocking descriptors, leading to livelock #61205
Labels
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
WaitingForInfo
Issue is not actionable because of missing required information, which needs to be provided.
Milestone
What version of Go are you using (
go version
)?go1.20.5
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?Linux/amd64
What did you do?
We use
net.FileConn
to construct anet.Conn
from a file descriptor passed in from another process via a unix socket.net.FileConn
callsinternal/poll.FD.Init(net, true)
, which then makes the descriptor pollable conditional on poll configuration succeeding:The trouble here is that
fd.pd.init(fd)
can produce an error when the number of poll watches exceeds/proc/sys/fs/epoll/max_user_watches
, and thus the non-blocking file descriptor is incorrectly labelled as blocking.The consequence of this is that calls to
write(2)
,read(2)
, etc. that returnEAGAIN
do not invoke the netpoller and instead loop forever (or as long as the operation would block), which consumes 100% of the CPU time. Here's a flame graph I captured from a profile:I think the correct fix here is to have
poll.FD.Init
produce a real error when netpoller registration fails.The text was updated successfully, but these errors were encountered: