Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: spurious EADDRINUSE from connect syscall on FreeBSD when connecting from IPv6 wildcard to IPv4 address #34264

Open
bcmills opened this issue Sep 12, 2019 · 7 comments
Labels
ExpertNeeded NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-FreeBSD
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Sep 12, 2019

Observed in a freebsd-amd64-12_0 TryBot: https://storage.googleapis.com/go-build-log/2e4c3b9c/freebsd-amd64-12_0_d1b5be43.log

--- FAIL: TestDialerLocalAddr (0.00s)
    dial_test.go:647: tcp [::]:0->127.0.0.1: got dial tcp [::]:0->127.0.0.1:22464: connect: address already in use; want <nil>
FAIL
FAIL	net	54.481s

This test is also flaky on macOS (#22019), but the symptom on FreeBSD is different from the one observed on macOS (connect: address already in use vs. getsockopt: operation timed out).

Note that the freebsd-arm-paulzhol flake reported in #22019 (comment) matches this one.

CC @ianlancetaylor @mikioh

@bcmills bcmills added OS-FreeBSD NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 12, 2019
@bcmills bcmills added this to the Go1.14 milestone Sep 12, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@bcmills bcmills changed the title net: TestDialerLocalAddr flaky on FreeBSD net: TestDialerLocalAddr is flaky on FreeBSD and Dragonfly Feb 21, 2020
@bcmills bcmills changed the title net: TestDialerLocalAddr is flaky on FreeBSD and Dragonfly net: TestDialerLocalAddr is flaky on FreeBSD Feb 21, 2020
@bcmills
Copy link
Contributor Author

bcmills commented Jun 8, 2021

@bcmills bcmills changed the title net: TestDialerLocalAddr is flaky on FreeBSD net: TestDialerLocalAddr failures with "connect: address already in use" on FreeBSD Jun 8, 2021
@bcmills
Copy link
Contributor Author

bcmills commented Nov 29, 2021

greplogs --dashboard -md -l -e 'FAIL: TestDialerLocalAddr.*\n.*address already in use' --since=2021-06-09

2021-11-25T00:07:28-f7e34e7/freebsd-amd64-11_4

@bcmills
Copy link
Contributor Author

bcmills commented Dec 3, 2021

These appear to always be for the specific pair [::]:0->127.0.0.1.

The connect: prefix in the error suggests that the error originates at one of these two sites:
https://cs.opensource.google/go/go/+/master:src/net/fd_unix.go;l=83;drc=c4aae23d6426442402b3de0e5f7de1ef8da3842a
https://cs.opensource.google/go/go/+/master:src/net/fd_unix.go;l=165;drc=c4aae23d6426442402b3de0e5f7de1ef8da3842a

This FreeBSD bug seems to match the symptom, but it was believed to be fixed as of FreeBSD 12-STABLE.

@gopherbot
Copy link

Change https://golang.org/cl/369157 mentions this issue: net: skip IPv6-to-v4 case in TestDialerLocalAddr on certain FreeBSD builders

gopherbot pushed a commit that referenced this issue Dec 10, 2021
The failure mode in #34264 appears to match
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210726.

That bug was supposed to have been fixed in FreeBSD 12, but we're
still observing failures specifically for the 6-to-4 case on FreeBSD
12.2. It is not clear to me whether FreeBSD 13.0 is also affected.

For #34264

Change-Id: Iba7c7fc57676ae628b13c0b8fe43ddf2251c3637
Reviewed-on: https://go-review.googlesource.com/c/go/+/369157
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@bcmills bcmills changed the title net: TestDialerLocalAddr failures with "connect: address already in use" on FreeBSD net: spurious EADDRINUSE from connect syscall on FreeBSD when connecting from IPv6 wildcard to IPv4 address Dec 10, 2021
@bcmills
Copy link
Contributor Author

bcmills commented Dec 10, 2021

The TestDialLocalAddr failure is now mitigated (by ignoring the spurious error). However, this could still use attention from a FreeBSD maintainer to diagnose what appears to be a lingering kernel bug.

@dmgk
Copy link
Member

dmgk commented Jul 3, 2022

Unfortunately, this is still reproducible on all FreeBSD versions from 12.3-RELEASE to 14.0-CURRENT. I'll look at writing a small reproducer and will reopen the corresponding FreeBSD bug.

@dmgk
Copy link
Member

dmgk commented Jul 8, 2022

I've looked at this some more, and it seems that this failure is caused by two separate issues:

  • net.inet.tcp.nolocaltimewait defaulting to 0 in all FreeBSD versions creates a fallout of local connection left in TIME_WAIT state, this makes it easy to run out of ephemeral ports during testing. This OID was updated to bool true in 14-CURRENT but this change wasn't yet MFC-ed to the -STABLE versions [1]. As a workaround, it might make sense to set net.inet.tcp.nolocaltimewait=1 on all FreeBSD builders. Linux kernel seems to have an equivalent feature enabled since 2018 [2].

  • When ephemeral ports range is exhausted, connect(2) unexpectedly returns EADDRINUSE instead of failing during bind(2) with EADDRNOTAVAIL when connecting from IPv6 wildcard to IPv4. I wasn't able to figure out the exact cause so I've created a separate bug for this [3].

[1] freebsd/freebsd-src@92b3e07
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=79e9fed460385a3d8ba0b5782e9e74405cb199b1
[3] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265064

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExpertNeeded NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-FreeBSD
Projects
None yet
Development

No branches or pull requests

4 participants