Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/context/ctxhttp: test failures due to ENOBUFS on netbsd-386-9_0 #56810

Open
gopherbot opened this issue Nov 17, 2022 · 13 comments
Open

x/net/context/ctxhttp: test failures due to ENOBUFS on netbsd-386-9_0 #56810

gopherbot opened this issue Nov 17, 2022 · 13 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-NetBSD
Milestone

Comments

@gopherbot
Copy link

gopherbot commented Nov 17, 2022

#!watchflakes
post <- pkg == "golang.org/x/net/context/ctxhttp" && `no buffer space available`

Issue created automatically to collect these failures.

Example (log):

--- FAIL: TestGo17Context (0.00s)
    ctxhttp_test.go:28: error received from client: Get "http://127.0.0.1:65524": dial tcp 127.0.0.1:65524: socket: no buffer space available <nil>

watchflakes

@gopherbot gopherbot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 17, 2022
@gopherbot gopherbot added this to the Unreleased milestone Nov 17, 2022
@gopherbot
Copy link
Author

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "golang.org/x/net/context/ctxhttp" && test == "TestGo17Context"
2022-11-16 23:25 netbsd-386-9_0 net@0833b635 go@fdd8f021 x/net/context/ctxhttp.TestGo17Context (log)
--- FAIL: TestGo17Context (0.00s)
    ctxhttp_test.go:28: error received from client: Get "http://127.0.0.1:65524": dial tcp 127.0.0.1:65524: socket: no buffer space available <nil>

watchflakes

@bcmills bcmills changed the title x/net/context/ctxhttp: TestGo17Context failures x/net/context/ctxhttp: test failures due to ENOBUFS Nov 17, 2022
@bcmills
Copy link
Contributor

bcmills commented Nov 17, 2022

@golang/netbsd, it's not clear to me how a TCP dial could result in ENOBUFS — generally I only expect to see that with UDP sockets. Any idea what's up with this?

@gopherbot
Copy link
Author

Sorry, but there were parse errors in the watch flakes script.
The script I found was:

#!watchflakes
post <- pkg == "golang.org/x/net/context/ctxhttp" && "no buffer space available"

And the problems were:

script:2.54: unexpected quoted string no buffer space available

See https://go.dev/wiki/Watchflakes for details.

watchflakes

@bcmills bcmills changed the title x/net/context/ctxhttp: test failures due to ENOBUFS x/net/context/ctxhttp: test failures due to ENOBUFS on netbsd-386-9_0 Nov 17, 2022
@riastradh
Copy link

What is the syscall that fails?

Have you checked the numbers for SOCK_STREAM, SOCK_DGRAM, &c., to verify they match NetBSD?

Is there a quick reproducer? Can you get a ktrace of the process that fails?

@bcmills
Copy link
Contributor

bcmills commented Nov 17, 2022

The only information I have is what's in the attached log, which has only been seen once on the builders.

(If more information is needed for debugging, please do send a CL to add that information to the errors on the affected codepaths.)

@riastradh
Copy link

What are the affected code paths? Where do I find the source code to this? I grepped through the go git repository's master branch for TestGo17Context and error received from client, and searched for files with ctxhttp in the name, but nothing came up.

@bcmills
Copy link
Contributor

bcmills commented Nov 17, 2022

The call that returned the error is this one:

That tracks through (*net/http.Client).Do:

The dial: fragment of the error message is almost certainly coming from one of the OpError returns in net/dial.go:

Unfortunately, the error message doesn't include enough information to pin down which of those paths is being taken. 🤦‍♂️

@riastradh
Copy link

It looks like this must have come out of the socket system call, judging by the message and by https://cs.opensource.google/go/go/+/master:src/net/net.go;l=473-494;drc=6b45863e47ad1a27ba3051ce0407f0bdc7b46113 -- there's only one colon formatted by OpError.Error() itself, and other similar errors are things like dial tcp 192.168.1.100:3000: connect: host is down, so I assume the syscall is formatted part of the constituent error message socket: no buffer space available.

Can you pinpoint the call to socket to find what arguments Go is passing to it?

@bcmills
Copy link
Contributor

bcmills commented Nov 17, 2022

I only see one SyscallError with the prefix socket in the standard library that would affect a netbsd build:
https://cs.opensource.google/go/go/+/master:src/net/sock_cloexec.go;l=22;drc=35d02791b990082fe80da54352050bd095ebd1e7

(Full search: https://cs.opensource.google/search?q=%5C%22socket%5C%22&ss=go%2Fgo)

@riastradh
Copy link

riastradh commented Nov 17, 2022

I only see one SyscallError with the prefix socket in the standard library that would affect a netbsd build: https://cs.opensource.google/go/go/+/master:src/net/sock_cloexec.go;l=22;drc=35d02791b990082fe80da54352050bd095ebd1e7

(Full search: https://cs.opensource.google/search?q=%5C%22socket%5C%22&ss=go%2Fgo)

What is socketFunc?

Likely call chain(?):

* https://cs.opensource.google/go/go/+/master:src/net/dial.go;l=580;drc=b7662047aedc5f2c512911eb59d514ce75b16e18

* https://cs.opensource.google/go/go/+/master:src/net/tcpsock_posix.go;l=64;drc=b7662047aedc5f2c512911eb59d514ce75b16e18

* https://cs.opensource.google/go/go/+/master:src/net/tcpsock_posix.go;l=74;drc=b7662047aedc5f2c512911eb59d514ce75b16e18

* https://cs.opensource.google/go/go/+/master:src/net/ipsock_posix.go;l=142;drc=8c17505da792755ea59711fc8349547a4f24b5c5

How does this one get "socket" in the error message? It doesn't seem to go through the sysSocket call you quoted earlier?


I did a quick search for ENOBUFS in the NetBSD kernel reachable from socket(AF_INET, SOCK_STREAM|..., IPPROTO_TCP). Possible call stack (written caller->callee top to bottom):

So this could happen from a low-memory situation, or because of the socket buffer rlimit, or because of kern.sbmax.

Can you show the rlimits you run this process with? Can you try raising them if they aren't already unlimited, or can you try raising kern.sbmax?

@bcmills
Copy link
Contributor

bcmills commented Nov 17, 2022

What is socketFunc?

It's an indirection for syscall.Socket, intended for use in (possibly misguided?) net tests:
https://cs.opensource.google/go/go/+/master:src/net/main_unix_test.go;l=25;drc=b7662047aedc5f2c512911eb59d514ce75b16e18

How does this one get "socket" in the error message? It doesn't seem to go through the sysSocket call you quoted earlier?

socket calls sysSocket here:
https://cs.opensource.google/go/go/+/master:src/net/sock_posix.go;l=19;drc=b7662047aedc5f2c512911eb59d514ce75b16e18

So this could happen from a low-memory situation, or because of the socket buffer rlimit, or because of kern.sbmax.

Given that it's a 386 builder, could it possibly be due to running out of address space?

Can you show the rlimits you run this process with?

Assuming that they're the same as for a login shell on a gomote:

buildlet-netbsd-386-9-0-rn98ccc15# ulimit -a
time          (-t seconds    ) unlimited
file          (-f blocks     ) unlimited
data          (-d kbytes     ) 3145728
stack         (-s kbytes     ) 2048
coredump      (-c blocks     ) unlimited
memory        (-m kbytes     ) 3071632
locked memory (-l kbytes     ) 1023877
thread        (-r threads    ) 1024
process       (-p processes  ) 1024
nofiles       (-n descriptors) 3404
vmemory       (-v kbytes     ) unlimited
sbsize        (-b bytes      ) unlimited
buildlet-netbsd-386-9-0-rn98ccc15# sysctl kern.sbmax
kern.sbmax = 262144

Can you try raising them if they aren't already unlimited, or can you try raising kern.sbmax?

I don't think I can feasibly do that on the Go builders, no. (And even if I could, with only N=1 failures I wouldn't be able to determine whether that fixed anything.)

@riastradh
Copy link

So this could happen from a low-memory situation, or because of the socket buffer rlimit, or because of kern.sbmax.

Given that it's a 386 builder, could it possibly be due to running out of address space?

It could be, and that could happen even if the address space shortage is only transient, because of the use of PR_NOWAIT. We've been working to minimize the use of PR_NOWAIT to avoid this kind of low-probability error on transient memory shortage; a reasonably small fixed-size allocation like this is a candidate for conversion to PR_WAITOK so socket will wait for memory to become available rather than fail.

Can you try raising them if they aren't already unlimited, or can you try raising kern.sbmax?

I don't think I can feasibly do that on the Go builders, no. (And even if I could, with only N=1 failures I wouldn't be able to determine whether that fixed anything.)

Probably not much more we can do to diagnose this, then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-NetBSD
Projects
Status: No status
Development

No branches or pull requests

3 participants