You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement a pooled TCP client that concurrently writes messages across 4 individual TCP connections, and write 100 messages using the client to netcat on port 9000.
Depending on whether or not there are messages that have been queued and have yet to be written to the TCP clients' destination address, more than 1 TCP connection may be established and pooled to write the 100 messages.
What did you expect to see?
Netcat is known to only support accepting one incoming connection. So I expected the case that should more than 1 TCP connection be established while writing 100 messages to netcat, net.DialContext() would return an error.
What did you see instead?
A deadlock on the connect() syscall (syscall.Connect()) itself instead.
(dlv) grs
Goroutine 1 - User: /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/runtime/sema.go:56 sync.runtime_Semacquire (0x472645) [semacquire 450440h8m53.582824256s]
Goroutine 2 - User: /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/runtime/proc.go:337 runtime.gopark (0x440755) [force gc (idle) 450440h8m53.582861217s]
Goroutine 3 - User: /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/runtime/proc.go:337 runtime.gopark (0x440755) [GC sweep wait]
Goroutine 4 - User: /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/runtime/proc.go:337 runtime.gopark (0x440755) [sleep]
Goroutine 5 - User: /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/runtime/proc.go:337 runtime.gopark (0x440755) [finalizer wait 450440h8m53.582910923s]
* Goroutine 19 - User: /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/syscall/asm_linux_amd64.s:26 syscall.Syscall (0x492e9b) (thread 440468) [chan receive]
Goroutine 3035 - User: /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/runtime/proc.go:337 runtime.gopark (0x440755) [GC worker (idle)]
Goroutine 3214 - User: /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/runtime/proc.go:337 runtime.gopark (0x440755) [GC worker (idle)]
Goroutine 3238 - User: /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/runtime/proc.go:337 runtime.gopark (0x440755) [GC worker (idle) 450440h8m53.582973511s]
Goroutine 3239 - User: /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/runtime/proc.go:337 runtime.gopark (0x440755) [GC worker (idle)]
Goroutine 3240 - User: /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/runtime/proc.go:337 runtime.gopark (0x440755) [GC worker (idle)]
Goroutine 3241 - User: /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/runtime/proc.go:337 runtime.gopark (0x440755) [GC worker (idle)]
Goroutine 3242 - User: /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/runtime/proc.go:337 runtime.gopark (0x440755) [GC worker (idle)]
Goroutine 3243 - User: /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/runtime/proc.go:337 runtime.gopark (0x440755) [GC worker (idle)]
[14 goroutines]
(dlv) bt
0 0x0000000000492e9b in syscall.Syscall
at /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/syscall/asm_linux_amd64.s:26
1 0x00000000004922f8 in syscall.connect
at /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/syscall/zsyscall_linux_amd64.go:1425
2 0x0000000000490ab3 in syscall.Connect
at /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/syscall/syscall_unix.go:262
3 0x00000000005062e5 in net.(*netFD).connect
at /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/net/fd_unix.go:59
4 0x0000000000521b69 in net.(*netFD).dial
at /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/net/sock_posix.go:149
5 0x0000000000520ec5 in net.socket
at /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/net/sock_posix.go:70
6 0x0000000000513c0d in net.internetSocket
at /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/net/ipsock_posix.go:141
7 0x000000000052436e in net.(*sysDialer).doDialTCP
at /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/net/tcpsock_posix.go:65
8 0x000000000052418c in net.(*sysDialer).dialTCP
at /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/net/tcpsock_posix.go:61
9 0x00000000004fb7f2 in net.(*sysDialer).dialSingle
at /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/net/dial.go:580
10 0x00000000004facc5 in net.(*sysDialer).dialSerial
at /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/net/dial.go:548
11 0x00000000004f9465 in net.(*Dialer).DialContext
at /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/net/dial.go:425
12 0x0000000000534da5 in main.(*Client).spawnOrWaitForAvailableConnection.func1.1.1
at ./cmd/main.go:111
13 0x0000000000535610 in main.(*Client).spawnOrWaitForAvailableConnection.func1.1
at ./cmd/main.go:112
14 0x0000000000535c96 in main.(*Client).spawnOrWaitForAvailableConnection.func1
at ./cmd/main.go:180
15 0x00000000004759c1 in runtime.goexit
at /nix/store/9g87plb4z4xgpg8jm6ch94j32yi6ycbn-go-1.16.4/share/go/src/runtime/asm_amd64.s:1371
To reproduce, run netcat on port 9000 via. nc -l 9000 and run the program below. Debugging via delve. unravels the callstack showcasing the deadlock on syscall.Connect.
package main
import (
"bytes""context""fmt""net""strconv""sync""time"
)
typeClientConnstruct {
conn*net.TCPConn
}
typeClientstruct {
addressstringmu sync.Mutexwg sync.WaitGroupclosedchanstruct{}
waiters []chanerrorpool [4]*ClientConnlenuint8queuechan []byte
}
funcNewClient(addressstring) Client {
returnClient{
address: address,
queue: make(chan []byte, 1024),
}
}
func (c*Client) Close() {
c.close()
c.wg.Wait()
close(c.queue)
}
func (c*Client) close() {
c.mu.Lock()
deferc.mu.Unlock()
for_, waiter:=rangec.waiters {
waiter<-context.Canceledclose(waiter)
}
c.waiters=c.waiters[:0]
c.closed=make(chanstruct{})
close(c.closed)
for_, conn:=rangec.pool[:c.len] {
ifconn.conn!=nil {
conn.conn.Close()
}
}
}
// spawnOrWaitForAvailableConnection returns a waiter channel if we are waiting for a connection// in our pool (which may or may not have just been spawned) to be ready, or nil otherwise. It// executes a pooling strategy.//// if any connected, pool isn't full, write queue isn't empty, spawn a new connection// if any connected, pool isn't full, write queue is empty, don't spawn a new connection// if any connected, pool is full, don't spawn a new connection// if none connected, pool isn't full, write queue isn't empty, spawn a new connection// if none connected, pool isn't full, write queue is empty, spawn a new connection// if none connected, pool is full, don't spawn a new connectionfunc (c*Client) spawnOrWaitForAvailableConnection() (chanerror, error) {
c.mu.Lock()
deferc.mu.Unlock()
ifc.closed!=nil {
returnnil, context.Canceled
}
pool:=c.pool[0:c.len:cap(c.pool)]
connected:=falsefor_, conn:=rangepool {
ifconn.conn!=nil {
connected=truebreak
}
}
iflen(pool) <cap(c.pool) && (!connected||len(c.queue) >0) {
cc:=&ClientConn{}
c.pool[c.len] =ccc.len=c.len+1c.wg.Add(1)
gofunc() {
deferc.wg.Done()
for {
loop:=func() bool {
vard net.Dialerconn, err:=func() (net.Conn, error) {
ctx, cancel:=context.WithTimeout(context.Background(), 5*time.Second)
defercancel()
returnd.DialContext(ctx, "tcp", c.address)
}()
iferr!=nil {
if!c.reportConnectionError(cc, err) {
returnfalse
}
returntrue
}
if!c.reportConnected(cc, conn.(*net.TCPConn)) {
conn.Close()
returnfalse
}
writerDone:=make(chanerror)
writerStop:=make(chanstruct{})
readerDone:=make(chanerror)
gofunc() {
writerDone<-func() error {
for {
select {
case<-writerStop:
returnnilcasebuf, open:=<-c.queue:
if!open {
returncontext.Canceled
}
_, err:=cc.conn.Write(buf)
iferr!=nil {
returnerr
}
}
}
}()
}()
gofunc() {
readerDone<-func() error {
varbuf [1024]bytefor {
n, err:=cc.conn.Read(buf[0:])
iferr!=nil {
returnerr
}
ifn==0 {
returnnil
}
fmt.Printf("Got: '%s'\n", bytes.TrimSpace(buf[0:n]))
}
}()
}()
select {
case<-writerDone:
conn.Close()
<-readerDonecase<-readerDone:
conn.Close()
close(writerStop)
<-writerDone
}
return!c.reportDisconnected(cc)
}()
if!loop {
break
}
}
}()
}
if!connected {
waiter:=make(chanerror, 1)
c.waiters=append(c.waiters, waiter)
returnwaiter, nil
}
returnnil, nil
}
func (c*Client) tryRemoveWaiter(waiterchanerror) bool {
// Attempt to remove our waiter entry from the waiters list. It is possible// that our waiter has already been removed from the waiter list.// If we did not find our waiter entry, then that means that a connection is available.// This is needed because we are racing whether or not a pool connection is available// against ctx.Done().c.mu.Lock()
deferc.mu.Unlock()
waiters:=c.waiters[:0]
found:=falsefor_, w:=rangec.waiters {
ifw==waiter {
found=truecontinue
}
waiters=append(waiters, w)
}
c.waiters=waitersif!found {
returnfalse
}
close(waiter)
returntrue
}
func (c*Client) reportConnected(cc*ClientConn, dialed*net.TCPConn) bool {
c.mu.Lock()
deferc.mu.Unlock()
ifc.closed!=nil {
pool:=c.pool[:0]
for_, conn:=rangec.pool {
ifcc==conn {
continue
}
pool=append(pool, conn)
}
c.len=c.len-1returnfalse
}
cc.conn=dialedfor_, waiter:=rangec.waiters {
close(waiter)
}
c.waiters=c.waiters[:0]
returntrue
}
func (c*Client) reportConnectionError(cc*ClientConn, errerror) bool {
c.mu.Lock()
deferc.mu.Unlock()
// If cc is not the last connection left, do not report// the error to the existing waiters.// Remove client connection entry from the pool.ifc.len>1 {
pool:=c.pool[:0]
for_, conn:=rangec.pool {
ifcc==conn {
continue
}
pool=append(pool, conn)
}
c.len=c.len-1returnfalse
}
for_, waiter:=rangec.waiters {
waiter<-errclose(waiter)
}
c.waiters=c.waiters[:0]
returntrue
}
func (c*Client) reportDisconnected(cc*ClientConn) bool {
c.mu.Lock()
deferc.mu.Unlock()
cc.conn=nilifc.closed!=nil||c.len>1 {
pool:=c.pool[:0]
for_, conn:=rangec.pool {
ifcc==conn {
continue
}
pool=append(pool, conn)
}
c.len=c.len-1returntrue
}
returnfalse
}
func (c*Client) EnsureConnectionAvailable(ctx context.Context) error {
waiter, err:=c.spawnOrWaitForAvailableConnection()
iferr!=nil {
returnerr
}
ifwaiter==nil {
returnnil
}
select {
case<-c.closed:
if!c.tryRemoveWaiter(waiter) {
returnnil
}
returncontext.Canceledcase<-ctx.Done():
if!c.tryRemoveWaiter(waiter) {
returnnil
}
returnctx.Err()
caseerr:=<-waiter:
returnerr
}
}
func (c*Client) Write(ctx context.Context, buf []byte) error {
iferr:=c.EnsureConnectionAvailable(ctx); err!=nil {
returnerr
}
select {
case<-c.closed:
returncontext.Canceledcase<-ctx.Done():
returnctx.Err()
casec.queue<-buf:
returnnil
}
}
funcmain() {
client:=NewClient("127.0.0.1:9000")
deferclient.Close()
fori:=0; i<100; i++ {
func() {
ctx, cancel:=context.WithTimeout(context.Background(), 10*time.Second)
defercancel()
iferr:=client.Write(ctx, append([]byte(strconv.FormatInt(int64(i), 10)), '\n')); err!=nil {
panic(err)
}
}()
}
}
The text was updated successfully, but these errors were encountered:
lithdew
changed the title
net: deadlock connecting to erroneous listener
net: deadlock connecting to listener that only supports accepting one client
May 26, 2021
lithdew
changed the title
net: deadlock connecting to listener that only supports accepting one client
net: deadlock connecting to erroneous listener
May 26, 2021
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Implement a pooled TCP client that concurrently writes messages across 4 individual TCP connections, and write 100 messages using the client to netcat on port 9000.
Depending on whether or not there are messages that have been queued and have yet to be written to the TCP clients' destination address, more than 1 TCP connection may be established and pooled to write the 100 messages.
What did you expect to see?
Netcat is known to only support accepting one incoming connection. So I expected the case that should more than 1 TCP connection be established while writing 100 messages to netcat,
net.DialContext()
would return an error.What did you see instead?
A deadlock on the connect() syscall (syscall.Connect()) itself instead.
To reproduce, run netcat on port 9000 via.
nc -l 9000
and run the program below. Debugging via delve. unravels the callstack showcasing the deadlock on syscall.Connect.The text was updated successfully, but these errors were encountered: