Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: TestUnixAndUnixpacketServer failed with EOF #42720

Closed
mengzhuo opened this issue Nov 19, 2020 · 12 comments
Closed

net: TestUnixAndUnixpacketServer failed with EOF #42720

mengzhuo opened this issue Nov 19, 2020 · 12 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@mengzhuo
Copy link
Contributor

mengzhuo commented Nov 19, 2020

What version of Go are you using (go version)?

$ go version 
59202c4204205ee019cf25efcb814b72c8703f4b

Does this issue reproduce with the latest release?

Yes.

What did you do?

The host-linux-mipsle-mengzhuo has this bug occasionally.

--- FAIL: TestUnixAndUnixpacketServer (0.00s)
    server_test.go:205: #2: EOF
FAIL
FAIL	net	23.436s
ok  	net/http	21.584s
ok  	net/http/cgi	1.146s
FAIL
go tool dist: Failed: exit status 1

https://storage.googleapis.com/go-build-log/11438245/linux-386_67c9929f.log
https://build.golang.org/log/c24b96e2c6ffdf46d29f6f588b9909c44e9cb3fe linux-mips64le
https://build.golang.org/log/01d39e71c813268a45b4d2923727357b457d1e6f linux-amd64
https://build.golang.org/log/1631a4e22febcfb4ddf941537973b681cf4e12be linux-ppc64
https://build.golang.org/log/6ff5f40e8dc400629c17c6ef7a81207fa03b8dff linux-ppc64
https://build.golang.org/log/73a9b496727ef70e96c534c52d252c7a853f017f linux-386
https://build.golang.org/log/895e752f114ea5e0b3e13598c5615957274122a6 linux-s390x
https://build.golang.org/log/9dd1bfe89dc17fed5444ddf41f89b8e74ac6e1f6 linux-amd64-race
https://build.golang.org/log/bd3fa7c299bbe966ccac50fcc5e24239ef85dc8a linux-386
https://build.golang.org/log/c0d0b0d5ba109f2be4b7e42469e2073b9fc48c20 linux-386-387
https://build.golang.org/log/faa4560674d8e7d406b7031bfe985db237b9860a linux-386-clang

What did you expect to see?

PASS

CC @cherrymui

@mengzhuo
Copy link
Contributor Author

However after enable race on mips64, I found that it's a internal/poll race issue.

==================                                               
WARNING: DATA RACE                                               
Write at 0xc000154098 by goroutine 44:                           
  internal/poll.(*pollDesc).close()                              
      /root/godev/src/internal/poll/fd_poll_runtime.go:56 +0x14c 
  internal/poll.(*FD).destroy()                                  
      /root/godev/src/internal/poll/fd_unix.go:76 +0x34          
  internal/poll.(*FD).readUnlock()                               
      /root/godev/src/internal/poll/fd_mutex.go:232 +0x7c        
  internal/poll.(*FD).Accept()                                   
      /root/godev/src/internal/poll/fd_unix.go:411 +0x204        
  net.(*netFD).accept()                                          
      /root/godev/src/net/fd_unix.go:172 +0x44                   
  net.(*TCPListener).accept()                                    
      /root/godev/src/net/tcpsock_posix.go:139 +0x54             
  net.(*TCPListener).Accept()                                    
      /root/godev/src/net/tcpsock.go:261 +0xa0                   
  net.transponder()                                              
      /root/godev/src/net/mockserver_test.go:216 +0x110          
  net.TestTCPConnSpecificMethods.func1()                         
      /root/godev/src/net/protoconn_test.go:75 +0x78             
  net.(*localServer).buildup.func1()                             
      /root/godev/src/net/mockserver_test.go:95 +0x6c            
                                                                 
Previous read at 0xc000154098 by goroutine 58:                   
  internal/poll.(*pollDesc).evict()                              
      /root/godev/src/internal/poll/fd_poll_runtime.go:61 +0x64  
  internal/poll.(*FD).Close()                                    
      /root/godev/src/internal/poll/fd_unix.go:102 +0x124        
  net.(*netFD).Close()                                           
      /root/godev/src/net/fd_posix.go:37 +0x68
  net.(*TCPListener).close()
      /root/godev/src/net/tcpsock_posix.go:156 +0xac
  net.(*TCPListener).Close()
      /root/godev/src/net/tcpsock.go:274 +0xbc
  net.(*localServer).teardown()
      /root/godev/src/net/mockserver_test.go:106 +0x168          
  runtime.call32()
      /root/godev/src/runtime/asm_mips64x.s:373 +0x8c                 
  net/internal/socktest.(*Switch).sockso()
      /root/godev/src/net/internal/socktest/switch_unix.go:13 +0x50   
  net/internal/socktest.(*Switch).Connect()
      /root/godev/src/net/internal/socktest/sys_unix.go:75 +0x50      
  net/internal/socktest.(*Switch).Connect-fm()
      /root/godev/src/net/internal/socktest/sys_unix.go:74 +0x80      
  net.(*netFD).connect()
      /root/godev/src/net/fd_unix.go:59 +0xb0
  net.(*netFD).dial()
      /root/godev/src/net/sock_posix.go:149 +0xf4
  net.socket()
      /root/godev/src/net/sock_posix.go:70 +0x248
  net.internetSocket()
      /root/godev/src/net/ipsock_posix.go:141 +0xe8
  net.(*sysDialer).doDialTCP()
      /root/godev/src/net/tcpsock_posix.go:65 +0xf0
  net.(*sysDialer).dialTCP()
      /root/godev/src/net/tcpsock_posix.go:61 +0x154
  net.DialTCP()
      /root/godev/src/net/tcpsock.go:216 +0x370
  net.TestTCPConnSpecificMethods()
      /root/godev/src/net/protoconn_test.go:89 +0x518                 
  testing.tRunner()
      /root/godev/src/testing/testing.go:1173 +0x1b0

Goroutine 44 (running) created at:
  net.(*localServer).buildup()
      /root/godev/src/net/mockserver_test.go:94 +0x5c                 
  net.TestTCPConnSpecificMethods()
      /root/godev/src/net/protoconn_test.go:81 +0x390                 
  testing.tRunner()
      /root/godev/src/testing/testing.go:1173 +0x1b0

Goroutine 58 (running) created at:
  testing.(*T).Run()
      /root/godev/src/testing/testing.go:1218 +0x4fc
  testing.runTests.func1()
      /root/godev/src/testing/testing.go:1491 +0xac
  testing.tRunner()
      /root/godev/src/testing/testing.go:1173 +0x1b0
  testing.runTests()
      /root/godev/src/testing/testing.go:1489 +0x624
  testing.(*M).Run()
      /root/godev/src/testing/testing.go:1397 +0x360
  net.TestMain()
      /root/godev/src/net/main_test.go:52 +0x4c
  main.main()
      _testmain.go:589 +0x288
==================

@gopherbot
Copy link

Change https://golang.org/cl/271500 mentions this issue: internal/poll: atomic pd.runtimeCtx

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Nov 19, 2020

The race detector should be architecture-independent. Why is this race reported on mips64 and not on other targets?

@mengzhuo
Copy link
Contributor Author

Hi Ian
Thanks for your review, @zhangfannie found that this issue might affect linux-386 too.

https://storage.googleapis.com/go-build-log/11438245/linux-386_67c9929f.log

@ianlancetaylor
Copy link
Contributor

Thanks. I don't doubt that there is a bug, I just don't yet see why https://golang.org/cl/271500 is the right fix.

@mengzhuo
Copy link
Contributor Author

Thanks. I don't doubt that there is a bug, I just don't yet see why https://golang.org/cl/271500 is the right fix.

I will dig deeper for the root cause, please ignore that CL for now.

@mengzhuo
Copy link
Contributor Author

Hi, Ian
Is it possible that rwunlock isn't a barrier as it should be?
I've draw a flow chart about what is going on IMHO.
rwunlock_block?

@ianlancetaylor
Copy link
Contributor

Nice diagram.

The rwunlock method can only return after a successful call to atomic.CompareAndSwapUint64. As far as I can see, that has to be a memory barrier. I agree that if that is not a memory barrier then something might break. In that case, what we need to do is fix atomic.CompareAndSwapUint64.

The implementation of CompareAndSwapUint64 in sync/atomic calls Cas64 in runtime/internal/atomic, and (for mips64) that function uses a SYNC instruction. So although I don't know the mips64 instruction set very well, that looks like a memory barrier.

It's more complicated because these functions are implemented directly in the compiler. But I also see a SYNC instruction there, in cmd/compile/internal/mips64/ssa.go.

So as far as I can tell, it really is a memory barrier.

@mengzhuo
Copy link
Contributor Author

@mengzhuo
Copy link
Contributor Author

I've figured it out, it's the transponder in mockserver_test.go closed the socket while tranceiver still waiting for the data.
Idealy write to the connection is after close, as diagram below shows.

+-------------+      +---------------+
| transceiver |      |  transponder  |
+-------+-----+      +-------+-------+
        |                    |        +----------+
        |                    +--------+  Accept  |
        |                    |        +----------+
        +------Dial--------->+        +-----------+
        |                    +--------+  NewConn  |
        +------Write-------->+        +-----------+
        |                    |        +---------------+
        +<-----Write--------XXX-------+  defer Close  |
        |                    |        +---------------+
        +<-----Close---------+
        |                    |

However, There is no channel or mutexes to block server from defer Closing. When scheduler runs Close first which will cause client receive EOF.

func transponder(ln Listener, ch chan<- error) {
defer close(ch)
switch ln := ln.(type) {
case *TCPListener:
ln.SetDeadline(time.Now().Add(someTimeout))
case *UnixListener:
ln.SetDeadline(time.Now().Add(someTimeout))
}
c, err := ln.Accept()
if err != nil {
if perr := parseAcceptError(err); perr != nil {
ch <- perr
}
ch <- err
return
}
defer c.Close()

+-------------+      +---------------+
| transceiver |      |  transponder  |
+-------+-----+      +-------+-------+
        |                    |        +----------+
        |                    +--------+  Accept  |
        |                    |        +----------+
        +------Dial--------->+        +-----------+
        |                    +--------+  NewConn  |
        +------Write-------->+        +-----------+
        |                    |        +---------------+
        +<-----Close--------XXX-------+  defer Close  |
        |                    |        +---------------+
        +<-----Write---------+
        |                    |

@gopherbot
Copy link

Change https://golang.org/cl/273672 mentions this issue: net: close connection in localServer teardown

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 2, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Dec 2, 2020
@mengzhuo
Copy link
Contributor Author

Another failure: https://build.golang.org/log/f083831b16b42b31bbe3fc6855e1966d65c68374
@bradfitz Could you review the CL? Thanks.

@golang golang locked and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants