Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: TestFilePacketConn fails on Scaleway #10730

Closed
bradfitz opened this issue May 6, 2015 · 15 comments
Closed

net: TestFilePacketConn fails on Scaleway #10730

bradfitz opened this issue May 6, 2015 · 15 comments
Labels
FrozenDueToAge Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented May 6, 2015

On a Scaleway ARM host (where we're trying to move the ARM builders), the net package fails with:

--- FAIL: TestFilePacketConn (0.00s)
        file_test.go:113: write ip 127.0.0.1->127.0.0.1: sendto: bad address

Debug:

root@scw-105acb:~/go/src# cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.10
DISTRIB_CODENAME=utopic
DISTRIB_DESCRIPTION="Ubuntu 14.10"
root@scw-105acb:~/go/src# ifconfig 
docker0   Link encap:Ethernet  HWaddr 56:84:7a:fe:97:99  
          inet addr:172.17.42.1  Bcast:0.0.0.0  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth0      Link encap:Ethernet  HWaddr 00:07:cb:03:76:44  
          inet addr:10.1.34.160  Bcast:10.1.35.255  Mask:255.255.254.0
          inet6 addr: fe80::207:cbff:fe03:7644/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:357998 errors:0 dropped:0 overruns:0 frame:0
          TX packets:108129 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:532 
          RX bytes:352772865 (352.7 MB)  TX bytes:2078718437 (2.0 GB)
          Interrupt:24 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:20563 errors:0 dropped:0 overruns:0 frame:0
          TX packets:20563 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:66891220 (66.8 MB)  TX bytes:66891220 (66.8 MB)

root@scw-105acb:~/go/src# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.1.34.1       0.0.0.0         UG    0      0        0 eth0
10.1.34.0       0.0.0.0         255.255.254.0   U     0      0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0

Note that this machine has a Docker daemon running, but I'm not yet running the build inside a container. This failure was from running on the host machine, as part of evaluating the speed of these machines.

/cc @mikioh, @davecheney, @crawshaw, @adg

@bradfitz bradfitz added this to the Go1.5 milestone May 6, 2015
@adg adg changed the title net: net: TestFilePacketConn fails on Scaleway May 6, 2015
@bradfitz
Copy link
Contributor Author

bradfitz commented May 6, 2015

And the strace:

[pid 15756] socket(PF_INET, SOCK_RAW|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_ICMP) = 3
[pid 15756] setsockopt(3, SOL_SOCKET, SO_BROADCAST, [1], 4) = 0
[pid 15756] bind(3, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
[pid 15756] epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=3058795512, u64=3058795512}}) = 0
[pid 15756] getsockname(3, {sa_family=AF_INET, sin_port=htons(1), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
[pid 15756] getpeername(3, 0x10649bac, [112]) = -1 ENOTCONN (Transport endpoint is not connected)
[pid 15756] fcntl(3, F_DUPFD_CLOEXEC, 0) = 5
[pid 15756] fcntl(5, F_GETFL)           = 0x802 (flags O_RDWR|O_NONBLOCK)
[pid 15756] fcntl(5, F_SETFL, O_RDWR)   = 0
[pid 15756] fcntl(5, F_DUPFD_CLOEXEC, 0) = 6
[pid 15756] fcntl(6, F_GETFL)           = 0x2 (flags O_RDWR)
[pid 15756] fcntl(6, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 15756] getsockopt(6, SOL_SOCKET, SO_TYPE, [3], [4]) = 0
[pid 15756] getsockname(6, {sa_family=AF_INET, sin_port=htons(1), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
[pid 15756] getpeername(6, 0x10649bdc, [112]) = -1 ENOTCONN (Transport endpoint is not connected)
[pid 15756] epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=3058795392, u64=3058795392}}) = 0
[pid 15756] sendto(6, "", 0, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EFAULT (Bad address)
[pid 15756] clock_gettime(CLOCK_REALTIME, {1430954968, 213392054}) = 0
[pid 15756] write(1, "--- FAIL: TestFilePacketConn (0."..., 529--- FAIL: TestFilePacketConn (0.04s)
        file_test.go:113: write ip 127.0.0.1->127.0.0.1: sendto: bad address
) = 529
[pid 15756] write(1, "FAIL\n", 5FAIL
)       = 5
[pid 15756] close(3)                    = 0
[pid 15756] exit_group(1)               = ?
[pid 15758] +++ exited with 1 +++
[pid 15757] +++ exited with 1 +++
+++ exited with 1 +++

@bradfitz
Copy link
Contributor Author

bradfitz commented May 6, 2015

The sendto EFAULT is seems wrong.

       EFAULT An invalid user space address was specified for an argument.

The man page says:

       ssize_t sendto(int sockfd, const void *buf, size_t len, int flags,
                      const struct sockaddr *dest_addr, socklen_t addrlen);

Is a NULL buf *void okay, even with len 0?

@bradfitz
Copy link
Contributor Author

bradfitz commented May 7, 2015

Actually, the syscall package already tries hard to avoid a NULL *void:

// Single-word zero for use when we need a valid pointer to 0 bytes.                                                
// See mksyscall.pl.                                                                                                
var _zero uintptr

func sendto(s int, buf []byte, flags int, to unsafe.Pointer, addrlen _Socklen) (err error) {
        var _p0 unsafe.Pointer
        if len(buf) > 0 {
                _p0 = unsafe.Pointer(&buf[0])
        } else {
                _p0 = unsafe.Pointer(&_zero)
        }
        _, _, e1 := Syscall6(SYS_SENDTO, uintptr(s), uintptr(_p0), uintptr(len(buf)), uintptr(flags), uintptr(to), uintptr(addrlen))
        if e1 != 0 {
                err = errnoErr(e1)
        }
        return
}

... yet &_zero (which should be non-nil) ends up as zero according to the strace.

Is Syscall6 doing the right thing?

This machine FWIW has 4 of these:

# cat /proc/cpuinfo 
processor       : 0
model name      : ARMv7 Processor rev 2 (v7l)
BogoMIPS        : 1332.01
Features        : half thumb fastmult vfp edsp thumbee vfpv3 tls idiva idivt vfpd32 lpae 
CPU implementer : 0x56
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0x584
CPU revision    : 2

/cc @minux @ianlancetaylor @rsc @davecheney @josharian

@mikioh
Copy link
Contributor

mikioh commented May 7, 2015

Maybe dup of #7299? (correction s/7229/7299)

@bradfitz
Copy link
Contributor Author

bradfitz commented May 7, 2015

No, I just can't read. The buf pointer is indeed non-zero. I was off by one reading all the empty values. And strace in raw mode (as well as some printlns in the syscall package) confirms:

[pid 16133] write(2, "sendto zero ", 12sendto zero ) = 12
[pid 16133] write(2, " ", 1 )            = 1
[pid 16133] write(2, "0x3af964", 80x3af964)     = 8
[pid 16133] write(2, " ", 1 )            = 1
[pid 16133] write(2, "0x3af964", 80x3af964)     = 8
[pid 16133] write(2, "\n", 1
)           = 1
[pid 16133] write(2, "sendto ", 7sendto )      = 7
[pid 16133] write(2, " ", 1 )            = 1
[pid 16133] write(2, "290", 3290)          = 3
[pid 16133] write(2, " ", 1 )            = 1
[pid 16133] write(2, "6", 16)            = 1
[pid 16133] write(2, " ", 1 )            = 1
[pid 16133] write(2, "3864932", 73864932)      = 7
[pid 16133] write(2, " ", 1 )            = 1
[pid 16133] write(2, "0", 10)            = 1
[pid 16133] write(2, " ", 1 )            = 1
[pid 16133] write(2, "0", 10)            = 1
[pid 16133] write(2, " ", 1 )            = 1
[pid 16133] write(2, "275330280", 9275330280)    = 9
[pid 16133] write(2, " ", 1 )            = 1
[pid 16133] write(2, "16", 216)           = 2
[pid 16133] write(2, "\n", 1
)           = 1
[pid 16133] sendto(0x6, 0x3af964, 0, 0, 0x106934e8, 0x10) = -1 (errno 14)

So it's only len and flags which are zero.

Still no clue about the EFAULT, though.

@minux
Copy link
Member

minux commented May 7, 2015 via email

@mikioh
Copy link
Contributor

mikioh commented May 7, 2015

Ah, if the error you are seeing is only

write ip 127.0.0.1->127.0.0.1: sendto: bad address

I'll take this issue. Seems like it just happens in the top/middle-half of ICMP stack.

@bradfitz
Copy link
Contributor Author

bradfitz commented May 7, 2015

Not sure what that means but happy for a fix. (ICMP has three halves? :))

@mikioh
Copy link
Contributor

mikioh commented May 7, 2015

As a matter of convenience, I usually think that it consists of socket-interface adaptation layer (or service access point layer), protocol layer and transport (in this case IP) adaptation layer. I believe that the root cause of this issue is just passing a corrupted ICMP packet to the kernel. Certainly the 4-year-old test cases need to be updated for the recent restricted kernels.

In addition, from Go 1.5, the full stack test cases for IPConn have been moved to the following:
golang.org/x/net/ipv4
golang.org/x/net/ipv6
golang.org/x/net/icmp

I'm happy if buildbots can support to run tests in x/net with administrator privilege eventually.

@mikioh mikioh self-assigned this May 7, 2015
@bradfitz
Copy link
Contributor Author

bradfitz commented May 7, 2015

I'm going to just delete that test for now, then. You can re-enable it later when you identify how the test is broken.

@bradfitz
Copy link
Contributor Author

bradfitz commented May 7, 2015

Kernel is 3.19.1-181, FWIW.

bradfitz added a commit that referenced this issue May 7, 2015
To be fixed later.

Updates #10730

Change-Id: Icac19f48c9e035dce192c97943b77b60411a3ea2
Reviewed-on: https://go-review.googlesource.com/9797
Reviewed-by: Mikio Hara <mikioh.mikioh@gmail.com>
@mikioh mikioh added the Testing An issue that has been verified to require only test changes, not just a test failure. label May 8, 2015
@moul
Copy link

moul commented May 11, 2015

Subscribing, I'm from the Scaleway team

@gopherbot
Copy link

CL https://golang.org/cl/10090 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/10134 mentions this issue.

@mikioh mikioh modified the milestones: Go1.6, Go1.5 May 21, 2015
gopherbot pushed a commit to golang/net that referenced this issue May 22, 2015
This change splits the existing ping test into non-privileged and
privileged tests to cover IPConn full stack test on behalf of the
standard library.

Updates golang/go#10730.

Change-Id: I5d2e00c0b42b857045414eb8e0efca393967742e
Reviewed-on: https://go-review.googlesource.com/10090
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@mikioh mikioh removed their assignment Sep 23, 2015
@gopherbot
Copy link

CL https://golang.org/cl/17476 mentions this issue.

@mikioh mikioh closed this as completed in 6a1c2a5 Dec 14, 2015
@golang golang locked and limited conversation to collaborators Dec 14, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

5 participants