Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: Uncatchable panic in net.Dial caused by dns failure #6232

Closed
gopherbot opened this issue Aug 23, 2013 · 15 comments
Closed

net: Uncatchable panic in net.Dial caused by dns failure #6232

gopherbot opened this issue Aug 23, 2013 · 15 comments
Milestone

Comments

@gopherbot
Copy link

by levtchenko:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x20 pc=0x49b075]

goroutine 93699344 [running]:
net.cgoLookupIPCNAME(0xc2ab607980, 0xa, 0x0, 0x0, 0x0, ...)
        net/_obj/_cgo_gotypes.go:188 +0x2a5
net.cgoLookupIP(0xc2ab607980, 0xa, 0x0, 0x0, 0x0, ...)
        net/_obj/_cgo_gotypes.go:228 +0x67
net.cgoLookupHost(0xc2ab607980, 0xa, 0x0, 0x0, 0x0, ...)
        net/_obj/_cgo_gotypes.go:106 +0x79
net.lookupHost(0xc2ab607980, 0xa, 0x0, 0x0, 0x0, ...)
        /usr/lib/go/src/pkg/net/lookup_unix.go:56 +0x61
net.func·019()
        /usr/lib/go/src/pkg/net/lookup.go:42 +0x34
created by net.lookupHostDeadline
        /usr/lib/go/src/pkg/net/lookup.go:44 +0x22f

go version go1.1.2 linux/amd64

Linux 3.3.1-gentoo #22 SMP Mon Apr 8 00:22:25 MSK 2013 x86_64 Intel(R) Xeon(R) CPU
E3-1230 V2 @ 3.30GHz GenuineIntel GNU/Linux
@robpike
Copy link
Contributor

robpike commented Aug 24, 2013

Comment 2:

Can you provide a complete example to reproduce the failure?

Labels changed: added priority-later, removed priority-triage.

Status changed to Accepted.

@gopherbot
Copy link
Author

Comment 3 by tomheinan:

Just chiming in to note that I've just run into this as well. My app spins up several
thousand goroutines, each of which does a DNS lookup every ten seconds or so, and the
app is reliably panicking after a few minutes of uptime.
go version go1.1.2 linux/amd64
Linux 3.9.3-x86_64-linode33 #1 SMP Mon May 20 10:22:57 EDT 2013 x86_64 x86_64 x86_64
GNU/Linux
The error in question:
    goroutine 63910 [running]:
    net.cgoLookupIPCNAME(0xc2006a65a0, 0x12, 0x0, 0x0, 0x0, ...)
        net/_obj/_cgo_gotypes.go:188 +0x2a5
    net.cgoLookupIP(0xc2006a65a0, 0x12, 0x0, 0x0, 0x0, ...)
        net/_obj/_cgo_gotypes.go:228 +0x67
    net.cgoLookupHost(0xc2006a65a0, 0x12, 0x0, 0x0, 0x0, ...)
        net/_obj/_cgo_gotypes.go:106 +0x79
    net.lookupHost(0xc2006a65a0, 0x12, 0x0, 0x0, 0x0, ...)
        /usr/local/go/src/pkg/net/lookup_unix.go:56 +0x61
    net.func·019()
        /usr/local/go/src/pkg/net/lookup.go:42 +0x34
    created by net.lookupHostDeadline
        /usr/local/go/src/pkg/net/lookup.go:44 +0x22f
The actual code that's doing the DNS lookup:
    // parse the server into host/port
    host, port, err := net.SplitHostPort(server)
    if err != nil {
        // we weren't given a port; try to find one via dns
        _, addrs, srvErr := net.LookupSRV("minecraft", "udp", server)
        if srvErr != nil {
            _, addrs, srvErr = net.LookupSRV("minecraft", "tcp", server)
        }
    
        if srvErr != nil {
            host = server
            port = "25565"
        } else {
            addr := addrs[0]
            host = strings.TrimRight(addr.Target, ".")
            port = strconv.FormatInt(int64(addr.Port), 10)
        }
    }
    
    conn, err := net.DialTimeout("udp", net.JoinHostPort(host, port), 3 * time.Second)
    if err != nil {
        return nil, err
    }
    .. etc ..
I ran into this this evening while doing a server migration - I thought perhaps it was
due to using a newer version of go, so I downgraded back to 1.1, but it's still crashing
after a few minutes, so it doesn't look like that's the issue.

@mikioh
Copy link
Contributor

mikioh commented Sep 6, 2013

Comment 4:

Can you please try the image that is built with CGO_ENABLED=0 if possible.
It looks like CGO lookup stuff in Dial dose something wrong.

@gopherbot
Copy link
Author

Comment 5 by tomheinan:

I wiped out the /bin and /pkg directories and rebuilt the app with CGO_ENABLED=0, but it
still seems to be failing on the cgo lookup. Is there some other flag I need to set for
the compiler to obey the CGO_ENABLED flag?
In the meanwhile, I'll try cross-compiling locally and see if that makes any difference.

@davecheney
Copy link
Contributor

Comment 6:

Please try to produce a code sample which reproduces the problem.
To build go from source
1. ensure you have removed every version of Go you may have on your system
2. hg clone -r release https://code.google.com/p/go 
3. export CGO_ENABLED=0 
4. cd go/src
5. ./make.bash
6. ensure go/bin is in your path.

Status changed to WaitingForReply.

@gopherbot
Copy link
Author

Comment 7 by arnaud.lb:

I can reproduce this when the process exceeds the opened files limit:
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x20 pc=0x4457c5]
goroutine 21 [running]:
net.cgoLookupIPCNAME(0x505db0, 0xb, 0x0, 0x0, 0x0, ...)
        net/_obj/_cgo_gotypes.go:188 +0x2a5
net.cgoLookupIP(0x505db0, 0xb, 0x0, 0x0, 0x0, ...)
        net/_obj/_cgo_gotypes.go:228 +0x67
net.cgoLookupHost(0x505db0, 0xb, 0x0, 0x0, 0x0, ...)
        net/_obj/_cgo_gotypes.go:106 +0x79
net.lookupHost(0x505db0, 0xb, 0x0, 0x0, 0x0, ...)
        /usr/local/go/src/pkg/net/lookup_unix.go:56 +0x61
net.lookupHostDeadline(0x505db0, 0xb, 0x0, 0x0, 0x0, ...)
        /usr/local/go/src/pkg/net/lookup.go:19 +0xd3
net.resolveInternetAddr(0x4ffb00, 0x2, 0x505db0, 0xb, 0x0, ...)
        /usr/local/go/src/pkg/net/ipsock.go:210 +0x405
net.ResolveIPAddr(0x4ffb00, 0x2, 0x505db0, 0xb, 0x0, ...)
        /usr/local/go/src/pkg/net/iprawsock.go:42 +0x14d
main.func·001()
The attached bug.go reproduces this.
Without CGO, it doesn't crash.

Attachments:

  1. bug.go (407 bytes)

@gopherbot
Copy link
Author

Comment 8 by arnaud.lb:

Apparently getaddrinfo() can return EAI_SYSTEM while leaving errno to 0
So err is sometimes nil here, even if gerrno is non-zero: (in net/cgo_unix.go)
    gerrno, err := C.getaddrinfo(h, nil, &hints, &res)                          
If err is nil, and gerrno is EAI_SYSTEM, there is a nil pointer deref:
    if gerrno != 0 {                                                            
        var str string                                                          
        if gerrno == C.EAI_NONAME {                                             
            str = noSuchHost                                                    
        } else if gerrno == C.EAI_SYSTEM {                                      
            str = err.Error()

@gopherbot
Copy link
Author

Comment 9 by levtchenko:

It's look like it is a bug in glibc that getaddrinfo is not thread safe.
http://sourceware.org/bugzilla/show_bug.cgi?id=13271

@ianlancetaylor
Copy link
Contributor

Comment 10:

According to the bug, getaddrinfo is only not thread-safe if your program changes the
environment while concurrently calling getaddrinfo; does your program do that?
Even then I don't see how the reported results (gerrno == EAI_SYSTEM && errno == 0)
would occur.

@ianlancetaylor
Copy link
Contributor

Comment 11:

arnaud.lb: can you reproduce those results with tip so that we get a good file/line for
the panic?
Also, what system are you running on?

@gopherbot
Copy link
Author

Comment 12 by arnaud.lb:

I'm running this on a debian box, libc6 package 2.17-92
The previously attached bug.go doesn't reproduce on tip; i've attached an other one
which works on tip.
Stacktrace:
panic: runtime error: invalid memory address or nil pointer dereference         
[signal 0xb code=0x1 addr=0x0 pc=0x451270]                                      
                                                                                
goroutine 59 [running]:                                                         
net.cgoLookupIPCNAME(0x51ee30, 0xb, 0x0, 0x0, 0x0, ...)                         
        /home/arnaud/dev/go-hg/src/pkg/net/cgo_unix.go:102 +0x380               
net.cgoLookupIP(0x51ee30, 0xb, 0x0, 0x0, 0x0, ...)                              
        /home/arnaud/dev/go-hg/src/pkg/net/cgo_unix.go:138 +0x9c                
net.lookupIP(0x51ee30, 0xb, 0x0, 0x0, 0x0, ...)                                 
        /home/arnaud/dev/go-hg/src/pkg/net/lookup_unix.go:64 +0x90              
net.func·019(0x0, 0x0, 0x0, 0x0)                                                
        /home/arnaud/dev/go-hg/src/pkg/net/lookup.go:41 +0x5a                   
net.(*singleflight).Do(0x62aeb0, 0x51ee30, 0xb, 0x7f9d7ea3ed70, 0x0, ...)       
        /home/arnaud/dev/go-hg/src/pkg/net/singleflight.go:45 +0x273            
net.lookupIPMerge(0x51ee30, 0xb, 0x0, 0x0, 0x0, ...)                            
        /home/arnaud/dev/go-hg/src/pkg/net/lookup.go:42 +0x11a                  
net.lookupIPDeadline(0x51ee30, 0xb, 0x0, 0x0, 0x0, ...)                         
        /home/arnaud/dev/go-hg/src/pkg/net/lookup.go:57 +0x11e                  
net.resolveInternetAddr(0x5186a0, 0x2, 0x51ee30, 0xb, 0x0, ...)                 
        /home/arnaud/dev/go-hg/src/pkg/net/ipsock.go:285 +0x3ff                 
net.ResolveIPAddr(0x5186a0, 0x2, 0x51ee30, 0xb, 0x0, ...)                       
        /home/arnaud/dev/go-hg/src/pkg/net/iprawsock.go:49 +0x17b               
main.func·001()                                                                 
        /home/arnaud/dev/go/src/bug6232/bug.go:17 +0x6a                         
created by main.lookup                                                          
        /home/arnaud/dev/go/src/bug6232/bug.go:20 +0xba
Attached bug.c confirms that getaddrinfo() sometimes leaves errno to 0 when it returns
an error.

Attachments:

  1. bug.go (439 bytes)
  2. bug.c (804 bytes)

@mikioh
Copy link
Contributor

mikioh commented Sep 6, 2013

Comment 13:

Labels changed: added go1.2, removed priority-later.

Status changed to Accepted.

@rsc
Copy link
Contributor

rsc commented Sep 11, 2013

Comment 14:

https://golang.org/cl/13532045

Status changed to Started.

@rsc
Copy link
Contributor

rsc commented Sep 11, 2013

Comment 15:

This issue was closed by revision 382738a.

Status changed to Fixed.

@gopherbot
Copy link
Author

Comment 16 by tomheinan:

Fix looks good on my end; I've been running it for about half an hour and it's working
normally. Thanks for your hard work, it's much appreciated!

@rsc rsc added this to the Go1.2 milestone Apr 14, 2015
@rsc rsc removed the go1.2 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants