Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: panic unlock mutex from sync.Once #16762

Closed
cpuguy83 opened this issue Aug 17, 2016 · 4 comments
Closed

sync: panic unlock mutex from sync.Once #16762

cpuguy83 opened this issue Aug 17, 2016 · 4 comments
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@cpuguy83
Copy link

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
    1.5.4
  2. What operating system and processor architecture are you using (go env)?
    linux/amd64
  3. What did you do?
    Normal usage of Docker.
    Docker is (was) using a sync.Once in a function that handles DNS requests. Naturally this function can get called often and with a lot of concurrency depending on the user.
  4. What did you expect to see?
    sync.Once should never panic
  5. What did you see instead?
    During operation a user reported this panic:
Aug 17 15:14:26 docker-07 docker[27936]: panic: sync: unlock of unlocked mutex
Aug 17 15:14:26 docker-07 docker[27936]: goroutine 11703956 [running]:
Aug 17 15:14:26 docker-07 docker[27936]: sync.(*Mutex).Unlock(0xc82263da58)
Aug 17 15:14:26 docker-07 docker[27936]: /usr/local/go/src/sync/mutex.go:108 +0x9f
Aug 17 15:14:26 docker-07 docker[27936]: sync.(*Once).Do(0xc82263da58, 0xc8233e1d18)
Aug 17 15:14:26 docker-07 docker[27936]: /usr/local/go/src/sync/once.go:46 +0xe5
Aug 17 15:14:26 docker-07 docker[27936]: github.com/docker/libnetwork.(*resolver).ServeDNS(0xc82263da00, 0x7f65a87be938, 0xc82371afc0, 0xc82230ce10)
Aug 17 15:14:26 docker-07 docker[27936]: /root/rpmbuild/BUILD/docker-engine/vendor/src/github.com/docker/libnetwork/resolver.go:355 +0x812
Aug 17 15:14:26 docker-07 docker[27936]: github.com/miekg/dns.(*Server).serve(0xc820bd7520, 0x7f65b02bb988, 0xc821be9440, 0x7f65b0238948, 0xc82263da00, 0xc822
Aug 17 15:14:26 docker-07 docker[27936]: /root/rpmbuild/BUILD/docker-engine/vendor/src/github.com/miekg/dns/server.go:535 +0x7c1
Aug 17 15:14:26 docker-07 docker[27936]: created by github.com/miekg/dns.(*Server).serveUDP
Aug 17 15:14:26 docker-07 docker[27936]: /root/rpmbuild/BUILD/docker-engine/vendor/src/github.com/miekg/dns/server.go:489 +0x3d5
Aug 17 15:14:26 docker-07 docker[27936]: goroutine 1 [chan receive, 2996 minutes]:
Aug 17 15:14:26 docker-07 docker[27936]: main.(*DaemonCli).CmdDaemon(0xc8204258c0, 0xc82000a0c0, 0x8, 0x8, 0x0, 0x0)
Aug 17 15:14:26 docker-07 docker[27936]: /root/rpmbuild/BUILD/docker-engine/docker/daemon.go:337 +0x23e1
Aug 17 15:14:26 docker-07 docker[27936]: reflect.callMethod(0xc8204177a0, 0xc821703c78)
Aug 17 15:14:26 docker-07 docker[27936]: /usr/local/go/src/reflect/value.go:628 +0x1fc
Aug 17 15:14:26 docker-07 docker[27936]: reflect.methodValueCall(0xc82000a0c0, 0x8, 0x8, 0x1, 0xc8204177a0, 0x0, 0x0, 0xc8204177a0, 0x0, 0x47ccb4, ...)
Aug 17 15:14:26 docker-07 docker[27936]: /usr/local/go/src/reflect/asm_amd64.s:29 +0x36
Aug 17 15:14:26 docker-07 docker[27936]: github.com/docker/docker/cli.(*Cli).Run(0xc820417710, 0xc82000a0b0, 0x9, 0x9, 0x0, 0x0)
Aug 17 15:14:26 docker-07 docker[27936]: /root/rpmbuild/BUILD/docker-engine/.gopath/src/github.com/docker/docker/cli/cli.go:89 +0x383
Aug 17 15:14:26 docker-07 docker[27936]: main.main()
Aug 17 15:14:26 docker-07 docker[27936]: /root/rpmbuild/BUILD/docker-engine/docker/docker.go:63 +0x43c

Note this panic (unlock of unlocked mutex) is happening as a result of a call to once.Do(func() {...}) in the slow path where the once does a defer o.m.Unlock()

@bradfitz
Copy link
Contributor

Can you point us at the code, at the revision matching the stack trace above?

I suspect it's just a locking bug or data race in Docker because I've never seen this bug reported, and this code is exercised heavily by everybody.

Is this easily reproducible? Have you seen multiple reports of this?

@bradfitz bradfitz added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 17, 2016
@josharian josharian changed the title panic unlock mutex from sync.Once sync: panic unlock mutex from sync.Once Aug 17, 2016
@cpuguy83
Copy link
Author

Line is from here: https://github.com/docker/docker/blob/v1.11.2/vendor/src/github.com/docker/libnetwork/resolver.go#L355

AFAICT the panic is happening from the defer Unlock in sync.Once.
I cannot easily reproduce, this would be the first time I've seen such a panic.

@bradfitz
Copy link
Contributor

The code looks a little suspicious.

There's code in resolver.go to reset the same Once you're finding crashing:

func (r *resolver) FlushExtServers() {                                           
        for i := 0; i < maxExtDNS; i++ {                                         
                if r.extDNSList[i].extConn != nil {                              
                        r.extDNSList[i].extConn.Close()                          
                }                                                                

                r.extDNSList[i].extConn = nil                                    
                r.extDNSList[i].extOnce = sync.Once{}                            
        }                                                                        
}                                                                                

And that is part of the Resolver interface:

        // FlushExtServers clears the cached UDP connections to external                                                                                           
        // nameservers                                                                                                                                             
        FlushExtServers()                                                        

And is call by endpoint.go via:

func (ep *endpoint) Join(sbox Sandbox, options ...EndpointOption) error {

I'm going to close this bug for now. I think you need to do an audit of libnetwork's locking.

@j7b
Copy link

j7b commented Apr 1, 2017

I've made the mistake myself, it's not a data race and more about getting types wrong than locks wrong. A convoluted repro:

https://play.golang.org/p/xPULnrVKiY fatal error: sync: unlock of unlocked mutex

Do() takes the address of the Once, locks m, invokes f(), f changes what *Once is pointing at, Unlock panics because m isn't the m that was locked.

It would be nice if the race detector also looked for "static hazard" problems like this.

As go vet carps when you copy a mutex (containing) that's not a pointer it might be worthwhile to have it complain when a mutex (containing) that's not a pointer is assigned to by operator, as there's probably never a case where that expresses what's intended.

@golang golang locked and limited conversation to collaborators Apr 1, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

4 participants