Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gccgo, runtime: panic with checkdead: find g 19 in status 1 #11406

Closed
laboger opened this issue Jun 25, 2015 · 5 comments
Closed

gccgo, runtime: panic with checkdead: find g 19 in status 1 #11406

laboger opened this issue Jun 25, 2015 · 5 comments
Milestone

Comments

@laboger
Copy link
Contributor

laboger commented Jun 25, 2015

Using gccgo from latest gcc5 branch. Building Docker from their master branch on ppc64le trying to provide continuous integration. This had been working without issue up until sometime in the past 2 weeks, where there is a failure during the Docker build. This same failure occurs when running one of the Docker unit tests after we've built Docker this way.

Failure occurs when building on Ubuntu 14.04 and 14.10 ppc64le.

Here is and example of the stack information for the Docker build failure:

runtime: checkdead: find g 19 in status 1
fatal error: checkdead: runnable g

runtime stack:
runtime_dopanic
../../../src/libgo/runtime/panic.c:135
runtime_throw
../../../src/libgo/runtime/panic.c:193
checkdead
../../../src/libgo/runtime/proc.c:2866
mput
../../../src/libgo/runtime/proc.c:3113
stopm
../../../src/libgo/runtime/proc.c:1456
findrunnable
../../../src/libgo/runtime/proc.c:1764
schedule
../../../src/libgo/runtime/proc.c:1847
park0
../../../src/libgo/runtime/proc.c:1912
runtime_mstart
../../../src/libgo/runtime/proc.c:1076

goroutine 16 [chan receive]:
main.mainDaemon
/go/src/github.com/docker/docker/docker/daemon.go:186
main.main
/go/src/github.com/docker/docker/docker/docker.go:99
created by main
../../../src/libgo/runtime/go-main.c:48

goroutine 18 [finalizer wait]:
created by runtime_createfing
../../../src/libgo/runtime/mgc0.c:2572

goroutine 19 [runnable]:
syscall.Exitsyscall
../../../src/libgo/runtime/proc.c:2122
os_signal.signal_recv
../../../src/libgo/runtime/sigqueue.goc:110
signal.loop
../../../src/libgo/go/os/signal/signal_unix.go:21
created by os_signal..import
../../../src/libgo/go/os/signal/signal_unix.go:25

goroutine 25 [chan receive]:
github_com_docker_docker_api_server.ServeApi.pN42_github_com_docker_docker_api_server.Server
/go/src/github.com/docker/docker/api/server/server.go:110
main.$nested2
/go/src/github.com/docker/docker/docker/daemon.go:136
created by main.mainDaemon
/go/src/github.com/docker/docker/docker/daemon.go:135

goroutine 26 [chan receive]:
daemon.$nested20
/go/src/github.com/docker/docker/daemon/debugtrap.go:17
created by daemon.setupSigusr1Trap
/go/src/github.com/docker/docker/daemon/debugtrap.go:16

goroutine 63 [chan receive]:
database_sql.connectionOpener.pN15_database_sql.DB
../../../src/libgo/go/database/sql/sql.go:589

We can also generate a failure in one of the unit tests with the same message, and the stack trace is not quite the same:

  • go test -test.timeout=180m github.com/docker/docker/pkg/archive
    runtime: checkdead: find g 20 in status 1
    fatal error: checkdead: runnable g

runtime stack:
runtime_dopanic
../../../src/libgo/runtime/panic.c:135
runtime_throw
../../../src/libgo/runtime/panic.c:193
checkdead
../../../src/libgo/runtime/proc.c:2866
mput
../../../src/libgo/runtime/proc.c:3113
stopm
../../../src/libgo/runtime/proc.c:1456
exitsyscall0
../../../src/libgo/runtime/proc.c:2205
runtime_mstart
../../../src/libgo/runtime/proc.c:1076

goroutine 16 [chan receive]:
testing.RunTests
../../../src/libgo/go/testing/testing.go:556
testing.Run.pN9_testing.M
../../../src/libgo/go/testing/testing.go:485
main.main
/tmp/go-build271522865/github.com/docker/docker/pkg/archive/_test/_testmain.go:180
created by main
../../../src/libgo/runtime/go-main.c:48

goroutine 18 [finalizer wait]:
created by runtime_createfing
../../../src/libgo/runtime/mgc0.c:2572

goroutine 73 [runnable]:
syscall.Exitsyscall
../../../src/libgo/runtime/proc.c:2122
syscall.Getdents
../../../src/libgo/go/syscall/libcall_linux.go:236
syscall.ReadDirent
../../../src/libgo/go/syscall/libcall_linux.go:250
archive.readdirnames
/go/src/github.com/docker/docker/pkg/archive/changes_linux.go:236
github_com_docker_docker_pkg_archive.walk.pN43_github_com_docker_docker_pkg_archive.walker
/go/src/github.com/docker/docker/pkg/archive/changes_linux.go:120
archive.collectFileInfoForChanges
/go/src/github.com/docker/docker/pkg/archive/changes_linux.go:53
github_com_docker_docker_pkg_archive.ChangesDirs
/go/src/github.com/docker/docker/pkg/archive/changes.go:309
archive.checkNoChanges
/go/src/github.com/docker/docker/pkg/archive/archive_test.go:553
github_com_docker_docker_pkg_archive.TestTarFiles
/go/src/github.com/docker/docker/pkg/archive/archive_test.go:521
testing.tRunner
../../../src/libgo/go/testing/testing.go:447
created by testing.RunTests
../../../src/libgo/go/testing/testing.go:555

goroutine 74 [runnable]:
sync.runtime_Syncsemacquire
../../../src/libgo/runtime/sema.goc:262
sync.Wait.pN9_sync.Cond
../../../src/libgo/go/sync/cond.go:62
io.write.pN7_io.pipe
../../../src/libgo/go/io/pipe.go:94
io.Write.pN13_io.PipeWriter
../../../src/libgo/go/io/pipe.go:161
bufio.flush.pN12_bufio.Writer
../../../src/libgo/go/bufio/bufio.go:530
bufio.Flush.pN12_bufio.Writer
../../../src/libgo/go/bufio/bufio.go:519
pools.$nested3
/go/src/github.com/docker/docker/pkg/pools/pools.go:102
github_com_docker_docker_pkg_ioutils.Close.pN55_github_com_docker_docker_pkg_ioutils.writeCloserWrapper
/go/src/github.com/docker/docker/pkg/ioutils/writers.go:31
archive.$nested0
/go/src/github.com/docker/docker/pkg/archive/archive.go:476
created by github_com_docker_docker_pkg_archive.TarWithOptions
/go/src/github.com/docker/docker/pkg/archive/archive.go:393

Tests failed: ./pkg/archive

Interesting that in both cases there is a stack marked as runnable with syscall.Exitsyscall at the top.

I have been looking at the code in checkdead where the message is coming from. In order for this to correctly detect a deadlock, the counts found in the runtime_sched structure must be in sync (i.e. must reflect the same state) with the entries runtime_allg list and I don't see how that is enforced. Seems like the run count could be computed but before the runtime_allg list is processed, a new goroutine could be added, resulting in a new run count, identifying a deadlock when there isn't one.

   run = runtime_sched.mcount - runtime_sched.nmidle - runtime_sched.nmidlelocked - 1 - countextra();
    if(run > 0)
            return;
    // If we are dying because of a signal caught on an already idle thread,
    // freezetheworld will cause all running threads to block.
    // And runtime will essentially enter into deadlock state,
    // except that there is a thread that will call runtime_exit soon.
    if(runtime_panicking > 0)
            return;
    if(run < 0) {
            runtime_printf("runtime: checkdead: nmidle=%d nmidlelocked=%d mcount=%d\n",
                    runtime_sched.nmidle, runtime_sched.nmidlelocked, runtime_sched.mcount);
            runtime_throw("checkdead: inconsistent counts");
    }
    grunning = 0;
    runtime_lock(&allglock);
    for(i = 0; i < runtime_allglen; i++) {
            gp = runtime_allg[i];
            if(gp->isbackground)
                    continue;
            s = gp->status;
            if(s == Gwaiting)
                    grunning++;
            else if(s == Grunnable || s == Grunning || s == Gsyscall) {
                    runtime_unlock(&allglock);
                    runtime_printf("runtime: checkdead: find g %D in status %d\n", gp->goid, s);
                    runtime_throw("checkdead: runnable g");
            }
    }
    runtime_unlock(&allglock);

We will continue to try to reproduce this problem in a smaller testcase and do some more debugging.

@ianlancetaylor ianlancetaylor added this to the Gccgo milestone Jun 28, 2015
@laboger
Copy link
Contributor Author

laboger commented Jun 30, 2015

We have a testcase that reproduces the problem, which shows that it is due to the Entersyscall and Exitsyscall within libgo/go/syscall/libcall_linux.go function Getdents. It should not be in Getdents, since Entersyscall and Exitsyscall are called within the call to Syscall(). Calling Exitsyscall on this path is what causes the panic in checkdead.

I have a patch to fix this here: https://go-review.googlesource.com/#/c/11761/

I built did a bootstrap build on ppc64le and tested it.

Testcase to reproduce the problem with gccgo from gcc5 branch on both ppc64le and x86 is below. You must do mkdir temp to cause the panic to occur:

package main

import (
        "fmt"
        "os"
        "sort"
        "syscall"
        "unsafe"
)

type nameIno struct {
        name string
        ino  uint64
}

type nameInoSlice []nameIno

func (s nameInoSlice) Len() int           { return len(s) }
func (s nameInoSlice) Swap(i, j int)      { s[i], s[j] = s[j], s[i] }
func (s nameInoSlice) Less(i, j int) bool { return s[i].name < s[j].name }

func readdirnames(dirname string) (names []nameIno, err error) {
        var (
                size = 100
                buf  = make([]byte, 4096)
                nbuf int
                bufp int
                nb   int
        )

        f, err := os.Open(dirname)
        if err != nil {
                return nil, err
        }
        defer f.Close()

        names = make([]nameIno, 0, size) // Empty with room to grow.
        for {
                // Refill the buffer if necessary
                if bufp >= nbuf {
                        bufp = 0
                        nbuf, err = syscall.ReadDirent(int(f.Fd()), buf) // getdents on linux
                        if nbuf < 0 {
                                nbuf = 0
                        }
                        if err != nil {
                                return nil, os.NewSyscallError("readdirent", err)
                        }
                        if nbuf <= 0 {
                                break // EOF
                        }
                }

                // Drain the buffer
                nb, names = parseDirent(buf[bufp:nbuf], names)
                bufp += nb
        }

        sl := nameInoSlice(names)
        sort.Sort(sl)
        return sl, nil
}

// parseDirent is a minor modification of syscall.ParseDirent (linux version)
// which returns {name,inode} pairs instead of just names.
func parseDirent(buf []byte, names []nameIno) (consumed int, newnames []nameIno) {
        origlen := len(buf)
        for len(buf) > 0 {
                dirent := (*syscall.Dirent)(unsafe.Pointer(&buf[0]))
                buf = buf[dirent.Reclen:]
                if dirent.Ino == 0 { // File absent in directory.
                        continue
                }
                bytes := (*[10000]byte)(unsafe.Pointer(&dirent.Name[0]))
                var name = string(bytes[0:clen(bytes[:])])
                if name == "." || name == ".." { // Useless names
                        continue
                }
                names = append(names, nameIno{name, dirent.Ino})
        }
        return origlen - len(buf), names
}

func clen(n []byte) int {
        for i := 0; i < len(n); i++ {
                if n[i] == 0 {
                        return i
                }
        }
        return len(n)
}

func main() {
        names, _ := readdirnames("temp/")
        fmt.Println("list %s\n", names)
} 

@laboger laboger closed this as completed Jun 30, 2015
@laboger laboger reopened this Jun 30, 2015
@laboger
Copy link
Contributor Author

laboger commented Jun 30, 2015

I did not intend to close it, sorry.

@gopherbot
Copy link

CL https://golang.org/cl/11761 mentions this issue.

@ianlancetaylor
Copy link
Contributor

Thanks for sorting this out. Your patch has been committed to GCC mainline and GCC 4.9 branch. Waiting for RM approval for GCC 5 branch.

KangDroid pushed a commit to KangDroid/gcc that referenced this issue Jul 13, 2015
    
    The syscall for Getdents in syscall/libcall_linux.go
    called Entersyscall and Exitsyscall, causing the
    runtime_sched counts for goroutines to be incorrect.
    Inconsistent counts caused checkdead in runtime/proc.c
    to panic.
    
    Fixes golang/go#11406
    
    Reviewed-on: https://go-review.googlesource.com/11761


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@225745 138bc75d-0d04-0410-961f-82ee72b054a4
hubot pushed a commit to gcc-mirror/gcc that referenced this issue Jul 13, 2015
    
    The syscall for Getdents in syscall/libcall_linux.go
    called Entersyscall and Exitsyscall, causing the
    runtime_sched counts for goroutines to be incorrect.
    Inconsistent counts caused checkdead in runtime/proc.c
    to panic.
    
    Fixes golang/go#11406
    
    Reviewed-on: https://go-review.googlesource.com/11761


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch@225746 138bc75d-0d04-0410-961f-82ee72b054a4
@ianlancetaylor
Copy link
Contributor

Also committed to GCC 5 branch.

hubot pushed a commit to gcc-mirror/gcc that referenced this issue Jul 14, 2015
    
    The syscall for Getdents in syscall/libcall_linux.go
    called Entersyscall and Exitsyscall, causing the
    runtime_sched counts for goroutines to be incorrect.
    Inconsistent counts caused checkdead in runtime/proc.c
    to panic.
    
    Fixes golang/go#11406
    
    Reviewed-on: https://go-review.googlesource.com/11761


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-5-branch@225773 138bc75d-0d04-0410-961f-82ee72b054a4
nbkolchin pushed a commit to Wolpike/gcc that referenced this issue Aug 25, 2015
    
    The syscall for Getdents in syscall/libcall_linux.go
    called Entersyscall and Exitsyscall, causing the
    runtime_sched counts for goroutines to be incorrect.
    Inconsistent counts caused checkdead in runtime/proc.c
    to panic.
    
    Fixes golang/go#11406
    
    Reviewed-on: https://go-review.googlesource.com/11761


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch@225746 138bc75d-0d04-0410-961f-82ee72b054a4
@golang golang locked and limited conversation to collaborators Jul 13, 2016
carstene1ns pushed a commit to pspdev/gcc that referenced this issue Jul 9, 2020
    
    The syscall for Getdents in syscall/libcall_linux.go
    called Entersyscall and Exitsyscall, causing the
    runtime_sched counts for goroutines to be incorrect.
    Inconsistent counts caused checkdead in runtime/proc.c
    to panic.
    
    Fixes golang/go#11406
    
    Reviewed-on: https://go-review.googlesource.com/11761

From-SVN: r225746
asiekierka pushed a commit to WonderfulToolchain/gcc-ia16 that referenced this issue May 16, 2022
    
    The syscall for Getdents in syscall/libcall_linux.go
    called Entersyscall and Exitsyscall, causing the
    runtime_sched counts for goroutines to be incorrect.
    Inconsistent counts caused checkdead in runtime/proc.c
    to panic.
    
    Fixes golang/go#11406
    
    Reviewed-on: https://go-review.googlesource.com/11761

From-SVN: r225745
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants