Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql, net/http: deadlock hangs on linux #15901

Closed
F21 opened this issue May 31, 2016 · 4 comments
Closed

database/sql, net/http: deadlock hangs on linux #15901

F21 opened this issue May 31, 2016 · 4 comments
Milestone

Comments

@F21
Copy link

F21 commented May 31, 2016

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
go version go1.6.2 linux/amd64
  1. What operating system and processor architecture are you using (go env)?

Ubuntu 16.04 64-bit and alpine 3.3

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/user/work"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GO15VENDOREXPERIMENT="1"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"
  1. What did you do?

I am building a database/sql driver. I noticed that on linux machines (tested with Ubuntu and alpine), the go test tool will deadlock and timeout (after 10 minutes).

The code is available in this repo: https://github.com/F21/deadlock

driver.go:

package deadlock

import (
    "database/sql"
    "database/sql/driver"
    "errors"
    "fmt"
    "net/http"
)

type httpClient struct {
    httpClient *http.Client // <- Do not need to wait for timeout if we remove this struct member and do not import net/http
}

type conn struct{}

func (c *conn) Prepare(query string) (driver.Stmt, error) {
    return nil, errors.New("prepare() not supported")
}

func (c *conn) Close() error {
    return nil
}

func (c *conn) Begin() (driver.Tx, error) {
    return nil, errors.New("Begin() not supported")
}

func (c *conn) Exec(query string, args []driver.Value) (driver.Result, error) {

    items := []int{1, 2}
    fmt.Println(items[2]) //Should cause a panic

    return nil, errors.New("error!")
}

type Driver struct{}

func (d *Driver) Open(dsn string) (driver.Conn, error) {

    conn := &conn{}

    return conn, nil
}

func init() {
    sql.Register("deadlock", &Driver{})
}

driver_test.go:

package deadlock

import (
    "database/sql"
    "testing"
)

func TestExecDeadlock(t *testing.T) {

    db, err := sql.Open("deadlock", "someserver:1234")

    if err != nil {
        t.Fatalf("error connecting: %s", err.Error())
    }

    defer db.Close()

    db.Exec("some statement")
}

tool/main.go:

package main

import (
    "database/sql"
    "fmt"
    _ "github.com/F21/deadlock"
)

func main() {
    db, err := sql.Open("deadlock", "someserver:1234")

    if err != nil {
        fmt.Println(err)
    }

    defer db.Close()

    db.Exec("some statement")

    fmt.Println("Done!")
}

driver.go contains a bug on line 32 that will cause a panic.

If we run go test on the root of the repo, it will hang and will only terminate until it reaches the 10 minute timeout.

If we run the command: go run tool/main.go we also see that it hangs forever.

The interesting thing is that if we remove lines 11 to 13 in driver.go and remove the net/http import, it fails immediately and gives us an error: fatal error: all goroutines are asleep - deadlock!

On Windows (tested on Windows 10 64-bit), doing the above does not hang and it fails immediately.

  1. What did you expect to see?
    The deadlock should definitely not hang. It should fail fast like on Windows. However, rather than having an error about a deadlock, I was expecting an error about the panic.
  2. What did you see instead?
    The deadlock hangs on linux and fails immediately on Windows. After timing out on Linux, we get this output:
SIGQUIT: quit
PC=0x460891 m=0

goroutine 0 [idle]:
runtime.futex(0x9deba8, 0x0, 0x0, 0x0, 0x0, 0x9de150, 0x0, 0x0, 0x40f3c4, 0x9deba8, ...)
    /usr/local/go/src/runtime/sys_linux_amd64.s:306 +0x21
runtime.futexsleep(0x9deba8, 0x0, 0xffffffffffffffff)
    /usr/local/go/src/runtime/os1_linux.go:40 +0x53
runtime.notesleep(0x9deba8)
    /usr/local/go/src/runtime/lock_futex.go:145 +0xa4
runtime.stopm()
    /usr/local/go/src/runtime/proc.go:1538 +0x10b
runtime.findrunnable(0xc82001c000, 0x0)
    /usr/local/go/src/runtime/proc.go:1976 +0x739
runtime.schedule()
    /usr/local/go/src/runtime/proc.go:2075 +0x24f
runtime.goexit0(0xc820001680)
    /usr/local/go/src/runtime/proc.go:2210 +0x1f9
runtime.mcall(0x7ffea7a9f320)
    /usr/local/go/src/runtime/asm_amd64.s:233 +0x5b

goroutine 1 [chan receive]:
testing.RunTests(0x8a7148, 0x9d59a0, 0x1, 0x1, 0xc82008d201)
    /usr/local/go/src/testing/testing.go:583 +0x8d2
testing.(*M).Run(0xc82004bef8, 0x8a7998)
    /usr/local/go/src/testing/testing.go:515 +0x81
main.main()
    github.com/F21/deadlock/_test/_testmain.go:54 +0x117

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1998 +0x1

goroutine 5 [semacquire]:
sync.runtime_Semacquire(0xc82001a1e4)
    /usr/local/go/src/runtime/sema.go:47 +0x26
sync.(*Mutex).Lock(0xc82001a1e0)
    /usr/local/go/src/sync/mutex.go:83 +0x1c4
database/sql.(*driverConn).closeDBLocked(0xc82001a1c0, 0x0)
    /usr/local/go/src/database/sql/sql.go:326 +0x3e
database/sql.(*DB).Close(0xc8200cc160, 0x0, 0x0)
    /usr/local/go/src/database/sql/sql.go:528 +0x195
panic(0x7894c0, 0xc82000a0d0)
    /usr/local/go/src/runtime/panic.go:443 +0x4e9
github.com/F21/deadlock.(*conn).Exec(0x9fada8, 0x80e2d0, 0xe, 0x9fada8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /home/user/work/src/github.com/F21/deadlock/driver.go:32 +0x242
database/sql.(*DB).exec(0xc8200cc160, 0x80e2d0, 0xe, 0x0, 0x0, 0x0, 0xc820029e01, 0x0, 0x0, 0x0, ...)
    /usr/local/go/src/database/sql/sql.go:1035 +0x2c2
database/sql.(*DB).Exec(0xc8200cc160, 0x80e2d0, 0xe, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /usr/local/go/src/database/sql/sql.go:1009 +0xbe
github.com/F21/deadlock.TestExecDeadlock(0xc820018120)
    /home/user/work/src/github.com/F21/deadlock/driver_test.go:18 +0x1e8
testing.tRunner(0xc820018120, 0x9d59a0)
    /usr/local/go/src/testing/testing.go:473 +0x98
created by testing.RunTests
    /usr/local/go/src/testing/testing.go:582 +0x892

rax    0xca
rbx    0x0
rcx    0x460893
rdx    0x0
rdi    0x9deba8
rsi    0x0
rbp    0x1
rsp    0x7ffea7a9f178
r8     0x0
r9     0x0
r10    0x0
r11    0x286
r12    0x8
r13    0x8a5d3d
r14    0x9
r15    0x8
rip    0x460891
rflags 0x286
cs     0x33
fs     0x0
gs     0x0
*** Test killed with quit: ran too long (10m0s).
FAIL    github.com/F21/deadlock 600.006s
@davecheney
Copy link
Contributor

Thank you for proving a detailed repo. I believe there is a shadowing error in sql.exec around line 1030

        if execer, ok := dc.ci.(driver.Execer); ok {
                dargs, err := driverArgs(nil, args) // shadows outer err
                if err != nil {
                        return nil, err
                }
                dc.Lock()
                resi, err := execer.Exec(query, dargs)
                dc.Unlock()
                if err != driver.ErrSkip {
                        if err != nil {
                                return nil, err
                        }
                        return driverResult{dc, resi}, nil
                }
        }

This shadowing may be preventing the defer of putConn to work properly.

@F21
Copy link
Author

F21 commented May 31, 2016

#13677 may also be related.

@ianlancetaylor ianlancetaylor added this to the Go1.7Maybe milestone May 31, 2016
@ianlancetaylor
Copy link
Contributor

@F21 Thanks for the clear test case. I agree with @davecheney that the shadowing is problematic, but I don't think that's the problem here. I think the deadlock/hang is occurring because of the defer db.Close(). That is run when the panic happens. It tries to close all the connections. Unfortunately, in doing so, it tries the lock them, but the connection is already locked. I will send a CL.

@ianlancetaylor
Copy link
Contributor

The CL has too many changes, and this case was already broken. Postponing until 1.8.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.8, Go1.7Maybe May 31, 2016
@quentinmit quentinmit changed the title database/sql and net/http deadlock hangs on linux database/sql, net/http: deadlock hangs on linux Jun 28, 2016
@golang golang locked and limited conversation to collaborators Aug 29, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants