Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: panic in database/sql.(*connRequestSet).deleteIndex #68949

Closed
nklaassen opened this issue Aug 19, 2024 · 9 comments
Closed

database/sql: panic in database/sql.(*connRequestSet).deleteIndex #68949

nklaassen opened this issue Aug 19, 2024 · 9 comments
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Milestone

Comments

@nklaassen
Copy link
Contributor

nklaassen commented Aug 19, 2024

Go version

go version go1.23.0 darwin/arm64

Output of go env in your module/workspace:

go version go1.23.0 darwin/arm64
Nics-MacBook-Pro:teleport nic$ go env
GO111MODULE=''
GOARCH='arm64'
GOBIN='/Users/nic/go/bin'
GOCACHE='/Users/nic/Library/Caches/go-build'
GOENV='/Users/nic/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/nic/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/nic/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/Users/nic/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.darwin-arm64'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/Users/nic/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.darwin-arm64/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.0'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/nic/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/Users/nic/src/teleport/go.mod'
GOWORK='/Users/nic/src/teleport/go.work'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/x_/33kp49n52rg8hs97k5m96rnw0000gn/T/go-build289495391=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

One of our unit tests seems to start a transaction concurrently with closing the database, this sometimes causes a panic here:

*(s.s[idx].curIdx) = -1

I haven't been able to reproduce this with db.Conn db.Close calls yet, but I do have a unit test in #68953 using database/sql.(*connRequestSet).CloseAndRemoveAll and database/sql.(*connRequestSet).Delete that reliably reproduces the panic.

Update: reproduced with exported methods on database/sql.DB here: https://go.dev/play/p/fHZFq6zgAV4

What did you see happen?

This looks like a race condition in database/sql.(*DB).conn:

  1. In (*DB).conn here delHandle is set to the result of db.connRequests.Add(req), and then db.mu is unlocked
  2. A concurrent call to db.Close will call db.connRequests.CloseAndRemoveAll here
  3. (*DB).conn re-locks db.mu here and calls db.connRequests.Delete(delHandle)
  4. delHandle.idx is positive and no bounds checking is done before trying to index into the slice which was set to nil by (*connRequestSet).CloseAndRemoveAll, resulting in the panic.

Here's a full stack trace (from https://github.com/gravitational/teleport/actions/runs/10427126860/job/28881284822):

panic: runtime error: index out of range [0] with length 0

goroutine 5494 [running]:
database/sql.(*connRequestSet).deleteIndex(...)
	/opt/go/src/database/sql/sql.go:3644
database/sql.(*connRequestSet).Delete(...)
	/opt/go/src/database/sql/sql.go:3638
database/sql.(*DB).conn(0xc001569380, {0x15fa19c0, 0xc0031d4000}, 0x1)
	/opt/go/src/database/sql/sql.go:1366 +0x1625
database/sql.(*DB).begin(0xc001569380, {0x15fa19c0, 0xc0031d4000}, 0x0, 0x1)
	/opt/go/src/database/sql/sql.go:1883 +0x57
database/sql.(*DB).BeginTx.func1(0x1)
	/opt/go/src/database/sql/sql.go:1866 +0x7a
database/sql.(*DB).retry(0x7fa4fef225c8?, 0xc002c0b2c8)
	/opt/go/src/database/sql/sql.go:1568 +0x4b
database/sql.(*DB).BeginTx(0xc001569380, {0x15fa19c0, 0xc0031d4000}, 0x0)
	/opt/go/src/database/sql/sql.go:1865 +0x108
github.com/gravitational/teleport/lib/backend/lite.(*Backend).inTransaction(0xc0033cde40, {0x15fa19c0, 0xc0031d4000}, 0xc002c0b498)
	/__w/teleport/teleport/lib/backend/lite/lite.go:949 +0x155
github.com/gravitational/teleport/lib/backend/lite.(*Backend).Get(0xc0033cde40, {0x15fa19c0, 0xc0031d4000}, {0xc003a9d440, 0x13, 0x18})
	/__w/teleport/teleport/lib/backend/lite/lite.go:598 +0x1e9
github.com/gravitational/teleport/lib/auth/storage.(*ProcessStorage).GetState(0xc0000575c0, {0x15fa19c0, 0xc0031d4000}, {0x14ac87f5, 0x5})
	/__w/teleport/teleport/lib/auth/storage/storage.go:91 +0x264
github.com/gravitational/teleport/lib/service.(*TeleportProcess).initAuthService.func5()
	/__w/teleport/teleport/lib/service/service.go:2486 +0x2db
github.com/gravitational/teleport/lib/srv.(*Heartbeat).fetch(0xc002010ea0)
	/__w/teleport/teleport/lib/srv/heartbeat.go:339 +0x56
github.com/gravitational/teleport/lib/srv.(*Heartbeat).fetchAndAnnounce(0xc002010ea0)
	/__w/teleport/teleport/lib/srv/heartbeat.go:624 +0x2f
github.com/gravitational/teleport/lib/srv.(*Heartbeat).Run(0xc002010ea0)
	/__w/teleport/teleport/lib/srv/heartbeat.go:282 +0xe8
github.com/gravitational/teleport/lib/service.(*LocalService).Serve(0xc0022c37c0)
	/__w/teleport/teleport/lib/service/supervisor.go:588 +0x36
github.com/gravitational/teleport/lib/service.(*LocalSupervisor).serve.func1()
	/__w/teleport/teleport/lib/service/supervisor.go:313 +0x489
created by github.com/gravitational/teleport/lib/service.(*LocalSupervisor).serve in goroutine 4831
	/__w/teleport/teleport/lib/service/supervisor.go:302 +0x105

What did you expect to see?

I expected no panic. sql.DB is advertised as "safe for concurrent use by multiple goroutines"

@nklaassen
Copy link
Contributor Author

Looks like this code was introduced in 190d0d3 by @bradfitz

nklaassen added a commit to nklaassen/go that referenced this issue Aug 19, 2024
The current implementation has a panic when the database is closed
concurrently with a new connection attempt.
`database/sql.(*connRequestSet).CloseAndRemoveAll` sets
`connRequestSet.s` to a `nil` slice.
If this happens at the line indicated below, we get a panic:

```go
db.mu.Lock()
delHandle := db.connRequests.Add(req)
db.mu.Unlock()

// CloseAndRemoveAll here causes a panic.

db.mu.Lock()
deleted := db.connRequests.Delete(delHandle)
db.mu.Unlock()
```

Fixes golang#68949
@cherrymui
Copy link
Member

cc @bradfitz @kardianos @kevinburke

@cherrymui cherrymui added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 19, 2024
@cherrymui cherrymui added this to the Go1.24 milestone Aug 19, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/607238 mentions this issue: database/sql: fix panic with concurrent Conn and Close

@nklaassen
Copy link
Contributor Author

Update: I managed to reproduce the panic very reliably in the go playground using only exported methods https://go.dev/play/p/fHZFq6zgAV4

nklaassen added a commit to nklaassen/go that referenced this issue Aug 22, 2024
The current implementation has a panic when the database is closed
concurrently with a new connection attempt.

connRequestSet.CloseAndRemoveAll sets connRequestSet.s to a nil slice.
If this happens between calls to connRequestSet.Add and
connRequestSet.Delete, there is a panic when trying to write to the nil
slice. This is sequence is likely to occur in DB.conn, where the mutex
is released between calls to db.connRequests.Add and
db.connRequests.Delete

This change updates connRequestSet.CloseAndRemoveAll to set the curIdx
to -1 for all pending requests before setting its internal slice to nil.
CloseAndRemoveAll already iterates the full slice to close all the request
channels. It seems appropriate to set curIdx to -1 before deleting the
slice for 3 reasons:
1. connRequestSet.deleteIndex also sets curIdx to -1
2. curIdx will not be relevant to anything after the slice is set to nil
3. connRequestSet.Delete already checks for negative indices

Fixes golang#68949
@bradfitz
Copy link
Contributor

@cherrymui, this probably warrants a backport to the 1.23 branch.

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. FixPending Issues that have a fix which has not yet been reviewed or submitted. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 23, 2024
@cherrymui
Copy link
Member

@gopherbot please backport this to Go 1.23. This is a regression in 1.23. Thanks.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #69041 (for 1.23).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/609255 mentions this issue: [release-branch.go1.23] database/sql: fix panic with concurrent Conn and Close

gopherbot pushed a commit that referenced this issue Aug 28, 2024
…and Close

The current implementation has a panic when the database is closed
concurrently with a new connection attempt.

connRequestSet.CloseAndRemoveAll sets connRequestSet.s to a nil slice.
If this happens between calls to connRequestSet.Add and
connRequestSet.Delete, there is a panic when trying to write to the nil
slice. This is sequence is likely to occur in DB.conn, where the mutex
is released between calls to db.connRequests.Add and
db.connRequests.Delete

This change updates connRequestSet.CloseAndRemoveAll to set the curIdx
to -1 for all pending requests before setting its internal slice to nil.
CloseAndRemoveAll already iterates the full slice to close all the request
channels. It seems appropriate to set curIdx to -1 before deleting the
slice for 3 reasons:
1. connRequestSet.deleteIndex also sets curIdx to -1
2. curIdx will not be relevant to anything after the slice is set to nil
3. connRequestSet.Delete already checks for negative indices

For #68949
Fixes #69041

Change-Id: I6b7ebc5a71b67322908271d13865fa12f2469b87
GitHub-Last-Rev: 7d26691
GitHub-Pull-Request: #68953
Reviewed-on: https://go-review.googlesource.com/c/go/+/607238
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Commit-Queue: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
(cherry picked from commit 08707d6)
Reviewed-on: https://go-review.googlesource.com/c/go/+/609255
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FixPending Issues that have a fix which has not yet been reviewed or submitted. NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants