Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: idle connections are not cleaned up properly #39471

Closed
stevenh opened this issue Jun 9, 2020 · 13 comments
Closed

database/sql: idle connections are not cleaned up properly #39471

stevenh opened this issue Jun 9, 2020 · 13 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@stevenh
Copy link
Contributor

stevenh commented Jun 9, 2020

What version of Go are you using (go version)?

go version devel +e64675a79f Mon Jun 8 22:06:39 2020 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE=""
GOARCH="amd64"
GOBIN="/data/go/bin"
GOCACHE="/home/steveh/.cache/go-build"
GOENV="/home/steveh/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/data/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/data/go:/home/steveh/code"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/steveh/code/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/steveh/code/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build544771756=/tmp/go-build -gno-record-gcc-switches"

What did you do?

  1. Use a larger number of DB connections with SetConnMaxIdleTime configured.
  2. Reduce the connection number of down to a lower number.
  3. Monitor to check if idle connections were reduced to the lower number after ConnMaxIdleTime

What did you expect to see?

DB connections to drop to ConnMaxIdleTime

What did you see instead?

Very little change in used connections even though large number of connections were showing as idle.

Investigation

After investigating the issue I identified the problem is that connects are not reused and maintained in idle time order in db.freeConn slice. This causes connections to be randomly reused and hence prevents the ConnMaxIdleTime from identifying the unneeded connections.

@gopherbot
Copy link

Change https://golang.org/cl/237337 mentions this issue: database/sql: Fix idle connection reuse

@stevenh
Copy link
Contributor Author

stevenh commented Jun 9, 2020

@kardianos as author of SetConnMaxIdleTime I thought you might be interested in this.

@kardianos
Copy link
Contributor

@stevenh Thank you for taking time to test this feature. There is quite a bit of change going on here. I'm open to this or something like this but it will take me a little bit to go through this change.

We are late in the Go release cycle: https://github.com/golang/go/wiki/Go-Release-Cycle (currently the tree is frozen) so I think the soonest it could get merged would be mid-July after the Go1.15 release.

I just want to confirm the issue you are trying to address. While the max idle time does work correctly today (technically), because some amount of connections get used (picked at random), then few connections actually idle out because one of them is always picked prior to idle timeout. Your solution was to pick connection from the free list starting with the most recent return time: select 1 conn from free_list order by return_at desc limit 1;. This will FIFO queue I think.

There is another issue #31708 which would like to make the free list a LIFO queue.

I'll need to inspect these two issues at the same time. Again, we can work on this with the aim of getting this ready to be merged at the top of the window when it opens in July. Just don't expect it to go into the Go1.15 release.

@stevenh
Copy link
Contributor Author

stevenh commented Jun 10, 2020

What triggered my investigation was we saw a spike in load and connections never returned to normal levels even with our 2m ConnMaxIdleTime, as it turned the connection reuse was resetting returnedAt for all connections hence not allowing any connections to be closed down.

We ended up having to restart the app to bring connection limits and hence the DB memory usage back under control.

Given this I think its import to get this issue fixed before 1.15

@stevenh
Copy link
Contributor Author

stevenh commented Jun 10, 2020

Here's a graph of what we saw with ConnMaxIdleTime = 2m, the vertical blue line is the spike in traffic which occurred for just a few seconds. As you can see idle connections didn't clear back down.
image

@toothrot toothrot added this to the Backlog milestone Jun 12, 2020
@toothrot toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 12, 2020
@stevenh
Copy link
Contributor Author

stevenh commented Aug 12, 2020

Now go 1.15 is released I wanted to pick this back up as the current implementation which people will start using is impacted by this.

@kardianos what will it take to get https://golang.org/cl/237337 over the line and into the next 1.15 patch release?

@tzeejay
Copy link

tzeejay commented Sep 28, 2020

Wanted to quickly leave a note for you that I think we hit this by accident. I had written quite inefficient code that is run very frequently. Every once in a while we'd hit a spike and used the max connections limit basically instantly making every other query fail.

Inefficient code is cleaned up and taken care of but the randomness and the instant spikes look similar to the reported issue here.

For reference:

$ go version
go version go1.15.2 darwin/amd64

I am cross compiling the project to Linux/x86-64

@kardianos
Copy link
Contributor

I like the effect that the user (blocking) code no longer copies the freeConn list when getting a new connection; it is only re-sliced. The freeConn list is only copied in a background goroutine.

@rsc I would be okay with this change; whats the current process for a proposal?

@ianlancetaylor
Copy link
Contributor

Does this need a proposal? It doesn't sound like an API change to me.

@stevenh
Copy link
Contributor Author

stevenh commented Sep 29, 2021

Correct this is just a fix to the expected behaviour.

@kardianos
Copy link
Contributor

Thanks Ian. Sorry, I'm a bit out of practice. @stevenh: I'm going to do another pass on the CL, but it it looking good.

@agnivade
Copy link
Contributor

agnivade commented Oct 3, 2021

Related: #43173

@stevenh
Copy link
Contributor Author

stevenh commented Oct 20, 2021

Looks like we have under 2 weeks until the 1.18 freeze, would love to have this in by then if possible @kardianos so if there's anything you need from me please let me know.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants