Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: Is it possible to release expired connections as soon as possible #39449

Open
chanxuehong opened this issue Jun 7, 2020 · 11 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@chanxuehong
Copy link
Contributor

chanxuehong commented Jun 7, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.4 darwin/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/chan/Library/Caches/go-build"
GOENV="/Users/chan/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOOS="darwin"
GOPATH="/Users/chan/gopath"
GOPROXY="https://goproxy.cn,https://goproxy.io,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/qw/ddvmdjjd35xcj6823_nz3z780000gn/T/go-build941747274=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

At present, database/sql closes expired conn at a fixed time interval, so when ConnMaxLifeTime is the same as the database configuration or greater than half of the database side, the database may close the conn earlier than the client, so that when the client obtains the conn A driver.ErrBadConn error will be generated, and an error may be generated when the expired conn is closed

So can we set the time interval to half of the original?

package main

import (
	"context"
	"database/sql"
	"fmt"
	"time"

	_ "github.com/go-sql-driver/mysql"
)

func main() {
	dsn := fmt.Sprintf("%s:%s@tcp(%s)/%s", "root", "123456", "127.0.0.1:3306", "test_db")
	dsn += "?collation=utf8mb4_bin&clientFoundRows=false&interpolateParams=true&loc=Asia%2FShanghai&maxAllowedPacket=0&multiStatements=false&parseTime=true&timeout=5000ms&time_zone=%27Asia%2FShanghai%27"
	dsn += "&wait_timeout=15"

	fmt.Println("start time:", time.Now().Format("2006/01/02 15:04:05"))
	DB, err := sql.Open("mysql", dsn)
	if err != nil {
		fmt.Println(err)
		return
	}
	defer DB.Close()
	DB.SetMaxOpenConns(100)
	DB.SetMaxIdleConns(100)
	DB.SetConnMaxLifetime(10 * time.Second)

	var name string
	var value string
	if err := DB.QueryRow("show session variables where variable_name='wait_timeout'").Scan(&name, &value); err != nil {
		fmt.Println(err)
		return
	}
	fmt.Println(name, value)

	time.Sleep(4 * time.Second)
	// Get the existing conn
	conn, err := DB.Conn(context.Background())
	if err != nil {
		fmt.Println(err)
		return
	}
	// Create a new conn
	fmt.Println("conn2 created time:", time.Now().Format("2006/01/02 15:04:05"))
	conn2, err := DB.Conn(context.Background())
	if err != nil {
		fmt.Println(err)
		return
	}
	_ = conn.Close()
	_ = conn2.Close()

	time.Sleep(30 * time.Second)
}

What did you expect to see?

no closing bad idle connection: EOF

What did you see instead?

start time: 2020/06/07 16:55:00
wait_timeout 15
conn2 created time: 2020/06/07 16:55:04
[mysql] 2020/06/07 16:55:20 packets.go:123: closing bad idle connection: EOF
@agnivade
Copy link
Contributor

agnivade commented Jun 7, 2020

/cc @kardianos

@agnivade agnivade added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 7, 2020
@agnivade agnivade added this to the Backlog milestone Jun 7, 2020
@kardianos
Copy link
Contributor

Thanks @chanxuehong this explains the PR you sent a little more.

Currently your PR has an issue where the timer will fire twice as fast every time it runs. This will consume more and more CPU and is probably not what you want.

You may have a good general point here. That is, you may want to run the connection cleaner faster then the shortest timeout. There is some validity here. However, I'm guessing it isn't your core issue. I suspect that your issue will be solved in the next release with:

#32530

https://go-review.googlesource.com/c/go/+/216197

@chanxuehong
Copy link
Contributor Author

chanxuehong commented Jun 8, 2020

Thanks @kardianos for your comment

Thanks @chanxuehong this explains the PR you sent a little more.

Currently your PR has an issue where the timer will fire twice as fast every time it runs. This will consume more and more CPU and is probably not what you want.

Indeed, this will happen, in fact, I also considered it, but I personally think that this is not a problem, because under normal circumstances, we will set the ConnMaxLifeTime to be relatively large, such as 30 minutes, then it runs 48 times a day, I think I Can accept this. In particular, if we speed up the collection of conn that has expired, we will may save the retry overhead caused by driver.ErrBadConn

You may have a good general point here. That is, you may want to run the connection cleaner faster then the shortest timeout. There is some validity here. However, I'm guessing it isn't your core issue. I suspect that your issue will be solved in the next release with:

#32530

https://go-review.googlesource.com/c/go/+/216197

For https://go-review.googlesource.com/c/go/+/216197, I think this does not significantly reduce the probability of driver.ErrBadConn because most of the time the strategy is cachedOrNewConn

Now there are many errors like the following in our error log. In order to avoid this error, we set wait_timeout to 2 times of ConnMaxLifeTime

...
[mysql] 2020/06/01 22:29:57 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/01 22:29:57 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/01 22:29:57 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/01 23:29:57 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/01 23:29:57 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/01 23:29:57 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/02 01:12:49 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/02 01:29:57 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/02 01:29:57 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/02 17:34:22 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/02 17:34:22 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/02 18:34:22 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/02 19:12:29 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/03 16:46:23 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/03 18:02:39 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/03 18:46:23 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/03 18:58:57 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/03 19:46:23 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/03 19:46:23 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/03 19:46:23 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/03 19:46:23 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/03 20:46:23 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/03 20:46:23 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/03 21:36:31 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/03 21:46:23 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/03 22:09:41 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/03 22:46:23 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/04 02:08:24 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/04 02:08:24 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/04 02:08:24 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/04 13:25:51 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/04 13:30:45 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/04 13:44:31 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/04 14:00:19 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/04 14:02:02 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/04 14:08:31 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/04 14:12:51 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/04 14:25:37 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/04 16:38:37 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/04 17:08:31 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/04 20:01:58 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/06/04 20:18:27 packets.go:122: closing bad idle connection: EOF
...

@pigletfly
Copy link

pigletfly commented Jun 29, 2020

I think https://go-review.googlesource.com/c/go/+/145758/ fixed this issue, you can control maxIdleTime which should be less than your db server config.

@chanxuehong
Copy link
Contributor Author

I think https://go-review.googlesource.com/c/go/+/145758/ fixed this issue, you can control maxIdleTime which should be less than your db server config.

Thanks for your comment

I think that SetConnMaxIdleTime can’t solve my problem, in fact I don’t want the following to happen

Expired connections may be closed lazily before reuse.

@mastertheknife
Copy link

I and also seeing these errors in my logs all day long, with the latest version of go-mysql-driver:

[mysql] 2020/10/02 20:35:01 packets.go:122: closing bad idle connection: EOF
[mysql] 2020/10/02 20:35:01 connection.go:158: driver: bad connection

This error is fairly innocent as of go-mysql-driver v1.5.0. It happens when getting a connection from the pool. On the first write, go-mysql-driver performs a non-blocking read, to detect EOF (connection closed), prior to writing on a possibly closed socket. When it detects this, it returns driver.ErrBadConn. This is fairly harmless, because database/sql will immediately retry on a new connection.

However, i don't think believe that setting max life time or max idle time is the correct solution for silencing these, because it requires knowledge of the server configuration, and changing these every time we point to a different database server. So instead of just connecting the host, user, password and such, we now have to take care of configuring max life time and max idle time.

I believe the solution is fairly simple: Only log these if we're interested in such logging, e.g. debug level.
I will consider creating a pull request for this in the go-mysql-driver project.

@methane
Copy link
Contributor

methane commented Oct 3, 2020

This error is fairly innocent as of go-mysql-driver v1.5.0.

This error is not innocent. This error message is saying the connection is closed not by the driver. This is dangerous because if the connection is closed when you are sending query, your application can not know your query is executed or not. There is no way to handle it safely. You should avoid this error message as possible, instead of ignoring it.

However, i don't think believe that setting max life time or max idle time is the correct solution for silencing these

It is the correct solution to avoid connection closed by not the driver.

it requires knowledge of the server configuration, and changing these every time we point to a different database server. So instead of just connecting the host, user, password and such, we now have to take care of configuring max life time and max idle time.

Yes, it is highly recommended to configure lifetime properly. I suggest "3min" or "less than 5min" for default lifetime, because I have seen connections are closed by 5min in some environment. (See docker/for-mac#2442).

@mastertheknife
Copy link

This error is fairly innocent as of go-mysql-driver v1.5.0.

This error is not innocent. This error message is saying the connection is closed not by the driver. This is dangerous because if the connection is closed when you are sending query, your application can not know your query is executed or not. There is no way to handle it safely. You should avoid this error message as possible, instead of ignoring it.

It is, but only for this specific case: When getting a connection from the pool and running the EOF check to see if the connection is stale.
Only at this case, it can be safely silenced, because nothing was sent to the database at this point. It was a simple non-blocking read to detect EOF.

I should have been more clear about this, i am not saying that this error should be always silenced, its not, but for this specific case, it can be safely silenced.
But should it be silenced? By silencing it, everything appears to work more naturally, with no unexpected messages in the logs. Most production servers have non-default wait time (8 hours) configured, because its a common practice. By silencing it, like you said, it is possible that we will be masking other possible causes for the connection close, something other than wait_timeout.

@methane
Copy link
Contributor

methane commented Oct 4, 2020

It is, but only for this specific case: When getting a connection from the pool and running the EOF check to see if the connection is stale.
Only at this case, it can be safely silenced, because nothing was sent to the database at this point. It was a simple non-blocking read to detect EOF.

I know it but I just don't describe it is safe.
This error message is like gunshot. When someone hear gunshot but the bullet didn't hit them, do you think they can safely ignore the gunshot even thoug next bullet may hit them?
So I don't think "no data lost in this specific time" doesn't mean "error message can be silenced safely".

By silencing it, like you said, it is possible that we will be masking other possible causes for the connection close, something other than wait_timeout.

Yes.

@manoj-00601
Copy link

Hi,
Is this issue resolved ?

@methane
Copy link
Contributor

methane commented Jan 6, 2021

This is not an issue. You should use ConnMaxLifeTime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants