Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: database/sql: add SetMaxBadConnRetries #52886

Closed
jinzhu opened this issue May 13, 2022 · 23 comments
Closed

proposal: database/sql: add SetMaxBadConnRetries #52886

jinzhu opened this issue May 13, 2022 · 23 comments

Comments

@jinzhu
Copy link
Contributor

jinzhu commented May 13, 2022

This method adds the ability to sets the number of maximum retries if the driver returns driver.ErrBadConn

Currently, database/sql has a hardcoded maxBadConnRetries, which is set to 2, and it will rerun the SQL if the driver returns a bad connection error, which is causing problems in many cases.

For example, if we have a long-running INSERT SQL, the first try might be killed due to TCP timeout or whatever reason, and the next retry will insert duplicated data, which is confusing to end-users.

With this option, we can set the max bad retry to 0, and write our retry logic to avoid the issue. and it also opens the opportunity to do some optimizations like https://cacm.acm.org/magazines/2013/2/160173-the-tail-at-scale/fulltext

Related issues:

#48309
#15608

Implementation:

I have tried to submit a possible implementation: https://go-review.googlesource.com/c/go/+/404935

@gopherbot
Copy link

Change https://go.dev/cl/404935 mentions this issue: database/sql: add SetMaxBadConnRetries

@ianlancetaylor
Copy link
Contributor

CC @kardianos

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) May 13, 2022
@kardianos
Copy link
Contributor

To prevent a lock on max retries, I would prefer this to be set in OpenDB from the Connector. Maybe the connector could be have a configuration optional interface. Unsure. However, the associated CL is not correct, it doesn't lock correctly. It also changes the BadConn error I think.

Should we do this? Probably. I had ideally wanted to create a v2 sql package, but realistically my time is very limited. So if this was set when the DB is created so no lock is required, I'd be alright.

Another option would be 2 retries or no retires.

I'm okay with accepting this, conditionally that a good design can be found.

@rsc
Copy link
Contributor

rsc commented May 18, 2022

We can avoid the lock by setting this value atomically, I think.
So the API should be OK either way.

@rsc rsc moved this from Incoming to Active in Proposals (old) May 18, 2022
@rsc
Copy link
Contributor

rsc commented May 18, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@jinzhu
Copy link
Contributor Author

jinzhu commented May 19, 2022

Thank you for your review, fixed the lock issue with atomic.

@MatthewJamesBoyle
Copy link

This seems a good proposal to me.

A couple of thoughts:

  • Does it make sense to have an upper bound on the amount of retries we allow a user to configure?
  • As far as I can see, the retry strategies today do not perform any exponential back off/jitter. This was likely fine for most use cases since the maximum was 2, but as users may put higher numbers here, we may want to do more to prevent a thundering herd problem.

@rsc
Copy link
Contributor

rsc commented May 25, 2022

The CL would be a lot clearer if the code was not being refactored at the same time. Can you instead add a db.maxBadConnRetries() method and replace all the i < maxBadConnRetries with i < db.maxBadConnRetries() ?

@jinzhu
Copy link
Contributor Author

jinzhu commented May 26, 2022

The CL would be a lot clearer if the code was not being refactored at the same time. Can you instead add a db.maxBadConnRetries() method and replace all the i < maxBadConnRetries with i < db.maxBadConnRetries() ?

Ok, will do that.

@jinzhu
Copy link
Contributor Author

jinzhu commented May 26, 2022

The CL would be a lot clearer if the code was not being refactored at the same time. Can you instead add a db.maxBadConnRetries() method and replace all the i < maxBadConnRetries with i < db.maxBadConnRetries() ?

but in the old style of code, we can't change the maxBadConnRetries to 0, or it will never reuse connection from pool, is it ok?

@jinzhu
Copy link
Contributor Author

jinzhu commented Jun 3, 2022

The CL would be a lot clearer if the code was not being refactored at the same time. Can you instead add a db.maxBadConnRetries() method and replace all the i < maxBadConnRetries with i < db.maxBadConnRetries() ?

Hello @rsc , which connReuseStrategy should we use if the maxBadConnRetries set to 0? alwaysNewConn or cachedOrNewConn?

@rsc
Copy link
Contributor

rsc commented Jun 8, 2022

@kardianos Do you have any suggestions or preferences here?

@kardianos
Copy link
Contributor

@jinzhu Can you explain to me when you would want a connection retry other then (default=2 or zero)?

This "retry" is to try to take a connection out of the connection pool, then if it is not a valid connection, return an error with BadConn. It is fundamentally about connection pooling, though the side effect might be to re-run a query if a driver mis-reports the connection to be bad, when it actually had side effects.

Turning off this connection retry I can understand: the driver is (now) provided many ways to report a connection as bad, when we actually attempt a query, we want to give the user an option to not retry.

Alternatively, the option to not retry could be implemented as a query parameter say, const NoRetry ..., then pass that into a param in the query.

So questions:

  1. Can you justify any other option besides Default and Off?
  2. Would the use case always be for the entire connection pool? Or would this be better per query?

To answer your question:

Hello @rsc , which connReuseStrategy should we use if the maxBadConnRetries set to 0? alwaysNewConn or cachedOrNewConn?

If max retries is set to zero, you must use cachedOrNewConn. If you used alwaysNew, then it would negate the entire point of a connection pool.

@jinzhu
Copy link
Contributor Author

jinzhu commented Jun 22, 2022

Hi @kardianos ,

Thank you for your response

@jinzhu Can you explain to me when you would want a connection retry other then (default=2 or zero)?

From my use case, default or zero should be enough.

Alternatively, the option to not retry could be implemented as a query parameter say, const NoRetry ..., then pass that into a param in the query.

This sounds make things even more tricky ;)

Compare to per query solution, I would prefer to set up two database pools with different retry option in this case, which works perfectly in read-write splitting cases.

@kardianos
Copy link
Contributor

kardianos commented Jun 22, 2022 via email

@jinzhu
Copy link
Contributor Author

jinzhu commented Jun 22, 2022

So it sounds like you do have a case where some connections (reads) would retry, and write would not.

yes, sounds reasonable in some cases, but in our case, we would like to disable retry totally for https://cacm.acm.org/magazines/2013/2/160173-the-tail-at-scale/fulltext

Do you mean tricky from a db user or implementation?

For a db user

@jinzhu
Copy link
Contributor Author

jinzhu commented Jun 27, 2022

Hi @kardianos

for the implementation https://go-review.googlesource.com/c/go/+/404935 , do you have any suggestions?

@rsc
Copy link
Contributor

rsc commented Jul 20, 2022

ping @kardianos

@kardianos
Copy link
Contributor

@jinzhu I like how your CL simplifies existing code. I can see the use case.

Upon examining the code more, only queries directly on the DB and Stmt use maxBadConnRetries. Conn and Tx do not (because once the connection is acquired, it is assumed to be good. As such, if you need ensure no retries, you can pull a dedicated *Conn or start a Tx. Most DB systems have a way to automatically roll back a transaction if the underlying network connection is terminated.

  • I would be okay merging your CL without the configuration in the DB, just as a simplification where retry gets the retry count from the const rather then the DB atomic.
  • I would be okay possibly adding a *Conn.StmtContext(*Stmt) *Stmt to allow calling a stmt from a specific Conn.
  • I would consider maybe a switch just to turn off the bad conn retries for a pool; I don't think I would want to allow any N to be configured.

Can you describe why we would want to make the retry configurable (on/off), rather then starting a Tx or grabbing an explicit Conn?

@jinzhu
Copy link
Contributor Author

jinzhu commented Jul 25, 2022

Hi @kardianos

if you need ensure no retries, you can pull a dedicated *Conn or start a Tx. Most DB systems have a way to automatically roll back a transaction if the underlying network connection is terminated.

Yeah, this should work for me. 👍

Then I will go to option #1, just created another PR, thank you for your review.

https://go-review.googlesource.com/c/go/+/419182

@rsc
Copy link
Contributor

rsc commented Jul 27, 2022

If we're down to CL 419182, it sounds like there are no API changes being proposed anymore, in which case we can close this issue as retracted. Do I have that right?

@jinzhu
Copy link
Contributor Author

jinzhu commented Jul 28, 2022

If we're down to CL 419182, it sounds like there are no API changes being proposed anymore, in which case we can close this issue as retracted. Do I have that right?

Yes, going to close this one, thank you all for the reviews.

@jinzhu jinzhu closed this as completed Jul 28, 2022
@rsc rsc moved this from Active to Declined in Proposals (old) Aug 3, 2022
@rsc
Copy link
Contributor

rsc commented Aug 3, 2022

This proposal has been declined as retracted.
— rsc for the proposal review group

@golang golang locked and limited conversation to collaborators Aug 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

6 participants