Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: Exec() never returns when IP is not reachable and maxOpenConns is reached #10886

Closed
joew1 opened this issue May 17, 2015 · 10 comments
Milestone

Comments

@joew1
Copy link

joew1 commented May 17, 2015

With go1.4.2, maxOpenConn set to 2, and an unreachable IP:
- two Exec() returns an error
- but the third Exec() never returns

import "database/sql"
import _ "github.com/go-sql-driver/mysql"
...
db := sql.Open("mysql", "root:password@tcp(1.2.3.4:3306)/mysql?timeout=15s")
db.SetMaxOpenConn(2)
db.SetMaxIdleConns(0)
for i := 0; i < 3; i++ {
go func() {
_, err := db.Exec("SELECT SLEEP(5)")
  if err != nil {
    log.Println(err)
  }
  log.Println("Done")
  }
}
...

Adding a call to db.maybeOpenNewConnections() after database/sql/sql.go:667 seems to fix this issue. But I'm not sure whether this is the correct fix or not. Please help. Thanks.

664 ci, err := db.driver.Open(db.dsn)
665 if err != nil {
666 db.mu.Lock()
667 db.numOpen-- // correct for earlier optimism
       db.maybeOpenNewConnections()
668 db.mu.Unlock()
669 return nil, err
670 }
@johto
Copy link
Contributor

johto commented May 19, 2015

The fix looks reasonable to me.

@johto
Copy link
Contributor

johto commented May 19, 2015

Hmm.. Though on further though I think that might open up a race condition. Consider:

  1. numOpen = 1, and conn() decides to open a connection
  2. The connection attempt fails and maybeOpenNewConnections() writes into db.openerCh
  3. Another goroutine calls conn() again, sees numOpen = 1 and decides to open a connection
  4. connectionOpener receives on db.openerCh and opens a third connection

Now you have three connections open, even though maxOpenConns is two.

I'm not sure whether a similar race condition already exists in the code or what the best way to fix it is. The most simple fix appears to be to change this code in conn:

    if db.maxOpen > 0 && db.numOpen >= db.maxOpen

to also take db.pendingOpens into account. But that might break some code path which already works today.

@ianlancetaylor ianlancetaylor added this to the Go1.5Maybe milestone Jun 3, 2015
@rsc rsc modified the milestones: Unplanned, Go1.5Maybe Jul 14, 2015
@ChrisHines
Copy link
Contributor

I have encountered this problem. Can we add it to the Go1.6 milestone? I can help work on the fix if that adds weight to my request.

@potocnyj
Copy link
Contributor

potocnyj commented Sep 2, 2015

I have a potential fix for this sitting in my local checkout. It's a little ugly, but I'd be happy to mail a CL if someone wants to review and propose edits

@minux
Copy link
Member

minux commented Sep 2, 2015 via email

@potocnyj
Copy link
Contributor

potocnyj commented Sep 2, 2015

Of course - the changes proposed above cover the case of db.maxOpen+1 connection attempts being made while we are unable to connect to the database, but in order to cover db.maxOpen+2 or more, I found that there's an additional check required in the connectionOpener routine that processes queued connections. I came up with something like this:

 func (db *DB) connectionOpener() {
        for range db.openerCh {
                db.openNewConnection()
+
+               db.mu.Lock()
+               if db.numOpen == 0 {
+                       // If there are no open connections,
+                       // check to make sure we don't have any connection requests
+                       // waiting for this one to be freed.
+                       // This is done in a separate goroutine so that there is no chance
+                       // for us to cause a deadlock.
+                       go func() {
+                               db.mu.Lock()
+                               db.maybeOpenNewConnections()
+                               db.mu.Unlock()
+                       }()
+               }
+               db.mu.Unlock()
        }
 }

As I said this is kind of ugly, but as far as I can see we need to have some sort of check here if we want to preserve the ability to open connections in parallel.

@ChrisHines
Copy link
Contributor

I have written a test that reproduces this issue. The issue still exists in 1.5.1.

The proposed fix in DB.conn shown below allows the test to pass when db.maxOpen+2 when added to the Go 1.5.1 tree.

    db.numOpen++ // optimistically
    db.mu.Unlock()
    ci, err := db.driver.Open(db.dsn)
    if err != nil {
        db.mu.Lock()
        db.numOpen-- // correct for earlier optimism
        db.maybeOpenNewConnections()        // <--- fix
        db.mu.Unlock()
        return nil, err
    }

@johto Although the scenario you describe does allow a third connection to be opened, the change made in cce127a causes the extra connection to be closed before it is ever added to the pool or services any queries. Your suggestion to change

if db.maxOpen > 0 && db.numOpen >= db.maxOpen

to

if db.maxOpen > 0 && (db.numOpen+db.pendingOpens) >= db.maxOpen

seems correct to me. All database/sql tests still pass when I make that change.

@potocnyj I don't understand how the proposed fix doesn't handle db.maxOpen+2. db.maybeOpenNewConnections can queue up multiple requests per call if needed. Could you walk us through the sequence that doesn't work?

I will submit a CL with the new test and the above changes if there are no objections.

@potocnyj
Copy link
Contributor

@ChrisHines I appear to have misspoken - without the additional change I proposed db.maxOpen+2 works in my test program, but db.maxOpen+3 hangs on the last connection. Adding my change (in addition to what was already proposed) works though, no matter how many connections I open simultaneously.

For reference, the full code I am using to test the failure is here: https://gist.github.com/potocnyj/fe05406010ad73660889

@ChrisHines
Copy link
Contributor

@potocnyj OK, thanks for the correction. Yes, I understand now. Opening db.maxOpen*2+n (for n>0) will not work. Once db.openNewConnection becomes the sole source of new connections—and they fail—we aren't triggering additional attempts to service the still pending requests.

I have additions to the proposed fix that work in this scenario, and it actually simplifies the logic.

I propose we consolidate the numOpen and pendingOpens counters into numOpen and always increment numOpen optimistically. Whenever an attempted connection fails we decrement numOpen and call db.maybeOpenNewConnections. This change simplifies type DB by removing a field. It also simplifies and harmonizes the logic in db.conn, db.openNewConnection and db.maybeOpenNewConnections.

During testing I also uncovered a panic that occurred when DB.Close was called while len(db.connRequests) > 0. DB.Close ranges over db.connRequests and closes each response channel. This resulted in the goroutines waiting on those channels to unblock and operate on a zero value sql.connRequest, which contained both a nil *driverConn and error. When that was returned to db.exec it caused a nil pointer panic. The fix is to use the comma-ok idiom to detect channel closure and return errDBClosed.

With these changes all database/sql tests pass, including the condition @potocnyj highlighted. I think this is good enough to mail a CL, and the changes are difficult to fully appreciate when described here.

@gopherbot
Copy link

CL https://golang.org/cl/14547 mentions this issue.

@golang golang locked and limited conversation to collaborators Oct 17, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants