database/sql: add extra locking when calling driver #21117

blinsay · 2017-07-21T20:31:33Z

What version of Go are you using (`go version`)?

go version go1.8.3 darwin/amd64

What did you do?

Unlike Conn or Stmt there's no guidance on Tx indicating how the stdlib uses implementations. This seems to cause bugs in driver implementations, as a Tx is expected to handle concurrent calls from multiple goroutines unlike most of the other interfaces in the package.

This came up as I was debugging a panic inside of lib/pq and discovered it was a data race due to the authors not assuming that their Tx would have to deal with calls from multiple goroutines (the repro is here. It slowly adjusts timeouts to try and make the Context time out as results are coming back). Out of curiosity, I looked at go-sql-driver/mysql and the lack of a race there seems like a happy accident - the Context expiring occasionally causes an internal error to bubble up, since the authors appear to aggressively check their invariants.

I'm not sure whether this is a documentation problem or if there's something database/sql could do to make life easier for driver implementors.

The text was updated successfully, but these errors were encountered:

odeke-em · 2017-07-21T20:49:56Z

/cc @kardianos @bradfitz

chrispassas · 2017-08-30T03:38:12Z

In my case I have a pool of database connections. I open a transaction and run queries. If I use BeginTx() I get an crash panic overtime.

Part of why I need a transaction is so I can use postgresql statement_timeout. The only way to use it with a connection pool is to use a transaction.

If I just use Begin() without context it never crashes. In my case though I need to be able to cancel the transaction if conditions are met in another goroutine.

blinsay · 2017-08-30T04:05:46Z

Hey y'all,

It looks like driver libs are still tossing out panics in Go 1.9. The underlying problem with the way the stdlib and the drivers are misunderstanding each other seems like it's still there. This comment has some more info (race detector output, panic traces, and a reliable repro) on how lib/pq and database/sql interact to panic.

kardianos · 2017-08-31T17:25:08Z

This might not be a documentation issue but a driver issue.
row.Next locks on closemu, but doesn't lock on the driver connection lock. If DB.Next operations also locked on the dc, then we would probably correct this issue. Could someone apply the following diff locally and see if it fixes it? I need to do a fuller evaluation of what is going on as well.

In database/sql/sql.go

diff --git a/src/database/sql/sql.go b/src/database/sql/sql.go
index c609fe4..4f2c9f9 100644
--- a/src/database/sql/sql.go
+++ b/src/database/sql/sql.go
@@ -2454,6 +2454,9 @@ func (rs *Rows) nextLocked() (doClose, ok bool) {
        if rs.lastcols == nil {
                rs.lastcols = make([]driver.Value, len(rs.rowsi.Columns()))
        }
+       rs.dc.Lock()
+       defer rs.dc.Unlock()
+
        rs.lasterr = rs.rowsi.Next(rs.lastcols)
        if rs.lasterr != nil {
                // Close the connection if there is a driver error.
@@ -2503,6 +2506,8 @@ func (rs *Rows) NextResultSet() bool {
                doClose = true
                return false
        }
+       rs.dc.Lock()
+       defer rs.dc.Unlock()
        rs.lasterr = nextResultSet.NextResultSet()
        if rs.lasterr != nil {
                doClose = true

I need to do a full evaluation of this solution. But you could try it out and see if the issue goes away.

blinsay · 2017-09-01T01:39:29Z

I applied the patch and that seems to fix the panic and satisfy the race detector!

Now that the log lines aren't mostly race detector output/panic stacktrace I've started noticing that quite a lot of the errors that come back are sql.ErrNoRows. That seems surprising to me.

Here's some sample data after running the repro for a few minutes:

$ ~/oss/go/bin/go run -race repro.go &> repro.log

# wait a while, grab some coffee

$ grep -E -o "error=.*" < repro.log | sort | uniq -c
  46 error="context deadline exceeded"
   1 error="driver: bad connection"
3017 error="pq: canceling statement due to user request"
  42 error="sql: Rows are closed"
1369 error="sql: no rows in result set"

kardianos · 2017-09-01T15:31:19Z

@blinsay Good catch. Next step will be to determine why it is reporting no rows sometimes. I suspect it may be driver issue of returning the wrong error (NoRows rather then ctx.Error), but I haven't looked.

gopherbot · 2017-09-23T22:55:18Z

Change https://golang.org/cl/65731 mentions this issue: database/sql: prevent race in driver by locking dc in Next

chrispassas · 2017-09-25T15:09:40Z

Once fixed can this be released in go 1.9.1 or 1.9.2? This is causing me problems in my application and waiting for 1.10 would be hard.

Thank you

chrispassas · 2017-10-02T13:17:25Z

I'm not that familiar with the Go release process. This issue appears to be fixed but sitting in Code Review. What comes next? Is there any hope of getting this in a 1.9.x release?

https://go-review.googlesource.com/c/go/+/65731

ianlancetaylor · 2017-10-03T00:59:03Z

I've marked this for consideration for a 1.9 release.

rsc · 2017-10-13T18:39:16Z

Every time we touch the locking in database/sql it seems like we introduce a deadlock. What is the condition under which the problem being fixed happens? Are we sure that the expected benefit here outweighs the very real chance of introducing a deadlock in some other case?

rsc · 2017-10-13T18:40:19Z

Also, is this a regression from Go 1.8? If not, then we really should leave it alone and let the ordinary Go 1.10 testing happen.

kardianos · 2017-10-13T19:03:15Z

@rsc I don't believe this is a regression from go1.8.

The benefit is it prevents real races in real programs.

In this case the risk of deadlock seems non-existent. Unlike previous issues, this just using the existing locking in *driverConn as it is meant to be used, locking for the short period of time around driver connection calls. The only way this could introduce a deadlock is if a driver calls some public sql API referencing the same connection from within the driver.Next, which would be a blatant driver issue.

I offer no opinion on when it should be released.

gopherbot · 2017-10-17T23:02:14Z

Change https://golang.org/cl/71450 mentions this issue: Revert "database/sql: prevent race in driver by locking dc in Next"

gopherbot · 2017-10-17T23:18:22Z

Change https://golang.org/cl/71433 mentions this issue: database/sql: ensure all driver interfaces are called under single lock

This reverts commit 897080d. Reason for revert: Fails to fix all the locking issues. Updates #21117 Change-Id: I6fc9cb7897244d6e1af78c089a2bf383258ec049 Reviewed-on: https://go-review.googlesource.com/71450 Reviewed-by: Ian Lance Taylor <iant@golang.org>

gopherbot · 2017-10-18T02:02:03Z

Change https://golang.org/cl/71510 mentions this issue: [release-branch.go1.9] database/sql: prevent race in driver by locking dc in Next

rsc · 2017-10-19T03:08:01Z

@blinsay and @chrispassas and anyone else seeing these crashes: I am assuming you are using Linux. Can you please try https://golang-release-staging.storage.googleapis.com/go1.9.2/go1.9.2rc2.linux-amd64.tar.gz and see if the problem disappears? Thank you.

chrispassas · 2017-10-19T03:38:44Z

@rsc I just tried with 1.9.1 and can reproduce the crash reliably in a few seconds.

Using 1.9.2rc2 I've let it run for a few minutes 3 times with 0 crashes.

rsc · 2017-10-19T03:41:22Z

@chrispassas, thanks very much. Based on that, I'm comfortable with including that fix in the final Go 1.9.2.

chrispassas · 2017-10-19T03:43:45Z

@rsc Thank you, I'm looking forward to having it!

rsc · 2017-10-19T23:54:55Z

OK, I am going to approve the cherry-pick to Go 1.9.2 over in #22314 and leave this bug open to track additional locking fixes in Go 1.10.

…g dc in Next Database drivers should be called from a single goroutine to ease driver's design. If a driver chooses to handle context cancels internally it may do so. The sql package violated this agreement when calling Next or NextResultSet. It was possible for a concurrent rollback triggered from a context cancel to call a Tx.Rollback (which takes a driver connection lock) while a Rows.Next is in progress (which does not tack the driver connection lock). The current internal design of the sql package is each call takes roughly two locks: a closemu lock which prevents an disposing of internal resources (assigning nil or removing from lists) and a driver connection lock that prevents calling driver code from multiple goroutines. Fixes #21117 Change-Id: Ie340dc752a503089c27f57ffd43e191534829360 Reviewed-on: https://go-review.googlesource.com/65731 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-on: https://go-review.googlesource.com/71510 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Theophanes <kardianos@gmail.com>

cespare · 2017-12-21T05:03:43Z

I'm reopening this because the commit that auto-closed it was reverted afterwards by 292366e

Reason for revert: Fails to fix all the locking issues.

AFAICT this issue has not been addressed for 1.10. Feel free to close if that's not correct. (I was brought here by some serious Tx+context races/crashes in both 1.9.2 and 1.10beta1.)

chrispassas · 2017-12-21T05:07:56Z

The fix that went into version 1.9.2 did fix the crashes for QueryContext. If I use ExecContext and PrepareContext I do get crashes fairly often. It looked like right after 1.9.2 was tagged they went back and fixed the muteness for 1.10.

If I can get some time this week I'll do some test with 1.10 beta 1 to see if it still crashes.

kardianos · 2017-12-21T17:37:28Z

The fix for this did get in shortly after that revert. Yes, please verify @chrispassas , but as far as I know this is fixed.

cespare · 2017-12-21T18:47:08Z

@kardianos can you please link the fix here?

kardianos · 2017-12-21T19:18:32Z

1126d14
https://golang.org/cl/71433

cespare · 2017-12-22T18:02:56Z

@kardianos thanks, I missed that.

After running my crashing code with on tip + CL 85175 (for #23208) I don't see any issues.

I'm going to re-close this as we don't seem to have any reason to believe that there's still a locking issue here. We can revisit if @chrispassas finds something.

chrispassas · 2017-12-22T20:02:03Z

I have tested version 1.9.2 and QueryContext is good but ExecContext will trigger a depending how often your calling it.

I still need to test that 1.10-beta1 ExecContext is OK.

gopherbot added the Documentation label Jul 21, 2017

odeke-em changed the title ~~unclear docs causing races in sql/driver implementations~~ database/sql: unclear docs for Tx cause races in sql/driver implementations Jul 21, 2017

kardianos self-assigned this Jul 21, 2017

blinsay mentioned this issue Aug 29, 2017

panic when a Context is cancelled lib/pq#614

Closed

ianlancetaylor added NeedsFix and removed Documentation labels Oct 3, 2017

gopherbot added the Documentation label Oct 3, 2017

ianlancetaylor added this to the Go1.9.1 milestone Oct 3, 2017

RJPercival mentioned this issue Oct 3, 2017

Possible race condition in storage Rollback() v begin() google/trillian#617

Closed

rsc modified the milestones: Go1.9.1, Go1.9.2 Oct 4, 2017

gopherbot closed this as completed in 897080d Oct 13, 2017

rsc reopened this Oct 13, 2017

rsc added NeedsDecision and removed NeedsFix Documentation labels Oct 13, 2017

rsc changed the title ~~database/sql: unclear docs for Tx cause races in sql/driver implementations~~ database/sql: add extra locking when calling driver Oct 13, 2017

rsc mentioned this issue Oct 15, 2017

all: Go 1.9.2 release status #22277

Closed

rsc mentioned this issue Oct 18, 2017

database/sql: missing locking in certain Tx calls [Go 1.9] #22314

Closed

rsc modified the milestones: Go1.9.2, Go1.10 Oct 19, 2017

gopherbot closed this as completed in 1126d14 Oct 25, 2017

cespare reopened this Dec 21, 2017

cespare mentioned this issue Dec 21, 2017

database/sql: concurrent (*Tx).Stmt calls may crash when Tx is canceled #23208

Closed

cespare closed this as completed Dec 22, 2017

methane mentioned this issue Jan 4, 2018

Datarace on context cancellation go-sql-driver/mysql#728

Closed

felixge mentioned this issue Jan 23, 2018

database/sql: QueryContext() and Scan() data race #23519

Closed

golang locked and limited conversation to collaborators Dec 22, 2018

gopherbot added the FrozenDueToAge label Dec 22, 2018

rsc unassigned kardianos Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

database/sql: add extra locking when calling driver #21117

database/sql: add extra locking when calling driver #21117

blinsay commented Jul 21, 2017 •

edited

Loading

odeke-em commented Jul 21, 2017

chrispassas commented Aug 30, 2017

blinsay commented Aug 30, 2017 •

edited

Loading

kardianos commented Aug 31, 2017

blinsay commented Sep 1, 2017

kardianos commented Sep 1, 2017

gopherbot commented Sep 23, 2017

chrispassas commented Sep 25, 2017 •

edited

Loading

chrispassas commented Oct 2, 2017

ianlancetaylor commented Oct 3, 2017

rsc commented Oct 13, 2017

rsc commented Oct 13, 2017

kardianos commented Oct 13, 2017

gopherbot commented Oct 17, 2017

gopherbot commented Oct 17, 2017

gopherbot commented Oct 18, 2017

rsc commented Oct 19, 2017

chrispassas commented Oct 19, 2017

rsc commented Oct 19, 2017

chrispassas commented Oct 19, 2017

rsc commented Oct 19, 2017

cespare commented Dec 21, 2017

chrispassas commented Dec 21, 2017

kardianos commented Dec 21, 2017

cespare commented Dec 21, 2017

kardianos commented Dec 21, 2017

cespare commented Dec 22, 2017

chrispassas commented Dec 22, 2017

database/sql: add extra locking when calling driver #21117

database/sql: add extra locking when calling driver #21117

Comments

blinsay commented Jul 21, 2017 • edited Loading

What version of Go are you using (go version)?

What did you do?

odeke-em commented Jul 21, 2017

chrispassas commented Aug 30, 2017

blinsay commented Aug 30, 2017 • edited Loading

kardianos commented Aug 31, 2017

blinsay commented Sep 1, 2017

kardianos commented Sep 1, 2017

gopherbot commented Sep 23, 2017

chrispassas commented Sep 25, 2017 • edited Loading

chrispassas commented Oct 2, 2017

ianlancetaylor commented Oct 3, 2017

rsc commented Oct 13, 2017

rsc commented Oct 13, 2017

kardianos commented Oct 13, 2017

gopherbot commented Oct 17, 2017

gopherbot commented Oct 17, 2017

gopherbot commented Oct 18, 2017

rsc commented Oct 19, 2017

chrispassas commented Oct 19, 2017

rsc commented Oct 19, 2017

chrispassas commented Oct 19, 2017

rsc commented Oct 19, 2017

cespare commented Dec 21, 2017

chrispassas commented Dec 21, 2017

kardianos commented Dec 21, 2017

cespare commented Dec 21, 2017

kardianos commented Dec 21, 2017

cespare commented Dec 22, 2017

chrispassas commented Dec 22, 2017

blinsay commented Jul 21, 2017 •

edited

Loading

What version of Go are you using (`go version`)?

blinsay commented Aug 30, 2017 •

edited

Loading

chrispassas commented Sep 25, 2017 •

edited

Loading