Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: Using RawBytes and cancelling query context is inherently racy #53970

Closed
mhr3 opened this issue Jul 20, 2022 · 23 comments
Closed
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@mhr3
Copy link

mhr3 commented Jul 20, 2022

What version of Go are you using (go version)?

$ go version
go version go1.18.3 darwin/arm64

Does this issue reproduce with the latest release?

haven't tried 1.19

What did you do?

something along these lines 馃斀

	rows, err := stmt.QueryContext(ctx)
	if err != nil {
		return info, nil, err
	}
	defer rows.Close()

	var data        sql.RawBytes

	for rows.Next() {
		err = rows.Scan(&data)
		if err != nil {
			return info, sel, err
		}

		// if ctx is canceled around this point, rows.Close() will be called and the data can be overwritten, causing a data race
		doWork(data)
	}

What did you expect to see?

No data races, no matter whether the context is cancelled or not.

What did you see instead?

We noticed this when using lib/pq, but upon looking deeper into it, there's not much lib/pq can do, it just sees a call to driver.Rows.Close(), and can't distinguish whether this is coming from within the iteration goroutine or because of context cancellation.
Therefore it reuses the buffer that was previously passed to Scan's RawBytes param and causes a data race because doWork is still reading the data.

@seankhliao
Copy link
Member

I think this is working as designed:
sql.RawBytes is explicitly a reference to database owned memory, and it's the user code's responsibility to ensure it's not used beyond the valid lifetime:

After a Scan into a RawBytes, the slice is only valid until the next call to Next, Scan, or Close.

@mhr3
Copy link
Author

mhr3 commented Jul 20, 2022

But it's the standard library that calls the Close, not user code. Which means the user code has no control over when it needs to stop using the data.

rs.close(ctx.Err())

@seankhliao
Copy link
Member

The user code passes in a context to control the lifetime, it needs to be aware of the lifetime too.

@mhr3
Copy link
Author

mhr3 commented Jul 20, 2022

Are you saying using sql.RawBytes means you cannot pass an externally controlled context into QueryContext()? If so that should be properly documented.

@seankhliao
Copy link
Member

I'm saying that context needs to be propagated into your doWork and the context deadline followed in there too.

cc @kardianos

@mhr3
Copy link
Author

mhr3 commented Jul 20, 2022

RawBytes is just a []byte, there's no way to disallow reading from a slice with a context, so propagating it into doWork() wouldn't help.

@toothrot
Copy link
Contributor

A better place to ask this kind of question is golang-nuts. See https://golang.org/wiki/Questions.

Closing, as this isn't a bug in Go or the standard library. This is working as intended. Feel free to ask on one of the resources linked in Questions for help.

@stevenh
Copy link
Contributor

stevenh commented Jul 25, 2022

As mentioned on golang-nuts, I believe @mhr3 has identified a legitimate issue here, as there is currently no way to use sql.RawBytes safely in code which uses also uses context.

@ianlancetaylor
Copy link
Contributor

CC @kardianos

@cherrymui cherrymui added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 25, 2022
@cherrymui cherrymui added this to the Backlog milestone Jul 25, 2022
@kardianos
Copy link
Contributor

@stevenh

I think you might be correct if calling a query directly from *DB. For when a query is cancelled, that connection and associated buffers may be re-used by another connection.

I believe it would be correct that this is not racy if using RawBytes from a Conn or Tx, assuming the application itself is written correctly. Would you agree?

@stevenh
Copy link
Contributor

stevenh commented Jul 26, 2022

To confirm @kardianos does your second case include code like original example?

func doQuery(ctx context.Context, stmt *sql.Stmt) error {
	rows, err := stmt.QueryContext(ctx)
	if err != nil {
		return err
	}
	defer rows.Close()

	var data  sql.RawBytes
	for rows.Next() {
		if err = rows.Scan(&data); err != nil {
			return err
		}

		// This is racey if ctx is cancelled after rows.Scan complete and before
                // we complete due to the close in QueryContext which can free / reuse
                // the driver memory.
		fmt.Println(data)
	}
        return rows.Err()
}

@kardianos
Copy link
Contributor

On that note, it is difficult to use a Stmt on Tx and maybe not possible to run an existing Stmt on a specific Conn. Given that stmt will pull a random conn from the pool, this will be an issue. We may want to add *Conn.StmtContext where it will use a stmt on a specific conn, like you can today with Tx.

@stevenh
Copy link
Contributor

stevenh commented Jul 26, 2022

That may be useful, but I don't believe it's needed to solve this problem.

@stevenh
Copy link
Contributor

stevenh commented Jul 26, 2022

Looks like both DB.BeginTx and Conn.BeginTx have the same issue as they call go tx.awaitDone() which calls close.

Conn.QueryContext looks OK.

@kardianos
Copy link
Contributor

@stevenh Yes and no. The BeginTx context will trigger a rollback if canceled. So if the context provided by BeginTx is deterministic, that is either context.Background() or WithCancel where the cancel is scoped. Once running in a Tx, the per query context should be alright, because the connection won't be reused.

I can see two solutions to this:

  1. Update the documentation on RawBytes on the specific cases where this is safe.
  2. Provide a new API that prevents the context from closing during an operation. RawBytes documentation could be modified to instruct the user to only use RawBytes within WithLock.

This new API could look like:

// WithLock prevents any additional calls to the database driver.
// Scan may be called within WithLock, but Next may not be.
// Scan RawBytes in WithLock and consider the RawBytes invalid outside of WithLock.
func (r *Rows) WithLock(f func() error) error {
    r.dc.Lock()
    defer r.dc.Unlock()
    return f()
}

Then the pattern for use would be:

var raw sql.RawBytes
for r.Next() {
	err := r.WithLock(func() {
		err := r.Scan(&raw)
		if err != nil {
			return err
		}
		fmt.Println(data)
	})
	if err != nil {
		return err
	}
}

I think this should work: Next calls closemu.RLock and dc.Lock, but r.Scan only calls closemu.RLock. By locking the calls to the driver, the buffer provided by the driver can remain un-touched.

@stevenh
Copy link
Contributor

stevenh commented Sep 5, 2022

While that seems like a potential, I suspect it would lead to continued issues because its a fragile API for consumers, relying on knowledge of the buffer type which may not be know depending on how abstractions are constructed.

Instead I would suggest we investigate changing the flow so that close triggered by context calls such as QueryContext can only occur before the function returns, if that is even desirable?

After that it should be the callers responsibility, which matches the current docs as I can't find any reference in these methods that tells the uses rows could be closed under the hood except when interaction completes.

@kardianos
Copy link
Contributor

@stevenh Only canceling while in the QueryContext function is not something I would support.

On SQL DB Servers, query execution context is extremely important. If you have an operation that effectively select * from TBLongTable; the query (in a good DB) will return quickly, but the rows will continue on for a long time. Rows must be able to hook into the same cancellation policy. Furthermore, because the context is passed into the driver, this makes it difficult to not let it be canceled right away.

@stevenh
Copy link
Contributor

stevenh commented Sep 6, 2022

To clarify I wasn't suggesting not cancelling, as you said that wouldn't work, the intent was for us to specifically to not call Close.

If consider the following:

func doQuery(ctx context.Context, stmt *sql.Stmt) error {
	rows, err := stmt.QueryContext(ctx)
	if err != nil {
		return err
	}

	defer rows.Close()
       ...
}

If QueryContext doesn't return an error the user is not expecting rows.Close() to be called irrespective of ctx being cancelled. If we can ensure that is the case I believe we'll solve the issue at hand.

@rsc
Copy link
Contributor

rsc commented May 19, 2023

Closing as a duplicate of #60304, which has a crisper presentation of the problem. Apologies for not understanding this problem better when it was filed.

@rsc rsc closed this as not planned Won't fix, can't repro, duplicate, stale May 19, 2023
@bradfitz
Copy link
Contributor

Sorry, I should've searched & found this bug before I filed my dup.

@gopherbot
Copy link

Change https://go.dev/cl/497675 mentions this issue: database/sql: fix use of Rows with context + RawBytes

gopherbot pushed a commit that referenced this issue May 24, 2023
sql.RawBytes was added the very first Go release, Go 1. Its docs
say:

> RawBytes is a byte slice that holds a reference to memory owned by
> the database itself. After a Scan into a RawBytes, the slice is only
> valid until the next call to Next, Scan, or Close.

That "only valid until the next call" bit was true at the time,
until contexts were added to database/sql in Go 1.8.

In the past ~dozen releases it's been unsafe to use QueryContext with
a context that might become Done to get an *sql.Rows that's scanning
into a RawBytes. The Scan can succeed, but then while the caller's
reading the memory, a database/sql-managed goroutine can see the
context becoming done and call Close on the database/sql/driver and
make the caller's view of the RawBytes memory no longer valid,
introducing races, crashes, or database corruption. See #60304
and #53970 for details.

This change does the minimal surgery on database/sql to make it safe
again: Rows.Scan was already acquiring a mutex to check whether the
rows had been closed, so this change make Rows.Scan notice whether
*RawBytes was used and, if so, doesn't release the mutex on exit
before returning. That mean it's still locked while the user code
operates on the RawBytes memory and the concurrent context-watching
goroutine to close the database still runs, but if it fires, it then
gets blocked on the mutex until the next call to a Rows method (Next,
NextResultSet, Err, Close).

Updates #60304
Updates #53970 (earlier one I'd missed)

Change-Id: Ie41c0c6f32c24887b2f53ec3686c2aab73a1bfff
Reviewed-on: https://go-review.googlesource.com/c/go/+/497675
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
@stevenh
Copy link
Contributor

stevenh commented May 24, 2023

Great to see this fixed, thanks @bradfitz!

bradfitz added a commit to tailscale/go that referenced this issue May 25, 2023
sql.RawBytes was added the very first Go release, Go 1. Its docs
say:

> RawBytes is a byte slice that holds a reference to memory owned by
> the database itself. After a Scan into a RawBytes, the slice is only
> valid until the next call to Next, Scan, or Close.

That "only valid until the next call" bit was true at the time,
until contexts were added to database/sql in Go 1.8.

In the past ~dozen releases it's been unsafe to use QueryContext with
a context that might become Done to get an *sql.Rows that's scanning
into a RawBytes. The Scan can succeed, but then while the caller's
reading the memory, a database/sql-managed goroutine can see the
context becoming done and call Close on the database/sql/driver and
make the caller's view of the RawBytes memory no longer valid,
introducing races, crashes, or database corruption. See golang#60304
and golang#53970 for details.

This change does the minimal surgery on database/sql to make it safe
again: Rows.Scan was already acquiring a mutex to check whether the
rows had been closed, so this change make Rows.Scan notice whether
*RawBytes was used and, if so, doesn't release the mutex on exit
before returning. That mean it's still locked while the user code
operates on the RawBytes memory and the concurrent context-watching
goroutine to close the database still runs, but if it fires, it then
gets blocked on the mutex until the next call to a Rows method (Next,
NextResultSet, Err, Close).

Updates golang#60304
Updates golang#53970 (earlier one I'd missed)

Change-Id: Ie41c0c6f32c24887b2f53ec3686c2aab73a1bfff
Reviewed-on: https://go-review.googlesource.com/c/go/+/497675
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
bradfitz added a commit to tailscale/go that referenced this issue May 25, 2023
sql.RawBytes was added the very first Go release, Go 1. Its docs
say:

> RawBytes is a byte slice that holds a reference to memory owned by
> the database itself. After a Scan into a RawBytes, the slice is only
> valid until the next call to Next, Scan, or Close.

That "only valid until the next call" bit was true at the time,
until contexts were added to database/sql in Go 1.8.

In the past ~dozen releases it's been unsafe to use QueryContext with
a context that might become Done to get an *sql.Rows that's scanning
into a RawBytes. The Scan can succeed, but then while the caller's
reading the memory, a database/sql-managed goroutine can see the
context becoming done and call Close on the database/sql/driver and
make the caller's view of the RawBytes memory no longer valid,
introducing races, crashes, or database corruption. See golang#60304
and golang#53970 for details.

This change does the minimal surgery on database/sql to make it safe
again: Rows.Scan was already acquiring a mutex to check whether the
rows had been closed, so this change make Rows.Scan notice whether
*RawBytes was used and, if so, doesn't release the mutex on exit
before returning. That mean it's still locked while the user code
operates on the RawBytes memory and the concurrent context-watching
goroutine to close the database still runs, but if it fires, it then
gets blocked on the mutex until the next call to a Rows method (Next,
NextResultSet, Err, Close).

Updates golang#60304
Updates golang#53970 (earlier one I'd missed)

Change-Id: Ie41c0c6f32c24887b2f53ec3686c2aab73a1bfff
Reviewed-on: https://go-review.googlesource.com/c/go/+/497675
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
bradfitz added a commit to tailscale/go that referenced this issue May 25, 2023
sql.RawBytes was added the very first Go release, Go 1. Its docs
say:

> RawBytes is a byte slice that holds a reference to memory owned by
> the database itself. After a Scan into a RawBytes, the slice is only
> valid until the next call to Next, Scan, or Close.

That "only valid until the next call" bit was true at the time,
until contexts were added to database/sql in Go 1.8.

In the past ~dozen releases it's been unsafe to use QueryContext with
a context that might become Done to get an *sql.Rows that's scanning
into a RawBytes. The Scan can succeed, but then while the caller's
reading the memory, a database/sql-managed goroutine can see the
context becoming done and call Close on the database/sql/driver and
make the caller's view of the RawBytes memory no longer valid,
introducing races, crashes, or database corruption. See golang#60304
and golang#53970 for details.

This change does the minimal surgery on database/sql to make it safe
again: Rows.Scan was already acquiring a mutex to check whether the
rows had been closed, so this change make Rows.Scan notice whether
*RawBytes was used and, if so, doesn't release the mutex on exit
before returning. That mean it's still locked while the user code
operates on the RawBytes memory and the concurrent context-watching
goroutine to close the database still runs, but if it fires, it then
gets blocked on the mutex until the next call to a Rows method (Next,
NextResultSet, Err, Close).

Updates golang#60304
Updates golang#53970 (earlier one I'd missed)

Change-Id: Ie41c0c6f32c24887b2f53ec3686c2aab73a1bfff
Reviewed-on: https://go-review.googlesource.com/c/go/+/497675
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
@gopherbot
Copy link

Change https://go.dev/cl/498398 mentions this issue: database/sql: fix regression from earlier RawBytes fix

bradfitz added a commit to tailscale/go that referenced this issue May 25, 2023
The earlier CL 497675 for golang#60304 introduced a behavior change
that, while not strictly a bug, caused a bunch of test failures
in a large codebase. Rather than add behavior changes in a 10 year
old package, revert to the old behavior: a context cancelation
between Rows.Next reporting false and a call to Rows.Err should
not result in Rows.Err returning the context error.

That behavior was accidentally added in CL 497675 as part of changing
how contexts and Rows iteration worked.

Updates golang#60304
Updates golang#53970

Change-Id: I22f8a6a6b0b5a94b430576cf50e015efd01ec652
gopherbot pushed a commit that referenced this issue May 26, 2023
The earlier CL 497675 for #60304 introduced a behavior change
that, while not strictly a bug, caused a bunch of test failures
in a large codebase. Rather than add behavior changes in a 10 year
old package, revert to the old behavior: a context cancelation
between Rows.Next reporting false and a call to Rows.Err should
not result in Rows.Err returning the context error.

That behavior was accidentally added in CL 497675 as part of changing
how contexts and Rows iteration worked.

Updates #60304
Updates #53970

Change-Id: I22f8a6a6b0b5a94b430576cf50e015efd01ec652
Reviewed-on: https://go-review.googlesource.com/c/go/+/498398
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

10 participants