Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: cancelled context in a transaction gives a variable error #43507

Open
natefinch opened this issue Jan 5, 2021 · 6 comments
Open
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@natefinch
Copy link
Contributor

natefinch commented Jan 5, 2021

Summary

When you're running a transaction with a context, and the context get cancelled, the error you get back when you go to commit is not always the same. Sometimes you see sql.ErrTxDone and sometimes you see context.Cancelled. If you commit after the context is cancelled but before database/sql rolls back the transaction in response to the context being done, then you get context.Cancelled. If database/sql finishes the rollback before you call Commit, then you get sql.ErrTxDone.

This is very confusing, because context.Cancelled is normal and expected in API handlers, but sql.ErrTxDone usually means there's a bug in the code (i.e. trying to use a transaction that's already been finished). We spent way too long at work worrying over this "bug" in our code before finally figuring out that this is just a different error sometimes returned when a context gets cancelled.

The big problem is that when you get sql.ErrTxDone, there's no way to distinguish between "hey dummy, you already called Commit or Rollback, that's a bug" and "the context you were using got cancelled, so we rolled back, no big deal".

Ideally I'd like the sql.Tx to record when it rolls back a transaction due to a cancelled context, so when I get sql.ErrTxDone, I can check that and know for sure if someone's introduced a bug, or if it's just a cancelled context.

i.e.

err := tx.Commit()
if err == sql.ErrTxDone && tx.WasCancelled() {
    return context.Cancelled
}
return err

(or some such.... I mean, ideally I would have just had tx.Commit return context.Cancelled in the first place, but that would be a behavior change that I presume is Not Allowed)

What version of Go are you using (go version)?

1.15.6

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/finchnat/Library/Caches/go-build"
GOENV="/Users/finchnat/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/finchnat/pkg/mod"
GONOPROXY="github.com/Mattel/*"
GONOSUMDB="github.com/Mattel/*"
GOOS="darwin"
GOPATH="/Users/finchnat"
GOPRIVATE="github.com/Mattel/*"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/finchnat/sdk/go1.15.6"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/finchnat/sdk/go1.15.6/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/finchnat/mattel/mcpp/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/ch/4d1vgsv17jq0b3yfmnkl9thdk0jm7d/T/go-build240587215=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

func main() {
	if err := run(); err != nil {
		log.Fatal(err)
	}
}

func run() error {
	db, err := sql.Open("postgres", "your connection string here")
	if err != nil {
		return fmt.Errorf("error opening database: %w", err)
	}
	defer db.Close()
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()
	tx, err := db.BeginTx(ctx, nil)
	if err != nil {
		return fmt.Errorf("error beginning transaction: %w", err)
	}
	// this is ok because rollback after a commit becomes a noop.
	defer tx.Rollback()
	count := 0
	err = tx.QueryRowContext(ctx, "select count(1) from users").Scan(&count)
	if err != nil {
		return fmt.Errorf("error from QueryRowContext: %w", err)
	}
	cancel()
	time.Sleep(time.Second)
	if err := tx.Commit(); err != nil {
		return fmt.Errorf("error from commit: %w", err)
	}
	return nil
}

Note that if you remove the time.Sleep, you get context.Cancelled and if you leave it in there you get sql.ErrTxDone.

What did you expect to see?

"context cancelled"

What did you see instead?

"transaction already committed or rolled back"

@natefinch natefinch changed the title database/sql: cancelled context in a transaction gives a variable error message database/sql: cancelled context in a transaction gives a variable error Jan 5, 2021
@toothrot toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 5, 2021
@toothrot toothrot added this to the Backlog milestone Jan 5, 2021
@toothrot
Copy link
Contributor

toothrot commented Jan 5, 2021

/cc @bradfitz @kardianos

@kardianos
Copy link
Contributor

It would be hard to change this behavior without breaking existing code. Not only that, this may be loose information that is needed. In other words, it is meaningful to know if the Tx has committed or not. You may not care, others surely will.

@natefinch
Copy link
Contributor Author

Sure sure... that's why my solution was to add a field/method to sql.Tx that would record the fact that it got rolled back via the context finishing.

@mcholismalik
Copy link

is there any solution for this ?

@molon
Copy link

molon commented Jul 20, 2023

go/src/database/sql/sql.go

Lines 2249 to 2263 in 5fe3f0a

func (tx *Tx) Commit() error {
// Check context first to avoid transaction leak.
// If put it behind tx.done CompareAndSwap statement, we can't ensure
// the consistency between tx.done and the real COMMIT operation.
select {
default:
case <-tx.ctx.Done():
if tx.done.Load() {
return ErrTxDone
}
return tx.ctx.Err()
}
if !tx.done.CompareAndSwap(false, true) {
return ErrTxDone
}

I am also puzzled by this.
Why check tx.done.Load() after ctx.Done ? Is there any advantage to this?

Generally, after ctx.Cancelled , the caller will not continue to care about the internal information of tx.
If it is necessary to get whether tx has been committed at this time, why not expose the isDone method?

@mfridman
Copy link

Sure sure... that's why my solution was to add a field/method to sql.Tx that would record the fact that it got rolled back via the context finishing.

I think extending sql.Tx is the most sane solution to propagate this information to the caller while maintaining backward compatibility.

There's code in the wild that discards the error conflating it with a cancellation, but unfortunately, this masks potentially real issues. (Especially ones where you shot yourself in the foot and ended up with an aborted transaction).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants