database/sql: cancelled context in a transaction gives a variable error #43507

natefinch · 2021-01-05T04:18:14Z

Summary

When you're running a transaction with a context, and the context get cancelled, the error you get back when you go to commit is not always the same. Sometimes you see sql.ErrTxDone and sometimes you see context.Cancelled. If you commit after the context is cancelled but before database/sql rolls back the transaction in response to the context being done, then you get context.Cancelled. If database/sql finishes the rollback before you call Commit, then you get sql.ErrTxDone.

This is very confusing, because context.Cancelled is normal and expected in API handlers, but sql.ErrTxDone usually means there's a bug in the code (i.e. trying to use a transaction that's already been finished). We spent way too long at work worrying over this "bug" in our code before finally figuring out that this is just a different error sometimes returned when a context gets cancelled.

The big problem is that when you get sql.ErrTxDone, there's no way to distinguish between "hey dummy, you already called Commit or Rollback, that's a bug" and "the context you were using got cancelled, so we rolled back, no big deal".

Ideally I'd like the sql.Tx to record when it rolls back a transaction due to a cancelled context, so when I get sql.ErrTxDone, I can check that and know for sure if someone's introduced a bug, or if it's just a cancelled context.

i.e.

err := tx.Commit()
if err == sql.ErrTxDone && tx.WasCancelled() {
    return context.Cancelled
}
return err

(or some such.... I mean, ideally I would have just had tx.Commit return context.Cancelled in the first place, but that would be a behavior change that I presume is Not Allowed)

What version of Go are you using (`go version`)?

1.15.6

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/finchnat/Library/Caches/go-build"
GOENV="/Users/finchnat/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/finchnat/pkg/mod"
GONOPROXY="github.com/Mattel/*"
GONOSUMDB="github.com/Mattel/*"
GOOS="darwin"
GOPATH="/Users/finchnat"
GOPRIVATE="github.com/Mattel/*"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/finchnat/sdk/go1.15.6"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/finchnat/sdk/go1.15.6/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/finchnat/mattel/mcpp/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/ch/4d1vgsv17jq0b3yfmnkl9thdk0jm7d/T/go-build240587215=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

func main() {
	if err := run(); err != nil {
		log.Fatal(err)
	}
}

func run() error {
	db, err := sql.Open("postgres", "your connection string here")
	if err != nil {
		return fmt.Errorf("error opening database: %w", err)
	}
	defer db.Close()
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()
	tx, err := db.BeginTx(ctx, nil)
	if err != nil {
		return fmt.Errorf("error beginning transaction: %w", err)
	}
	// this is ok because rollback after a commit becomes a noop.
	defer tx.Rollback()
	count := 0
	err = tx.QueryRowContext(ctx, "select count(1) from users").Scan(&count)
	if err != nil {
		return fmt.Errorf("error from QueryRowContext: %w", err)
	}
	cancel()
	time.Sleep(time.Second)
	if err := tx.Commit(); err != nil {
		return fmt.Errorf("error from commit: %w", err)
	}
	return nil
}

Note that if you remove the time.Sleep, you get context.Cancelled and if you leave it in there you get sql.ErrTxDone.

What did you expect to see?

"context cancelled"

What did you see instead?

"transaction already committed or rolled back"

The text was updated successfully, but these errors were encountered:

toothrot · 2021-01-05T18:00:04Z

/cc @bradfitz @kardianos

kardianos · 2021-01-05T19:30:09Z

It would be hard to change this behavior without breaking existing code. Not only that, this may be loose information that is needed. In other words, it is meaningful to know if the Tx has committed or not. You may not care, others surely will.

natefinch · 2021-01-05T21:41:17Z

Sure sure... that's why my solution was to add a field/method to sql.Tx that would record the fact that it got rolled back via the context finishing.

mcholismalik · 2022-12-08T09:08:04Z

is there any solution for this ?

molon · 2023-07-20T03:31:23Z

go/src/database/sql/sql.go

Lines 2249 to 2263 in 5fe3f0a

    
           func (tx *Tx) Commit() error { 
        
           	// Check context first to avoid transaction leak. 
        
           	// If put it behind tx.done CompareAndSwap statement, we can't ensure 
        
           	// the consistency between tx.done and the real COMMIT operation. 
        
           	select { 
        
           	default: 
        
           	case <-tx.ctx.Done(): 
        
           		if tx.done.Load() { 
        
           			return ErrTxDone 
        
           		} 
        
           		return tx.ctx.Err() 
        
           	} 
        
           	if !tx.done.CompareAndSwap(false, true) { 
        
           		return ErrTxDone 
        
           	}

I am also puzzled by this.
Why check tx.done.Load() after ctx.Done ? Is there any advantage to this?

Generally, after ctx.Cancelled , the caller will not continue to care about the internal information of tx.
If it is necessary to get whether tx has been committed at this time, why not expose the isDone method?

mfridman · 2024-01-25T03:06:04Z

Sure sure... that's why my solution was to add a field/method to sql.Tx that would record the fact that it got rolled back via the context finishing.

I think extending sql.Tx is the most sane solution to propagate this information to the caller while maintaining backward compatibility.

There's code in the wild that discards the error conflating it with a cancellation, but unfortunately, this masks potentially real issues. (Especially ones where you shot yourself in the foot and ended up with an aborted transaction).

natefinch changed the title ~~database/sql: cancelled context in a transaction gives a variable error message~~ database/sql: cancelled context in a transaction gives a variable error Jan 5, 2021

toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 5, 2021

toothrot added this to the Backlog milestone Jan 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

database/sql: cancelled context in a transaction gives a variable error #43507

database/sql: cancelled context in a transaction gives a variable error #43507

natefinch commented Jan 5, 2021 •

edited

toothrot commented Jan 5, 2021

kardianos commented Jan 5, 2021

natefinch commented Jan 5, 2021

mcholismalik commented Dec 8, 2022

molon commented Jul 20, 2023

mfridman commented Jan 25, 2024

database/sql: cancelled context in a transaction gives a variable error #43507

database/sql: cancelled context in a transaction gives a variable error #43507

Comments

natefinch commented Jan 5, 2021 • edited

Summary

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

toothrot commented Jan 5, 2021

kardianos commented Jan 5, 2021

natefinch commented Jan 5, 2021

mcholismalik commented Dec 8, 2022

molon commented Jul 20, 2023

mfridman commented Jan 25, 2024

natefinch commented Jan 5, 2021 •

edited

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?