Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: Scan() appends data to RawBytes #65201

Closed
methane opened this issue Jan 22, 2024 · 14 comments
Closed

database/sql: Scan() appends data to RawBytes #65201

methane opened this issue Jan 22, 2024 · 14 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done.

Comments

@methane
Copy link
Contributor

methane commented Jan 22, 2024

Go version

go1.22

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/methane/.cache/go-build'
GOENV='/home/methane/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/methane/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/methane/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/methane/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/methane/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.3'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/methane/work/go-mysql/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2142438049=/tmp/go-build -gno-record-gcc-switches'

What did you do?

See go-sql-driver/mysql#1539

This is very complex to reproduce in real code because it is relating mysql protocol and go-mysql double buffering.

So I explain what is happening in this simple code and comments.

var buf sql.RawBytes
rows := tx.Query(tx, "SELECT 'hoge'")
rows.Next()      // driver returns []byte("hoge") that references internal buffer.
rows.Scan(&buf)  // buf now points to driver's internal buffer.
rows.Close()

rows = tx.Query(tx, "SELECT 42")
rows.Next()      // driver returns int64(42).
rows.Scan(&buf)  // sql does `buf = strconv.AppendInt([]byte(buf)[:0], 42)`. That write "42" to the internal buffer.

The place database/sql breaks the buffer is here.

case *RawBytes:
sv = reflect.ValueOf(src)
if b, ok := asBytes([]byte(*d)[:0], sv); ok {
*d = RawBytes(b)

func asBytes(buf []byte, rv reflect.Value) (b []byte, ok bool) {
switch rv.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
return strconv.AppendInt(buf, rv.Int(), 10), true

What did you see happen?

Reusing RawBytes breaks internal buffer.

What did you expect to see?

Reusing RawBytes variable should be safe.
This is very hard to debug pitfall.

@methane methane changed the title database/sql: Scan-ning into RawBytes would break previous data database/sql: Scan() appends data to RawBytes Jan 22, 2024
@seankhliao
Copy link
Member

I believe this is working as intended:

RawBytes documents: https://pkg.go.dev/database/sql#RawBytes

After a Scan into a RawBytes, the slice is only valid until the next call to Next, Scan, or Close.

@methane
Copy link
Contributor Author

methane commented Jan 23, 2024

I don't think this is the intended behavior.

The document says "RawBytes is a byte slice that holds a reference to memory owned by the database itself".
buf = strconv.AppendInt(buf, 42, 10) is not documented at all.

And users don't know about this behavior.
Users cannot expect that passing the same RawBytes to Scan() multiple times violates the "the slice is only valid until..." constraint.
This is very hard to debug for normal gophers.

Stop appending to RawBytes saves users from this pitfall without any downside.

@methane
Copy link
Contributor Author

methane commented Jan 23, 2024

CC @bradfitz @kardianos as owner.

@neild
Copy link
Contributor

neild commented Jan 23, 2024

Not clear to me what's intended here, but the actual behavior is surprising and easy to misuse.

Consider:

var raw sql.RawBytes
for rows.Next() {
  rows.Scan(&raw)
}

Rows.Scan converts a value provided by the database driver into a value provided by the user (a sql.RawBytes in this case).

If the database driver provides a []byte, Scan places this value directly into the RawBytes. This allows returning this data without copying.

If the database driver provides some other type, Scan reuses any existing byte slice in the RawBytes. This allows reusing a single underlying array across multiple calls to Scan. For example, if the above loop is reading an int32 value provided by the database driver, then the loop only needs to allocate a single []byte of length 4 for all iterations.

The latter case says to me that the above loop is written as intended: We reuse the same RawBytes across every Scan call to avoid allocations.

Where things go wrong is if we make two queries with the same RawBytes, mixing the two modes. The first query places a reference to a driver-owned []byte into the RawBytes. The second query overwrites this data.

Perhaps this is user error, but the documentation doesn't make it clear that a RawBytes may not be reused between queries, and the fact that it looks like you're expected to reuse a RawBytes within a single query makes this quite surprising.

One fix might be to drop the optimization for the second case and never write to the data contained within a RawBytes. However, this optimization has been present since 2014 (CL 50240044), so I'm not sure reverting it now is the right choice.

Another might be to zero out any RawBytes scan parameters on the first Scan, to avoid carrying over references to driver-internal memory from a previous query.

I suspect this behavior hasn't been noticed before now since it's unusual to reuse a single query variable for scanning two different columns of different types.

@neild neild reopened this Jan 23, 2024
@gopherbot
Copy link

Change https://go.dev/cl/557917 mentions this issue: database/sql: avoid clobbering driver-owned memory in RawBytes

@seankhliao seankhliao added the NeedsFix The path to resolution is known, but the work has not been done. label Jan 24, 2024
@seankhliao
Copy link
Member

ah, apologies i didn't understand before how the append was observed without api misuse.

@methane
Copy link
Contributor Author

methane commented Jan 24, 2024

RawBytes has two usage (zero copy and reusing buffer).
One of them (reusing buffer) is undocumented. And mixing two makes hard to find data corruption bug.

How to fix this?

  1. Add new type like RawBytes that intend to reuse buffer and RawBytes is only about zero copy.
  2. Just document the "reusing buffer" usage and warn about silent data corruption when mixing two usages.

@methane
Copy link
Contributor Author

methane commented Jan 24, 2024

Another might be to zero out any RawBytes scan parameters on the first Scan, to avoid carrying over references to driver-internal memory from a previous query.

But this makes allocation of []byte.

@methane
Copy link
Contributor Author

methane commented Jan 24, 2024

However, this optimization has been present since 2014 (CL 50240044), so I'm not sure reverting it now is the right choice.

I doubt that this optimization is important.

  • Almost no one knows it. (I, the driver maintainer didn't know it.)
  • Using int64, float64 etc doesn't need allocation&stringification.
  • Using any need allocation, but no stringification.

How do you think? @bradfitz.

@dakusan
Copy link

dakusan commented Jan 29, 2024

I was the original reporter of this problem, and I agree that this is an unintended and undocumented behavior. The documentation makes it sound like a RawBytes will always receive a buffer pointer. One that either points directly into the raw sql buffer, or a newly allocated buffer with the data.

Overwriting the data that is currently in the buffer does not make sense and can be a security issue. There are 3 places in sql/convert.go that write into the buffer that RawBytes currently holds. These should instead create a new buffer and set that on the RawBytes.

@dakusan
Copy link

dakusan commented Apr 11, 2024

I was going to ask on the commit but you blocked me from commenting. Just a thought, but on line 240 of sql/convert.go, which was modified in this commit, wouldn't it make sense to use an unsafe operation and pass the string bytes directly? Or does that go against go conventions? It would limit the new rows.raw member to holding only converted integral and time types, so basically after the first row was scanned in, the rows.raw buffer would likely never change sizes.

*d = s2b(s)
func s2b(s string) (b []byte) {
	str := (*reflect.StringHeader)(unsafe.Pointer(&s))
	sh := (*reflect.SliceHeader)(unsafe.Pointer(&b))
	sh.Data = str.Data
	sh.Len = str.Len
	sh.Cap = str.Len
	return b
}

@ianlancetaylor
Copy link
Contributor

I blocked comments on the commit because we don't use GitHub for code review. GitHub is a mirror, and almost nobody sees comments on a GitHub commit. Comments about commits should be on the change review, which in this case is https://go.dev/cl/557917. Thanks.

@ianlancetaylor
Copy link
Contributor

To answer your question, we don't use unsafe unless there is a very significant benefit. And in this case I don't actually see any benefit at all; append supports appending a string directly to a []byte, which is what the code is doing now.

@dakusan
Copy link

dakusan commented Apr 11, 2024

I understand. I was just thinking the RawBytes functionality, meant for speed optimization, in which read-only use-once buffers are returned, kind of matched a use case for returning the direct bytes of a string. Thanks for answering though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants