Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/gzip: multistream reading fails on os.File #30230

Closed
pbnjay opened this issue Feb 14, 2019 · 3 comments
Closed

compress/gzip: multistream reading fails on os.File #30230

pbnjay opened this issue Feb 14, 2019 · 3 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@pbnjay
Copy link
Contributor

pbnjay commented Feb 14, 2019

What version of Go are you using (go version)?

$ go version
go version go1.11.5 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/jeremy/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/jeremy"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/sn/3cktm0557lbc6wvwltjl9cwm0000gn/T/go-build634703979=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

While implementing a multi-stream gzip reader for a specific bioinformatics file format (BAM), (*gzip.Reader).Reset(*os.File) fails to reset the stream correctly. It appears to be due to read-ahead somewhere, as substituting os.File with a bytes.Buffer with the same contents works correctly.

I've attached a somewhat minimal example code and a truncated data file to give a test case showing the above (attached). The data files are typically in the multi-gigabyte range so streaming from disk is preferred to loading everything into memory.

bug_repro.zip

2019/02/14 10:31:39 ------- LOADING FROM BUFFERED FILE -------------
2019/02/14 10:31:39 Should end in +20059
2019/02/14 10:31:39 Ended at  20059
2019/02/14 10:31:39 Should end in +15247
2019/02/14 10:31:39 Ended at  35306
2019/02/14 10:31:39 Should end in +14446
2019/02/14 10:31:39 Ended at  49752
2019/02/14 10:31:39 Should end in +15601
2019/02/14 10:31:39 Ended at  65353
2019/02/14 10:31:39 EOF
2019/02/14 10:31:39 ------- LOADING FROM FILE DIRECTLY -------------
2019/02/14 10:31:39 Should end in +20059
2019/02/14 10:31:39 Ended at 20480
2019/02/14 10:31:39 gzip: invalid header

What did you expect to see?

Loading from a file should work the same as loading from a buffer.

What did you see instead?

The file is advanced too far, so the subsequent read fails to find the header of the next gzip chunk.

@pbnjay
Copy link
Contributor Author

pbnjay commented Feb 14, 2019

Following up my own investigation: If I wrap the file in a bufio.Reader manually before passing to Reset it reuses it correctly. So the issue appears to be at https://github.com/golang/go/blob/master/src/compress/gzip/gunzip.go#L108

There's no way to get the internal wrapped reader for reuse. I will submit a tiny PR with a possible workaround that should be backward-compatible.

@gopherbot
Copy link

Change https://golang.org/cl/162737 mentions this issue: compress/gzip: allow multi-stream io.Reader reuse

@katiehockman katiehockman changed the title compress/gzip: Multistream reading fails on os.File compress/gzip: multistream reading fails on os.File Feb 15, 2019
@katiehockman
Copy link
Contributor

/cc @dsnet

@katiehockman katiehockman added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 15, 2019
@katiehockman katiehockman added this to the Unplanned milestone Feb 15, 2019
@dsnet dsnet added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Feb 17, 2019
@pbnjay pbnjay closed this as completed Feb 17, 2019
@golang golang locked and limited conversation to collaborators Feb 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

4 participants