Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/bzip2: Unexpected File Signature #42057

Closed
varunravi98 opened this issue Oct 19, 2020 · 7 comments
Closed

compress/bzip2: Unexpected File Signature #42057

varunravi98 opened this issue Oct 19, 2020 · 7 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@varunravi98
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.15 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/varun.ravichandran/Library/Caches/go-build"
GOENV="/Users/varun.ravichandran/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/varun.ravichandran/Documents/Code/atlasproxy/.gopath/pkg/mod"
GONOPROXY="github.com/10gen"
GONOSUMDB="github.com/10gen"
GOOS="darwin"
GOPATH="/Users/varun.ravichandran/Documents/Code/atlasproxy/.gopath"
GOPRIVATE="github.com/10gen"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.15/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.15/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/9r/_mg8g99s00x2fg7psqphzqt40000gp/T/go-build794568920=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I was working on a work-related project that utilized compress/bzip2 to read files compressed with bzip2. I noticed that the bzip2FileMagic constant is set to 0x425a, or "BZ". My understanding is that every file passed into the reader checks for that heading to verify that it is a file that has been compressed with bzip2 before reading/decompressing the file.
However, this source indicates that the file signature for bzip2 is 0x425a68, or "BZh". Is there a source for where the shortened header used by compress/bzip2 was derived from?

What did you expect to see?

I expected to see const bzip2FileMagic to be set to 0x425a68, or "BZh".

What did you see instead?

I saw the const bzip2FileMagic set to 0x425a, or "BZ".

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 22, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Oct 22, 2020
@cagedmantis
Copy link
Contributor

The source mentions what was used as a reference:

// There's no RFC for bzip2. I used the Wikipedia page for reference and a lot
// of guessing: https://en.wikipedia.org/wiki/Bzip2
// The source code to pyflate was useful for debugging:
// http://www.paul.sladen.org/projects/pyflate

@dsnet

@dsnet
Copy link
Member

dsnet commented Oct 22, 2020

See https://github.com/dsnet/compress/raw/master/doc/bzip2-format.pdf, which is a reverse engineered specification from the C source code. Section 2.2.2. describes the stream header, which indicates that "BZ" is the magic for the file, and "h" is technically the version flag.

@varunravi98
Copy link
Author

varunravi98 commented Oct 23, 2020

Interesting, so are we confident that the original bzip version effectively does not exist anymore? Because according to the spec, that has a version flag of 0 and if I wanted to ensure that I can only support bzip2 and not any other older versions, would it be better in that case to include the version flag in the file magic?

@dsnet
Copy link
Member

dsnet commented Oct 23, 2020

I'm not sure what this issue is trying to get at. While the internal implementation doesn't treat h as part of the magic number, it does have an explicit check that the subsequent version flag is always h. Thus, even if h isn't functionally treated as part of the magic number in the implementation, it functionally still enforces it to be present.

@varunravi98
Copy link
Author

Okay, understood. I've just been working on an application that should be able to read many different types of files, and I just added support for bzip2. As a result, I manually check the file signatures to determine which reader to use, and I wanted to be sure that the file signature I check before launching the bzip2 reader is consistent with the one the reader itself enforces. My understanding now is that the bzip2 reader checks for {'B', 'Z'} as the magic number and then checks that the subsequent character, the version flag, is 'h'. As a result, any file that is processed by the reader must begin with {'B', 'Z', 'h'}, and I'll use that as bytes to check before launching the bzip2 reader.

@dsnet
Copy link
Member

dsnet commented Oct 23, 2020

Sounds like you learned what you need.

Is there anything actionable to do for this issue or can we close it?

@varunravi98
Copy link
Author

Yep, that's all, thank you! I'll close it.

@golang golang locked and limited conversation to collaborators Oct 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants