Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/flate: reader is not RFC1951 compliant #11030

Closed
dsnet opened this issue Jun 2, 2015 · 12 comments
Closed

compress/flate: reader is not RFC1951 compliant #11030

dsnet opened this issue Jun 2, 2015 · 12 comments
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Jun 2, 2015

Running the following produces an error:

package main

import "bytes"
import "compress/flate"

func main() {
    b := bytes.NewBuffer([]byte{
        0x05, 0xc0, 0x07, 0x06, 0x00, 0x00, 0x00, 0x80, 0x40, 0x0f, 0xff, 0x37, 0xa0, 0xca,
    })
    r := flate.NewReader(b)
    _, err := r.Read(nil)
    if err != nil {
        panic(err)
    }
}

The byte slice above represents a single valid DEFLATE block compressed with a dynamic huffman tree, consisting of 257 literal/length codes and 1 distance codes. It should decompresses to nothing.

In the last byte (0xCA), the bit sequence (with LSB on left) is 01010011. The 11 at the end is the EOM marker in DEFLATE; the 0 to the left of that is the single code used for the distance tree.

According to RFC1951 section 3.2.7, it says the following:

One distance code of zero bits means that there are no distance codes used at all (the data is all literals).

As can be seen above, the tree section terminates immediately with an EOM marker (11) and never uses the distance codes. Thus, this should properly decompress.

@bradfitz
Copy link
Contributor

bradfitz commented Jun 2, 2015

/cc @nigeltao

@minux minux changed the title compress/flate reader is not RFC1951 compliant compress/flate: reader is not RFC1951 compliant Jun 2, 2015
@minux minux added this to the Go1.5Maybe milestone Jun 2, 2015
@dsnet
Copy link
Member Author

dsnet commented Jun 2, 2015

FYI, the issue is in the init() function in inflate.go

The max == 0 check should only be performed for the HCLEN tree and HLIT tree, but not for the HDIST tree.

@minux
Copy link
Member

minux commented Jun 2, 2015 via email

@nigeltao
Copy link
Contributor

nigeltao commented Jun 2, 2015

Thanks for the analysis, which looks correct to me. @dsnet, can you say how you generated the input for that test? I think it'd be a stronger test case if the decompressed string isn't empty but is instead something like "IJK".

I can't remember exactly how flate works, but @mdempsky also recently fixed an infinite loop bug in https://go-review.googlesource.com/#/c/8893/
I don't think that skipping the max == 0 check for the HDIST tree will re-open that particular bug, but does skipping that open us up to another infinite loop bug, or will we then panic on other garbage input somehow?

Also, I think that skipping the max ==0 check means that the HDIST table doesn't pass the sanity check (change the "const sanity = false" line to enable sanity checks).

What does the C zlib library do? Is it obvious in its source code that it special cases the HDIST tree?

@dsnet, @mdempsky: I'm happy to poke around some more at this, but I'll have to remember once again how flate's Huffman tables work. If you're already up to speed and are happy to poke around yourself, that will save me some time. Thanks.

Note to myself (or whoever wants to propose a patch): any change to inflate.go's huffmanDecoder.init method should probably also make the same change to gen.go's copy, and the re-run "go generate" to check that its output doesn't change. Strictly speaking, I don't expect patching gen.go to be necessary, but it'd be nice to keep the two implementations in sync.

@dsnet
Copy link
Member Author

dsnet commented Jun 2, 2015

The input test I posted above is a bitstream I generated by hand as part of some research into encoding meta information into the Huffman tree definition. When I tested it on other DEFLATE decoders (java, zlib), it failed only on Go's implementation.

The occurrence of #11033 is interestingly caused by this issue. I left a detailed analysis of the DEFLATE stream at the bit level in the comments for that issue. I suspect that the data stream may have been generated using the Z_HUFFMAN_ONLY option in the C zlib library.

I can attest that the C zlib library handles this case. Looking at the source, we see that it returns 0 (ok) if the tree is empty, and it relies on error reporting later on when actually decoding the symbols in the payload. I haven't studied Go's implementation enough to see if it will panic on garbage input, but I can do a little of research into that.

The C implementation doesn't special case HDIST specifically (since it relies on error reporting later on for this case), but from RFC1951, we can infer that HLIT and HCLEN trees must be non empty. The rational is: every dynamic block must be ended by an EOM marker. This marker must be represented somehow, so it implies that the HLIT tree has at least one entry. In order for the HLIT tree to be generated, it implies that the HCLEN tree is non-empty (otherwise, how would you encode HLIT?).

@dsnet
Copy link
Member Author

dsnet commented Jun 4, 2015

I took a brief look at Go's inflate implementation and found some other issues.

The length codes 286 and 287 are marked as reserved in RFC1951, but the implementation does not check for those values and defaults everything above 285 as 285.

Thus, a malformed stream, specially constructed to use reserved values passes Go's implementation:

[
    "1", "01",           # Last, fixed block
    "00000110",          # Symbol '0' literal
    "11100011", "00000", # Reserved symbol 287 (Go treats this as Length: 258, Distance: 1)
    "0000000",           # EOM marker
]
# Composes into "\x33\x18\x07\x00"

Instead, this value should crash:

python -c 'import zlib; print zlib.decompress("\x33\x18\x07\x00", -15);'
# zlib.error: Error -3 while decompressing data: invalid literal/length code

I've never contributed to Go before, but I'd be interested in scrubbing the inflate implementation and make sure it works as expected (both for valid inputs and invalid inputs).

@nigeltao
Copy link
Contributor

nigeltao commented Jun 4, 2015

Sounds like another legit bug to me, or perhaps another legit facet of this bug. Happy to answer any questions about the contribution process, over and above http://golang.org/doc/contribute.html

@mdempsky
Copy link
Member

mdempsky commented Jun 4, 2015

Yeah, sounds legit to me too. Seems like we should just relax (*huffmanDecoder).init to allow empty trees like how we already allow single-code trees.

@dsnet Do you want to work on a CL for this? If not, I can prepare one.

And thanks for the analysis and test case!

@dsnet
Copy link
Member Author

dsnet commented Jun 6, 2015

@mdempsky I apologize for this beginner question, but what is a CL? I'm interested in working on this, and can probably do it over this weekend.

@minux
Copy link
Member

minux commented Jun 6, 2015 via email

@gopherbot
Copy link

CL https://golang.org/cl/11000 mentions this issue.

@dsnet
Copy link
Member Author

dsnet commented Jun 12, 2015

FYI, the unit tests in the CL exercises every one of the edge cases listed here and more. As far as I was able to test it, it's behavior matches the C zlib library.

@mikioh mikioh modified the milestones: Go1.5, Go1.5Maybe Jun 17, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants