compress/flate: reader is not RFC1951 compliant #11030

dsnet · 2015-06-02T04:12:11Z

Running the following produces an error:

package main

import "bytes"
import "compress/flate"

func main() {
    b := bytes.NewBuffer([]byte{
        0x05, 0xc0, 0x07, 0x06, 0x00, 0x00, 0x00, 0x80, 0x40, 0x0f, 0xff, 0x37, 0xa0, 0xca,
    })
    r := flate.NewReader(b)
    _, err := r.Read(nil)
    if err != nil {
        panic(err)
    }
}

The byte slice above represents a single valid DEFLATE block compressed with a dynamic huffman tree, consisting of 257 literal/length codes and 1 distance codes. It should decompresses to nothing.

In the last byte (0xCA), the bit sequence (with LSB on left) is 01010011. The 11 at the end is the EOM marker in DEFLATE; the 0 to the left of that is the single code used for the distance tree.

According to RFC1951 section 3.2.7, it says the following:

One distance code of zero bits means that there are no distance codes used at all (the data is all literals).

As can be seen above, the tree section terminates immediately with an EOM marker (11) and never uses the distance codes. Thus, this should properly decompress.

bradfitz · 2015-06-02T04:20:26Z

/cc @nigeltao

dsnet · 2015-06-02T04:28:10Z

FYI, the issue is in the init() function in inflate.go

The max == 0 check should only be performed for the HCLEN tree and HLIT tree, but not for the HDIST tree.

minux · 2015-06-02T04:36:22Z

Indeed. $ python2 -c 'import zlib; print repr(zlib.decompress("\x05\xc0\x07\x06\x00\x00\x00\x80\x40\x0f\xff\x37\xa0\xca", -15));' ''

nigeltao · 2015-06-02T10:23:29Z

Thanks for the analysis, which looks correct to me. @dsnet, can you say how you generated the input for that test? I think it'd be a stronger test case if the decompressed string isn't empty but is instead something like "IJK".

I can't remember exactly how flate works, but @mdempsky also recently fixed an infinite loop bug in https://go-review.googlesource.com/#/c/8893/
I don't think that skipping the max == 0 check for the HDIST tree will re-open that particular bug, but does skipping that open us up to another infinite loop bug, or will we then panic on other garbage input somehow?

Also, I think that skipping the max ==0 check means that the HDIST table doesn't pass the sanity check (change the "const sanity = false" line to enable sanity checks).

What does the C zlib library do? Is it obvious in its source code that it special cases the HDIST tree?

@dsnet, @mdempsky: I'm happy to poke around some more at this, but I'll have to remember once again how flate's Huffman tables work. If you're already up to speed and are happy to poke around yourself, that will save me some time. Thanks.

Note to myself (or whoever wants to propose a patch): any change to inflate.go's huffmanDecoder.init method should probably also make the same change to gen.go's copy, and the re-run "go generate" to check that its output doesn't change. Strictly speaking, I don't expect patching gen.go to be necessary, but it'd be nice to keep the two implementations in sync.

dsnet · 2015-06-02T11:19:39Z

The input test I posted above is a bitstream I generated by hand as part of some research into encoding meta information into the Huffman tree definition. When I tested it on other DEFLATE decoders (java, zlib), it failed only on Go's implementation.

The occurrence of #11033 is interestingly caused by this issue. I left a detailed analysis of the DEFLATE stream at the bit level in the comments for that issue. I suspect that the data stream may have been generated using the Z_HUFFMAN_ONLY option in the C zlib library.

I can attest that the C zlib library handles this case. Looking at the source, we see that it returns 0 (ok) if the tree is empty, and it relies on error reporting later on when actually decoding the symbols in the payload. I haven't studied Go's implementation enough to see if it will panic on garbage input, but I can do a little of research into that.

The C implementation doesn't special case HDIST specifically (since it relies on error reporting later on for this case), but from RFC1951, we can infer that HLIT and HCLEN trees must be non empty. The rational is: every dynamic block must be ended by an EOM marker. This marker must be represented somehow, so it implies that the HLIT tree has at least one entry. In order for the HLIT tree to be generated, it implies that the HCLEN tree is non-empty (otherwise, how would you encode HLIT?).

dsnet · 2015-06-04T00:52:59Z

I took a brief look at Go's inflate implementation and found some other issues.

The length codes 286 and 287 are marked as reserved in RFC1951, but the implementation does not check for those values and defaults everything above 285 as 285.

Thus, a malformed stream, specially constructed to use reserved values passes Go's implementation:

[
    "1", "01",           # Last, fixed block
    "00000110",          # Symbol '0' literal
    "11100011", "00000", # Reserved symbol 287 (Go treats this as Length: 258, Distance: 1)
    "0000000",           # EOM marker
]
# Composes into "\x33\x18\x07\x00"

Instead, this value should crash:

python -c 'import zlib; print zlib.decompress("\x33\x18\x07\x00", -15);'
# zlib.error: Error -3 while decompressing data: invalid literal/length code

I've never contributed to Go before, but I'd be interested in scrubbing the inflate implementation and make sure it works as expected (both for valid inputs and invalid inputs).

nigeltao · 2015-06-04T03:51:32Z

Sounds like another legit bug to me, or perhaps another legit facet of this bug. Happy to answer any questions about the contribution process, over and above http://golang.org/doc/contribute.html

mdempsky · 2015-06-04T23:19:18Z

Yeah, sounds legit to me too. Seems like we should just relax (*huffmanDecoder).init to allow empty trees like how we already allow single-code trees.

@dsnet Do you want to work on a CL for this? If not, I can prepare one.

And thanks for the analysis and test case!

dsnet · 2015-06-06T01:37:25Z

@mdempsky I apologize for this beginner question, but what is a CL? I'm interested in working on this, and can probably do it over this weekend.

minux · 2015-06-06T01:40:15Z

Please see our contribution guide: tip.golang.org/doc/contribute.html A CL is change list, basically a patch proposed to be included into the tree.

gopherbot · 2015-06-12T23:00:26Z

CL https://golang.org/cl/11000 mentions this issue.

dsnet · 2015-06-12T23:03:07Z

FYI, the unit tests in the CL exercises every one of the edge cases listed here and more. As far as I was able to test it, it's behavior matches the C zlib library.

minux changed the title ~~compress/flate reader is not RFC1951 compliant~~ compress/flate: reader is not RFC1951 compliant Jun 2, 2015

minux added this to the Go1.5Maybe milestone Jun 2, 2015

toqueteos mentioned this issue Jun 2, 2015

compress/zlib: error due to compress/flate #11033

Closed

nigeltao closed this as completed in b03129a Jun 17, 2015

mikioh modified the milestones: Go1.5, Go1.5Maybe Jun 17, 2015

toqueteos mentioned this issue Jul 28, 2015

Integrate golang/go#11030 klauspost/compress#2

Closed

golang locked and limited conversation to collaborators Jun 25, 2016

gopherbot added the FrozenDueToAge label Jun 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compress/flate: reader is not RFC1951 compliant #11030

compress/flate: reader is not RFC1951 compliant #11030

dsnet commented Jun 2, 2015

bradfitz commented Jun 2, 2015

dsnet commented Jun 2, 2015

minux commented Jun 2, 2015 via email

nigeltao commented Jun 2, 2015

dsnet commented Jun 2, 2015

dsnet commented Jun 4, 2015

nigeltao commented Jun 4, 2015

mdempsky commented Jun 4, 2015

dsnet commented Jun 6, 2015

minux commented Jun 6, 2015 via email

gopherbot commented Jun 12, 2015

dsnet commented Jun 12, 2015

compress/flate: reader is not RFC1951 compliant #11030

compress/flate: reader is not RFC1951 compliant #11030

Comments

dsnet commented Jun 2, 2015

bradfitz commented Jun 2, 2015

dsnet commented Jun 2, 2015

minux commented Jun 2, 2015 via email

nigeltao commented Jun 2, 2015

dsnet commented Jun 2, 2015

dsnet commented Jun 4, 2015

nigeltao commented Jun 4, 2015

mdempsky commented Jun 4, 2015

dsnet commented Jun 6, 2015

minux commented Jun 6, 2015 via email

gopherbot commented Jun 12, 2015

dsnet commented Jun 12, 2015