New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compress/flate: reader is not RFC1951 compliant #11030
Comments
/cc @nigeltao |
FYI, the issue is in the init() function in inflate.go The max == 0 check should only be performed for the HCLEN tree and HLIT tree, but not for the HDIST tree. |
Indeed.
$ python2 -c 'import zlib; print
repr(zlib.decompress("\x05\xc0\x07\x06\x00\x00\x00\x80\x40\x0f\xff\x37\xa0\xca",
-15));'
''
|
Thanks for the analysis, which looks correct to me. @dsnet, can you say how you generated the input for that test? I think it'd be a stronger test case if the decompressed string isn't empty but is instead something like "IJK". I can't remember exactly how flate works, but @mdempsky also recently fixed an infinite loop bug in https://go-review.googlesource.com/#/c/8893/ Also, I think that skipping the max ==0 check means that the HDIST table doesn't pass the sanity check (change the "const sanity = false" line to enable sanity checks). What does the C zlib library do? Is it obvious in its source code that it special cases the HDIST tree? @dsnet, @mdempsky: I'm happy to poke around some more at this, but I'll have to remember once again how flate's Huffman tables work. If you're already up to speed and are happy to poke around yourself, that will save me some time. Thanks. Note to myself (or whoever wants to propose a patch): any change to inflate.go's huffmanDecoder.init method should probably also make the same change to gen.go's copy, and the re-run "go generate" to check that its output doesn't change. Strictly speaking, I don't expect patching gen.go to be necessary, but it'd be nice to keep the two implementations in sync. |
The input test I posted above is a bitstream I generated by hand as part of some research into encoding meta information into the Huffman tree definition. When I tested it on other DEFLATE decoders (java, zlib), it failed only on Go's implementation. The occurrence of #11033 is interestingly caused by this issue. I left a detailed analysis of the DEFLATE stream at the bit level in the comments for that issue. I suspect that the data stream may have been generated using the Z_HUFFMAN_ONLY option in the C zlib library. I can attest that the C zlib library handles this case. Looking at the source, we see that it returns 0 (ok) if the tree is empty, and it relies on error reporting later on when actually decoding the symbols in the payload. I haven't studied Go's implementation enough to see if it will panic on garbage input, but I can do a little of research into that. The C implementation doesn't special case HDIST specifically (since it relies on error reporting later on for this case), but from RFC1951, we can infer that HLIT and HCLEN trees must be non empty. The rational is: every dynamic block must be ended by an EOM marker. This marker must be represented somehow, so it implies that the HLIT tree has at least one entry. In order for the HLIT tree to be generated, it implies that the HCLEN tree is non-empty (otherwise, how would you encode HLIT?). |
I took a brief look at Go's inflate implementation and found some other issues. The length codes 286 and 287 are marked as reserved in RFC1951, but the implementation does not check for those values and defaults everything above 285 as 285. Thus, a malformed stream, specially constructed to use reserved values passes Go's implementation: [
"1", "01", # Last, fixed block
"00000110", # Symbol '0' literal
"11100011", "00000", # Reserved symbol 287 (Go treats this as Length: 258, Distance: 1)
"0000000", # EOM marker
]
# Composes into "\x33\x18\x07\x00" Instead, this value should crash: python -c 'import zlib; print zlib.decompress("\x33\x18\x07\x00", -15);'
# zlib.error: Error -3 while decompressing data: invalid literal/length code I've never contributed to Go before, but I'd be interested in scrubbing the inflate implementation and make sure it works as expected (both for valid inputs and invalid inputs). |
Sounds like another legit bug to me, or perhaps another legit facet of this bug. Happy to answer any questions about the contribution process, over and above http://golang.org/doc/contribute.html |
Yeah, sounds legit to me too. Seems like we should just relax (*huffmanDecoder).init to allow empty trees like how we already allow single-code trees. @dsnet Do you want to work on a CL for this? If not, I can prepare one. And thanks for the analysis and test case! |
@mdempsky I apologize for this beginner question, but what is a CL? I'm interested in working on this, and can probably do it over this weekend. |
Please see our contribution guide: tip.golang.org/doc/contribute.html
A CL is change list, basically a patch proposed to be included into the
tree.
|
CL https://golang.org/cl/11000 mentions this issue. |
FYI, the unit tests in the CL exercises every one of the edge cases listed here and more. As far as I was able to test it, it's behavior matches the C zlib library. |
Running the following produces an error:
The byte slice above represents a single valid DEFLATE block compressed with a dynamic huffman tree, consisting of 257 literal/length codes and 1 distance codes. It should decompresses to nothing.
In the last byte (0xCA), the bit sequence (with LSB on left) is 01010011. The 11 at the end is the EOM marker in DEFLATE; the 0 to the left of that is the single code used for the distance tree.
According to RFC1951 section 3.2.7, it says the following:
As can be seen above, the tree section terminates immediately with an EOM marker (11) and never uses the distance codes. Thus, this should properly decompress.
The text was updated successfully, but these errors were encountered: