Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/lzw: compress/decompress corrupts data #11142

Closed
dvyukov opened this issue Jun 10, 2015 · 5 comments
Closed

compress/lzw: compress/decompress corrupts data #11142

dvyukov opened this issue Jun 10, 2015 · 5 comments
Milestone

Comments

@dvyukov
Copy link
Member

dvyukov commented Jun 10, 2015

The following program fails with the panic:

package main

import (
    "bytes"
    "compress/lzw"
    "fmt"
    "io/ioutil"
)

func main() {
    uncomp := []byte("a")
    buf := new(bytes.Buffer)
    w := lzw.NewWriter(buf, lzw.LSB, 2)
    _, err := w.Write(uncomp)
    if err != nil {
        panic(err)
    }
    if err := w.Close(); err != nil {
        panic(err)
    }
    r1 := lzw.NewReader(buf, lzw.LSB, 2)
    uncomp1, err := ioutil.ReadAll(r1)
    if err != nil {
        panic(err)
    }
    if !bytes.Equal(uncomp, uncomp1) {
        fmt.Printf("data0: %q\n", uncomp)
        fmt.Printf("data0: %q\n", uncomp1)
        panic("data differs")
    }
}
data0: "a"
data0: "\x01"
panic: data differs

go version devel +b0532a9 Mon Jun 8 05:13:15 2015 +0000 linux/amd64

@dvyukov
Copy link
Member Author

dvyukov commented Jun 10, 2015

Is it because of width?
Experiments show that width is the number of bits encoded from every byte.

@ianlancetaylor ianlancetaylor added this to the Go1.5Maybe milestone Jun 10, 2015
@dsnet
Copy link
Member

dsnet commented Jun 16, 2015

I don't know too much about lzw, but comments say that the litWidth value controls the "number of bits to use for literal codes". Thus, if the value is set to 2, doesn't that mean you can only encode the literals 0x00, 0x01, 0x02, and 0x03?

In fact, this seems to be what's happening since the above code works when uncomp is set to \x00, \x01, \x02, or \x03. It also seems that the incorrect output value is the input value modulo 4.

If the encoder/decoder is working properly, maybe Write should output an error if the user tries to encode data with literals that are too large? In the horrendous off-chance that other formats depend on this degenerate behavior, then the library should at least document it?

@dvyukov
Copy link
Member Author

dvyukov commented Jun 17, 2015

@dsnet Yes, this is my current understanding that there is no bug in the code.
I don't know whether it worth a runtime check or not, maybe it is meant to be obvious for anybody using the package. However, the docs are quite cryptic ("number of bits to use for literal codes"). When I read it first time, I interpreted it as some parameter of compression algorithm.

@gopherbot
Copy link

CL https://golang.org/cl/11063 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/11227 mentions this issue.

@mikioh mikioh modified the milestones: Go1.5, Go1.5Maybe Jun 18, 2015
nigeltao added a commit that referenced this issue Jun 18, 2015
Fixes #11142.

Change-Id: Id772c4364c47776d6afe86b0939b9c6281e85edc
Reviewed-on: https://go-review.googlesource.com/11227
Reviewed-by: Russ Cox <rsc@golang.org>
@golang golang locked and limited conversation to collaborators Jun 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants