Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/flate: Allow private dict/history buffer to be retrieved from flate.Reader #18930

Closed
as opened this issue Feb 3, 2017 · 2 comments
Closed

Comments

@as
Copy link
Contributor

as commented Feb 3, 2017

Original Discussion

This issue was submitted as a result of this thread:
https://groups.google.com/forum/#!topic/golang-dev/UND0JhtsV9s

Summary

The compress\flate package allows you to initialize a compressed reader using an existing dictionary by calling flate.NewReaderDict(r io.Reader, dict []byte). This is nice. What would also be nice is a method to return the state of this []byte after some decompression has taken place.

The issue is that the history buffer can not be retrieved. Hopefully this can serve as a discussion as to whether the dictionary / history buffer should be exposed in stdlib. There's at least one use case for this that I know of, but I'd like to enumerate and discuss other potential uses.

Use Case

I'm using Go to implement a cabinet (cab) file extractor. The compression format used in many cab file is mszip. Mszip slightly modifies the block semantics of the flate algorithm by preserving the "history buffer" across block boundaries. This means each decompressed block is the result of an entire inflate operation requiring a new or reset reader for each block with one difference: every block (except the first) needs the history buffer carried over from previous blocks.

By adding this function to the flate package in the standard library (for testing purposes only; i don't suggest this as the solution to access the history)

func Dict(r io.ReadCloser) []byte {
    return r.(*decompressor).dict.hist
}

I am able to implement mszip without any other changes. Although I'm skeptical that this is a good stdlib feature and compression internals are not in my domain of expertise I am also not looking forward to maintaining a fork of compress/flate just for this function. Is there a reason that dict.hist is private other than the usual "nobody has needed it public"?

// example usage of flate.Dict

var (
    dict []byte
    zbuf = new(bytes.Buffer)
)

// zbuf filled with blocks

for zbuf.Len() > 0{
 zr := flate.NewReaderDict(zbuf, dict)
 _, err := plaintext.ReadFrom(zr)
 if err != nil{
     log.Fatalln(err)
 }
 dict = flate.Dict(zr)
}

What version of Go are you using (go version)?

go1.8rc3.windows-amd64

@dsnet
Copy link
Member

dsnet commented Feb 4, 2017

As mentioned in the linked discussion, there is a workaround which is to wrap the io.ReadCloser with a memoryReader:

// memoryReader wraps an io.Reader remembers up to 32KiB
// of the last bytes read.
type memoryReader struct {
	io.Reader
	Dict []byte
}

func (mr *memoryReader) Read(b []byte) (int, error) {
	const maxWindow = 1 << 15 // Maximum size of a DEFLATE window
	n, err := mr.Reader.Read(b)
	mr.Dict = append(mr.Dict, b[:n]...)
	if len(mr.Dict) > maxWindow {
		mr.Dict = mr.Dict[len(mr.Dict)-maxWindow:]
	}
	return n, err
}

The dictionary is literally the last (up to 32KiB) few bytes decompressed up until that point, so recording what those bytes were is all that is needed.

Any API additions to the Reader will inevitably lead to nasty package-level functions or interfaces that are not obvious to user how to use (e.g., I find the flate.Resetter interface really ugly).

If we want to add this functionality, I vote that we revisit this in Go2, when the API can be made cleaner.

@as
Copy link
Contributor Author

as commented Feb 4, 2017

The workaround works as expected and is documented on the issue tracker. I think we should consider this for Go2.

@as as closed this as completed Feb 4, 2017
@golang golang locked and limited conversation to collaborators Feb 4, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants