compress/flate: Allow private dict/history buffer to be retrieved from flate.Reader #18930

as · 2017-02-03T22:43:03Z

Original Discussion

This issue was submitted as a result of this thread:
https://groups.google.com/forum/#!topic/golang-dev/UND0JhtsV9s

Summary

The compress\flate package allows you to initialize a compressed reader using an existing dictionary by calling flate.NewReaderDict(r io.Reader, dict []byte). This is nice. What would also be nice is a method to return the state of this []byte after some decompression has taken place.

The issue is that the history buffer can not be retrieved. Hopefully this can serve as a discussion as to whether the dictionary / history buffer should be exposed in stdlib. There's at least one use case for this that I know of, but I'd like to enumerate and discuss other potential uses.

Use Case

I'm using Go to implement a cabinet (cab) file extractor. The compression format used in many cab file is mszip. Mszip slightly modifies the block semantics of the flate algorithm by preserving the "history buffer" across block boundaries. This means each decompressed block is the result of an entire inflate operation requiring a new or reset reader for each block with one difference: every block (except the first) needs the history buffer carried over from previous blocks.

By adding this function to the flate package in the standard library (for testing purposes only; i don't suggest this as the solution to access the history)

func Dict(r io.ReadCloser) []byte {
    return r.(*decompressor).dict.hist
}

I am able to implement mszip without any other changes. Although I'm skeptical that this is a good stdlib feature and compression internals are not in my domain of expertise I am also not looking forward to maintaining a fork of compress/flate just for this function. Is there a reason that dict.hist is private other than the usual "nobody has needed it public"?

// example usage of flate.Dict

var (
    dict []byte
    zbuf = new(bytes.Buffer)
)

// zbuf filled with blocks

for zbuf.Len() > 0{
 zr := flate.NewReaderDict(zbuf, dict)
 _, err := plaintext.ReadFrom(zr)
 if err != nil{
     log.Fatalln(err)
 }
 dict = flate.Dict(zr)
}

What version of Go are you using (`go version`)?

go1.8rc3.windows-amd64

The text was updated successfully, but these errors were encountered:

dsnet · 2017-02-04T00:18:08Z

As mentioned in the linked discussion, there is a workaround which is to wrap the io.ReadCloser with a memoryReader:

// memoryReader wraps an io.Reader remembers up to 32KiB
// of the last bytes read.
type memoryReader struct {
	io.Reader
	Dict []byte
}

func (mr *memoryReader) Read(b []byte) (int, error) {
	const maxWindow = 1 << 15 // Maximum size of a DEFLATE window
	n, err := mr.Reader.Read(b)
	mr.Dict = append(mr.Dict, b[:n]...)
	if len(mr.Dict) > maxWindow {
		mr.Dict = mr.Dict[len(mr.Dict)-maxWindow:]
	}
	return n, err
}

The dictionary is literally the last (up to 32KiB) few bytes decompressed up until that point, so recording what those bytes were is all that is needed.

Any API additions to the Reader will inevitably lead to nasty package-level functions or interfaces that are not obvious to user how to use (e.g., I find the flate.Resetter interface really ugly).

If we want to add this functionality, I vote that we revisit this in Go2, when the API can be made cleaner.

as · 2017-02-04T11:00:07Z

The workaround works as expected and is documented on the issue tracker. I think we should consider this for Go2.

as closed this as completed Feb 4, 2017

golang locked and limited conversation to collaborators Feb 4, 2018

gopherbot added the FrozenDueToAge label Feb 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compress/flate: Allow private dict/history buffer to be retrieved from flate.Reader #18930

compress/flate: Allow private dict/history buffer to be retrieved from flate.Reader #18930

as commented Feb 3, 2017

dsnet commented Feb 4, 2017

as commented Feb 4, 2017

compress/flate: Allow private dict/history buffer to be retrieved from flate.Reader #18930

compress/flate: Allow private dict/history buffer to be retrieved from flate.Reader #18930

Comments

as commented Feb 3, 2017

Original Discussion

Summary

Use Case

What version of Go are you using (go version)?

dsnet commented Feb 4, 2017

as commented Feb 4, 2017

What version of Go are you using (`go version`)?