You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The compress\flate package allows you to initialize a compressed reader using an existing dictionary by calling flate.NewReaderDict(r io.Reader, dict []byte). This is nice. What would also be nice is a method to return the state of this []byte after some decompression has taken place.
The issue is that the history buffer can not be retrieved. Hopefully this can serve as a discussion as to whether the dictionary / history buffer should be exposed in stdlib. There's at least one use case for this that I know of, but I'd like to enumerate and discuss other potential uses.
Use Case
I'm using Go to implement a cabinet (cab) file extractor. The compression format used in many cab file is mszip. Mszip slightly modifies the block semantics of the flate algorithm by preserving the "history buffer" across block boundaries. This means each decompressed block is the result of an entire inflate operation requiring a new or reset reader for each block with one difference: every block (except the first) needs the history buffer carried over from previous blocks.
By adding this function to the flate package in the standard library (for testing purposes only; i don't suggest this as the solution to access the history)
I am able to implement mszip without any other changes. Although I'm skeptical that this is a good stdlib feature and compression internals are not in my domain of expertise I am also not looking forward to maintaining a fork of compress/flate just for this function. Is there a reason that dict.hist is private other than the usual "nobody has needed it public"?
// example usage of flate.Dict
var (
dict []byte
zbuf = new(bytes.Buffer)
)
// zbuf filled with blocks
for zbuf.Len() > 0{
zr := flate.NewReaderDict(zbuf, dict)
_, err := plaintext.ReadFrom(zr)
if err != nil{
log.Fatalln(err)
}
dict = flate.Dict(zr)
}
What version of Go are you using (go version)?
go1.8rc3.windows-amd64
The text was updated successfully, but these errors were encountered:
As mentioned in the linked discussion, there is a workaround which is to wrap the io.ReadCloser with a memoryReader:
// memoryReader wraps an io.Reader remembers up to 32KiB// of the last bytes read.typememoryReaderstruct {
io.ReaderDict []byte
}
func (mr*memoryReader) Read(b []byte) (int, error) {
constmaxWindow=1<<15// Maximum size of a DEFLATE windown, err:=mr.Reader.Read(b)
mr.Dict=append(mr.Dict, b[:n]...)
iflen(mr.Dict) >maxWindow {
mr.Dict=mr.Dict[len(mr.Dict)-maxWindow:]
}
returnn, err
}
The dictionary is literally the last (up to 32KiB) few bytes decompressed up until that point, so recording what those bytes were is all that is needed.
Any API additions to the Reader will inevitably lead to nasty package-level functions or interfaces that are not obvious to user how to use (e.g., I find the flate.Resetter interface really ugly).
If we want to add this functionality, I vote that we revisit this in Go2, when the API can be made cleaner.
Original Discussion
This issue was submitted as a result of this thread:
https://groups.google.com/forum/#!topic/golang-dev/UND0JhtsV9s
Summary
The
compress\flate
package allows you to initialize a compressed reader using an existing dictionary by callingflate.NewReaderDict(r io.Reader, dict []byte)
. This is nice. What would also be nice is a method to return the state of this[]byte
after some decompression has taken place.The issue is that the history buffer can not be retrieved. Hopefully this can serve as a discussion as to whether the dictionary / history buffer should be exposed in stdlib. There's at least one use case for this that I know of, but I'd like to enumerate and discuss other potential uses.
Use Case
I'm using Go to implement a cabinet (cab) file extractor. The compression format used in many cab file is mszip. Mszip slightly modifies the block semantics of the flate algorithm by preserving the "history buffer" across block boundaries. This means each decompressed block is the result of an entire inflate operation requiring a new or reset reader for each block with one difference: every block (except the first) needs the history buffer carried over from previous blocks.
By adding this function to the flate package in the standard library (for testing purposes only; i don't suggest this as the solution to access the history)
I am able to implement mszip without any other changes. Although I'm skeptical that this is a good stdlib feature and compression internals are not in my domain of expertise I am also not looking forward to maintaining a fork of compress/flate just for this function. Is there a reason that dict.hist is private other than the usual "nobody has needed it public"?
What version of Go are you using (
go version
)?go1.8rc3.windows-amd64
The text was updated successfully, but these errors were encountered: