compress: add AppendCompress and AppendDecompress #54078

dsnet · 2022-07-27T00:52:52Z

There are situations where we have the entire input on hand and want to compress/decompress it into a buffer we already have on hand. I propose the addition of the following functions to gzip, flate, zlib:

// AppendEncoded appends the compressed form of src to the end of dst
// using the specified compression level.
// It returns an error only if the specified level is invalid.
func AppendEncoded(dst, src []byte, level int) ([]byte, error)

// AppendDecoded appends the decompressed form of src to the end of dst.
// If the decompressed output exceeds maxSize, it returns an error.
// If maxSize is negative, then there is no limit on the output size.
func AppendDecoded(dst, src []byte, maxSize int) ([]byte, error)

At face value, this appears to be yet another proposal for convenience functions (which has been proposed and rejected before: #16504). However, there are CPU and memory performance benefits to this API that is impossible to achieve otherwise.

Compression is the combination of LZ77 and Huffman encoding. The memory used by LZ77 is essentially the last 32KiB of uncompressed data. When operating with an io.Reader or io.Writer, the compressor/decompressor is responsible for maintaining a copy of the LZ77 window. This is quite a bit of copying and extra memory usage. The advantage of the APIs above is that there is zero work that needs to be done to maintain the LZ77 window separately since the AppendEncoded.src and AppendDecoded.dst are the LZ77 windows. When decompressing, the only memory required is the memory for the Huffman decoder, which is relatively small. When compressing, the only memory required is the memory for the LZ77 search structure and Huffman encoder, which is still sizeable, but still smaller than the entire LZ77 window.

Other considerations: The flate and zlib APIs provide the ability to specify a dictionary and the proposed APIs have no means of specifying such a feature. It appears that dictionaries are rarely used such that adding it as part of the API is probably not worth it. Usage of dictionaries can still use the io.Reader and io.Writer based APIs.

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2022-07-27T03:12:53Z

I don't see anywhere to keep state here. Is this the correct usage? In particular, are we required to always keep passing the same destination slice to each call? Is there really no other state required?

    buf := make([]byte, 4096)
    var dst []byte
    for {
        n, err := r.Read(buf)
        if err != nil && err != io.EOF {
            return nil, err
        }
        if n == 0 {
            break
        }
        dst, err = gzip.AppendEncoded(dst, buf[:n], 9)
        if err != nil {
            return nil, err
        }
    }
    return dst, nil

dsnet · 2022-07-27T04:03:58Z

The intention of the proposed API is for use when you already have a []byte in its entirety on hand and you want to continue dealing with it as a []byte. Anytime, you start with or end with a io.Reader or io.Writer, then the current API is superior.

For your example, you could do:

b, err := io.ReadAll(r) // converts io.Reader to []byte
if err != nil {
    return nil, err
}
return gzip.AppendEncoded(nil, b, 9) // stays in the []byte world

but I'm not sure if that's any faster or slower than doing:

var bb bytes.Buffer
zw := gzip.NewWriter(bb) // stays in the io.Writer world
if _, err := io.Copy(zw, r); err != nil { // connects io.Writer and io.Reader together
    return nil, err
}
zw.Close()
return bb.Bytes(), nil // converts io.Writer into a []byte

Calling gzip.AppendEncoded within each read-loop would be a fairly odd use of the API. It is technically correct since each call to gzip.AppendEncoded would append the compressed form of buf[:n] as a individual gzip files to the destination. A sequence of gzip files concatenated together is still a valid gzip file per RFC 1952. Since the chunk size is 4k, I would not expect compression ratio in this case to be particularly great since we're functionally limiting the compression window to 4k.

rsc · 2022-07-27T17:54:18Z

This API would remove the allocation of the window. How much can we remove of the other allocated state? Can these be implemented with no allocation other than the append? (That is, can we get everything to be stack-allocated? Or were you planning on a sync.Pool?)

dsnet · 2022-07-27T17:57:33Z

I was planning on using a sync.Pool. With work, I'm fairly certain the huffman coders can be stack allocated (since they're fairly constant size), not sure about the LZ77 search state.

rsc · 2022-07-27T18:14:26Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

ianlancetaylor · 2022-08-03T18:44:35Z

Is Encoded really the right word here? Should these be named AppendCompressed and AppendDecompressed?

dsnet · 2022-08-03T18:56:48Z

I'm fine with either AppendEncoded/AppendDecoded or AppendCompressed/AppendDecompressed. I see "compression" as a particular type of "encoding". Decoded is shorter than Decompressed and the fact that we're dealing with compression is implied by the package name: gzip.AppendDecoded.

rsc · 2022-08-10T17:17:29Z

strconv.AppendQuote is not AppendQuoted (no d).
So it seems like we can use AppendCompress and AppendDecompress, which will be very clear.
Quoting is a kind of encoding too, of course, but it's probably clearer to say the encoding than just 'Encoded'.

Does anyone object to AppendCompress and AppendDecompress?

rsc · 2022-08-17T18:13:55Z

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

rsc · 2022-09-07T16:46:00Z

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

gopherbot · 2022-09-09T00:29:35Z

Change https://go.dev/cl/429736 mentions this issue: compress/gzip: add AppendEncoded

dsnet added the Proposal label Jul 27, 2022

gopherbot added this to the Proposal milestone Jul 27, 2022

dsnet added the Performance label Jul 27, 2022

ianlancetaylor added this to Incoming in Proposals (old) Jul 27, 2022

rsc moved this from Incoming to Active in Proposals (old) Jul 27, 2022

rsc changed the title ~~proposal: compress: add AppendEncoded and AppendDecoded~~ proposal: compress: add AppendCompress and AppendDecompress Aug 10, 2022

rsc added the Proposal-FinalCommentPeriod label Aug 17, 2022

rsc changed the title ~~proposal: compress: add AppendCompress and AppendDecompress~~ compress: add AppendCompress and AppendDecompress Sep 7, 2022

rsc added the Proposal-Accepted label Sep 7, 2022

rsc modified the milestones: Proposal, Backlog Sep 7, 2022

rsc removed the Proposal-FinalCommentPeriod label Jul 26, 2023

dsnet mentioned this issue Sep 8, 2023

proposal: compress/zstd: add new package #62513

Open

dsnet mentioned this issue Apr 23, 2024

proposal: slices: add IndexPointer #66981

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compress: add AppendCompress and AppendDecompress #54078

compress: add AppendCompress and AppendDecompress #54078

dsnet commented Jul 27, 2022 •

edited

ianlancetaylor commented Jul 27, 2022

dsnet commented Jul 27, 2022 •

edited

rsc commented Jul 27, 2022

dsnet commented Jul 27, 2022

rsc commented Jul 27, 2022

ianlancetaylor commented Aug 3, 2022

dsnet commented Aug 3, 2022

rsc commented Aug 10, 2022 •

edited

rsc commented Aug 17, 2022

rsc commented Sep 7, 2022

gopherbot commented Sep 9, 2022

compress: add AppendCompress and AppendDecompress #54078

compress: add AppendCompress and AppendDecompress #54078

Comments

dsnet commented Jul 27, 2022 • edited

ianlancetaylor commented Jul 27, 2022

dsnet commented Jul 27, 2022 • edited

rsc commented Jul 27, 2022

dsnet commented Jul 27, 2022

rsc commented Jul 27, 2022

ianlancetaylor commented Aug 3, 2022

dsnet commented Aug 3, 2022

rsc commented Aug 10, 2022 • edited

rsc commented Aug 17, 2022

rsc commented Sep 7, 2022

gopherbot commented Sep 9, 2022

dsnet commented Jul 27, 2022 •

edited

dsnet commented Jul 27, 2022 •

edited

rsc commented Aug 10, 2022 •

edited