Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: compress/gzip: add compressing reader and decompressing writer #51092

Closed
iangudger opened this issue Feb 9, 2022 · 20 comments
Closed

Comments

@iangudger
Copy link
Contributor

Building a reader out of a writer and visa versa can be error prone and inefficient.

@gopherbot gopherbot added this to the Proposal milestone Feb 9, 2022
@ianlancetaylor
Copy link
Contributor

Sorry, I don't know if I understand what you are asking for. Are you looking for code like

// CompressingReader returns a reader that reads the data in r after it has been compressed.
func CompressingReader(r io.Reader) io.Reader {
    pr, pw := io.Pipe()
    go func() { io.Copy(gzip.NewWriter(pw), r) }()
    return pr
}

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Feb 9, 2022
@iangudger
Copy link
Contributor Author

In terms of functionality, yes. In terms of implementation, no. I was suggesting a fork of gzip.Writer that operates on io.Reader directly.

Why?

First, I claim that the approach you suggest is error prone. For example, I am pretty sure that the code you wrote is incorrect as it does not close the gzip writer which may leave buffered data and will skip the GZIP footer.

Second, I claim that it is inefficient. In addition to overhead of the goroutine and the io.Copy (which won't be able to use either the io.ReaderFrom or io.WriterTo fast paths), my read of the gzip.Writer implementation is that a version which operates on io.Reader directly would actually be slightly more efficient than the io.Writer version. This is due to the caller provided output buffer.

@ianlancetaylor
Copy link
Contributor

When does this case come up? Is it worth the cost of adding to, and maintaining in, the standard library?

(I agree that my sample code was not production quality, but the general idea might be sound.)

@iangudger
Copy link
Contributor Author

The case that motivated me filing this issue is wanting to compress the contents of a file and send it as an HTTP request body.

I did some research and it looks like at least one other person has run into something similar and thought the solution was worth sharing:
https://gist.github.com/tomcatzh/cf8040820962e0f8c04700eb3b2f26be

Finally, I believe gzip performance is more likely to be a concern than other stream operations.

@dsnet
Copy link
Member

dsnet commented Feb 9, 2022

Second, I claim that it is inefficient.

You're probably right about inefficiency, but is the gains worth the complexity? It will probably lead to significant duplication of code in the compress/flate package.

When compressing, the io.Writer API has the significant advantage that discrete chunks can be emitted by the compressor, knowing that Write will handle it all. When using io.Reader, there is always the possibility that the Read method was not provided a large enough buffer to read entire chunk. For compliance with the io.Reader interface, our implementation would have to be able to handle torn state in more ways than what the implementation has to deal with this today.

Either we add this extra state tracking to the current implementation and slow down the implementation for io.Writer or fork the implementation do one-off special for io.Reader.

@earthboundkid
Copy link
Contributor

The case that motivated me filing this issue is wanting to compress the contents of a file and send it as an HTTP request body.

FWIW, I wrote a helper function that can do this. I think it is correct, but I would love a bug report if it is not.

@rsc
Copy link
Contributor

rsc commented Feb 9, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) Feb 9, 2022
@rsc
Copy link
Contributor

rsc commented Feb 16, 2022

It sounds like the use case here is that http.Client uses an http.Request, and to send a body you fill in http.Request.Body, which is an io.Reader. And so if you want to send a compressed request body, compressing on the fly, it would be nice to be able to do

req.Body = CompressingReader(uncompressedDataReader)

The http.Client's Transport guarantees to call req.Body.Close, so it seems like an io.Pipe-based solution (needing Close to stop a goroutine) should be OK.

@dsnet, what do you think of adding some API backed by an io.Pipe and a goroutine to satisfy this use case? (The name CompressingReader is a placeholder above.)

@rsc
Copy link
Contributor

rsc commented Feb 16, 2022

Is there a use case for the decompressing writer?

@dsnet
Copy link
Member

dsnet commented Feb 17, 2022

I'm not quite convinced this needs to be in the gzip package. It seems more appropriate for this to be in the io (or even http) package, since we still have the same io.Reader <-> io.Writer inversion problem if users are trying to leverage some other transport encoding (e.g., brotli or zstd). Thus, this does not seem gzip specific.

If it were in the io package, usage could look something like:

req.Body = io.NewReaderFromWriter(uncompressedDataReader, gzip.NewWriter)

where you pass in the constructor for the writer.

However, there is no standard interface for constructors, so actual usage may be awkward. The API in io might look like:

func NewReaderFromWriter(r io.Reader, new func(io.Writer) io.WriteCloser) io.WriteCloser

Actual callsite would look like:

req.Body = io.NewReaderFromWriter(uncompressedDataReader, func(w io.Writer) io.WriteCloser {
    return gzip.NewWriter(w)
})

You need to wrap gzip.NewWriter in a closure to get the types to match up.

With generics, we could probably avoid this wrapping by changing the signature to:

func NewReaderFromWriter[AnyWriteCloser io.WriteCloser](r io.Reader, new func(io.Writer) AnyWriteCloser) io.WriteCloser

@rsc
Copy link
Contributor

rsc commented Feb 23, 2022

This general form seems pretty awkward. Maybe there is a clearer presentation, but it took me quite a while to wrap my head around why there is a func in the API.

req.Body = gzip.CompressingReader(r)

seems much clearer, and zstd and brotli could add their own too. Even if we did have a primitive in io, it might probably be worth having a function in package gzip, so others don't have to puzzle through how to use it.

@ianlancetaylor
Copy link
Contributor

I'm not yet convinced there is anything to do here.

I think that @iangudger is asking for a duplication of the gzip compression code to work on an io.Reader rather than an io.Writer. @dsnet points out some problems with that (#51092 (comment)). I agree with @dsnet that this does not seem to be a desirable approach.

If we avoid the duplication, then all we need is code that pipes from an io.WriteCloser to a io.Reader. When the io.WriteCloser is closed, the io.Reader should return io.EOF. This is exactly what io.Pipe does. The caller has to pass the pipe writer to gzip.NewWriter, write their data to the gzip.Writer, and then close both writers. I don't see any particular need for the standard library to have this code. If we think that there is a more efficient mechanism than io.Pipe, that too can live outside of the standard library.

So I think are choices are to extend gzip in the standard library, which I don't think we should do, or use io.Pipe or an equivalent, which doesn't have to be in the standard library.

@jonjohnsonjr
Copy link
Contributor

We implemented this ourselves and found the error handling to be very error-prone. We ended up adding a lot of synchronization because calling Write and Close on a gzip.Writer can cause a panic, which you'll hit if you naively try to propagate errors. See this comment thread.

@gopherbot
Copy link

Change https://go.dev/cl/389514 mentions this issue: compress/gzip: add example of compressing reader

@ianlancetaylor
Copy link
Contributor

In https://go.dev/cl/389514 I wrote an example of how to write a compressing reader to use with an HTTP PUT request. This will appear in compress/gzip/example_test.go. Comments welcome. Thanks.

@rsc
Copy link
Contributor

rsc commented Mar 9, 2022

@dsnet, what do you think? Is this a common enough need to add code like Ian's example as a function like CompressingReader to gzip?

@dsnet
Copy link
Member

dsnet commented Mar 9, 2022

I like Ian's approach of adding an example. While the example is in the gzip package it could very well be used for zstd, brotli, etc.

gopherbot pushed a commit that referenced this issue Mar 15, 2022
For #51092

Change-Id: If0a233651ac75f113569ddfffd056084f6092564
Reviewed-on: https://go-review.googlesource.com/c/go/+/389514
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Joseph Tsai <joetsai@digital-static.net>
TryBot-Result: Gopher Robot <gobot@golang.org>
@rsc
Copy link
Contributor

rsc commented Mar 16, 2022

OK, it sounds like we're happy with the example for now.

@rsc rsc moved this from Active to Likely Decline in Proposals (old) Mar 16, 2022
@rsc
Copy link
Contributor

rsc commented Mar 16, 2022

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Likely Decline to Declined in Proposals (old) Mar 23, 2022
@rsc
Copy link
Contributor

rsc commented Mar 23, 2022

No change in consensus, so declined.
— rsc for the proposal review group

@golang golang locked and limited conversation to collaborators Mar 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

7 participants