Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/gzip: provide user access to size trailer (when possible) #32539

Open
jcorbin opened this issue Jun 10, 2019 · 4 comments
Open

compress/gzip: provide user access to size trailer (when possible) #32539

jcorbin opened this issue Jun 10, 2019 · 4 comments
Labels
FeatureRequest NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@jcorbin
Copy link

jcorbin commented Jun 10, 2019

The current gzip.Reader reads and verifies the trailer crc and size at the end, but the user currently has no way of reading (to pre-allocate output resources) the size.

How much appetite do we have for adding something like a func ReadSize(io.ReadSeeker) (uint32, error) utility to the compress/gzip package?

For concreteness, a case that I really have in mind is decompression in Shopify/sarama, leading to dynamic bytes.Buffer reallocation growth under ioutil.ReadAll.

Using that example, perhaps another / a better option would be something like func Decode([]byte) ([]byte, error) like both snappy and zstd provide?

However there's some appeal to being able to do the classic "seek to the end, read the size, pre-allocate, seek back to 0 and start streaming" idiom...

@ianlancetaylor
Copy link
Contributor

CC @dsnet

@dsnet
Copy link
Member

dsnet commented Jun 10, 2019

Regarding the exposure of the footer size:

  • Using the size in the footer to pre-allocate buffers seems like a somewhat dangerous operation if the GZIP file can come from untrusted sources (which I suspect is a pretty common situation). An attacker can trivially set a large number there and waste precious memory.
  • Setting aside malformed files, the size in the footer is only accurate up to math.MaxUint32.

As such, I'm not sure exposing the size is a good idea. For the cases where it's helpful (i.e., in trusted environments with smallish GZIP files), the size is relatively easy to obtain by reading the last 4 bytes.

@dsnet
Copy link
Member

dsnet commented Jun 10, 2019

Regarding func Decode([]byte) ([]byte, error), I'm personally not opposed to API like this, but it has been proposed and rejected before. See #16504.

@jcorbin
Copy link
Author

jcorbin commented Jun 12, 2019

See #16504.

Interesting, since that one basically got "closed because ioutil.ReadAll" where-as my saga starts with "production use of ioutil.ReadAll considered harmful"; it's a common source of GC pressure due to dynamic reallocation growth in its bytes.Buffer.

@FiloSottile FiloSottile added FeatureRequest NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 14, 2019
@FiloSottile FiloSottile added this to the Unplanned milestone Jun 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FeatureRequest NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants