Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/gzip: allow multi-stream io.Reader reuse #30234

Closed
wants to merge 2 commits into from
Closed

compress/gzip: allow multi-stream io.Reader reuse #30234

wants to merge 2 commits into from

Conversation

pbnjay
Copy link
Contributor

@pbnjay pbnjay commented Feb 14, 2019

Fixes #30230

@googlebot googlebot added the cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change. label Feb 14, 2019
@gopherbot
Copy link

This PR (HEAD: 6cc6410) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/162737 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link

Message from Brad Fitzpatrick:

Patch Set 1:

This would need a test.


Please don’t reply on this GitHub thread. Visit golang.org/cl/162737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link

Message from Jeremy Jay:

Patch Set 1:

Patch Set 1:

This would need a test.

I can do that.
Should I duplicate the existing multistream test replacing with the nil io.Reader?
And/or include a test using a testdata file?


Please don’t reply on this GitHub thread. Visit golang.org/cl/162737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link

This PR (HEAD: 241093b) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/162737 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

@gopherbot
Copy link

Message from Brad Fitzpatrick:

Patch Set 2:

Patch Set 1:

Patch Set 1:

This would need a test.

I can do that.
Should I duplicate the existing multistream test replacing with the nil io.Reader?
And/or include a test using a testdata file?

I don't know the existing tests from memory at least. Ideally no testdata if possible (doesn't seem required?), and don't modify any existing tests. Just add new tests, copy/pasting if needed.


Please don’t reply on this GitHub thread. Visit golang.org/cl/162737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link

Message from Jeremy Jay:

Patch Set 2:

Patch Set 2:

Patch Set 1:

Patch Set 1:

This would need a test.

I can do that.
Should I duplicate the existing multistream test replacing with the nil io.Reader?
And/or include a test using a testdata file?

I don't know the existing tests from memory at least. Ideally no testdata if possible (doesn't seem required?), and don't modify any existing tests. Just add new tests, copy/pasting if needed.

That's what I did with patch set 2. Should be ready to go!


Please don’t reply on this GitHub thread. Visit golang.org/cl/162737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link

Message from Joe Tsai:

Patch Set 2:

In isolation, I understand this change, but it makes Reader.Reset and Writer.Reset asymmetrical, since Writer.Reset still doesn't permit taking in nil. Also, using nil to signal reusing the underlying reader is a somewhat odd API.

As you mentioned in the issue, there is a workaround for this. The documentation for Reader.Multistream currently states:

If the underlying reader implements io.ByteReader, it will be left positioned just after the gzip stream. To start the next stream, call z.Reset(r) followed by z.Multistream(false). If there is no next stream, z.Reset(r) will return io.EOF.

Thus, it explicitly calls out the fact that users of this feature need to provide an io.ByteReader to properly use this feature.


Please don’t reply on this GitHub thread. Visit golang.org/cl/162737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link

Message from Jeremy Jay:

Patch Set 2:

Patch Set 2:

In isolation, I understand this change, but it makes Reader.Reset and Writer.Reset asymmetrical, since Writer.Reset still doesn't permit taking in nil. Also, using nil to signal reusing the underlying reader is a somewhat odd API.

I agree it's odd. Certainly easy to mimic on Writer I'd the asymmetry is problematic, though I don't see any point to that either. Open to other options...

As you mentioned in the issue, there is a workaround for this. The documentation for Reader.Multistream currently states:

If the underlying reader implements io.ByteReader, it will be left positioned just after the gzip stream. To start the next stream, call z.Reset(r) followed by z.Multistream(false). If there is no next stream, z.Reset(r) will return io.EOF.

Thus, it explicitly calls out the fact that users of this feature need to provide an io.ByteReader to properly use this feature.

While this is true, it also doesn't explicitly mention that the underlying io.Reader is wrapped and this behavior is expected. It does not state that passing the original io.Reader to Reset will not work UNLESS it implements ByteReader.


Please don’t reply on this GitHub thread. Visit golang.org/cl/162737.
After addressing review feedback, remember to publish your drafts!

@gopherbot
Copy link

Message from Joe Tsai:

Patch Set 2:

I thought about this some more and I'm more hesitant to use Reset(nil) to signal that the underlying reader should be re-used. Seeing that used in code does not make it clear what's happening without reading the documentation. In fact, it's the opposite of what the call seems to be doing.

Furthermore, this would make gzip different from the other compress packages where Reset(nil) explicitly clears the underlying reader and is sometimes done to help the Go GC be able to reclaim the memory for the underlying reader. This is useful if a compression Reader is cached within a larger data structure.

it also doesn't explicitly mention that the underlying io.Reader is wrapped and this behavior is expected. It does not state that passing the original io.Reader to Reset will not work UNLESS it implements ByteReader.

Actually, it does. The documentation for gzip.NewReader says:

If r does not also implement io.ByteReader, the decompressor may read more data than necessary from r.

This implies some form of wrapping or internal buffering.

Open to other options...

Alternatives:

  • A gzip.ReuseReader sentinel reader value that can be passed to Reset to explicitly signal that the underlying reader should be reused. Thus, this would be more readable as it would read as Reset(gzip.ReuseReader). Even with this, iterating over each stream is still clunky, since you have to call Reader.Multistream(false) after each Reset.
  • An additional Reader.NextStream method that is explicitly called to advance to the next stream. This is better than the previous option since it doesn't require calling Reader.Multistream(false) after every stream.

Overall, I'm inclined to make no changes. In my opinion, dealing with individual GZIP files in a multistream file is fundamentally clunky and there is a workaround as you noted in the issue.


Please don’t reply on this GitHub thread. Visit golang.org/cl/162737.
After addressing review feedback, remember to publish your drafts!

@pbnjay pbnjay closed this Feb 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants