Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress/gzip: expose sub-file information #6486

Closed
rsc opened this issue Sep 26, 2013 · 16 comments
Closed

compress/gzip: expose sub-file information #6486

rsc opened this issue Sep 26, 2013 · 16 comments
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Sep 26, 2013

See https://groups.google.com/d/msg/golang-dev/nA3QRGIvrOE/JmABTgizDAsJ and following
messages.

Should do this for compress/bzip2 too.
May want to expose compressed input offset in compress/flate.
@rsc
Copy link
Contributor Author

rsc commented Dec 4, 2013

Comment 1:

Labels changed: added release-go1.3.

@rsc
Copy link
Contributor Author

rsc commented Dec 4, 2013

Comment 2:

Labels changed: removed go1.3.

@rsc
Copy link
Contributor Author

rsc commented Dec 4, 2013

Comment 3:

Labels changed: added repo-main.

@bradfitz
Copy link
Contributor

Comment 4:

This requires API changes. Probably too late for Go 1.3. (It was also too late for Go
1.2 when it was proposed, though, and is arguably already "started" since a patch
exists).
Russ?

@rsc
Copy link
Contributor Author

rsc commented Apr 16, 2014

Comment 5:

Yes, I guess so. Best laid plans and all.

Labels changed: added release-go1.4, removed release-go1.3.

@kortschak
Copy link
Contributor

Comment 6:

Is the started CL available for this? If it's not going into 1.3, I'd like to be able to
work on a fork that is likely to match what goes into the standard so that when it does
it is not much more than a import change.

@kortschak
Copy link
Contributor

Comment 7:

Sorry, that is obviously the diff quoted at
https://groups.google.com/d/msg/golang-dev/nA3QRGIvrOE/cXv7ygTsU8IJ. What I'm interested
in knowing is if the change that comes is likely to be in line with what was tentatively
proposed there. I am asking because I have been holding off work on my package waiting
for this issue, so now that it looks like if will possibly be another three to six
months (depending on how the release cycle is dealt with this round), I don't think I
can wait for the standard library any more.

@rsc
Copy link
Contributor Author

rsc commented Apr 16, 2014

Comment 8:

I assumed you had forked the package already. Please do.

@rsc
Copy link
Contributor Author

rsc commented Apr 16, 2014

Comment 9:

To answer your original question, I don't know what the API will be like. My message is
one option, but I'm not 100% thrilled with it. If I were sure that was the right API I
would just do it now.

@kortschak
Copy link
Contributor

Comment 10:

Thanks, Russ. I should clarify. I have a fork that uses my approach to handling the
member header to deal with blocked decoding, but maintaining this while bringing in
changes from compress/gzip (and keeping public-facing types and fields interoperable
with that package) while not difficult it painful.
I was hoping to retire that fork when the fix for this issue and the (otherwise
unrelated) CL13512052 goes in,  before moving on to deal with the long term
implementation of the BGZF reader which will be doing partially parallel blocked read
ahead with block caching. This requires careful handling of gzip member header fields
that I'm only just able to keep my head around in a stable world, changing API is likely
to blow away my capacity to handle that.

@kortschak
Copy link
Contributor

Comment 11:

I have come back to this code since I have found some time with clear thought. I can now
see that I can achieve exactly what I want to do with the API as is stands at the
moment. Apologies for confusion.
To explain, the current gzip.Reader Read() method returns with a nil error without
filling the parameter slice when the call to z.decompressor.Read(p) reads across a
member boundary (because err != io.EOF since the decompressor return nil here). In my
case I intend to already know the start of the next member, so a gzip.Reader Reset() on
the underlying Reader Seek'd to the appropriate location will give me the next Header.
The upshot is that a test of n < len(p) && err == nil will tell you if you have
reached a boundary. 
After a brief reading of the WARC format mentioned by Daniel Krech in the original
thread, I think they should also be able to use this approach.

@bradfitz
Copy link
Contributor

Comment 12:

That doesn't sound like a reliable test. An io.Reader is always allowed to return fewer
bytes than requested.
Future code changes may break your test.

@kortschak
Copy link
Contributor

Comment 13:

That's true, but the flate.Reader can't read through to the next member since there are
non-flate components to be handled and it could be specified in the gzip docs that a
single Read will not cross a member boundary, formalising the current behaviour.

@kortschak
Copy link
Contributor

Comment 14:

Sorry, you're right Brad; my tests pass due to their small size. Working with real data,
I can guarantee they will fail since the flate.decompressor hist is half as long as the
block size I am using.

@gopherbot
Copy link

Comment 15:

CL https://golang.org/cl/159120044 mentions this issue.

@rsc
Copy link
Contributor Author

rsc commented Oct 21, 2014

Comment 16:

This issue was closed by revision 70f2f1b.

Status changed to Fixed.

@rsc rsc added fixed labels Oct 21, 2014
@rsc rsc self-assigned this Oct 21, 2014
@rsc rsc added this to the Go1.4 milestone Apr 14, 2015
@rsc rsc removed the release-go1.4 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jun 25, 2018
Allows parsing some file formats that assign special
meaning to which stream data is found in.

Will do the same for compress/bzip2 once this is
reviewed and submitted.

Fixes golang#6486.

LGTM=nigeltao
R=nigeltao, dan.kortschak
CC=adg, bradfitz, golang-codereviews, r
https://golang.org/cl/159120044
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jun 26, 2018
Allows parsing some file formats that assign special
meaning to which stream data is found in.

Will do the same for compress/bzip2 once this is
reviewed and submitted.

Fixes golang#6486.

LGTM=nigeltao
R=nigeltao, dan.kortschak
CC=adg, bradfitz, golang-codereviews, r
https://golang.org/cl/159120044
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jul 9, 2018
Allows parsing some file formats that assign special
meaning to which stream data is found in.

Will do the same for compress/bzip2 once this is
reviewed and submitted.

Fixes golang#6486.

LGTM=nigeltao
R=nigeltao, dan.kortschak
CC=adg, bradfitz, golang-codereviews, r
https://golang.org/cl/159120044
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jul 20, 2018
Allows parsing some file formats that assign special
meaning to which stream data is found in.

Will do the same for compress/bzip2 once this is
reviewed and submitted.

Fixes golang#6486.

LGTM=nigeltao
R=nigeltao, dan.kortschak
CC=adg, bradfitz, golang-codereviews, r
https://golang.org/cl/159120044
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jul 30, 2018
Allows parsing some file formats that assign special
meaning to which stream data is found in.

Will do the same for compress/bzip2 once this is
reviewed and submitted.

Fixes golang#6486.

LGTM=nigeltao
R=nigeltao, dan.kortschak
CC=adg, bradfitz, golang-codereviews, r
https://golang.org/cl/159120044
@rsc rsc removed their assignment Jun 22, 2022
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants