New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
archive/tar: add Reader.NextRaw method to read only one raw header #17657
Comments
Thanks for filing this. I explored this somewhat and I'm opposed to providing this functionality for the following reasons:
|
On Sat, Oct 29, 2016 at 02:46:22PM -0700, Joe Tsai wrote:
I agree that tar.Header.Bytes is hacky and am open to the TeeWriter
Forking is so drastic, when NextRaw seems pretty straightforward.
I agree that this is a bad fit for the stdlib for the reasons you |
I think a
An API to add
|
On Wed, Nov 09, 2016 at 01:55:26PM -0800, Joe Tsai wrote:
Agreed, although the TeeWriter approach depends on NextRaw or similar
These are good points. The main issue with Bytes access and tar // Next2 is just like Next, but if callback is non-nil it is called type HeaderCallback(tar *Reader, header *Header) error That seems reasonable to me, although the “don't break the reader” type HeaderCallback(header *Header) error but folks might want to use the Reader reference in a non-desctuctive |
Before we dig deeper into this API (which seems fairly complex), I want to understand the use case more. How does |
a possible option (and one that I would love to reconcile with golang upstream) is the approach taken for https://github.com/vbatts/tar-split. (coincidentally I discovered this thread while preparing to open the discussion for pushing the |
On Mon, Jan 16, 2017 at 01:54:05PM -0800, Vincent Batts wrote:
See the docs:
https://godoc.org/github.com/vbatts/tar-split/archive/tar The only
additions to the `archive/tar` API is the `RawAccounting bool` field
and the `RawBytes()` func on the `Reader` struct.
Does this collect *all* the bytes (from both the headers and payloads)
between RawBytes calls when RawAccounting is set? If so, it sounds a
lot like the io.TeeWriter approach [1]. Is it just a more conventient
API for that behavior, or does it give you a way to trigger on tar
headers that are currently elided by the upstream Go library (e.g. pax
x typeflag records)?
[1]: #17657 (comment)
|
It does not seem like the As @wking mention, it seems that the It seems that this could be accomplished with a modified However, my modification falls short in the following two areas:
If those are the only two deficiencies (correct me if I'm wrong), I would much rather see the following happen instead:
The first change requires no API change and is something I would want to see anyways. The later API change at least provides benefit for another functionality more popularly requested of the |
@dsnet understandable, but the need was in accessing only the raw headers and padding, and not the body of each entry, as the tar.Reader.Read() already provides this. @wking in this way, it does allow for collecting all the bytes. That's the purpose of tar-split, such that all the non-payload bytes can be preserved for re-assembling the cryptographic-equivalent original. There is nothing for triggering on tar headers. Though a parser of the RawBytes ought not be too terrible. Additionally, that RawBytes approach does not affect the tar.Writer at all. It is only for reading. |
Use TeeReader instead: https://play.golang.org/p/xXOIJ0d63A There's an interesting proposal currently open golang/go#17657 Signed-off-by: Simarpreet Singh <simar@linux.com>
The current master implementation elides records, for example pax
x
typeflag records are collapsed into the following record with the extended headers being merged into the normal file's headers. This is useful for most consumers, who are only interested in consuming the tar content. But two use cases would benefit from more direct access to the lower-level tar details:tar.Header
does not currently expose pax extended headers outside of these names and prefixes. There are plans to rectify this limitation (archive/tar: add support for arbitrary pax vendor extensions #14472), but until the tar structures expose all of these sorts of details it will be useful to have a way to directly access the header and file data. This will allow users to add their own parsers for any extension they need which is not yet supported by the stdlib, without requiring them to fork archive/tar or replace it with a completely novel parser.Next()
never returns thex
typeflag headers).I think we can address both of these cases without breaking the existing API with two changes:
tar.Reader.NextRaw()
that works liketar.Reader.Next()
, but always returns the next record regardless of its typeflag.tar.Header.Bytes
exposing a[]byte
slice of the raw header data, to allow clients to access header fields that are not yet mapped to othertar.Header
attributes. Alternatively, this could be a methodtar.Header.Bytes()
(or an attribute exposing astring
or whatever) to make the data clearly read-only.Folks who wish to control writing at a low level may want similar changes on the
tar.Writer
side, but I think that is orthogonal enough to punt to a separate issue (if anyone wants it).CC @dsnet.
The text was updated successfully, but these errors were encountered: