-
Notifications
You must be signed in to change notification settings - Fork 18k
archive/tar: add support for arbitrary pax vendor extensions #14472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not the owner of If the package allows for arbitrary vendor extensions, there are some edge cases to think about. For example #13548 requests sparse file support, which can be accomplished using the GNU.sparse.* extended headers. Since these extended headers have semantic meaning in regards to how to read the file, what happens when the user overwrites GNU.sparse.realsize or something? More than likely, the user will end up creating a corrupt file. This will take a little thought, but it's certainly solvable. If I have time, I can try and tackle this in the go1.7 cycle. |
Yeah, I thought about that, too, but I don't have a good solution. The same might be said e.g. for SCHILY.acl.*, if someone adds support for POSIX ACLs to Header directly. If the caller to WriteHeader() fills out both the specific Header field and an entry in the PAX extension map, which takes precedence? Thanks for considering this change. For now, I'm going to have to fork archive/tar to add the necessary functionality, but I'd like to converge back on the official version if/when this feature makes it in. |
I think you are now. :-) Anyway, it sounds like you are in favor and so a CL just needs to be written. This isn't going to block the release, but if you want to write a CL or someone else wants to write it and send it to you for review, that sounds fine. |
Pushing to Go 1.9. I'm going to close off all Reader related issues in this cycle and let Go 1.9 be about Writer. |
Another way to deal with this (for readers anyway) is to add: func (tr *Reader) NextRaw() (*Header, error) (or some similar name) that gets you the next entry without invoking the current automatic pax-extention-header consumption and similar. Users who want to handle custom extended headers could call |
If I understand your suggestion correctly, you are proposing that we try to read the archive as straight USTAR format (since PAX is really an extension built ontop of USTAR)? What happens when the underlying format is the old GNU format, which does not conform to USTAR? It also seems pretty painful, that'd you would have to parse all of the PAX extended headers yourself.
Is this a real use-case that you have? Validating that a tar file is of a given format is actually pretty hard. Do you validate it according to strict conformance to the POSIX specification or GNU manual? Do you validate it according to what the GNU implementation actually does? Do you validate it according to what the world seems to believe those formats are? If there is a problem with implementations properly conforming to HTTP specifications, then the conformance of tar implementations to "specifications" is even worse. I'm not sure the standard library should provide this feature. On the other hand, I'm willing to explore providing an option where the Writer guarantees to output only specific formats; please file another issue if you care for it. |
On Fri, Oct 28, 2016 at 03:19:18PM -0700, Joe Tsai wrote:
Yeah.
I'm not particularly familiar with the old GNU format; can you spell
At the moment, I'm just looking at typeflag values (not having realize I also have the same “I want to access a pax extended header that Go
I agree that the standard library does not need to provide tar
I'll see how getting low-level extension points goes for readers ;). |
@wking, I glanced through your references and I want to make sure that we are addressing your issues appropriately. It seems from that you have two separate issues:
The first problem is irrelevant to this ticket and I am somewhat opposed to providing that amount of low-level control over the consumed and generated tar ball. I know I'm not really responding to your points. We can discuss more, but please file a separate issue for that. The second problem is exactly what this issue is about and I think there is a sufficient need that this should be addressed. I'll present a preliminary design shortly that can possibly make it into Go 1.9. |
@jstarks and @wking. Your thoughts on the following API? type Header struct {
...
// Records stores key-value records for a given file to be encoded as a
// series of PAX extended header records.
//
// The key must take the following form:
// VENDOR.keyword
// Where VENDOR is some namespace in all uppercase and keyword is any
// arbitrary UTF-8 encoded string not containing the '=' character
// (e.g., "GOLANG.pkg.version").
//
// The value has no restrictions.
Records map[string]string
// Xattrs stores attributes for a file under the SCHILY namespace.
//
// Recording a key-value pair as:
// hdr.Xattrs["key"] = "value"
// Is equivalent to doing the following:
// hdr.Records["SCHILY.xattr.key"] = "value"
//
// Deprecated: Use Records instead.
Xattrs map[string]string
} As far as I know, the only common PAX header that has semantic meaning for the interpretation of the file is Changes to ReaderWhen reading a file, all PAX records will be read. All records will be copied into
All of the records in Changes to WriterWhen writing a file, all records in The records in
If those condition passes, we will write that information into the tar file. Note that the restrictions (except for the ones regarding
In other places:
|
This looks good to me. I think this would cover the MSWINDOWS.* extensions I created for Windows metadata. Looking through my fork of archive/tar, the other thing I needed to change was the handling of time. Basically I need to guarantee subsecond time granularity, which is anticipated by this comment in writer.go:
To avoid confusing this issue, I can open another issue to request some mechanism to encode time with subsecond resolution, if you think it's worthy of consideration. |
On Fri, Oct 28, 2016 at 05:01:50PM -0700, Joe Tsai wrote:
Filed as #17657. |
On Fri, Oct 28, 2016 at 05:02:14PM -0700, Joe Tsai wrote:
This looks good to me, but I'd rather have it be PaxExtendedHeaders or
This is stricter than the spec, which you quote. Here it is with a An extended header shall consist of one or more records, each
The extended header records shall be encoded according to the … Keywords consisting entirely of lowercase letters, digits, and And then later down, they suggest something that's closer to your The standard developers have reserved keyword name space for vendor
where VENDOR is the name of the vendor or organization in all So I think it's a good idea for users to follow those suggestions, but And rather than recapping all of that in the comment, I'd suggest
Looks good to me.
I'm not sure about “common”, but SCHILY.filetype [3](which I'd like
I'd copy anything that we don't consume into Records. For example,
Is “reject the header” “exit WriteHeader with an error”? That makes
I think these are too strict; see above. For cases where the user |
It seems that Sparse file support is complex enough that it makes sense to build that into the
The strictness of my proposed API is entirely intentional. We can always relax the rules in future versions, but we can never make them strict again due to the compatibility agreement. The reason I chose an intentionally strict API is because I didn't want to deal with all the edge cases that occur when member fields of the I believe the number of users who even need arbitrary PAX headers are relatively low and it seems my current proposal is sufficient for all the known use cases. It seems to me that the use cases for complete reign over the PAX headers are mostly hypothetical. Should a need arise in the future, people are welcome to fork the One listed header that people may want access to that my API currently doesn't provide is the I believe the proposed API with all its strictness satisfies the real use-cases that people have without complicating the API significantly for the typical user of the |
On Fri, Oct 28, 2016 at 10:48:44PM -0700, Joe Tsai wrote:
This is currently true with your Xattrs / Records split. If someone How about replacing your Records with: func (h *Header) PaxHeaders() map[string]string SetPaxHeader would parse known headers (e.g. “atime”) into the PaxHeaders() would construct the headers as writeHeader does now, header.SetPaxHeader("atime", "1084839148.1212") Then callers can update via either the Header attribute or via And with that sort of consistency guarantee, I think it would be One interesting case (that I'm ok with but which is probably worth header.SetPaxHeader("atime", "1084839148") I'm fine with the explicitly-set header “going away”, and if that |
@wking Or we can just make it so that if a key is set in |
On Wed, Nov 09, 2016 at 05:41:11AM -0800, Aleksa Sarai wrote:
With an entry that sets SCHILY.xattr.key to foo, all of the proposals header, err = tarReader.Next() With @dsnet's original proposal 1, folks trying to use the new API header, err = tarReader.Next() will get an error from the WriteHeader call (“If the key-value already header, err = tarReader.Next() That makes actually removing Xattrs (in some future Go release after a With your proposal (not erroring out on conflicts and preferring With my proposal 2, callers can still use the old Xattrs-only header, err = tarReader.Next() That doesn't involve the deprecated Xattrs API, so you can transition If SetPaxHeader is not the API folks want (for whatever reason), PaxHeaders map[string]string // or Records, or whatever // Create PaxHeaders entries for anything in in Xattrs and empty Xattrs. // Deprecated. Call PaxHeadersOptIn() and use PaxHeaders Callers interested in the new API could: header, err = tarReader.Next() Then when Xattrs is eventually dropped, PaxHeadersOptIn becomes a |
I like the idea of SetPaxHeader. I do not like PaxHeadersOptIn. I think SetPaxHeader is better than exposing a map because it sidesteps the possible collisions. In this proposal, how do you remove a header? Is an empty value the same as removing, or does that potentially have semantic meaning (in which case I guess we need DeletePaxHeader?) |
On Wed, Nov 09, 2016 at 11:49:31AM -0800, Quentin Smith wrote:
Ah, good point. I don't see a “ must be non-empty” requirement func (h *Header) SetPaxHeader(keyword, value *string) error where nil means “delete the header”. I don't have a strong preference |
I'm not opposed to adding methods to Here are my reasons why:
I'm not opposed to a Responses to other statements made:
@wking, The semantic meaning of my proposal is only if the existing value differs. Thus, the following is valid: header, err = tarReader.Next()
header.Records["SCHILY.xattr.key"] = "bar"
n, err = tarWriter.WriteHeader(header)
@quentinmit, It doesn't sidestep collisions, they still occur. The specific policy chosen (i.e., always replace) is not specific to the
@wking, Actually, an empty value means deletion: |
On Wed, Nov 09, 2016 at 01:39:51PM -0800, Joe Tsai wrote:
I'm not familiar with “aliased field”, can you unpack that for me? Do
Every time you call it. Folks who need to access a number of headers paxHeaders := header.PaxHeaders() I agree that it would be nice to avoid this copy cost, but it's a
I think the local-variable approach mitigates this, but I'm also
With the atime examples I floated in 3, the newer value clobbers the
It keeps all the fields in sync, since it will be updating Xattrs
Because “returns the pax headers that WriteHeaders will write” is
As soon as you allow an exported map, you lose the ability to
But the existing value was “bar” (as loaded by Next() 1).
Hmm. An empty-string value will have a length of one (“The
Writing gets tricky. There doesn't seem to be a tailored API for a. Set an empty-string value (e.g. value = ""; SetPaxHeader("foo", &value)). So my current preferred API is: // PaxHeaders returns a copy of the pax extended headers which would // SetPaxHeader sets a pax extended header. If the keyword // DeletePaxHeader removes a pax header from future PaxHeaders or |
On Wed, Nov 09, 2016 at 02:28:56PM -0800, W. Trevor King wrote:
Another important distinction is that WriteHeader is not the only
With the SetPaxHeader approach, $VALUE will be the new “bar”. With To avoid confusion with the Records approach, as soon as one element An alternative Xattrs → Records migration plan would be the parallel |
Change https://golang.org/cl/58390 mentions this issue: |
Feature request: archive/tar should support reading and writing arbitrary pax vendor extensions. This could be done either with additions to tar.Header, or with new functions on tar.Reader and tar.Writer. This would allow go code to work with metadata that archive/tar does not yet support.
Here's the motivation: Docker (written in go) uses tar as the archive format for container images in Linux. We'd like to make sure this works for Windows, too, but doing so requires extending the tar format to include some non-POSIX Windows metadata (security descriptor, creation time, etc.).
The natural way to extend the tar format is to encode this additional metadata in pax vendor extensions, e.g. "MSWINDOWS.sd" for the security descriptor. Currently, however, archive/tar does not expose pax vendor extensions other than those that start with SCHILY.xattr (used for POSIX extended attributes), which are exposed in Header.Xattrs. It's probably not reasonable to ask archive/tar to support these specific "MSWINDOWS" extensions that are neither formally defined nor used anywhere today, but archive/tar could expose the ability to read and write arbitrary vendor extensions.
This seems like it would be useful outside of this use case as well, e.g. to support POSIX ACLs (SCHILY.acl.*).
@dsnet
The text was updated successfully, but these errors were encountered: