Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/tar: provide the ability to select the format #18710

Closed
dsnet opened this issue Jan 18, 2017 · 3 comments
Closed

archive/tar: provide the ability to select the format #18710

dsnet opened this issue Jan 18, 2017 · 3 comments
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Jan 18, 2017

There should be some way to inform the Writer what tar format to write in (and also never to write in). For symmetry, the Reader could also provide insight into what format the file is in.

Example use case: Debian packages are an ar archive with 2 other tar archives within them. However, the dpkg tool does not support parsing the PAX format. In the case of creating Debian packages, we should prioritize using the GNU format over the PAX format. If the PAX format was used, then the debian package is rejected by dpkg as being corrupted (when it is infact a valid PAX format tar file).

The fix for this is not as simple as simply making GNU the priority over PAX since some readers have the opposite problem where they can handle PAX and not GNU.

\cc @neild

@dsnet dsnet added this to the Go1.9Maybe milestone Jan 18, 2017
@dsnet dsnet self-assigned this Jan 18, 2017
@dsnet dsnet changed the title archive/tar: provide the ability to select which the format archive/tar: provide the ability to select the format Jan 18, 2017
@dsnet dsnet modified the milestones: Go1.9, Go1.9Maybe Jan 21, 2017
@dsnet dsnet modified the milestones: Go1.10, Go1.9 May 22, 2017
@gopherbot
Copy link

Change https://golang.org/cl/55231 mentions this issue: archive/tar: implement specialized logic for PAX format

gopherbot pushed a commit that referenced this issue Aug 14, 2017
Rather than going through writeHeader, which attempts to handle all formats,
implement writePAXHeader, which only has an understanding of the PAX format.

In PAX, the USTAR header is filled out in a best-effort manner.
Thus, we change logic of formatString and formatOctal to try their best to
output something (possibly truncated) in the event of an error.

The new implementation of PAX headers causes several tests to fail.
An investigation into the new output reveals that the new behavior is correct,
while the tests had actually locked in incorrect behavior before.

A dump of the differences is listed below (-before, +after):

<< writer-big.tar >>

This change is due to fact that we changed the Header.Devminor to force the
tar.Writer to choose the GNU format over the PAX one.
The ability to control the output is an open issue (see #18710).
- 00000150  00 30 30 30 30 30 30 30  00 00 00 00 00 00 00 00  |.0000000........|
+ 00000150  00 ff ff ff ff ff ff ff  ff 00 00 00 00 00 00 00  |................|

<< writer-big-long.tar>>

The previous logic generated the GNU magic values for a PAX file.
The new logic correctly uses the USTAR magic values.
- 00000100  00 75 73 74 61 72 20 20  00 00 00 00 00 00 00 00  |.ustar  ........|
- 00000500  00 75 73 74 61 72 20 20  00 67 75 69 6c 6c 61 75  |.ustar  .guillau|
+ 00000100  00 75 73 74 61 72 00 30  30 00 00 00 00 00 00 00  |.ustar.00.......|
+ 00000500  00 75 73 74 61 72 00 30  30 67 75 69 6c 6c 61 75  |.ustar.00guillau|

The previous logic tried to use the specified timestmap in the PAX headers file,
but this is problematic as this timestamp can overflow, defeating the point
of using PAX, which is intended to extend tar.
The new logic uses the zero timestamp similar to what GNU and BSD tar do.
- 00000080  30 30 30 30 32 33 32 00  31 32 33 33 32 37 37 30  |0000232.12332770|
+ 00000080  30 30 30 30 32 35 36 00  30 30 30 30 30 30 30 30  |0000256.00000000|

The previous logic populated the devminor and devmajor fields.
The new logic leaves them zeroed just like what GNU and BSD tar do.
- 00000140  00 00 00 00 00 00 00 00  00 30 30 30 30 30 30 30  |.........0000000|
- 00000150  00 30 30 30 30 30 30 30  00 00 00 00 00 00 00 00  |.0000000........|
+ 00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
+ 00000150  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The previous logic uses PAX headers, but fails to add a record for the size.
The new logic does properly add a record for the size.
- 00000290  31 36 67 69 67 2e 74 78  74 0a 00 00 00 00 00 00  |16gig.txt.......|
- 000002a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
+ 00000290  31 36 67 69 67 2e 74 78  74 0a 32 30 20 73 69 7a  |16gig.txt.20 siz|
+ 000002a0  65 3d 31 37 31 37 39 38  36 39 31 38 34 0a 00 00  |e=17179869184...|

The previous logic encoded the size as a base-256 field,
which is only valid in GNU, but the previous PAX headers implies this should
be a PAX file. This result in a strange hybrid that is neither GNU nor PAX.
The new logic uses PAX headers to store the size.
- 00000470  37 35 30 00 30 30 30 31  37 35 30 00 80 00 00 00  |750.0001750.....|
- 00000480  00 00 00 04 00 00 00 00  31 32 33 33 32 37 37 30  |........12332770|
+ 00000470  37 35 30 00 30 30 30 31  37 35 30 00 30 30 30 30  |750.0001750.0000|
+ 00000480  30 30 30 30 30 30 30 00  31 32 33 33 32 37 37 30  |0000000.12332770|

<< ustar.issue12594.tar >>

The previous logic used the specified timestamp for the PAX headers file.
The new logic just uses the zero timestmap.
- 00000080  30 30 30 30 32 33 31 00  31 32 31 30 34 34 30 32  |0000231.12104402|
+ 00000080  30 30 30 30 32 33 31 00  30 30 30 30 30 30 30 30  |0000231.00000000|

The previous logic populated the devminor and devmajor fields.
The new logic leaves them zeroed just like what GNU and BSD tar do.
- 00000140  00 00 00 00 00 00 00 00  00 30 30 30 30 30 30 30  |.........0000000|
- 00000150  00 30 30 30 30 30 30 30  00 00 00 00 00 00 00 00  |.0000000........|
+ 00000140  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
+ 00000150  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

Change-Id: I33419eb1124951968e9d5a10d50027e03133c811
Reviewed-on: https://go-review.googlesource.com/55231
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/58230 mentions this issue: archive/tar: support reporting and selecting the format

@gopherbot
Copy link

Change https://golang.org/cl/58310 mentions this issue: archive/tar: return better WriteHeader errors

gopherbot pushed a commit that referenced this issue Aug 25, 2017
WriteHeader may fail to encode a header for any number of reasons,
which can be frustrating for the user when trying to create a tar archive.
As we validate the Header, we generate an informative error message
intended for human consumption and return that if and only if no
format can be selected.

This allows WriteHeader to return informative errors like:
    tar: cannot encode header: invalid PAX record: "linkpath = \x00hello"
    tar: cannot encode header: invalid PAX record: "SCHILY.xattr.foo=bar = baz"
    tar: cannot encode header: Format specifies GNU; and only PAX supports Xattrs
    tar: cannot encode header: Format specifies GNU; and GNU cannot encode ModTime=1969-12-31 15:59:59.0000005 -0800 PST
    tar: cannot encode header: Format specifies GNU; and GNU supports sparse files only with TypeGNUSparse
    tar: cannot encode header: Format specifies USTAR; and USTAR cannot encode ModTime=292277026596-12-04 07:30:07 -0800 PST
    tar: cannot encode header: Format specifies USTAR; and USTAR does not support sparse files
    tar: cannot encode header: Format specifies PAX; and only GNU supports TypeGNUSparse

Updates #18710

Change-Id: I82a498d6f29d02c4e73bce47b768eb578da8499c
Reviewed-on: https://go-review.googlesource.com/58310
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@golang golang locked and limited conversation to collaborators Aug 24, 2018
@rsc rsc unassigned dsnet Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants