Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/tar: do not use ustar long filename encoding and binary number encoding together #9683

Closed
lyonel opened this issue Jan 25, 2015 · 16 comments

Comments

@lyonel
Copy link

lyonel commented Jan 25, 2015

Go version: go1.3.3 linux/amd64 on Fedora 20

When creating an archive (with tar.NewWriter), long names (> 100 chars) that don't fit into standard tar headers need to be encoded differently.
An optimisation in archive/tar/writer.go: writeHeader() tries to use a ustar header when only the name is too long creates files that are misinterpretated by other tar implementations (but read correctly by archive/tar)

For example, /home/support/.openoffice.org/3/user/uno_packages/cache/registry/com.sun.star.comp.deployment.executable.PackageRegistryBackend becomes com.sun.star.comp.deployment.executable.PackageRegistryBackend for external tar commands (tested with GNU tar, BSD tar and star)

Modifying archive/tar/writer.go and forcing preferPax to true in NewWriter fixes the issue

@minux minux changed the title archive/tar ustar-style long filenames not understood by GNU tar / BSD tar / star archive/tar: ustar-style long filenames not understood by GNU tar / BSD tar / star Jan 25, 2015
@bradfitz
Copy link
Contributor

I see there were archive/tar changes between Go 1.3 and Go 1.4. Please confirm with this is a problem with Go 1.4 and we can re-open this bug.

@lyonel
Copy link
Author

lyonel commented Jan 25, 2015

Actually there has been no code change affecting archive creation since 2013, please re-open.

@minux
Copy link
Member

minux commented Jan 25, 2015

Could you please provide a short program that creates a tar file that
common tar(1) commands couldn't handle?

I tried but failed to produce such an example.
http://play.golang.org/p/atdcetMiDV

@lyonel
Copy link
Author

lyonel commented Jan 26, 2015

it's actually quite tricky: the name has to be long enough (>100 chars), but not too long (<250 or so) and must contain slashes

here is an example that will untar a 'readme.txt' file instead of /path/to/readme/readme/.../readme.txt:
http://play.golang.org/p/sok5w4xal0

@minux
Copy link
Member

minux commented Jan 26, 2015

I tried your example on both Linux and OS X, but the results are both
correct.

PS: if recreating the bug requires a trick test case, please provide the
example program in the original report, that will save us a lot of time.
Thanks.

@lyonel
Copy link
Author

lyonel commented Jan 26, 2015

you're correct, I'll need to extract the exact code that exhibits the issue... stay tuned.

@lyonel
Copy link
Author

lyonel commented Jan 26, 2015

It's indeed tricky: it only happens when archive/tar omits the ustar version (instead of setting it to '00'); this is triggered by "big" user ids (10000000)

cf. http://play.golang.org/p/pTFShmA0bZ

package main

import (
    "archive/tar"
    "log"
    "os"
    "strings"
)

func main() {
    // Create a buffer to write our archive to.
    buf, _ := os.Create("broken.tar")

    // Create a new tar archive.
    tw := tar.NewWriter(buf)

    // Add some files to the archive.
    var files = []struct {
        Name, Body string
    }{
        {"shortname.txt", "short file content"},
        {"path/to/" + strings.Repeat("readme/", 15) + "readme.txt", "This archive contains some text files."},
        {"b/shortname.txt", "short file content"},
    }
    for _, file := range files {
        hdr := &tar.Header{
            Name:       file.Name,
            Uid:        1000000000,
            Uname:      "root",
            Gname:      "root",
            Linkname:   ".",
            Mode:       0660,
            Size:       int64(len(file.Body)),
        }
        if err := tw.WriteHeader(hdr); err != nil {
            log.Fatalln(err)
        }
        if _, err := tw.Write([]byte(file.Body)); err != nil {
        log.Fatalln(err)
        }
    }
    // Make sure to check the error on Close.
    if err := tw.Close(); err != nil {
        log.Fatalln(err)
    }
    buf.Close()
}

@minux minux reopened this Jan 27, 2015
@minux
Copy link
Member

minux commented Jan 27, 2015

The problem is that if the uid/gid is bigger than 2^21-1 (that is, can't be
epresented in 8 octal digits), archive/tar will use binary representation
in the tar header.

When using that extension, it uses "ushar\x20\x20" as header magic,
instead of the normal "ushar". However, according to GNU tar source,
the long file name will only be considered if it's TMAGIC ("ushar").

By always using "ushar" fixed the problem, and both GNU tar and BSD
tar read the tar file correctly. I'm not sure why we're using
"ushar\x20\x20"
(OLDGNU_MAGIC) in the first place. It seems all tar readers can handle
binary-encoded uid/gid just fine with TMAGIC.

@minux
Copy link
Member

minux commented Jan 27, 2015

According to libarchive source code,
https://github.com/libarchive/libarchive/blob/master/libarchive/archive_write_set_format_ustar.c
http://nxr.netbsd.org/xref/src-freebsd/contrib/libarchive/libarchive/archive_write_set_format_gnutar.c

Non-strict mode ustar support binary (base-256) uid/gid.
However, gnu mode (actually OLDGNU_MAGIC) doesn't support base-256 uid/gid.

According to GNU tar source code, once uid/gid is bigger than 2^21-1, it
will use pax header.

Thus we have at least two possible fixes:

  1. if uid/gid is too large to be written in octal, always use PAX.
  2. if uid/gid is too large, and there is nothing that prevents ushar format
    otherwise, use
    "ushar". This will generate a file not strictly compliant to the ushar
    format, but the majority
    of tar implementions (GNU tar and those based on libarchive) will accept it
    without issue.

the current way of using OLDGNU_MAGIC doesn't look right, as GNU tar itself
doesn't
use base-256 encoding when generating OLDGNU_MAGIC files.

/cc @dsymonds

@lyonel
Copy link
Author

lyonel commented Jan 27, 2015

Thanks @minux for your analysis!
Just 2 remarks:

  • any "binary-encoded" value will make the archive unreadable (especially the size, that's actually what happened in my initial bug report)
  • I concur with you, PAX headers are probably the best option (forcing preferPax to true did the trick in my case)

@rsc rsc changed the title archive/tar: ustar-style long filenames not understood by GNU tar / BSD tar / star archive/tar: do not use ustar long filename encoding and binary number encoding together Apr 10, 2015
@rsc rsc added this to the Go1.5 milestone Apr 10, 2015
@rsc
Copy link
Contributor

rsc commented Jun 29, 2015

Too late for Go 1.5.

@dsnet
Copy link
Member

dsnet commented Sep 11, 2015

If #12594 is addressed, then this issue goes away.

@dsnet
Copy link
Member

dsnet commented Sep 4, 2016

Closing this as duplicate of #12594

@gopherbot
Copy link

CL https://golang.org/cl/32234 mentions this issue.

gopherbot pushed a commit that referenced this issue Nov 2, 2016
The proper fix for the Writer is too involved to be done in time
for Go 1.8. Instead, we do a localized fix that simply disables the
prefix encoding logic. While this will prevent some legitimate uses
of prefix, it will ensure that we don't keep outputting invalid
GNU format files that have the prefix field populated.

For headers with long filenames that could have used the prefix field,
they will be promoted to use the PAX format, which ensures that we
will still be able to encode all headers that we were able to do before.

Updates #12594
Fixes #17630
Fixes #9683

Change-Id: Ia97b524ac69865390e2ae8bb0dfb664d40a05add
Reviewed-on: https://go-review.googlesource.com/32234
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/54433 mentions this issue: archive/tar: check for permissible output formats first

gopherbot pushed a commit that referenced this issue Aug 11, 2017
The current logic in writeHeader attempts to encode the Header in one
format and if it discovered that it could not it would attempt to
switch to a different format mid-way through. This makes it very
hard to reason about what format will be used in the end and whether
it will even be a valid format.

Instead, we should verify from the start what formats are allowed
to encode the given input Header. If no formats are possible,
then we can return immediately, rejecting the Header.

For now, we continue on to the hairy logic in writeHeader, but
a future CL can split that logic up and specialize them for each
format now that we know what is possible.

Update #9683
Update #12594

Change-Id: I8406ea855dfcb8b478a03a7058ddf8b2b09d46dc
Reviewed-on: https://go-review.googlesource.com/54433
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/55574 mentions this issue: archive/tar: re-implement USTAR path splitting

gopherbot pushed a commit that referenced this issue Aug 15, 2017
The logic for USTAR was disabled because a previous implementation of
Writer had a wrong understanding of the differences between USTAR and GNU,
causing the prefix field is incorrectly be populated in GNU files.

Now that this issue has been fixed, we can re-enable the logic for USTAR
path splitting, which allows Writer to use the USTAR for a wider range
of possible inputs.

Updates #9683
Updates #12594
Updates #17630

Change-Id: I9fe34e5df63f99c6dd56fee3a7e7e4d6ec3995c9
Reviewed-on: https://go-review.googlesource.com/55574
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@golang golang locked and limited conversation to collaborators Aug 15, 2018
@rsc rsc unassigned dsnet Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants