Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/tar: tar archives created with paths longer than 100 characters are not valid tar files #17630

Closed
nmiyake opened this issue Oct 27, 2016 · 10 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@nmiyake
Copy link
Contributor

nmiyake commented Oct 27, 2016

What version of Go are you using (go version)?

go version go1.7.3 darwin/amd64

What did you do?

Used tar.FileInfoHeader and tarWriter.WriteHeader to write an entry with a name (path in tar) longer than 100 characters.

What did you expect to see?

The generated tar file should be valid, and it should be possible to use tools like tar to expand the created archive.

What did you see instead?

Expanding the archive using tar -xf on the host system failed -- tar reported an error, and the file in the archive which was supposed to be a file nested in many directories was written to the working directory of the expansion.

Details, repro and possible fix

  • The archive does seem to expand properly when read using Go libraries
  • There's a deterministic point (around 100 characters) at which this starts to happen -- the behavior is fine under 100 characters, but breaks in this manner afterwards
  • This seems to have something to do with the header.Uid and header.Gid fields, as setting header.Uid = 0 and header.Gid = 0 before writing the header to the tar file fixes the issue

The following test demonstrates the bug and the work-around. It requires tar to be present on the running system, as it uses exec.Command("tar", ...) to perform the expansion (and thus cannot be run in the Go Playground).

package archive_test

import (
    "archive/tar"
    "fmt"
    "io"
    "io/ioutil"
    "os"
    "os/exec"
    "path"
    "testing"
)

func TestArchive(t *testing.T) {
    const tarName = "test.tar"
    tmpDir, err := ioutil.TempDir("", "test-src")
    defer os.RemoveAll(tmpDir)
    if err != nil {
        panic(err)
    }

    for i, currCase := range []struct {
        filePath string
    }{
        // ok
        {filePath: "0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30/file.txt"},
        // ok
        {filePath: "0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30/31/32/33/file.txt"},
        // fails: path len = 101 characters
        {filePath: "0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30/31/32/333/file.txt"},
        // fails
        {filePath: "0/1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19/20/21/22/23/24/25/26/27/28/29/30/31/32/33/34/35/36/37/38/39/40/file.txt"},
        // fails
        {filePath: "0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000/file.txt"},
    } {
        srcDir, err := ioutil.TempDir(tmpDir, fmt.Sprintf("case-%d-input-", i))
        if err != nil {
            panic(err)
        }

        srcFile := path.Join(srcDir, currCase.filePath)
        err = os.MkdirAll(path.Dir(srcFile), 0755)
        if err != nil {
            panic(err)
        }

        err = ioutil.WriteFile(srcFile, []byte("Hello, world!"), 0644)
        if err != nil {
            panic(err)
        }

        outDir, err := ioutil.TempDir(tmpDir, fmt.Sprintf("case-%d-output-", i))
        if err != nil {
            panic(err)
        }

        tarPath := path.Join(outDir, tarName)

        func() {
            out, err := os.Create(tarPath)
            if err != nil {
                panic(err)
            }
            defer out.Close()

            tarWriter := tar.NewWriter(out)
            defer tarWriter.Close()

            srcInfo, err := os.Stat(srcFile)
            if err != nil {
                panic(err)
            }

            header, err := tar.FileInfoHeader(srcInfo, srcFile)
            if err != nil {
                panic(err)
            }
            header.Name = currCase.filePath

            // Adding these lines fixes the issue (both lines required)-------
            //header.Uid = 0
            //header.Gid = 0
            // ---------------------------------------------------------------

            err = tarWriter.WriteHeader(header)
            if err != nil {
                panic(err)
            }

            if header.Typeflag == tar.TypeReg {
                file, err := os.Open(srcFile)
                if err != nil {
                    panic(err)
                }
                defer file.Close()

                _, err = io.CopyN(tarWriter, file, srcInfo.Size())
                if err != nil {
                    panic(err)
                }
            }
        }()

        // expand using 'tar' command
        cmd := exec.Command("tar", "-C", outDir, "-xf", tarName)
        cmd.Dir = outDir
        output, err := cmd.CombinedOutput()
        if err != nil {
            t.Errorf("Case %d: command %v failed: %s", i, cmd.Args, string(output))
        }

        // long file in tar should be expanded properly
        _, err = os.Stat(path.Join(outDir, currCase.filePath))
        if err != nil {
            t.Errorf("Case %d: file did not exist at path %s", i, path.Join(outDir, currCase.filePath))
        }

        // stray file should not exist
        _, err = os.Stat(path.Join(outDir, "file.txt"))
        if err == nil {
            t.Errorf("Case %d: unexpected file at %s", i, path.Join(outDir, "file.txt"))
        }
    }
}
@nmiyake
Copy link
Contributor Author

nmiyake commented Oct 27, 2016

@jonsyu1 helped with the investigation of this issue, and thinks this may be related to #12594. The symptoms also seem consistent with
http://ftp.gnu.org/old-gnu/Manuals/tar/html_node/tar_117.html.

@nmiyake nmiyake changed the title archive/tar: archives not created correctly for long paths archive/tar: cannot create archives containing paths longer than 100 characters Oct 27, 2016
@nmiyake
Copy link
Contributor Author

nmiyake commented Oct 27, 2016

Although setting header.Uid = 0 and header.Gid = 0 seems to fix this issue, we're not sure if that has further ramifications so are hesitant to do so on production code without understanding the issue further. If anyone knows more about what effects that might have/whether it is safe to do, the information would be much appreciated.

@bradfitz bradfitz added this to the Go1.8Maybe milestone Oct 27, 2016
@bradfitz
Copy link
Contributor

@dsnet, I'm marking this Go1.8Maybe in case it's easy, but feel free to punt to Go 1.9 if not.

@nmiyake nmiyake changed the title archive/tar: cannot create archives containing paths longer than 100 characters archive/tar: tar archives created with paths longer than 100 characters are not valid tar files Oct 27, 2016
@jlzhao27
Copy link

jlzhao27 commented Oct 27, 2016

This is basically our issue: #9683

The linked duplicate bug is tagged for go1.9 and seemed like a large refactor of archive/tar. I feel like there is probably a smaller fix for this immediate bug.

I quote @minux here:

The problem is that if the uid/gid is bigger than 2^21-1 (that is, can't be
epresented in 8 octal digits), archive/tar will use binary representation
in the tar header.

When using that extension, it uses "ushar\x20\x20" as header magic,
instead of the normal "ushar". However, according to GNU tar source,
the long file name will only be considered if it's TMAGIC ("ushar").

By always using "ushar" fixed the problem, and both GNU tar and BSD
tar read the tar file correctly. I'm not sure why we're using
"ushar\x20\x20"
(OLDGNU_MAGIC) in the first place. It seems all tar readers can handle
binary-encoded uid/gid just fine with TMAGIC.

Edit: ah read through #12594, seems like this is a pretty involved fix with writer header formatting :( @dsnet, do you have any recommendations on how to proceed before the writer fixes how it writes ustar/pax headers?

@dsnet
Copy link
Member

dsnet commented Oct 27, 2016

Let me think about this. There may be an easy fix we can do in Go1.8 to at least disable the buggy behavior and do the full fix for Go1.9.

@nmiyake
Copy link
Contributor Author

nmiyake commented Oct 27, 2016

Thanks for the extra info @jlz27 .

Based on the details there, can confirm that this does seem to be fixed by always forcing usage of PAX headers in tar.Writer. Adding preferPax: true in writer.go seems to resolve the issue:

// NewWriter creates a new Writer writing to w.
func NewWriter(w io.Writer) *Writer {
    return &Writer{w: w, preferPax: true}
}

From looking around the tar package, it doesn't look like preferPax is set anywhere, and it's a private field so it shouldn't be possible to set externally either. This change would obviously alter the output format for all created tar files, but seems correct -- however, I don't know enough to assess relative trade-offs to this.

However, since this does seem to work and be "safe", as a work-around on our end we may just vendor archive/tar with this modification to work around the bug for now.

@nmiyake
Copy link
Contributor Author

nmiyake commented Oct 27, 2016

This may not be a viable work-around in core because it adds to the API surface, but we've worked around this creating a paxtar package that is a copy of pkg/tar and just adds a NewPaxWriter function: https://github.com/nmiyake/paxtar/blob/develop/writer.go#L53

In our own tar writing libraries, we use the paxtar library and the NewPaxWriter function to write our tar files, and this works for us.

Hope to see similar flexibility in the standard library at some point, but noting this work-around for now.

@dsnet
Copy link
Member

dsnet commented Oct 27, 2016

As an alternative, you can consider using https://github.com/dsnet/tar. Please don't "go get" it, but just vendor the entire package. All of the fixes I have been making to the standard library are pretty much being copied from that repo.

@gopherbot
Copy link

CL https://golang.org/cl/32234 mentions this issue.

@gopherbot
Copy link

Change https://golang.org/cl/55574 mentions this issue: archive/tar: re-implement USTAR path splitting

gopherbot pushed a commit that referenced this issue Aug 15, 2017
The logic for USTAR was disabled because a previous implementation of
Writer had a wrong understanding of the differences between USTAR and GNU,
causing the prefix field is incorrectly be populated in GNU files.

Now that this issue has been fixed, we can re-enable the logic for USTAR
path splitting, which allows Writer to use the USTAR for a wider range
of possible inputs.

Updates #9683
Updates #12594
Updates #17630

Change-Id: I9fe34e5df63f99c6dd56fee3a7e7e4d6ec3995c9
Reviewed-on: https://go-review.googlesource.com/55574
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@golang golang locked and limited conversation to collaborators Aug 15, 2018
@rsc rsc unassigned dsnet Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants