Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/tar: Does not handle extended pax values with nulls #32264

Open
keegancsmith opened this issue May 27, 2019 · 5 comments
Open

archive/tar: Does not handle extended pax values with nulls #32264

keegancsmith opened this issue May 27, 2019 · 5 comments
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@keegancsmith
Copy link
Contributor

What version of Go are you using (go version)?

go version go1.12.5 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What did you do?

Read a tar file generated by git archive:

$ git clone https://github.com/SSW-SCIENTIFIC/NNDD.git
$ cd NNDD
$ git archive --format tar c21b98da2ca7f007230e696b2eda5da6589fe137 > nndd.tar

Alternatively you can fetch this archive directly from github: https://github.com/SSW-SCIENTIFIC/NNDD/tarball/c21b98da2ca7f007230e696b2eda5da6589fe137

If you try to read this with golang's archive/tar you will get a tar.ErrHeader on one of the entries. Here is a reproduction program:

package main

import (
	"archive/tar"
	"compress/gzip"
	"fmt"
	"io"
	"log"
	"net/http"
	"os"
)

func main() {
	// Read tar from file if specified, otherwise fetch directly from github
	var r io.Reader
	if len(os.Args) > 1 {
		f, err := os.Open(os.Args[1])
		if err != nil {
			log.Fatal(err)
		}
		defer f.Close()
		r = f
	} else {
		resp, err := http.Get("https://github.com/SSW-SCIENTIFIC/NNDD/tarball/c21b98da2ca7f007230e696b2eda5da6589fe137")
		if err != nil {
			log.Fatal(err)
		}
		defer resp.Body.Close()
		gr, err := gzip.NewReader(resp.Body)
		if err != nil {
			log.Fatal(err)
		}
		defer gr.Close()
		r = gr
	}

	tr := tar.NewReader(r)
	for {
		_, err := tr.Next()
		if err == io.EOF {
			break
		}
		if err != nil {
			log.Fatal(err)
		}
	}
	fmt.Println("Successfully read tar")
}

If you try read the same file with bsdtar or gnutar they both read the whole tarball. bsdtar (depending on version) will log a warning to stderr but still extract the rest of the tarball.

The archive contains a symbolic link which instead of a path for value, contains a large value which includes null. The validatePAXRecord explicitly disallows null from appearing in the value for PAX values for links. This leads to tar.Reader.Next() returning tar.ErrHeader.

https://github.com/golang/go/blob/go1.12.5/src/archive/tar/strconv.go#L321-L322

I reported this to the git mailing list: https://public-inbox.org/git/CAMVcy0Q0TL6uEGR2NeudJrOiXdQ87XcducL0EwMidWucjk5XYw@mail.gmail.com/T/#u Read that issue for more details on the "bad" entry in the archive.

The opinion there seems to be that generating values that aren't C-strings for links is valid. Not all tarballs are destined for filesystems. And the latest bsdtar and gnutar do not complain about the value. gnutar will write the symbolic link value upto the first null.

What did you expect to see?

I think any of the follow behaviours would be better:

  • Be able to read the rest of the tar if encountering a recoverable error.
  • Allow nulls in all PAX values.
  • Truncate up to first null in PAX value. (consistent with gnutar)

What did you see instead?

tar.ErrHeader on a tarball both bsdtar and gnutar can read.

@agnivade
Copy link
Contributor

@dsnet

@agnivade agnivade added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 27, 2019
@agnivade agnivade added this to the Go1.14 milestone May 27, 2019
@keegancsmith
Copy link
Contributor Author

FYI I am happy to contribute a fix for this, just need guidance on what is the appropriate fix.

@dsnet
Copy link
Member

dsnet commented May 27, 2019

The specification for PAX regarding linkpath says:

The pathname of a link being created to another file, of any type, previously archived. This record shall override the linkname field in the following ustar header block(s). The following ustar header block shall determine the type of link created. If typeflag of the following header block is 1, it shall be a hard link. If typeflag is 2, it shall be a symbolic link and the linkpath value shall be the contents of the symbolic link. The pax utility shall translate the name of the link (contents of the symbolic link) from the encoding in the header to the character set appropriate for the local file system. When used in write or copy mode, pax shall include a linkpath extended header record for each link whose pathname cannot be represented entirely with the members of the portable character set other than NUL.

This suggest that NUL is not a valid character to appear the extended record for linkpath.

@dsnet dsnet added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 27, 2019
@dsnet
Copy link
Member

dsnet commented May 27, 2019

Be able to read the rest of the tar if encountering a recoverable error.

This seems reasonable, but it requires a scrub through of the entire implementation to identify fatal vs. non-fatal errors. It also needs an API design to surface non-fatal errors.

One problem is that the implementation has always returned tar.ErrHeader for all types of parse errors (both fatal and non-fatal ones). The compatibility agreement technically requires us to continue to return the same sentinel error in the same circumstances. Who uses this obscure feature, I have no idea.

@dsnet dsnet removed this from the Go1.14 milestone May 27, 2019
@keegancsmith
Copy link
Contributor Author

When used in write or copy mode, pax shall include a linkpath extended header record for each link whose pathname cannot be represented entirely with the members of the portable character set other than NUL.

This suggest that NUL is not a valid character to appear the extended record for linkpath.

I read over that bit of the spec a few times. I believe it is saying use an extended header record if set(linkpath) - (set(portable_characters) - set(\0)) != set(). Using the set notation to try and clear up the ambiguity. With that interpretation it is fine to include nul.

However, the spec also states UTF-8. But the archive/tar implementation doesn't validate that due to many archives not following spec. I think the same case can be made here. Especially given gnutar and bsdtar do not choke on this. For example the CR which introduced the validatePAXRecord cited what gnutar and bsdtar does in practice for its implementation.

Be able to read the rest of the tar if encountering a recoverable error.

This seems reasonable, but it requires a scrub through of the entire implementation to identify fatal vs. non-fatal errors. It also needs an API design to surface non-fatal errors.

One problem is that the implementation has always returned tar.ErrHeader for all types of parse errors (both fatal and non-fatal ones). The compatibility agreement technically requires us to continue to return the same sentinel error in the same circumstances. Who uses this obscure feature, I have no idea.

I agree, I think that rules out this possibility without further investigation. Given what gnutar and bsdtar do, I would be tempted to just remove the contains nul validation. If that was done, the effect on existing use cases:

  • Writers: Could now generate tars which previously would of failed. When the tars are read how it is handled would depend on implementation. I think this is fine, since it is the same as not writing valid utf-8. Additionally I doubt anyone relied on the behaiour of failing with nul in linkpath.
  • Readers: Instead of failing, we would pass on linkpaths with null in them. This could lead to failures in later parts of a program. This seems acceptable to me, since it would of failed anyways. Additionally the error is more likely to be more descriptive than the error value tar.ErrHeader.

@seankhliao seankhliao added this to the Unplanned milestone Aug 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

4 participants