Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/zip: sanitize the FileHeader.Name to remove path traversal ("../../") from zip files? #25849

Closed
bradfitz opened this issue Jun 12, 2018 · 22 comments
Assignees
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Security
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Jun 12, 2018

Go isn't directly affected by path traversal attacks in archives but programs written in Go might be.

In particular, if a Go program reads a malicious zip file, the archive/zip package will return the malicious filename in the FileHeader.FileName.

Perhaps we should sanitize it?

If we really need the raw/unsanitized version, we could copy it to a new field FileHeader.InsecureFileName or RawFilename or something.

/cc @dsnet @ianlancetaylor @FiloSottile @andybons @rsc

@bradfitz bradfitz added Security NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Jun 12, 2018
@bradfitz
Copy link
Contributor Author

/cc @mholt too, author of github.com/mholt/archiver (pointed out by @dsnet), which properly sanitizes paths on its own. But maybe that shouldn't be its job.

@dsnet
Copy link
Member

dsnet commented Jun 12, 2018

I'm of the opinion that it's the users responsibility to sanitize the path, as I have seen some crazy (and very rare) use-cases in the past actually depending on this behavior. We should document this though.

As a data point, a popular package that wraps the standard archive package is github.com/mholt/archiver, which handles it properly.

@dsnet
Copy link
Member

dsnet commented Jun 12, 2018

Hmmm, as a counter-point, it seems that the logic in archiver was added precisely because of zip-slip:
mholt/archiver#65

@bradfitz
Copy link
Contributor Author

I think we should be safe by default but provide access to the unsanitized value for the "crazy" use cases you mention.

@dsnet
Copy link
Member

dsnet commented Jun 12, 2018

One proposal is to return a distinguishable error when an invalid path is encountered, but still parse out (or write) the full header. Odd use cases can check for that specific semantic error and ignore it, but most users will be protected.

@FiloSottile
Copy link
Contributor

+1 on sanitizing FileName and adding InsecureFileName. Looks like this applies to archive/tar as well. We'll also have to make absolute paths relative.

However, note that it's not enough to prevent all path traversal attacks. For example, allowing symlinks with arbitrary targets (like ../../) in a tar archive can still results in an escape even if filenames are clean. Sanitizing symlink targets might break more things than filenames.

See https://blog.filippo.io/so-i-lost-the-password-of-my-nas/ section 3 for example.

@FiloSottile
Copy link
Contributor

And it was already reported against github.com/mholt/archiver: mholt/archiver#65 (comment)

@bradfitz
Copy link
Contributor Author

One proposal is to return a distinguishable error when an invalid path is encountered, but still parse out (or write) the full header.

I like that.

It'd mean only adding a new error variable (or type) to the package, and some docs on two functions.

However, note that it's not enough to prevent all path traversal attacks. For example, allowing symlinks with arbitrary targets (like ../../) in a tar archive can still results in an escape even if filenames are clean. Sanitizing symlink targets might break more things than filenames.

This is more work, but not terrible. We could track them all and still detect when another name would reference a path element with a link. Without that, though, fortunately the number of affected callers is probably fewer because you need to go out of your way to support symlinks. If the caller program is, say, a webapp letting users upload a zip of JPEGs, they probably didn't add in the symlink creation support.

Let's start with some more documentation for Go 1.11.

@gopherbot
Copy link

Change https://golang.org/cl/118335 mentions this issue: archive/zip: warn about FileHeader.Name being unvalidated on read

@slrz
Copy link

slrz commented Jun 12, 2018

+1 on sanitizing FileName and adding InsecureFileName. Looks like this applies to archive/tar as well. We'll also have to make absolute paths relative.

I'm not comfortable with doing this for archive/tar. Absolute pathnames are common usage there and writing to /etc/passwd is often intended. Returning an error where there was none before is going to break existing programs.

@mholt
Copy link

mholt commented Jun 12, 2018

I probably should not have open-sourced my archiver package, since I only use it to compress, and only in non-adversarial environments. Sorry if it caused any trouble. (I don't have much time to maintain it lately, so if anyone wants to do so, I'll make you a collaborator.)

Thanks for thinking of this aspect of security, though, from the standard library perspective. Admittedly it's hard to do path sanitization right, since there are some use cases, as others have mentioned, where "unsafe" sanitization may be desired. And I keep getting path cleanup wrong myself (did at least 3 times in Caddy too).

Whatever you decide, I vote for sane defaults with the option to shoot yourself in the foot if need be, with strongly-worded warnings in the godocs. :)

gopherbot pushed a commit that referenced this issue Jun 13, 2018
Updates #25849

Change-Id: I09ee928b462ab538a9d38c4e317eaeb8856919f2
Reviewed-on: https://go-review.googlesource.com/118335
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
dna2github pushed a commit to dna2fork/go that referenced this issue Jun 14, 2018
Updates golang#25849

Change-Id: I09ee928b462ab538a9d38c4e317eaeb8856919f2
Reviewed-on: https://go-review.googlesource.com/118335
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
@ianlancetaylor ianlancetaylor added this to the Go1.11 milestone Jun 19, 2018
@odeke-em
Copy link
Member

/cc @mikesamuel too

@mikesamuel
Copy link
Contributor

It seems that there are 2 use cases:

  • We have a tarball from a source that we trust with arbitrary current-user file-system access
  • We have a tarball of which we are suspicious but we can interact with carefully as long as it's self-contained.

IIUC, it seems rare that part of a tarball would be trusted more than another part.

If so, we probably know this when we open the tarball.

Instead of adding more file information to individual entries, could we treat this as an optional is self contained check?

An individual entry might be treated as invalid if the tarball is supposed to be self contained and either

  • unpacking the tarball into an empty subdirectory would place the entries content out of the subdirectory or
  • an entry has the symlink bit set and unpacking the entry would cause the referent (whether it currently exists) to be outside the directory.

@mikesamuel
Copy link
Contributor

@odeke-em

How hard is the symlink problem to disentangle?

Resolving symlink targets might require isDirectory checks so it might require accessing multiple entries to tell whether a symlink entry reaches outside the directory to which entry . refers.

That means that you have to detect and reject symlink cycles.

Are isDirectory checks the only kind of transitive analysis required?

isDirectory becomes easier to reason about if you also assume that symlink targets that don't map to any entry are not directories. Otherwise, you might have to do a worst-case analysis.

I did some experimenting with symlink chains below. I think this means that we can do self-containedness checks even where an attacker can cause two or more tarballs to untar into the same containing directory since an attacker doesn't defeat any analysis by constructing a symlink chain where even elements are from one tarball and odd from another.

A symlink chain that reaches past `.` must have one link that reaches past `.`

Question: Given a tarball with a symlink structure like the below, would untarring into ${UNTAR_DIR} cause cat ${UNTAR_DIR/c} to access ${UNTAR_DIR}/../../etc/shadow?

a/a/a/a/a -> ../../../etc/shadow
b -> a/a/a/a/
c -> b/a

The answer seems to be no.

$ mkdir /tmp/ln-exp; \
  cd /tmp/ln-exp/; \
  mkdir -p a/a/a/a/; \
  mkdir a/etc/; \
  echo 'self-contained' > a/etc/shadow; \
  cd a/a/a/a; \
  ln -s ../../../etc/shadow a; \
  cd -; \
  ln -s a/a/a/a/ b; \
  ln -s b/a c; \
  cat c
self-contained
$ find . \( -type l -o -type f \) -exec ls -l {} \;
-rw-r--r--  1 msamuel  wheel  15 Jun 25 14:40 ./a/etc/shadow
lrwxr-xr-x  1 msamuel  wheel  19 Jun 25 14:40 ./a/a/a/a/a -> ../../../etc/shadow
lrwxr-xr-x  1 msamuel  wheel  3 Jun 25 14:40 ./c -> b/a
lrwxr-xr-x  1 msamuel  wheel  8 Jun 25 14:40 ./b -> a/a/a/a/

@rsc rsc added early-in-cycle A change that should be done early in the 3 month dev cycle. and removed release-blocker labels Jun 25, 2018
@rsc rsc modified the milestones: Go1.11, Go1.12 Jun 25, 2018
@rsc
Copy link
Contributor

rsc commented Jun 25, 2018

I'd like to see an answer that covers archive/zip, archive/tar, and anything else in a consistent way. We can take our time. This is a very very very old "problem".

@mikesamuel
Copy link
Contributor

How about this:

archive/tar

  • fork NewReader into
    • NewReader which creates a reader with an internal bit set indicating it should be self contained
    • NewTrustedArchiveReader which creates a reader with that bit unset.
  • Change func (tr *Reader) Read(b []byte) (int, error) so that it reports an error for any entry that violates self containment as defined in archive/zip: sanitize the FileHeader.Name to remove path traversal ("../../") from zip files? #25849 (comment)
  • Make sure that the Sys() is nil for anyos.FileInfo from FileInfo() for an entry that violates self-containment. Alternatively, produce an error.

archive/zip

  • same as archive/tar
  • fork other Open methods and/or possibly associate a trustedness bit with File

@rsc
Copy link
Contributor

rsc commented Sep 26, 2018

Separate "NewTrustedArchiveReader" etc seems like overkill. A new InsecureName or RawName, as proposed above, combined with sanitizing the current Name field, seems like enough. But then you need to define what "sanitized" means. Is it just ".." or do we care about other things, like files named COM1 or with backslashes or ... ?

@dsnet
Copy link
Member

dsnet commented Sep 26, 2018

I'm not a fan of duplicated header fields.

  • When writing, it is not clear whether you are required to populate both Name and InsecureName.
  • In the case of archive/tar you would need both InsecureName and InsecureLinkname.
  • When reading, how does duplicated fields help the security issue? If I'm reading a file entry, and it is not "self-contained", does the reader just put an empty Name in the header and populate the InsecureName instead? Is it my responsibility to check that Name is not set, but that InsecureName is? If the suggestion is a "sanitized" name, how could that work for "../foo.txt"?

The more I think about this, I like my suggestion of a non-fatal distinguishable error (#25849 (comment)).

@rsc
Copy link
Contributor

rsc commented Nov 28, 2018

This seems OK to leave until Go 1.13.

@rsc rsc removed this from the Go1.12 milestone Nov 28, 2018
@rsc rsc added this to the Go1.13 milestone Nov 28, 2018
@bradfitz bradfitz modified the milestones: Go1.13, Go1.14 Apr 29, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@ALTree
Copy link
Member

ALTree commented Jul 16, 2021

Note: this issue was referenced in https://blog.ryotak.me/post/cdnjs-remote-code-execution-en/, and if my reading of the blog post is correct, the current archive/tar behaviour is what enabled the exploit.

@ianlancetaylor
Copy link
Contributor

Anybody interested in this, please see the proposal at #55356. Please comment on that proposal if it would not help. Thanks.

@gopherbot
Copy link

Change https://go.dev/cl/449937 mentions this issue: archive/tar, archive/zip: return ErrInsecurePath for unsafe paths

@dmitshur dmitshur modified the milestones: Backlog, Go1.20 Nov 21, 2022
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Nov 21, 2022
@golang golang locked and limited conversation to collaborators Nov 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. Security
Projects
None yet
Development

No branches or pull requests

14 participants