Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: archive/zip: add ability to read local file header #40354

Open
dstaley opened this issue Jul 22, 2020 · 5 comments
Open

proposal: archive/zip: add ability to read local file header #40354

dstaley opened this issue Jul 22, 2020 · 5 comments

Comments

@dstaley
Copy link

dstaley commented Jul 22, 2020

In a zip archive, file information is stored in two locations: the central directory header, and a local file header located prior to the file's data. In certain circumstances, a user might want to read information from the local file header as opposed to solely the central directory header. The current implementation relies solely on the central directory header to compute the FileHeader struct.

Instances in which a user might want to have access to the local header include:

  1. comparing the central and local file names to detect potential malicious activity (such as naming a file with a .txt extension in the central directory header, but with .exe in the local file header)
  2. detecting potential archive corruption (such as when a file contains a trailing slash in the central directory header, but the correct character in the local file header)

Currently, if a user wanted to read the local file header, they would need to implement their own zip reader, as no export would allow them the raw access to the underlying reader necessary to read the local file header.

Ideally, the solution would not introduce additional unnecessary reads for users who don't care about the local file header. One potential solution would be the addition of the following function:

func (f *File) ReadLocalHeader() (FileHeader, error) {
	var buf [fileHeaderLen]byte
	if _, err := f.zipr.ReadAt(buf[:], f.headerOffset); err != nil {
		return FileHeader{}, err
	}
	b := readBuf(buf[:])
	if sig := b.uint32(); sig != fileHeaderSignature {
		return FileHeader{}, ErrFormat
	}
	h := FileHeader{}
	h.ReaderVersion = b.uint16()
	h.Flags = b.uint16()
	h.Method = b.uint16()
	h.ModifiedTime = b.uint16()
	h.ModifiedDate = b.uint16()
	h.CRC32 = b.uint32()
	h.CompressedSize = b.uint32()
	h.UncompressedSize = b.uint32()
	h.CompressedSize64 = uint64(h.CompressedSize)
	h.UncompressedSize64 = uint64(h.UncompressedSize)
	filenameLen := int(b.uint16())

	d := make([]byte, filenameLen)
	if _, err := f.zipr.ReadAt(d[:], f.headerOffset+fileHeaderLen); err != nil {
		return FileHeader{}, err
	}
	h.Name = string(d[:filenameLen])

	return h, nil
}

This would allow users to explicitly request a read of the local file header for a given zip.File.

@gopherbot gopherbot added this to the Proposal milestone Jul 22, 2020
@ianlancetaylor
Copy link
Contributor

CC @dsnet

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Jul 22, 2020
@dsnet
Copy link
Member

dsnet commented Jul 23, 2020

The duplication of information in ZIP combined with the lack of implementations that verify that the two are consistent has resulted in a large number of ZIP files that have inconsistent local vs central header information. I believe that the de facto state of most implementations is to treat the central header as authoritative anytime they differ.

I don't see a way to provide this feature without a potentially significant change to the API. While I understand the desire for being able to verify ZIP file integrity for security purposes, I'm not sure it's a compelling enough use-case to warrant this additional API. Let's say that we verify consistency of (centrally known) local headers with the central headers, there are still weird corner cases that people can theoretically exploit for malicious purposes. For example, what do you do with local files that do not exist in the central directory? Such a situation still presents a possible security hole. I'm not sure this implementation would want to support such a use case at it would require changing the ZIP decoder from being random-access, to be streaming based.

@dstaley
Copy link
Author

dstaley commented Jul 23, 2020

Just to be absolutely clear, I'm not suggesting that the archive/zip package perform validation by default (or even by request), simply that it provide access to the local file header should the user wish to perform that validation themselves. The provided suggestion is a single new API method, which I think is probably the least-significant modification that could be made and still reasonably support this functionality. An alternative would be providing access to the underlying reader, but I'm not 100% sure you could do that and still have safe concurrent access (though I'd love to be proven wrong there!).

For example, what do you do with local files that do not exist in the central directory?

As of now, I believe it's impossible to even detect this state with the current implementation. However the proposed API (providing access to the local header of a zip.File generated from the central header) also wouldn't provide the information necessary to detect this scenario. I do agree that that specific scenario would likely require major modifications, and likely isn't worth implementing in the standard library.

@martin-sucha
Copy link
Contributor

I would also appreciate if archive/zip allowed getting information from the local file header.

My use case is reading zip files produced by httrack, it writes custom metadata in extra field only in the local file header. I had to fork whole archive/zip inside my package just to be able to get this one piece of metadata. This also means that I can't accept zip.Reader from standard library in the API of my package directly.

Currently archive/zip has File.DataOffset function, but this function can't be easily used to read the local file header. While one could try to scan before the offset for the local file header signature, that seems as a hack as it won't work if the filename or extra field contains the signature value. I would prefer if archive/zip either exposed a function to read the contents of local file header as proposed in the original post (but also including the Extra field) or exposed the offset of the local file header (File.headerOffset field) similar to the existing DataOffset method so that I could decode the local file header manually.

@mjsir911
Copy link

Yeah 👍 on this, had to end up re-implementing for szip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

6 participants