proposal: archive/zip: NewZipWriter should support Ahead-of-time CRC and Sizes #23301

steve-gray · 2018-01-02T09:19:18Z

Golang's archive/zip does not support pre-calculation of the file sizes and CRC statistics used in the PKZIP file header structure, requiring use of the 'Content Descriptor' general purpose flag and appending a structure after each included file. This flag is not compatible with various older software, precluding archive/zip from being used in some application scenarios.

What version of Go are you using (`go version`)?

1.9

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

OSX High Sierra (Intel x64)

What did you do?

When writing zip files, Golang always defaults to the 'Content Descriptor' method. Some legacy applications do not understand the 0x08 'general purpose' flag, and when content size is known in advance,

What did you expect to see?

Ability to set 'PrecalculateSize' flag on writer or similiar, to avoid writing Content Descriptor structure and use pre-computed fields in the file header within the central directory.

What did you see instead?

Go's zip output differs from WinZip/OSX/Linux zip command output, because file size header is set to 0 and file is followed by a Content Descriptor.

The text was updated successfully, but these errors were encountered:

odeke-em · 2018-01-02T09:28:45Z

/cc @dsnet @rsc

steve-gray · 2018-01-02T10:14:09Z

@rsc - A variation on this idea (that works for my use case) is found at https://github.com/hidez8891/zip/

Rather than the implemented modification of CreateHeader/Create, I'd propose a new method (or method pair) that allows non-streaming creation of file entries - leaving the default behaviour as-is, but allowing the new use case.

dsnet · 2018-01-02T18:39:10Z

Any functionality that allows the user to preset the size seems brittle. There are 3 pieces of information that needs to be known ahead-of-time: CRC32, compressed size, and uncompressed size. The compressed size is not known by the user and is dependent on the underlying compression implementation used.

The only API that seems to make sense to me is:

func (*Writer) AddFile(fh *Header, b []byte) error

While I understand your use-case, it doesn't seem like this issue is prevalent enough to warrant new API being added.

rasky · 2018-01-03T15:45:13Z

@steve-gray which old software are you aware of (and which version) that doesn't support content descriptors?

rsc · 2018-01-08T21:51:08Z

It doesn't have to be ahead of time, we just have to seek back earlier in the file and write it out, right? So maybe we could do this transparently if the underlying writer allows seeking?

dsnet · 2018-01-08T23:59:54Z

Yes and no. If the size was less than 4GB, then it's easy to seek back and re-write the fixed-width 4B fields. However, if the file written was large enough and had to upgrade to ZIP64, it would not be able to go back and write the local headers for ZIP64. That being said, if the purpose of all this is to support old readers, then I doubt an old reader would understand ZIP64 anyways.

Secondly, as a user, I would be surprised if the output format differed depending on whether the underlying writer was an io.Writer or an io.WriteSeeker. At minimum, I would expect a different constructor that took in an io.WriteSeeker to make this distinction clear.

EDIT: We could always emit a ZIP64 record conservatively. This leads to an increase of 28B for every file. Old readers that don't support ZIP64 should still ignore the ZIP64 extra data.

rsc · 2018-01-22T21:50:13Z

@rasky's question is still unanswered: what programs care about this? It may just not be worth doing.

mavimo · 2018-01-23T08:23:06Z

@rsc I'm try to upload a zip file into google chome webstore (https://chrome.google.com/webstore/developer/dashboard) but it indicate that the file is not valid. After fixing it with zip -F mypackage.zip --out mypackage-fixed.zip seems to work.

Generated files:

File debugging:

Difference between mypackage.zip and mypackage-fixed.zip:

I hope this can help

genez · 2018-01-23T08:28:50Z

following, I have the same issue

rsc · 2018-01-29T21:44:26Z

@genez you mean you are using Google Chrome Webstore?

Maybe this should be a bug for Google Chrome Webstore instead. What Go generates is a valid zip file already.

genez · 2018-01-30T06:50:24Z

@rsc yes, I could submit a bug for Google Chrome Webstore.
I think it's a gray area, I mean: after "fixing" the zip file with zip -F, Chrome Webstore successfully ingests it.
So, in my understanding, zip files generated by Go std packages is "ok" (I can confirm that I can open it in Mac and Windows with no problem at all) but it could be more "compatible" just by adding some (maybe optional? I don't have good knowledge of zip format) information in the header.

I would see this as an improvement rather than a bug

What's your opinion?

rsc · 2018-02-05T21:18:57Z

The whole zip.Writer interface and design is predicated on having an io.Writer and generating the output in one pass. And the zip file format is designed to make that possible. It's far from clear that we need to take on the complexity of two passes just for Google Chrome Webstore. Even knowing about zip -F is maybe enough.

Are there any other reasons we should do this?

steve-gray · 2018-02-05T22:05:50Z

An additional operation to add a file from a buffer would: 1. Only represent a slight growth to API (1 operation) 2. Allow existing code to work as-was, without impact or output changes. 3. Allow users impacted by this issue to have a path to support their scenarios. 4. Preserves the single-pass write style. It seems like a good compromise - and the read side needs to be able to parse these anyway for files produced, so it’s not like we need to do work in that area. Get Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: Russ Cox <notifications@github.com> Sent: Tuesday, February 6, 2018 7:19:50 AM To: golang/go Cc: Steve Gray; Mention Subject: Re: [golang/go] proposal: archive/zip: NewZipWriter should support Ahead-of-time CRC and Sizes (#23301) The whole zip.Writer interface and design is predicated on having an io.Writer and generating the output in one pass. And the zip file format is designed to make that possible. It's far from clear that we need to take on the complexity of two passes just for Google Chrome Webstore. Even knowing about zip -F is maybe enough. Are there any other reasons we should do this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#23301 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AN45v_Waqe-tk6MdLMNVR3uJtDsnD3hfks5tR3B2gaJpZM4RQadS>.

dsnet · 2018-02-05T22:19:25Z

The complexity being added is only to support single-pass readers (which Chrome seems to be one). The more I think about it, the more I believe supporting such a use-case is actually a mistake.

There is nothing in the specification that forbids a valid file to be written, but omitted from the trailing central-directory. In fact, you can "remove" and add files to an existing ZIP file by concatenating data to the end and writing an entirely new central directory that references the newly added files and ignores the "removed" ones. When reading in a streaming manner, it is impossible for a reader to determine whether the file really exists or was "removed" until it hits the directory. Fundamentally, the ZIP format is not streaming read friendly. Giving users the illusion that it is the case, is misleading in my opinion.

If we're going to add complexity, #15626 is more in line with actual properties that ZIP guarantees.

rsc · 2018-02-26T21:47:22Z

Per @dsnet's comment, I think we should decline to do this.

odeke-em changed the title ~~NewZipWriter - Ahead-of-time CRC and Sizes~~ proposal: archive/zip: NewZipWriter should support Ahead-of-time CRC and Sizes Jan 2, 2018

gopherbot added this to the Proposal milestone Jan 2, 2018

gopherbot added the Proposal label Jan 2, 2018

rsc closed this as completed Feb 26, 2018

dsnet mentioned this issue Nov 5, 2018

archive/zip does not set proper CRC, size and modification date/time #28602

Closed

golang locked and limited conversation to collaborators Feb 26, 2019

gopherbot added the FrozenDueToAge label Feb 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: archive/zip: NewZipWriter should support Ahead-of-time CRC and Sizes #23301

proposal: archive/zip: NewZipWriter should support Ahead-of-time CRC and Sizes #23301

steve-gray commented Jan 2, 2018

odeke-em commented Jan 2, 2018

steve-gray commented Jan 2, 2018

dsnet commented Jan 2, 2018

rasky commented Jan 3, 2018

rsc commented Jan 8, 2018

dsnet commented Jan 8, 2018 •

edited

rsc commented Jan 22, 2018

mavimo commented Jan 23, 2018 •

edited

genez commented Jan 23, 2018 •

edited

rsc commented Jan 29, 2018

genez commented Jan 30, 2018 •

edited

rsc commented Feb 5, 2018

steve-gray commented Feb 5, 2018 via email

dsnet commented Feb 5, 2018 •

edited

rsc commented Feb 26, 2018

proposal: archive/zip: NewZipWriter should support Ahead-of-time CRC and Sizes #23301

proposal: archive/zip: NewZipWriter should support Ahead-of-time CRC and Sizes #23301

Comments

steve-gray commented Jan 2, 2018

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

odeke-em commented Jan 2, 2018

steve-gray commented Jan 2, 2018

dsnet commented Jan 2, 2018

rasky commented Jan 3, 2018

rsc commented Jan 8, 2018

dsnet commented Jan 8, 2018 • edited

rsc commented Jan 22, 2018

mavimo commented Jan 23, 2018 • edited

genez commented Jan 23, 2018 • edited

rsc commented Jan 29, 2018

genez commented Jan 30, 2018 • edited

rsc commented Feb 5, 2018

steve-gray commented Feb 5, 2018 via email

dsnet commented Feb 5, 2018 • edited

rsc commented Feb 26, 2018

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

dsnet commented Jan 8, 2018 •

edited

mavimo commented Jan 23, 2018 •

edited

genez commented Jan 23, 2018 •

edited

genez commented Jan 30, 2018 •

edited

dsnet commented Feb 5, 2018 •

edited