archive/zip: Zip reading with prefixed bytes #10464

philwebb · 2015-04-15T06:28:45Z

As discussed here it is possible for Java jar files (zip formatted archives) to include a prefixed bash script that allows them to act like fully executable applications. Many existing tools (java -jar, unzip etc) support zip files with additional prefixed bytes.

Unfortunately go currently doesn't support the reading of such archives which causes problems for the Cloud Foundry cli tool.

The text was updated successfully, but these errors were encountered:

philwebb · 2015-04-15T06:30:24Z

I have a potential patch pushed here, I can try to submit it via gerrit if it looks like a sensible approach.

bradfitz · 2015-04-15T15:18:41Z

Wasn't this already fixed when we added http://tip.golang.org/pkg/archive/zip/#Writer.SetOffset ?

In de573f8 and 1f35bb6 (Issue #8669)

philwebb · 2015-04-15T17:03:18Z

@bradfitz Looking at that issue I think it's specifically about writing files, this one would be for reading files. I added a testcase that appeared to fail before starting.

It's also worth considering that in this particular case, the size of the prefixed bytes isn't known until you read the zip central directory record. This means that an equivalent withOffset reader method wouldn't be particularly useful for most people since they would need to duplicate the code that reads the central directory end record.

philwebb · 2015-04-15T17:05:44Z

Also, this is my first experience writing go, so apologies if the suggested changes aren't using the proper conventions. I'm happy to try and rework things before submitting to gerrit if there is something off about the suggested changes.

bradfitz · 2015-04-15T17:05:57Z

You didn't mention in your bug report that this is about reading. Can you write a self-contained bug report here without referencing other URLs, discussions, or patches?

I see no reason why reading such a file would fail. The zip TOC is at the end, so it doesn't matter what's at the beginning of a zip file if the TOC is correct.

philwebb · 2015-04-15T17:09:48Z

Apologies, I've updated the title. I thought the references would be useful.

Although the TOC is located correctly and can be read, the offsets that are stored in the directory structure are assumed to be relative to the front of the file. This means that when you actually try to read the data you're hitting the incorrect part of the underling file since there are additional bytes at the font that aren't considered.

The suggested fix works out the size of these additional bytes and then includes them when reading directory entries and file content.

bradfitz · 2015-04-15T17:11:57Z

So it's an invalid zip file?

Is there a spec for this? Is there some bit in the TOC that says this adjustment should be done?

philwebb · 2015-04-15T17:16:44Z

I'm guessing that the spec is a little open to interpretation here. Going from https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT descriptions such as:

3.12 Central directory structure:
... relative offset of local header 4 bytes

leave things a little ambiguous as to what the offset is "relative" to.

I do know that such files can be read by unzip and zipinfo and by java -jar.

philwebb · 2015-04-15T17:20:32Z

Here is the output from zipinfo for the archive I added to the test case if that helps:

$ zipinfo archive/zip/testdata/test-prefix-junk.zip 
Archive:  archive/zip/testdata/test-prefix-junk.zip   1209   2
warning [archive/zip/testdata/test-prefix-junk.zip]:  39 extra bytes at beginning or within zipfile
  (attempting to process anyway)
-rw-r--r--  3.0 unx       26 tx defN  4-Sep-10 19:12 test.txt
-rw-r--r--  3.0 unx      785 bx stor  4-Sep-10 22:52 gophercolor16x16.png
2 files, 811 bytes uncompressed, 810 bytes compressed:  0.1%

Note specifically the 39 extra bytes at beginning or within zipfile is presented as a warning but the file can still be read.

philwebb · 2015-04-22T15:04:33Z

@bradfitz Is there anything else I can do for this one or do you consider prefixed bytes to be a spec violation? I'm happy to revisit the code if you want changes.

bradfitz · 2015-04-22T21:02:02Z

I'm curious which recovery strategies other tools use. I don't want to implement a unique one that doesn't agree with the leniency of other tools, in either direction.

philwebb · 2015-04-30T01:53:19Z

I've tried to understand what the stock unzip application does but I'm not that proficient with C code. The source is available at ftp.info-zip.org/pub/infozip/src/ and with version 6 it looks like line 857 of process.c is where it calculates how many extra bytes are at the front of the file.

The Java implementation also delegates to C, I believe that the relevant code is here. This one seems a bit easier to understand.

bradfitz · 2015-04-30T21:31:21Z

I imagine on error (when we missed finding the directory header) we'd just scan forward looking for the directory header (0x02014b50) and for each case of 0x02014b50 we find, remember its offset from the real one, then load all the file headers with that offset, stopping once we find an offset that works.

But this is too late for Go 1.5. The tree closes in 1.5 days and the few people who regularly hack on this package are on vacation or busy.

For now I recommend you just do this loop yourself. You could instead scan forwards from the beginning, looking for file headers (0x04034b50) and whenever you find a 0x04034b50, try chopping off that prefix and opening the zip anew. io.NewSectionReader will give you a ReaderAt view into another one, letting you cheaply remove prefix bytes.

neopaf · 2015-06-30T17:02:53Z

Same story here. I heavily relied go could do this upon quick-checking that TOC -reading code is OK.
Yet it fails to parse the file :(

elvarb · 2017-08-08T12:47:08Z

I'm running into this problem I think with a program that is scanning files to search for certain data, when it encounters a JAR file it is processed as a zip file. Now checking the logs I'm noticing that when the program has been stopping for unknown reasons in most cases its scanning JAR files, and specifically certain rt.jar file that is a part of JRE. Most of the time the program can read the rt.jar file but not always, still looking into why that is.

The program has run on over 1000 machines and by far the most common file it stops on are JAR files.

One example of this happening just sometimes, when processing rt.jar that comes with the TSM BA client for the same version. Two machines have the same versions, one successfully scans the file while the other fails.

EricRobert · 2018-07-05T20:35:41Z

Same issue here.

As suggested, the workaround is to scan for every instances of 0x02014b50 and try to create a reader from that offset until it works. Reading from the end helps if the file is big.

At least one implemenation minizip simply computes the start of the archive in the file from the end descriptor instead of assuming it's 0 which looks like the sensible thing to do.

AxbB36 · 2019-04-26T08:59:10Z

I'm curious which recovery strategies other tools use. I don't want to implement a unique one that doesn't agree with the leniency of other tools, in either direction.

Implementations differ in this respect:

Python zipfile always takes the base offset to be eocd_offset − cd_size − cd_offset, where eocd_offset is the absolute offset of the End of Central Directory (usually 22 bytes before the end of the file), and cd_size and cd_offset are read from the EOCD structure. In other words, Python assumes that there is no slack space between the central directory and the EOCD; whatever slack space might be implied by cd_size and cd_offset should instead be interpreted to come at the beginning of the file. Ref Ref
yauzl always takes the base offset to be 0. Ref
nsZipArchive always takes the base offset to be 0. Ref
Info-ZIP Unzip first does what Python does (as @philwebb noted), but if it doesn't find a central directory header signature there, it tries again with a base offset of 0. See process.c:
```
     /*-----------------------------------------------------------------------
         Compensate for missing or extra bytes, and seek to where the start
         of central directory should be.  If header not found, uncompensate
         and try again (necessary for at least some Atari archives created
         with STZip, as well as archives created by J.H. Holm's ZIPSPLIT 1.1).
       -----------------------------------------------------------------------*/
```
If UnZip succeeds with the first strategy (nonzero base offset), it prints a warning, e.g.: 100 extra bytes at beginning or within zipfile. If it succeeds with the second strategy (zero base offset), it prints no warning. You can see where it expects to find the central directory with zipinfo -v.

I haven't seen any implementations do the scanning idea. They either look in one place or two. Scanning seems like a bad idea because file data or even unused space between files can contain a central directory header signature.

account-login · 2021-01-17T07:04:36Z

Just FYI, zip file with prefixed bytes can be adjusted with zip -A command, making the zip file compatible with golang.

gopherbot · 2022-02-25T03:25:42Z

Change https://go.dev/cl/387976 mentions this issue: archive/zip: permit zip files to have prefixes

mikioh changed the title ~~Zip handling with prefixed bytes~~ archive/zip: Zip handling with prefixed bytes Apr 15, 2015

philwebb changed the title ~~archive/zip: Zip handling with prefixed bytes~~ archive/zip: Zip reading with prefixed bytes Apr 15, 2015

philwebb mentioned this issue Apr 29, 2015

cf push fails with jars generated with the Spring Boot 1.3.0 snapshot release cloudfoundry/cli#411

Closed

bradfitz added this to the Unplanned milestone Apr 30, 2015

camelpunch mentioned this issue Sep 15, 2015

If you are not a space developer, renaming a service fails with an unhelpful error cloudfoundry/cli#438

Closed

fchorney mentioned this issue Jan 16, 2020

Read file backwards to find EOCD offset without conflict fhs/ZipFile.jl#63

Merged

This was referenced Jun 2, 2021

Improve discovery of app.jar packaging (self-executing Jars) anchore/syft#413

Closed

Add support for processing files with prepended bytes before the zip archive anchore/syft#428

Merged

FilipDeVos mentioned this issue Dec 23, 2021

unable to unzip jar files hillu/local-log4j-vuln-scanner#18

Closed

This was referenced Dec 30, 2021

Support executable JARs google/log4jscanner#21

Closed

*: support self-executable JARs google/log4jscanner#22

Merged

ericchiang mentioned this issue Feb 25, 2022

archive/zip: OpenReader(zipfile) simply says "zip: not a valid zip file" for jar files with extra bytes, which Java and unzip handle fine #51337

Closed

gopherbot closed this as completed in df57592 May 8, 2022

golang locked and limited conversation to collaborators May 8, 2023

gopherbot added the FrozenDueToAge label May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

archive/zip: Zip reading with prefixed bytes #10464

archive/zip: Zip reading with prefixed bytes #10464

philwebb commented Apr 15, 2015

philwebb commented Apr 15, 2015

bradfitz commented Apr 15, 2015

philwebb commented Apr 15, 2015

philwebb commented Apr 15, 2015

bradfitz commented Apr 15, 2015

philwebb commented Apr 15, 2015

bradfitz commented Apr 15, 2015

philwebb commented Apr 15, 2015

philwebb commented Apr 15, 2015

philwebb commented Apr 22, 2015

bradfitz commented Apr 22, 2015

philwebb commented Apr 30, 2015

bradfitz commented Apr 30, 2015

neopaf commented Jun 30, 2015

elvarb commented Aug 8, 2017 •

edited

EricRobert commented Jul 5, 2018

AxbB36 commented Apr 26, 2019

account-login commented Jan 17, 2021

gopherbot commented Feb 25, 2022

archive/zip: Zip reading with prefixed bytes #10464

archive/zip: Zip reading with prefixed bytes #10464

Comments

philwebb commented Apr 15, 2015

philwebb commented Apr 15, 2015

bradfitz commented Apr 15, 2015

philwebb commented Apr 15, 2015

philwebb commented Apr 15, 2015

bradfitz commented Apr 15, 2015

philwebb commented Apr 15, 2015

bradfitz commented Apr 15, 2015

philwebb commented Apr 15, 2015

philwebb commented Apr 15, 2015

philwebb commented Apr 22, 2015

bradfitz commented Apr 22, 2015

philwebb commented Apr 30, 2015

bradfitz commented Apr 30, 2015

neopaf commented Jun 30, 2015

elvarb commented Aug 8, 2017 • edited

EricRobert commented Jul 5, 2018

AxbB36 commented Apr 26, 2019

account-login commented Jan 17, 2021

gopherbot commented Feb 25, 2022

elvarb commented Aug 8, 2017 •

edited