Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/zip: Writer produces zip files unreadable by the OS X Finder's extractor #3252

Closed
adg opened this issue Mar 8, 2012 · 12 comments
Closed
Milestone

Comments

@adg
Copy link
Contributor

adg commented Mar 8, 2012

Add the following test to writer_test.go and run "go test -run=Disk". It
produces the file $HOME/test.zip. Try to extract that by opening it from Finder. I get
"Error 2 - no such file or directory."

I can't remember whether this ever worked, or whether it was something that broke
recently. I backed up past some recent changes to fd7f2a4b69a9 and it is still broken.

Any help with this would be most appreciated. Marking as Priority-Go1 but it's not a
release blocker.

func TestWriterDisk(t *testing.T) {
    largeData := make([]byte, 1<<17)
    for i := range largeData {
        largeData[i] = byte(rand.Int())
    }
    writeTests[1].Data = largeData
    defer func() {
        writeTests[1].Data = nil
    }()

    // write a zip file
    f, err := os.Create(os.GetEnv("HOME") + "/test.zip")
    if err != nil {
        t.Fatal(err)
    }
    defer f.Close()
    w := NewWriter(f)

    for _, wt := range writeTests {
        testCreate(t, w, &wt)
    }

    if err := w.Close(); err != nil {
        t.Fatal(err)
    }
}
@bradfitz
Copy link
Contributor

bradfitz commented Mar 9, 2012

Comment 1:

Here's a program which uses archive/zip to read a good zip file (cli.zip) and writes a
bad copy of it (go.zip).
package main
import (
    "archive/zip"
    "io"
    "log"
    "os"
)
func main() {
    f, err := os.Open("cli.zip")
    if err != nil {
        log.Fatalf("open: %v", err)
    }
    fi, _ := f.Stat()
    r, err := zip.NewReader(f, fi.Size())
    if err != nil {
        log.Fatalf("NewReader: %v", err)
    }
    os.Remove("go.zip")
    outf, err := os.Create("go.zip")
    if err != nil {
        log.Fatalf("Create: %v", err)
    }
    zw := zip.NewWriter(outf)
    for _, zf := range r.File {
        w, err := zw.CreateHeader(&zf.FileHeader)
        if err != nil {
            log.Fatalf("CreateHeader: %v", err)
        }
        rc, err := zf.Open()
        if err != nil {
            log.Fatalf("zf.Open: %v", err)
        }
        n, err := io.Copy(w, rc)
        log.Printf("copyed %d bytes", n)
        if err != nil {
            log.Fatalf("io.Open: %v", err)
        }
    }
    zw.Close()
}
Good and bad zips are attached.

Attachments:

  1. cli.zip (314 bytes)
  2. go.zip (330 bytes)

@bradfitz
Copy link
Contributor

bradfitz commented Mar 9, 2012

Comment 2:

Or if you're Andrew and have the zip file format memorized:
bradmac-2:ziptest bradfitz$ hexdump go.zip 
0000000 50 4b 03 04 14 00 08 00 00 00 65 87 68 40 a8 65
0000010 32 7e 04 00 00 00 04 00 00 00 07 00 18 00 66 6f
0000020 6f 2e 74 78 74 55 54 05 00 03 de 55 59 4f 75 78
0000030 0b 00 01 04 f5 01 00 00 04 14 00 00 00 66 6f 6f
0000040 0a a8 65 32 7e 04 00 00 00 04 00 00 00 50 4b 03
0000050 04 14 00 08 00 00 00 66 87 68 40 e9 b3 a2 04 04
0000060 00 00 00 04 00 00 00 07 00 18 00 62 61 72 2e 74
0000070 78 74 55 54 05 00 03 e0 55 59 4f 75 78 0b 00 01
0000080 04 f5 01 00 00 04 14 00 00 00 62 61 72 0a e9 b3
0000090 a2 04 04 00 00 00 04 00 00 00 50 4b 01 02 14 03
00000a0 14 00 08 00 00 00 65 87 68 40 a8 65 32 7e 04 00
00000b0 00 00 04 00 00 00 07 00 18 00 00 00 00 00 00 00
00000c0 00 00 a4 81 00 00 00 00 66 6f 6f 2e 74 78 74 55
00000d0 54 05 00 03 de 55 59 4f 75 78 0b 00 01 04 f5 01
00000e0 00 00 04 14 00 00 00 50 4b 01 02 14 03 14 00 08
00000f0 00 00 00 66 87 68 40 e9 b3 a2 04 04 00 00 00 04
0000100 00 00 00 07 00 18 00 00 00 00 00 00 00 00 00 a4
0000110 81 4d 00 00 00 62 61 72 2e 74 78 74 55 54 05 00
0000120 03 e0 55 59 4f 75 78 0b 00 01 04 f5 01 00 00 04
0000130 14 00 00 00 50 4b 05 06 00 00 00 00 02 00 02 00
0000140 9a 00 00 00 9a 00 00 00 00 00                  
000014a
bradmac-2:ziptest bradfitz$ hexdump cli.zip 
0000000 50 4b 03 04 0a 00 00 00 00 00 65 87 68 40 a8 65
0000010 32 7e 04 00 00 00 04 00 00 00 07 00 1c 00 66 6f
0000020 6f 2e 74 78 74 55 54 09 00 03 de 55 59 4f de 55
0000030 59 4f 75 78 0b 00 01 04 f5 01 00 00 04 14 00 00
0000040 00 66 6f 6f 0a 50 4b 03 04 0a 00 00 00 00 00 66
0000050 87 68 40 e9 b3 a2 04 04 00 00 00 04 00 00 00 07
0000060 00 1c 00 62 61 72 2e 74 78 74 55 54 09 00 03 e0
0000070 55 59 4f e0 55 59 4f 75 78 0b 00 01 04 f5 01 00
0000080 00 04 14 00 00 00 62 61 72 0a 50 4b 01 02 1e 03
0000090 0a 00 00 00 00 00 65 87 68 40 a8 65 32 7e 04 00
00000a0 00 00 04 00 00 00 07 00 18 00 00 00 00 00 01 00
00000b0 00 00 a4 81 00 00 00 00 66 6f 6f 2e 74 78 74 55
00000c0 54 05 00 03 de 55 59 4f 75 78 0b 00 01 04 f5 01
00000d0 00 00 04 14 00 00 00 50 4b 01 02 1e 03 0a 00 00
00000e0 00 00 00 66 87 68 40 e9 b3 a2 04 04 00 00 00 04
00000f0 00 00 00 07 00 18 00 00 00 00 00 01 00 00 00 a4
0000100 81 45 00 00 00 62 61 72 2e 74 78 74 55 54 05 00
0000110 03 e0 55 59 4f 75 78 0b 00 01 04 f5 01 00 00 04
0000120 14 00 00 00 50 4b 05 06 00 00 00 00 02 00 02 00
0000130 9a 00 00 00 8a 00 00 00 00 00                  
000013a

@bradfitz
Copy link
Contributor

bradfitz commented Mar 9, 2012

Comment 3:

bradmac-2:ziptest bradfitz$ unzip -l cli.zip 
Archive:  cli.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
        4  03-08-12 16:59   foo.txt
        4  03-08-12 16:59   bar.txt
 --------                   -------
        8                   2 files
bradmac-2:ziptest bradfitz$ unzip -l go.zip 
Archive:  go.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
        4  03-08-12 16:59   foo.txt
        4  03-08-12 16:59   bar.txt
 --------                   -------
        8                   2 files

@bradfitz
Copy link
Contributor

bradfitz commented Mar 9, 2012

Comment 4:

The file name "foo.txt" is the 7 bytes starting at "66 6f" at the end of the second line:
0000010 32 7e 04 00 00 00 04 00 00 00 07 00 18 00 66 6f
You can see the file header's
        b.uint16(uint16(len(h.Name)))
        b.uint16(uint16(len(h.Extra)))
immediately preceding that.
Note how both correctly have "07 00" as the filename length, but the cli.zip has an
Extra length of 0x1c, whereas go.zip has an extra length of 0x18.
cli's extras are:
55 54 09 00 03 de 55 59 4f de 55 59 4f 75 78 0b 00 01 04 f5 01 00 00 04 14 00 00 00 
go's extras are:
55 54 05 00 03 de 55 59 4f 75 78 0b 00 01 04 f5 01 00 00 04 14 00 00 00 
Not sure what the Extras mean, though.
I'll make a zip with archive/zip directly, without copying cli.zip and see what we make.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 9, 2012

Comment 5:

Searching in this page for "50 4b" shows all the interesting parts in the zip file:
        fileHeaderSignature      = 0x04034b50
    directoryHeaderSignature = 0x02014b50
        directoryEndSignature    = 0x06054b50

@bradfitz
Copy link
Contributor

bradfitz commented Mar 9, 2012

Comment 6:

The file headers start at "50 4b 01 02".  After that is:
        f.CreatorVersion = b.uint16()
        f.ReaderVersion = b.uint16()
        f.Flags = b.uint16()
        f.Method = b.uint16()
        f.ModifiedTime = b.uint16()
        f.ModifiedDate = b.uint16()
    f.CRC32 = b.uint32()
        f.CompressedSize = b.uint32()
        f.UncompressedSize = b.uint32()
        filenameLen := int(b.uint16())
        extraLen := int(b.uint16())
        commentLen := int(b.uint16())
        b = b[4:] // skipped start disk number and internal attributes (2x uint16)                                                             
        f.ExternalAttrs = b.uint32()
    f.headerOffset = int64(b.uint32())
        d := make([]byte, filenameLen+extraLen+commentLen)
    if _, err := io.ReadFull(r, d); err != nil {
                return err
        }
        f.Name = string(d[:filenameLen])
        f.Extra = d[filenameLen : filenameLen+extraLen]
    f.Comment = string(d[filenameLen+extraLen:])

@bradfitz
Copy link
Contributor

bradfitz commented Mar 9, 2012

Comment 7:

Differences:
We write CreatorVersion of "14 03".  zip does "1e 03".
We write ReaderVersion of "14 00". zip does "0a 00".
We write Flags "08 00".  zip does "00 00".
Oh, more scary:
In the file header, we report an extraLen of "18 00" (follows "07 00" filenameLen), just
like the zip tool does, but in our file header above we write "18 00" while zip writes
"1c 00".
That's my leading theory now.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 9, 2012

Comment 8:

The flags 0x80 signals that we include data descriptors, which in ours is at 0x41,
immediately after "foo\n":
a8 65 32 7e 04 00 00 00 04 00 00 00
which is the crc32, uint32(4) compressed size, uint32(4) uncompressed size.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 9, 2012

Comment 9:

Even smaller test.
Writing a header like:
        w, err := zw.CreateHeader(&zip.FileHeader{
            Name:   zf.FileHeader.Name,
            Method: zf.FileHeader.Method,
        })
... not writing the Extra crap that the zip file included:
0000000 50 4b 03 04 14 00 08 00 00 00 00 00 00 00 00 00
0000010 00 00 00 00 00 00 00 00 00 00 07 00 00 00 66 6f
0000020 6f 2e 74 78 74 66 6f 6f 0a a8 65 32 7e 04 00 00
0000030 00 04 00 00 00 50 4b 03 04 14 00 08 00 00 00 00
0000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 07
0000050 00 00 00 62 61 72 2e 74 78 74 62 61 72 0a e9 b3
0000060 a2 04 04 00 00 00 04 00 00 00 50 4b 01 02 14 00
0000070 14 00 08 00 00 00 00 00 00 00 a8 65 32 7e 04 00
0000080 00 00 04 00 00 00 07 00 00 00 00 00 00 00 00 00
0000090 00 00 00 00 00 00 00 00 66 6f 6f 2e 74 78 74 50
00000a0 4b 01 02 14 00 14 00 08 00 00 00 00 00 00 00 e9
00000b0 b3 a2 04 04 00 00 00 04 00 00 00 07 00 00 00 00
00000c0 00 00 00 00 00 00 00 00 00 35 00 00 00 62 61 72
00000d0 2e 74 78 74 50 4b 05 06 00 00 00 00 02 00 02 00
00000e0 6a 00 00 00 6a 00 00 00 00 00                  
00000ea
The zip tool can still read that, but Finder can't.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 9, 2012

Comment 10:

Found it.  When using data descriptors (which the command-line zip tool doesn't write,
even in streaming mode to stdout!), we need to use the unofficial header for them.
The spec says:
      Although not originally assigned a signature, the value 
      0x08074b50 has commonly been adopted as a signature value 
      for the data descriptor record.  Implementers should be 
      aware that ZIP files may be encountered with or without this 
      signature marking data descriptors and should account for
      either case when reading ZIP files to ensure compatibility.
      When writing ZIP files, it is recommended to include the
      signature value marking the data descriptor record.  When
      the signature is used, the fields currently defined for
      the data descriptor record will immediately follow the
      signature.
I verified that this change to archive/zip writer.go fixes it, and Finder can open the
zips then:
        // write data descriptor                                                                                                               
        var buf [dataDescriptorLen + 4]byte
        b := writeBuf(buf[:])
    b.uint32(0x08074b50)

@bradfitz
Copy link
Contributor

bradfitz commented Mar 9, 2012

Comment 11:

Fix is at:
http://golang.org/cl/5787062

Owner changed to @bradfitz.

Status changed to Started.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 9, 2012

Comment 12:

This issue was closed by revision 3cea413.

Status changed to Fixed.

@adg adg added fixed labels Mar 9, 2012
@rsc rsc added this to the Go1 milestone Apr 10, 2015
@rsc rsc removed the priority-go1 label Apr 10, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants