proposal: archive/zip: extend the visibility of the countWriter #65569

grdw · 2024-02-07T12:37:54Z

Proposal Details

We're currently using a custom build zip writer to "flush" zip headers and the EOCD footer, which naturally for 90% looks identical to the one in writer.go. The use-case for this custom zip writer is to "prepare a zip file" without the need for having the actual data of a file in the actual zip file yet, which allows for streaming of a zip file.

We currently can't use the standard golang zip library because we can't forward the position of the countWriter by hand. Ideally, we'd be able to set w.cw.count without the restriction of the data being written beforehand (so SetOffset() can't be used, unfortunately).

The suggestion here would be to add the following helpers, or some similar functionality to forward the w.cw.count variable without the restriction of SetOffset(), and to read out its value with the following public functions:

// pseudo code:
func (w *Writer) AdvanceOffset(n int64) {
	w.cw.count += n
}

func (w *Writer) GetOffset() int64 {
	w.cw.count
}

This would make the use of the standard golang zip-library useful for our use-case. We would use the Flush() functions as they exist now to get out the intermediate headers and the EOCD footer.

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2024-02-07T17:39:16Z

That seems pretty special purpose. I struggle to see how anybody else would use this functionality. Is it really worth adding to the standard library?

It also seems to me that you can increment the offset by calling the Write method with a slice of the appropriate size. You could have the underlying writer discard the data, if necessary.

grdw · 2024-02-08T13:22:15Z

Thanks for the quick reply!

It also seems to me that you can increment the offset by calling the Write method with a slice of the appropriate size.

Correct, that can also be done to solve this specific use-case. The downside is that the files that will be 'squeezed' in-between the zip elements (for lack of a better description) can become quite large in our specific case, and we'll easily talk >500 GB in some cases. To take an extreme - but not uncommon example - doing the work for a 1 TiB file would result in the following code snippet:

package main

import (
	"archive/zip"
	"bytes"
	"fmt"
	"time"
)

func main() {
	fileSize := uint64(1024 * 1024 * 1024 * 1024)
	io := new(bytes.Buffer)
	zipWriter := zip.NewWriter(io)
	w, err := zipWriter.CreateHeader(&zip.FileHeader{
		Name:               "test.mov",
		Modified:           time.Now(),
		CRC32:              25,
		CompressedSize64:   fileSize,
		UncompressedSize64: fileSize,
	})
	if err != nil {
		panic(err)
	}
	// Flush out the header:
	zipWriter.Flush()
	fmt.Printf("Header: %x\n", io.Bytes())
	io.Reset()
	// Flush out the bytes:
	w.Write(make([]byte, fileSize))
	zipWriter.Flush()
	fmt.Printf("Flushed: %d\n", len(io.Bytes()))
	io.Reset()
	zipWriter.Close()
	fmt.Printf("EOCD footer: %x\n", io.Bytes())
}

This is quite slow and memory intense. Not having to do this:

w.Write(make([]byte, fileSize))

... would make our lives a lot easier 😅.

grdw added the Proposal label Feb 7, 2024

gopherbot added this to the Proposal milestone Feb 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: archive/zip: extend the visibility of the countWriter #65569

proposal: archive/zip: extend the visibility of the countWriter #65569

grdw commented Feb 7, 2024

ianlancetaylor commented Feb 7, 2024

grdw commented Feb 8, 2024 •

edited

proposal: archive/zip: extend the visibility of the countWriter #65569

proposal: archive/zip: extend the visibility of the countWriter #65569

Comments

grdw commented Feb 7, 2024

Proposal Details

ianlancetaylor commented Feb 7, 2024

grdw commented Feb 8, 2024 • edited

grdw commented Feb 8, 2024 •

edited