Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: archive/zip: extend the visibility of the countWriter #65569

Open
grdw opened this issue Feb 7, 2024 · 2 comments
Open

proposal: archive/zip: extend the visibility of the countWriter #65569

grdw opened this issue Feb 7, 2024 · 2 comments
Labels
Milestone

Comments

@grdw
Copy link

grdw commented Feb 7, 2024

Proposal Details

We're currently using a custom build zip writer to "flush" zip headers and the EOCD footer, which naturally for 90% looks identical to the one in writer.go. The use-case for this custom zip writer is to "prepare a zip file" without the need for having the actual data of a file in the actual zip file yet, which allows for streaming of a zip file.

We currently can't use the standard golang zip library because we can't forward the position of the countWriter by hand. Ideally, we'd be able to set w.cw.count without the restriction of the data being written beforehand (so SetOffset() can't be used, unfortunately).

The suggestion here would be to add the following helpers, or some similar functionality to forward the w.cw.count variable without the restriction of SetOffset(), and to read out its value with the following public functions:

// pseudo code:
func (w *Writer) AdvanceOffset(n int64) {
	w.cw.count += n
}

func (w *Writer) GetOffset() int64 {
	w.cw.count
}

This would make the use of the standard golang zip-library useful for our use-case. We would use the Flush() functions as they exist now to get out the intermediate headers and the EOCD footer.

@grdw grdw added the Proposal label Feb 7, 2024
@gopherbot gopherbot added this to the Proposal milestone Feb 7, 2024
@ianlancetaylor
Copy link
Contributor

That seems pretty special purpose. I struggle to see how anybody else would use this functionality. Is it really worth adding to the standard library?

It also seems to me that you can increment the offset by calling the Write method with a slice of the appropriate size. You could have the underlying writer discard the data, if necessary.

@grdw
Copy link
Author

grdw commented Feb 8, 2024

Thanks for the quick reply!

It also seems to me that you can increment the offset by calling the Write method with a slice of the appropriate size.

Correct, that can also be done to solve this specific use-case. The downside is that the files that will be 'squeezed' in-between the zip elements (for lack of a better description) can become quite large in our specific case, and we'll easily talk >500 GB in some cases. To take an extreme - but not uncommon example - doing the work for a 1 TiB file would result in the following code snippet:

package main

import (
	"archive/zip"
	"bytes"
	"fmt"
	"time"
)

func main() {
	fileSize := uint64(1024 * 1024 * 1024 * 1024)
	io := new(bytes.Buffer)
	zipWriter := zip.NewWriter(io)
	w, err := zipWriter.CreateHeader(&zip.FileHeader{
		Name:               "test.mov",
		Modified:           time.Now(),
		CRC32:              25,
		CompressedSize64:   fileSize,
		UncompressedSize64: fileSize,
	})
	if err != nil {
		panic(err)
	}
	// Flush out the header:
	zipWriter.Flush()
	fmt.Printf("Header: %x\n", io.Bytes())
	io.Reset()
	// Flush out the bytes:
	w.Write(make([]byte, fileSize))
	zipWriter.Flush()
	fmt.Printf("Flushed: %d\n", len(io.Bytes()))
	io.Reset()
	zipWriter.Close()
	fmt.Printf("EOCD footer: %x\n", io.Bytes())
}

This is quite slow and memory intense. Not having to do this:

w.Write(make([]byte, fileSize))

... would make our lives a lot easier 😅.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

3 participants