Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive/zip: add (*Writer).AddFS #54898

Closed
c9845 opened this issue Sep 6, 2022 · 42 comments
Closed

archive/zip: add (*Writer).AddFS #54898

c9845 opened this issue Sep 6, 2022 · 42 comments

Comments

@c9845
Copy link

c9845 commented Sep 6, 2022

Zipping a directory is unclear and slightly confusing via the archive/zip package, mostly due to the way paths need to be handled and how directories are created. I suggest adding a func to handle zipping a directory simply to aid in implementation. I have had to implement, or educate others on implementing, similar code for various projects over the years. Furthermore, help via StackOverflow, golang-nuts, etc., isn't very clear either. There is clearly somewhat wide-spread confusion regarding zipping a directory.

Pseudo-code is below.

// Dir creates a zip from sourceDirPath and saves it to zipPath.  If a file at zipPath 
// already exists, an error is returned.
func Dir(sourceDirPath, zipPath string) (err error) {
       // Make sure zip file doesn't already exist at zipPath.
	_, err = os.Stat(zipFileAbs)
	if err == nil {
		return os.ErrExist
	} else if !os.IsNotExist(err) {
		return
	}

	// Create the zip file.
	zipFile, err := os.Create(zipFileAbs)
	if err != nil {
		return
	}
	defer zipFile.Close()

	// Initialize the zip writer.
	z := zip.NewWriter(zipFile)
	defer z.Close()

	// Zip up the sourceDirPath, files and directories, recursively.
	err = filepath.WalkDir(sourceDirPath, func(path string, d fs.DirEntry, err error) error {
		// Error with path.
		if err != nil {
			return err
		}

		// Skip the source directory root. This can be ignored because the source
		// directory root is just the root of the zip file.
		if sourceDir == path {
			return nil
		}

		// Skip directories. Directories will be created automatically from paths to
		// each file to zip up.
		if d.IsDir() {
			return nil
		}

		// Handle formatting path name properly for use in zip file. Paths must be 
                // relative, not start with a slash character, and must use forward slashes,
                // even on Windows.  See: https://pkg.go.dev/archive/zip#Writer.Create
		//
                // Directories are created automatically based on the subdirectories provided
                // in each file's path.
		zipPath := strings.Replace(path, sourceDir, "", 1)
		zipPath = strings.TrimPrefix(zipPath, string(filepath.Separator))
		zipPath = filepath.ToSlash(zipPath)

		// Open the path to read from.
		f, err := os.Open(path)
		if err != nil {
			return err
		}
		defer f.Close()

		// Create the path in the zip.
		w, err := z.Create(zipPath)
		if err != nil {
			return err
		}

		// Write the source file into the zip at path from Create().
		_, err = io.Copy(w, f)
		if err != nil {
			return err
		}

		return nil
	})
	if err != nil {
		return
	}

	// Zip file created.
	err = z.Close()
	if err != nil {
		return
	}

	return
}
@c9845 c9845 added the Proposal label Sep 6, 2022
@gopherbot gopherbot added this to the Proposal milestone Sep 6, 2022
@seankhliao
Copy link
Member

This doesn't look like something that belongs in archive/zip, it shouldn't directly operate on the filesystem, both for reading and writing.
And if we did add something like this, the first thing people will want is control over the file attributes.
Also, the value proposition here is much weaker if you use the io/fs api (error handling elided):

fsys := os.DirFS("src")
zw := zip.NewWriter(w)
defer zw.Close()
fs.WalkDir(fsys, ".", func(p string, d fs.DirEntry, err error) error {
        if err != nil || d.IsDir() {
                return err
        }
        zf, _ := zw.Create(p)
        f, _ := fsys.Open(p)
        defer f.Close()
        _, _ = io.Copy(zf, f)
})

@DeedleFake
Copy link

How about a way to simply hand archive/zip an fs.FS and it'll just insert all of its contents into the in-progress zipfile? Something like

zw := zip.NewWriter(w)
defer zw.Close()
err := zw.AddFS(os.DirFS("src"))
// ...

@ianlancetaylor ianlancetaylor changed the title proposal: archive/zip: Add function for zipping a directory proposal: archive/zip: add Dir to zip a directory Sep 7, 2022
@earthboundkid
Copy link
Contributor

earthboundkid commented Sep 7, 2022

Adding to DeedleFake's proposal, perhaps AddFS(fsys fs.FS, globs ...string) error and it will only add those files matching a glob?

@beoran
Copy link

beoran commented Sep 7, 2022

@carlmjohnson if we want a filter,

AddFS(fsys fs.FS, filters ...func(string) (bool, error)) error

Seems like the more flexible solution.

@c9845
Copy link
Author

c9845 commented Sep 7, 2022

I completely forgot about fs.FS! That makes a lot of sense for this.

Using @DeedleFake's proposed zw.AddFS() also cleans us the func call since the output path to the zip file does not need to be provided to the func argument list.

Regarding filtering the files to include, I can see this going two ways.

  1. Skip filtering and make the user/caller/parent func/etc. filter the directory prior to zipping.
  2. Create an AddFS and AddFSWithFilter func to make the default case simpler. A ...string of regex glob patterns or ...func could work here, both have pluses and minuses.

I have rewritten my proposal pseudo-code below to use fs.FS and skip the filtering part for now.

// AddFS creates a zip archive of a directory by writing files from f to w.
func (w *Writer) AddFS (f fs.FS) (err error) {
	err = fs.WalkDir(f, ".", func(path string, d fs.DirEntry, err error) error {
		// Error with path.
		if err != nil {
			return err
		}

		// Skip directories. Directories will be created automatically from paths to
		// each file to zip up.
		if d.IsDir() {
			return nil
		}

		// Handle formatting path name properly for use in zip file. Paths must
		// use forward slashes, even on Windows. 
                // See: https://pkg.go.dev/archive/zip#Writer.Create
		//
		// Directories are created automatically based on the subdirectories provided
		// in each file's path.
		path = filepath.ToSlash(path)

		// Open the path to read from.
		f, err := f.Open(path)
		if err != nil {
			return err
		}
		defer f.Close()

		// Create the file in the zip.
		w, err := w.Create(path)
		if err != nil {
			return err
		}

		// Write the source file into the zip at path noted in Create().
		_, err = io.Copy(w, f)
		if err != nil {
			return err
		}

		return nil
	})
	if err != nil {
		return
	}

	return
}

@earthboundkid
Copy link
Contributor

If filter is a callback, you don't need to take multiple. Just take one and people can write any filter chaining themselves: AddFS(fsys fs.FS, filter func(string) (bool, error)) error. I think this is probably better than just hard coding glob. The other way to do it might be AddFS(fsys fs.FS, paths []string) error and then you require users to do any path listing/filtering ahead of time. You could special case it so that nil paths means "add everything".

@DeedleFake
Copy link

Adding to DeedleFake's proposal, perhaps AddFS(fsys fs.FS, globs ...string) error and it will only add those files matching a glob?

Maybe this would be useful as a more generalized wrapper for the io/fs package? For example, maybe a func FilteredByGlobs(fsys FS, patterns ...string) FS that returns an FS that treats anything that doesn't match the globs as not existing. There could be one that uses a custom filtering function, too, but I'm not sure if that makes sense over just writing a custom filesystem wrapper to do it if necessary. It's not that hard and it'll give more control.

@seankhliao
Copy link
Member

there's https://pkg.go.dev/io/fs#Glob

@DeedleFake
Copy link

But that just returns a []string of the matches. You can't use it to, for example, filter the contents of an fs.FS before handing it to something that itself wants an fs.FS.

@c9845
Copy link
Author

c9845 commented Sep 7, 2022

I think we should just skip the "filtering files to zip" functionality altogether and just zip an entire fs.FS. This covers 99% of use cases and is typically what is done when you zip a directory in a desktop GUI.

If a user doesn't want certain files to be zipped, then they can manually write the code to filter and zip, or remove the file(s) from the directory to be zipped.

@beoran
Copy link

beoran commented Sep 8, 2022

@seankhliao Using FS.Glob seems like the right idea, indeed. But one can write an FS that filters files internally, so the benefits are debatable.

@earthboundkid
Copy link
Contributor

earthboundkid commented Sep 8, 2022

I think we should just skip the "filtering files to zip" functionality altogether and just zip an entire fs.FS. This covers 99% of use cases and is typically what is done when you zip a directory in a desktop GUI.

If a user doesn't want certain files to be zipped, then they can manually write the code to filter and zip, or remove the file(s) from the directory to be zipped.

I think it's worth it to have some mechanism to filter because otherwise in those cases, you end up with a lot of boilerplate, which is the motivation for this issue. I still think AddFS(fsys fs.FS, paths []string) error where nil means "add everything" is a good balance of simple and extensible. If you want to glob, for example, it would be

matches, err := fs.Glob(fsys, "*.txt")
if err != nil { /**/ }

zw := zip.NewWriter(w)
defer zw.Close()

err := zw.AddFS(fsys, matches)

But one can write an FS that filters files internally, so the benefits are debatable.

The benefit is that making a new FS is a lot of boilerplate. It would be nice though if the fs package had more mechanisms for cut down FSes, beside just fs.Sub.

@apparentlymart
Copy link

apparentlymart commented Sep 9, 2022

Would the availability of a ready-to-use implementation of fs.FS that implements filtering -- whether in the standard library or elsewhere -- render the boilerplate concern to be moot, or is the goal for "add this specific subset of files from an already-existing fs.FS object" to be expressible all as a single statement?

I'm not sure if the concern is "no such filesystem implementation exists yet", "such a filesystem would be impossible to implement robustly in practice", or "the need to use such a filesystem makes this too inconvenient". The latter two would perhaps be justification for building filtering in to the zip API, but the former seems more like justification for a separate proposal to extend the fs package API (assuming there's consensus that it's a common enough need to be in the standard library). 🤔

When I read "an FS that filters files internally" I imagine an API allowing something like the following:

err := zw.AddFS(fs.FilterFS(fsys, func (...) { ... }))

Is that too much boilerplate? Or is it that the fs.FS API makes it tricky to implement fs.FilterFS fully generally, and that it's more reasonable to implement it specifically for the case of walking the filesystem tree and therefore not having to decide what it means to filter other filesystem operations?

@earthboundkid
Copy link
Contributor

It's not clear to me what a good signature for fs.FilterFS would be. func(name string) (ok bool) is simple and solves this usecase, but what about more of a middleware-ish func(name string) (fs.File, error) or maybe func(fsys fs.FS, name string) (fs.File, error)? And should it take another callback just for fs.Stat? I'm not opposed in principle, but I think it's hard to get the exact right balance of abstract, easy to use, and extensible without having more usecases in mind.

@c9845
Copy link
Author

c9845 commented Sep 10, 2022

Pardon my abruptness, but I feel we are going off on quite a tangent with this whole "filtering as fs.FS" thing. Would it not be serve us better to start a new issue/proposal for filtering an FS and keep this issue/proposal strictly focused on zipping an FS? Filtering an FS seems much more general purpose than something just used for zipping and should be a separate discussion to determine if it is really necessary and how it should be implemented.

@c9845
Copy link
Author

c9845 commented Sep 15, 2022

How do we proceed with this? Do I need to write a formal proposal for this?

If we are really interested in implementing a filtering mechanism, we can simply just copy the implementation of html/templates.ParseFS.

@apparentlymart
Copy link

My sense is that we are discussing the possibility of generally filtering filesystems because it it's an important part of deciding whether the archive/zip feature should include its own filtering feature, or whether it can instead rely on something more general being added to the fs package later.

I would agree that the exact details of how the general feature might work are not really relevant to the proposal, but whether such a thing would be possible/desirable at all does seem to be material to this proposal, unless you (as the original proposer) would like to explicitly rule out filtering as being in scope for what you are proposing. I personally think that would be fine; anyone who considers filtering to be a crucial component could then create a parallel proposal for that.


I'm not Go team but since I'm already replying anyway: my sense of the proposals process is that this issue has already been classified as a proposal and so it's in the right state to be visible to the proposal review group. The proposal review group will therefore at some point review this and discuss whether there seems to be a developing consensus. If not, they may ask further questions to try to encourage such consensus.

Other commenters here are currently attempting to help refine the proposal by clarifying the requirements and non-requirements, and discussing some different implementation/design approaches. In other proposal issues the original proposer typically responds to that by either incorporating the feedback into the original proposal text or clarifying exactly what is in scope for the proposal as far as they are concerned.

There is a short doc on The Proposal Process. My sense is that this issue is currently in the early stages of step 2, where others are trying to help develop the proposal in preparation for the proposal group to "triage the proposal".

@c9845
Copy link
Author

c9845 commented Jan 4, 2023

Hello All,
Has there been any progress on this topic? We continue to use, and are expanding our use of, the pseudo-code I provided above, but it would be nice to have this as a standard library feature.

@ianlancetaylor
Copy link
Contributor

This is in the incoming queue for proposal review. I'll bump its priority.

@rsc
Copy link
Contributor

rsc commented Jan 11, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Jan 18, 2023

It sounds like the minimal addition would be

func (*Writer) AddFS(fsys fs.FS) error

and then if you want more filtering or changing of paths or metadata, you write an fs.FS that does that. The implementation of this is also fairly straightforward. In common use

w.AddFS(os.DirFS("/home/you/sources"))

Do I have that right?

edit added error result

@c9845
Copy link
Author

c9845 commented Jan 18, 2023

@rsc Yes, that is exactly what I was thinking. Any filtering would be handled before calling AddFS().

I'm sure you meant it, but I think you would also want to AddFS to return an error.

@rsc rsc changed the title proposal: archive/zip: add Dir to zip a directory proposal: archive/zip: add (*Writer).AddFS Jan 25, 2023
@rsc
Copy link
Contributor

rsc commented Jan 25, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@c9845
Copy link
Author

c9845 commented Jan 25, 2023

@rsc Would it be worth implementing this same functionality for archive/tar as well? There is an argument that the use case is similar.

@ianlancetaylor
Copy link
Contributor

@c9845 Seems reasonable, want to open a proposal for that? See https://go.dev/s/proposal. Thanks.

@rsc
Copy link
Contributor

rsc commented Jan 31, 2023

I thought I needed this function in a program I was writing, so I wrote it (I've since changed my mind about needing it in this case). Should be something like this.

func zipAddFS(w *zip.Writer, fsys fs.FS) error {
	return fs.WalkDir(fsys, ".", func(name string, d fs.DirEntry, err error) error {
		if err != nil {
			return err
		}
		if d.IsDir() {
			return nil
		}
		info, err := d.Info()
		if err != nil {
			return err
		}
		h, err := zip.FileInfoHeader(info)
		if err != nil {
			return err
		}
		h.Name = name
		h.Method = zip.Deflate
		fw, err := w.CreateHeader(h)
		if err != nil {
			return err
		}
		f, err := fsys.Open(name)
		if err != nil {
			return err
		}
		defer f.Close()
		_, err = io.Copy(fw, f)
		return err
	})
}

@c9845
Copy link
Author

c9845 commented Jan 31, 2023

@rsc Your code is nearly identical to the code I use in production (see #54898 (comment)).

I compared your code against mine and both work the same. That is, they generate a zip file from and fs.FS. No issues running on Ubuntu or Windows with build in zip tools or 7zip.

There are some slight differences in zipped file size and the checksums of the zip files generated by each function do not match, but I believe that is just caused by you using FileInfoHeader and CreateHeader versus I used Create; some file metadata isn't copied via my method.

@c9845
Copy link
Author

c9845 commented Jan 31, 2023

Would it make sense to rename this proposed method to CreateFS instead of AddFS? This would follow the naming convention of Create, CreateHeader, and CreateRaw. This would help with editor auto-completion suggestions and human memory.

@DeedleFake
Copy link

Would it make sense to rename this proposed method to CreateFS instead of AddFS? This would follow the naming convention of Create, CreateHeader, and CreateRaw. This would help with editor auto-completion suggestions and human memory.

That sounds strange to me, like it's creating an FS. Maybe CreateFromFS()?

@earthboundkid
Copy link
Contributor

The Create family all return io.Writers. I don't think this fits that pattern.

@c9845
Copy link
Author

c9845 commented Jan 31, 2023

@DeedleFake Yes, I know it sounds weird. I just figured we should be following the naming convention.

@carlmjohnson I do see that the "Create" family all return io.Writer. So "AddFS" it is I guess.

@rsc
Copy link
Contributor

rsc commented Feb 1, 2023

Leaving likely accept for #58000 to catch up.

@rsc
Copy link
Contributor

rsc commented Feb 9, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: archive/zip: add (*Writer).AddFS archive/zip: add (*Writer).AddFS Feb 9, 2023
@rsc rsc modified the milestones: Proposal, Backlog Feb 9, 2023
@dsnet
Copy link
Member

dsnet commented Feb 9, 2023

Sorry I'm late to the party. The AddFS method SGTM for both zip and tar.

I apologize for bikeshedding, but if we AddFS we should think about what the equivalent method would be called that could hypothetically extract the contents of an archive into a writeable fs.FS (#45757). What's the opposite verb of Add? It's certainly not Subtract.

Possible suggestions:

  • Writer.ReadFromFS and Reader.WriteToFS
  • Writer.InsertFromFS and Reader.ExtractToFS
  • Writer.CopyFromFS and Reader.CopyToFS

@earthboundkid
Copy link
Contributor

Writer.FS() fs.FS?

@dsnet
Copy link
Member

dsnet commented Feb 9, 2023

I don't think we want the archive packages to ever return an fs.FS as that's significantly more complicated than simply reading from or writing to a caller-provided fs.FS.

Also, how would this work for the equivalent of Reader.FS() fs.FS in package tar? The TAR format does not provide random access to files and contents of files. Implementing the FS method would require buffering the entire contents of the archive.

@c9845
Copy link
Author

c9845 commented Feb 9, 2023

If we are worried about the AddFS name, why not just FS? As in...

f, _ := os.Create(pathToZipFile)

zw := zip.NewWriter(f)
defer zw.Close()

sourceDir := os.DirFS(pathToDirectory)

err := zw.FS(sourceDir)
if err != nil {
    //handle err
}

Another naming could be AddFS and RemoveFS.

@dsnet
Copy link
Member

dsnet commented Feb 9, 2023

I don't see how Reader.RemoveFS works. It's not "removing" a set of files from the reader and placing them in the target FS.

@iand
Copy link
Contributor

iand commented Feb 10, 2023

Writer.ExtractFS() (fs.FS, error)

@c9845
Copy link
Author

c9845 commented Mar 24, 2023

Hello, have there been any progress on this or the related tar functionality?

mauri870 added a commit to mauri870/go that referenced this issue Jul 26, 2023
The method AddFS can be used to add the contents of a fs.FS filesystem
to a zip archive.
This method walks the directory tree starting at the root of the filesystem
and adds each file to the archive.

Fixes: golang#54898
@gopherbot
Copy link

Change https://go.dev/cl/513438 mentions this issue: archive/zip: add AddFS method to zip Writer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

Successfully merging a pull request may close this issue.