proposal: io/fs,path/filepath: add iterator form of WalkDir #64341

rogpeppe · 2023-11-22T18:05:05Z

It's often more convenient to use a for loop rather than a callback-based API, and fs.WalkDir and filepath.Walk are both callback-based APIs. I've been using github.com/kr/fs for years to work around this.
The new iterate-over-func functionality (#61405) now makes it straightforward to adapt the stdlib functions into iterators, so perhaps we could provide iterator-based APIs as part of the standard library now.

Here's a straw-man implementation:

package fs

type WalkDirEntry struct {
	Path    string
	Entry   fs.DirEntry
	Err     error
	skipDir *bool
}

// SkipDir causes the iteration to skip the contents
// of the entry. This will have no effect if called outside the iteration
// for this particular entry.
func (entry WalkDirEntry) SkipDir() {
	*entry.skipDir = true
}

// WalkDirIter returns an iterator that can be used to iterate over the contents
// of a directory. It uses WalkDir under the hood. To skip a directory,
// call the SkipDir method on an entry.
func WalkDirIter(fsys fs.FS, root string) func(func(WalkDirEntry) bool) {
	return func(yield func(WalkDirEntry) bool) {
		fs.WalkDir(fsys, root, func(path string, d fs.DirEntry, err error) error {
			skipDir := false
			info := WalkDirEntry{
				Path:    path,
				Entry:   d,
				Err:     err,
				skipDir: &skipDir,
			}
			if !yield(info) {
				return fs.SkipAll
			}
			if skipDir {
				return fs.SkipDir
			}
			return nil
		})
	}
}

The main point of contention from my point of view is the semantics of the SkipDir method. Calling a method on an iterated item to determine the subsequent course of iteration seems somewhat unusual, but I don't see any other decent alternative.

[Edited to remove incorrect remarks about GC behavior].

The text was updated successfully, but these errors were encountered:

mateusz834 · 2023-11-22T19:26:25Z

Any reason not to return the error as a second value? like this:

type WalkDirEntry struct {
	Path    string
	Entry   fs.DirEntry
	skipDir *bool
}

func WalkDirIter(fsys fs.FS, root string) func(func(WalkDirEntry, error) bool) {

Seems like the conclusion of #61405 is to return errors this way. #61405 (comment)

rogpeppe · 2023-11-22T19:34:52Z

Any reason not to return the error as a second value? like this:

That was @Merovius's suggestion too. I wasn't sure, because the Path and Entry fields can be valid even when the error isn't, making the docs a bit awkward to write, which made me think it might not be the correct approach.

Nonetheless, here's a stab at it. Maybe it's not so awkward after all.

// WalkDirIter returns an iterator that can be used to iterate over the contents
// of a directory. The implementation uses [WalkDir].
//
// Each iteration produces an entry and an error that will be non-nil
// for unreadable directories. The [WalkDirEntry.Path] field is always valid;
// the [WalkDirEntry.Entry] field is always valid except for when the
// initial Stat on the root directory fails.
//
// If a directory's ReadDir method fails, the iteration will produce two
// entries, both with the directory's path: first with a nil error,
// giving the body of the loop the chance to stop or call [WalkDirEntry.SkipDir],
// and secondly with the error from ReadDir.
//
// To skip a directory, call the SkipDir method on an entry.
func WalkDirIter(fsys fs.FS, root string) func(func(WalkDirEntry, error) bool) {
	var skipDir bool // outside the loop so we'll only allocate once.
	return func(yield func(WalkDirEntry, error) bool) {
		fs.WalkDir(fsys, root, func(path string, d fs.DirEntry, err error) error {
			info := WalkDirEntry{
				Path:    path,
				Entry:   d,
				skipDir: &skipDir,
			}
			skipDir = false
			if !yield(info, err) {
				return fs.SkipAll
			}
			if skipDir {
				return fs.SkipDir
			}
			return nil
		})
	}
}

mateusz834 · 2023-11-22T19:44:46Z

I am also not a big fan of the SkipDir method, it looks weird to skip something (change the state of the iterator) using the result of the iterator, but on the other hand I don't know how this could be done differently with range-over-func. Was this discussed somewhere in #61405?

Merovius · 2023-11-22T19:54:02Z

It has been mentioned at least once

gazerro · 2023-11-22T20:40:37Z

I've observed (https://sourcegraph.com/search?q=context%3Aglobal+lang%3Ago+fs.WalkDir&patternType=standard&sm=1&groupBy=repo) that most fs.WalkDirFunc functions first check for the existence of an error. This makes me inclined to sympathize with @mateusz834's suggestion. However, what I observe most is that when the error is not checked first, access to the DirEntry argument occurs without verifying that it is not nil. (It is nil if the stat on the root directory has failed.)

This lack of checking obviously leads to a panic. There are also two missing checks in two tests in the Go repository.

I suggest not passing the error as a parameter, that is, leaving only one parameter, but ensuring that WalkDirIter returns an error in case the stat on the root directory fails:

files, err := fs.WalkDirIter(fsys, root)
if err != nil {
      // stat on the root directory failed
      ...
}
for file := range files {
	_ = file.Entry.Name() // this cannot panic because file.Entry is always guaranteed to be non-nil
}

rogpeppe · 2023-11-22T21:21:28Z

@gazerro Yes, I tend to agree, although it's a bit unfortunate that that makes it considerably harder (or perhaps no possible?) to implement WalkDirIter in terms of WalkDir AFAICS. Perhaps it might be preferable to reverse that, because I think it should be easier to implement WalkDir in terms of WalkDirIter with those semantics.

gazerro · 2023-11-22T22:42:59Z

Perhaps it might be preferable to reverse that, because I think it should be easier to implement WalkDir in terms of WalkDirIter with those semantics.

Yes, I agree. It would be more appropriate to implement WalkDir in terms of WalkDirIter and possibly deprecate WalkDir.

gazerro · 2023-11-22T22:44:44Z

I also suggest naming the new function DirWalker (or perhaps simply Entries? Though, this name might be overly simplistic). While WalkDir appropriately denotes an action, the new function simply returns an iterator (or a sequence, as one prefers to say). So it would be used as:

entries, err := fs.DirWalker(fsys, root)
if err != nil {
      // stat on the root directory failed
      ...
}
for entry := range entries {
	// ...
}

earthboundkid · 2023-11-23T02:41:26Z

Individual subdirectories and files can fail, so I don’t know what it saves you to return an error for the initial stat.

gazerro · 2023-11-23T07:39:39Z

@carlmjohnson if the first stat fails, the DirEntry.Entry field is nil. Therefore, before accessing DirEntry.Entry, it is necessary to check that it is not nil. This should be done both within an if err != nil { ... } statement and in cases where the error is not handled because we are not interested in handling it.

I have noticed many cases (https://sourcegraph.com/search?q=context%3Aglobal+lang%3Ago+fs.WalkDir&patternType=standard&sm=1&groupBy=repo) where this check is not done.

In addition, this is my opinion, I find it incorrect for the function to return an iterator on a root directory that does not exist.

rogpeppe added the Proposal label Nov 22, 2023

gopherbot added this to the Proposal milestone Nov 22, 2023

rogpeppe mentioned this issue Mar 25, 2024

proposal: errors: add All and AllAs iterators #66455

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: io/fs,path/filepath: add iterator form of WalkDir #64341

proposal: io/fs,path/filepath: add iterator form of WalkDir #64341

rogpeppe commented Nov 22, 2023 •

edited

mateusz834 commented Nov 22, 2023

rogpeppe commented Nov 22, 2023 •

edited

mateusz834 commented Nov 22, 2023

Merovius commented Nov 22, 2023

gazerro commented Nov 22, 2023 •

edited

rogpeppe commented Nov 22, 2023

gazerro commented Nov 22, 2023

gazerro commented Nov 22, 2023 •

edited

earthboundkid commented Nov 23, 2023 •

edited

gazerro commented Nov 23, 2023 •

edited

proposal: io/fs,path/filepath: add iterator form of WalkDir #64341

proposal: io/fs,path/filepath: add iterator form of WalkDir #64341

Comments

rogpeppe commented Nov 22, 2023 • edited

mateusz834 commented Nov 22, 2023

rogpeppe commented Nov 22, 2023 • edited

mateusz834 commented Nov 22, 2023

Merovius commented Nov 22, 2023

gazerro commented Nov 22, 2023 • edited

rogpeppe commented Nov 22, 2023

gazerro commented Nov 22, 2023

gazerro commented Nov 22, 2023 • edited

earthboundkid commented Nov 23, 2023 • edited

gazerro commented Nov 23, 2023 • edited

rogpeppe commented Nov 22, 2023 •

edited

rogpeppe commented Nov 22, 2023 •

edited

gazerro commented Nov 22, 2023 •

edited

gazerro commented Nov 22, 2023 •

edited

earthboundkid commented Nov 23, 2023 •

edited

gazerro commented Nov 23, 2023 •

edited