Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: embed: read file contents directly from FS during fs.WalkDir #45815

Closed
colin-sitehost opened this issue Apr 28, 2021 · 9 comments
Closed

Comments

@colin-sitehost
Copy link

colin-sitehost commented Apr 28, 2021

background

Currently there is no ergonomic way to walk an embed.FS and interact with file contents without handling a number of errors that are guaranteed to be nil.

//go:embed static
var static embed.FS

func main() {
        fs.WalkDir(static, ".", func(path string, d fs.DirEntry, err error) error {
                if err != nil {
                        return err
                }
                f, err := static.Open(path)
                if err != nil {
                        return err
                }
                b, err := io.ReadAll(f)
                if err != nil {
                        return err
                }
                fmt.Println(string(b)) // or something more interesting
                return nil
        })
}

Each fs.DirEntry, when using an embed.FS, is actually an embed.file that contains all the data that I want to use in io.Copy, however it is gated behind the data.Open call.

I understand that for a general file system this type of feature may be dangerous or unnecessary, as sometimes things change between stat and open, however for embed.FS these are invariants.

proposal

I can envision a new interface that gives direct access to the raw data:

package embed

func Walk(fsys FS, root string, fn func(path, data string))

I am not sold on this exact syntax, and it would need to document that it is just for ripping through file data, but think that there exists something for this problem space.

//go:embed static
var static embed.FS

func main() {
        embed.Walk(static, ".", func(_, data string) {
                fmt.Println(data) // or something more interesting
        })
}

alternatives

With sufficient testing, this could be implemented in an extended standard library as syntax sugar, but the amount of errors that would be ignored (or panicked) and data buffering scares me:

package x

func EmbedWalk(fsys embed.FS, root string, fn func(path string, data []byte)) {
        fs.WalkDir(fsys, root, func(path string, d DirEntry, _ error) error) {
                if !d.IsDir() {
                        f, _ := fsys.Open(path)
                        b, _ := io.ReadAll(f)
                        fn(path, b)
                }
                return nil
        }
}

Since there likely exist other cases that would benefit from this (immutable, in memory, or loops where you intend to read every file), it may be useful to create a more generic interface in the io/fs package instead of embed. This will require more exported symbols and may not be worth the hassle.

package fs

func WalkRead(fsys ReadFileFS, fn WalkReadFunc) error

type WalkReadFunc func(path string, data []byte, err error) error

Currently, the embed.file.Sys method always returns nil, it may be considered a breaking change to start providing a value, but if that could be something useful like embed.openFile/embed.openDir, you could abuse it like so:

        fs.WalkDir(data, ".", func(path string, d fs.DirEntry, err error) error {
                if err != nil {
                        return err
                }
                io.Copy(os.Stdout, d.(interface{ Sys() interface{} }).Sys().(io.Reader)) // or something more interesting
                return nil
        })

Using the existing interface, I could just panic, but this feels like bad form, and my goal is to simplify the interface:

        fs.WalkDir(data, ".", func(path string, d fs.DirEntry, err error) error {
                if err != nil {
                        return err
                }
                f, err := data.Open(path)
                if err != nil {
                        panic(err)
                }
                io.Copy(os.Stdout, f) // or something more interesting
                return nil
        })
@gopherbot gopherbot added this to the Proposal milestone Apr 28, 2021
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Apr 28, 2021
@earthboundkid
Copy link
Contributor

FWIW, here's a runnable version of your code: https://play.golang.org/p/XkgAqsOBcJa

You can use fs.ReadFile whether an fs provides ReadFile or not and it will fallback to io.ReadAll, so I'm not sure why you'd need WalkRead.

The error passed into the WalkDirFunc is from errors trying to open a directory. For an embed.FS, you can just ignore it, or panic if you want, since they can't happen.

Here is a simplified version of your code:

	err := fs.WalkDir(static, ".", func(path string, d fs.DirEntry, err error) error {
		// cannot happen
		if err != nil {
			panic(err)
		}
		if d.IsDir() {
			return nil
		}
		b, err := fs.ReadFile(static, path)
		if err != nil {
			return err  // or panic or ignore
		}
		log.Printf("%q", b)
		return nil
	})

Playing with this code does make me aware that there's no way inside of a WalkDirFunc to see the fs that's being walked, which is a little unfortunate. It means the WalkDirFunc needs to be a closure over fsys. But in practice, I don't think many people are writing reusable WalkDirFuncs anyway, so I don't think it's a real limitation.

FWIW, in my experience, it's usually reads best to have the WalkDirFunc just collect valid path names in a slice and then do the processing after the walking is over. That also makes it easier to fan out goroutines if that's needed for performance. If you're doing it that way, the WalkDirFunc is usually pretty short and mostly just comparing file names to some pattern or another.

@rsc
Copy link
Contributor

rsc commented May 5, 2021

It does seem like in the case where the Open+Read+Close are guaranteed to succeed, it suffices to write

data, _ := fs.ReadFile(static, path)

That's just one line. It doesn't seem worth adding new API to avoid writing one line.

@rsc
Copy link
Contributor

rsc commented May 5, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) May 5, 2021
@rsc
Copy link
Contributor

rsc commented May 12, 2021

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Decline in Proposals (old) May 12, 2021
@colin-sitehost
Copy link
Author

data, _ := fs.ReadFile(static, path)

this seems like a really sharp edge. much like #45961 I am certian you or I could convince ourselves of if an error can be ignored, but this is a toy example. what if somebody comes behind and refactors fs.FS to an implementation other than embed.FS?

@earthboundkid
Copy link
Contributor

func EmbedFSOnlyWalkDir(fsys embed.FS, path string, f fs.WalkDirFunc) error {
	return fs.WalkDir(fsys, path, f)
}

@rsc rsc moved this from Likely Decline to Declined in Proposals (old) May 19, 2021
@rsc
Copy link
Contributor

rsc commented May 19, 2021

No change in consensus, so declined.
— rsc for the proposal review group

@colin-sitehost
Copy link
Author

so, just to confirm there is not any solution for performance concerns and the suggested alternative is to simply ignore errors?

@ianlancetaylor
Copy link
Contributor

The suggestion is to use fs.ReadFile. It's fine to check the error result, although at present it will always be nil.

@golang golang locked and limited conversation to collaborators Aug 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

5 participants