Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: os: new Readdirentries method to read directory entries and efficiently expose file metadata #40352

Open
israel-lugo opened this issue Jul 22, 2020 · 34 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Proposal Proposal-Hold
Milestone

Comments

@israel-lugo
Copy link

israel-lugo commented Jul 22, 2020

API

(Update: edited to turn into a proposal, add suggested API)

func (f *File) Readdirentries(n int) ([]DirEntry, error)

// The file type could not be determined
const ModeUnknown FileMode = xxx

type DirEntry struct {
  // Name is the base name of the file
  Name string
  // ... unexported fields ...
}

// Id returns a number which uniquely identifies the file within the filesystem that contains it.
// This is known as the inode number in Unix-like systems, or file ID in Windows. Under Windows,
// this will require a system call on first call, but not on Unix.
//
// Note this is not guaranteed to be unique in the ReFS file system, introduced with Windows
// Server 2012, since that uses 128-bit identifiers.
func (d *DirEntry) Id() (uint64, error) { ... }

// Type returns the file's type. Depending on the underlying filesystem, this may require an Lstat,
// which will be done internally and cached after first use. If lightweight is true, the Lstat will not
// be done; in that case, if the file type is not immediately known, ModeUnknown will be
// returned. This may be useful e.g.if the caller will be opening the file anyway and would prefer
// to do a Stat of the open file to avoid filename races. 
func (d *DirEntry) Type(lightweight bool) (FileMode, error) { ... }

// Lstat behaves like the normal Lstat. Its result will be cached after the first use, which may have
// occurred from calling Type or even Inode under Windows.
func (d *DirEntry) Lstat() (FileInfo, error) { ... }

This is analogous e.g. to Python's os.scandir, as pointed out by @qingyunha below.

Context

Could we please have a new File-level API to list a directory's entries, which exposes the d_type field (syscall.Dirent.Type) and, ideally, also d_ino (syscall.Dirent.Ino)?

Under Linux, certain filesystems (such as Btrfs, ext4) store the file type information in the direntry itself. This is available via the d_type field, which may be DT_UNKNOWN if the file type could not be determined for some reason (e.g. no filesystem support, or weird quirks such as "." or ".."). According to man readdir(3), some BSDs also support this.

Currently, we have os.(*File).Readdir, which does an lstat on every file and does not make use of the type information, even if it's there. This makes sense given the method's signature, since it needs to find out the file's size, mode, etc.

We also have os.(*File).Readdirnames, which reads the dirent but only returns the name portion.

It would be very useful to have an intermediate method between these two, that returns not only the name, but also the file type (which may of course be DT_UNKNOWN), and ideally anything else it can know from the dirent, such as the file's inode number (d_ino or syscall.Dirent.Ino).

This would make it much easier to implement a fast/scalable file crawler (e.g. for backup software or something else). Given a directory with 100,000 entries, being able to cheaply separate subdirectories from other files while listing the directory itself lets the crawler e.g. batch up regular files for further processing, or choose crawling strategies depending on whether there are 2 subdirectories or 75,000. Especially for the backup case, having the inode number outright would also be useful, as it helps identify hardlinks (which may skip reading the data twice) without the cost of the lstat.

See e.g. this topic in golang-nuts for some speed comparisons. This can make a very big difference.

@ianlancetaylor ianlancetaylor changed the title New File method to read directory entries and expose file type, inode os: new File method to read directory entries and expose file type, inode Jul 22, 2020
@ianlancetaylor
Copy link
Contributor

This is inherently very system dependent, so I think the question is whether it would be better to use golang.org/x/sys/unix.ReadDirents instead.

@toothrot toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 22, 2020
@toothrot toothrot added this to the Backlog milestone Jul 22, 2020
@israel-lugo
Copy link
Author

israel-lugo commented Jul 22, 2020

This is inherently very system dependent, so I think the question is whether it would be better to use golang.org/x/sys/unix.ReadDirents instead.

The thing is, golang.org/x/sys/unix.ReadDirent is terribly low level in its signature. It modifies a byte buffer in place, and requires parsing each entry. The user would need to basically reimplement Readdirnames and ParseDirent; that's some 80 lines of non-trivial and error-prone code, dealing with a bunch of byte offsets etc, versus just being able to use dirInfo, err := f.ReaddirExt().

Regarding the system-dependent point, I would note that POSIX specifies d_ino, i.e. any POSIX system will at the very least tell you a file's inode, along with its name. That information in itself can be very useful, as mentioned above to efficiently identify hard links in a large tree.

Even the file type d_type is widely supported; common filesystems on Linux have long supported this, e.g. ext4, xfs, btrfs. And in case of no support, returning DT_UNKNOWN would be transparent for the application.

Perhaps even Windows may have similar information that can be returned from a directory listing. It's frankly been over a decade since I've done any Windows development so I wouldn't know.

@networkimprov
Copy link

The inode is accessible in most oses as aFileInfo.Sys().(*syscall.Stat_t).Ino. This fails on Windows, so I patched my stdlib; see
https://groups.google.com/d/topic/golang-dev/raE01Fa2Kmo/discussion

Maybe we need os.FileInfo.BetterSys() to provide the d_type if available and the fileId on Windows.

@ianlancetaylor
Copy link
Contributor

@israel-lugo Go runs on non-POSIX systems. The os package is intended to be platform independent.

If the current interfaces provided by golang.org/x/sys/unix are awkward, perhaps we can think about ways to improve them.

@ianlancetaylor
Copy link
Contributor

@networkimprov The issue here is whether the inode is available from reading only the directory, not opening the file and not calling Stat. Is that possible on Windows?

@networkimprov
Copy link

networkimprov commented Jul 22, 2020

A quick Winapi search didn't turn up a way to get fileId from a directory entry without a CreateFile() (aka open).

Perhaps we need (f *File) Readdirentries(n int) ([]DirEntry, error). DirEntry methods are Name(), Type(), Id().

On most unixen, the DirEntry object would be populated. On Windows, its methods would CreateFile, etc on first use, to avoid that for every item. That solves the performance problem for Linux, and the missing FileInfo.Sys().Ino on Windows.

@alexbrainman
Copy link
Member

The issue here is whether the inode is available from reading only the directory, not opening the file and not calling Stat. Is that possible on Windows?

I don't know what inode is, but, from what I can gather, the closest thing on Windows is BY_HANDLE_FILE_INFORMATION.dwVolumeSerialNumber (see https://docs.microsoft.com/en-us/windows/win32/api/fileapi/ns-fileapi-by_handle_file_information ). I suspect you can get dwVolumeSerialNumber by reading directory directly (I did not look for that API). But I don't see how it is helpful, because I don't know what is the problem that we are solving.

Alex

@networkimprov
Copy link

Alex, it's the nFileIndexHigh/Low fields (aka fileId) in that structure.

@alexbrainman
Copy link
Member

Alex, it's the nFileIndexHigh/Low fields (aka fileId) in that structure.

You are correct, it is nFileIndexHigh / nFileIndexLow. I was wrong. I spoke too quickly.

Alex

@israel-lugo
Copy link
Author

israel-lugo commented Jul 23, 2020

Thank you for your help, @networkimprov and @alexbrainman. I noticed Alex asked what is an inode and what we're trying to achieve. Let me try to bridge the gap a little, with a small introduction to Unix filesystems.

Inodes

An inode, in Unix, is the metadata structure that contains the low level details of a file (or directory, which is just a particular type of file). The inode holds the file's owner, permissions, timestamp and the list of disk sectors that contain the file's data. The inode is, for all intents and purposes, the file itself.

Inodes are identified within a filesystem by a unique number. This is the "inode number", often referred to as just "the inode", which is a slight misnomer. Some filesystems (such as the very common ext4) actually have a table of inodes, i.e. a linear array of inode structures. The inode number would be an index into that table.

Directory entries

Crucially, an inode (i.e. the file) does not have a name. It is identified solely by its inode number. The mapping of filenames to inodes is kept in directories.

A directory is a file of a special type. Its data is a list of directory entries, which map individual filenames (the files and directories inside that directory) to their respective inodes. So, if directory Foo contains file Bar.txt, which has inode number 823, then Foo's data will contain a directory entry like this:

filename inode
Bar.txt 823

Since the directory is itself a file, it also has its own inode, and its own inode number. So Foo's parent will contain a directory entry, with filename="Foo" and inode=12345. And so on, until we reach the root of the filesystem, which exists at a well known inode number (inode 2 for most Linux filesystems I know).

Hard links

Another name for this mapping of filename to inode is a "hard link". I.e. a link from the filename to the inode number.

One important thing to note is that the mapping of filename to inode can be many to one: multiple entries in multiple directories can point to the same inode (the same file). So, as we have Foo/Bar.txt pointing to inode 823, some other Splot directory can also have a directory entry like this:

filename inode
another-name 823

So the same file (inode 823) exists in two places of the filesystem tree: "Foo/Bar.txt" and "Splot/another-name". Both names point to the same inode, so it's the same exact file. Same owner, same timestamps, same physical disk blocks of data. Opening the file through one name or the other yields the same file object, with the same data.

In this scenario, we would say the file has two hard links. Or, colloquially, that e.g. "Splot/another-name" is a hard link to "Foo/Bar.txt".

Intended goal

Go's current directory API allows us to get either all the filenames in a directory (os.(*File).Readdirnames), or full os.FileInfo objects for every file (os.(*File).Readdir). The os.FileInfo objects contain a wealth of metadata on the file, such as the file's type, permissions, timestamps.

The problem is, os.FileInfo has "too much" information. In order to get all that, it's necessary to read the actual inode for each file. So os.(*File).Readdir has to, for each file inside the directory, read that file's actual inode. That's O(N) I/O operations, with N being the number of files in the directory.

Another problem, as @networkimprov pointed out, is that os.FileInfo doesn't contain the inode number at all, or rather, a unique identifier for the file, which in Windows would apparently correspond to the nFileIndexHigh / nFileIndexLow.

What I would like to see is an intermediate API that provides the filenames, their inode/fileId numbers and (when possible) their file types.

In Unix, just listing the directory will always give you at least the the filename and the inode number. So we can get all the filenames and their inode numbers just by reading the one directory. That's already very useful for a file crawler, since it allows you e.g. to detect hard links (two files with the same inode number). If you're writing e.g. a backup software, you probably won't need to backup the same file twice.

Many modern filesystems also store more information in the directory entry: the file's type (directory, regular file, socket, symlink, etc). So in these systems, when reading a directory's entries, we can get, for each child, its filename, inode number and file type. That is even better, because it allows the file crawler to make better decisions about which files to access. Maybe you only care about the directories, or you want to treat them first (to build the directory tree). Having the file type information for all children, just by reading the parent directory, is a great time saver.

I like @networkimprov's proposal. os.(*File).Readdirentries would return an object which is populated efficiently in most Unix systems, and can be populated with a slower fallback in other systems that don't support retrieving this information in one pass.

The question here is: can we make this fast in Windows too? In Windows, what kind of information can one retrieve about a directory's children, when listing that directory? I.e. without having to open or stat each file individually. If it's possible to get this working efficiently in Windows, that would be great. If not, however, I still think it would be worth having the API, even if it does have to fall back to opening each file individually. It won't be much of a speed improvement in Windows over os.(*File).Readdir, but it can certainly make a very large difference for POSIX (Unix-like) systems.

@networkimprov
Copy link

networkimprov commented Jul 23, 2020

Below is the struct you get while walking a directory in Windows. It should be fast. So we could also do for Linux what I proposed for Windows, provide DirEntry methods that lstat() on first use to retrieve metadata not found in Linux dirents.

https://docs.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-win32_find_dataa

A small fact I neglected to mention is that the Go team tries to avoid adding new APIs :-/ However this concept could deprecate os.Readdir()...

The issue title should probably be "proposal: os: Readdirentries to read directory..."
EDIT: And you'd need to spell out the specifics; feel free to cut/paste what I wrote, and amend it as nec.

@alexbrainman
Copy link
Member

@israel-lugo

Let me try to bridge the gap a little, with a small introduction to Unix filesystems.

Thank you for spending time to explain it all. I don't know of anything similar to inode number on Windows. Probably, something like that https://devblogs.microsoft.com/oldnewthing/20110228-00/?p=11363

Go's current directory API allows us to get either all the filenames in a directory (os.(*File).Readdirnames), or full os.FileInfo objects for every file (os.(*File).Readdir). The os.FileInfo objects contain a wealth of metadata on the file, such as the file's type, permissions, timestamps.

The problem is, os.FileInfo has "too much" information. In order to get all that, it's necessary to read the actual inode for each file.

Windows version of os.(*File).Readdirnames just calls os.(*File).Readdir, gets all filenames, and throws away the rest of os.FileInfo data. See

go/src/os/dir_windows.go

Lines 62 to 69 in 8696ae8

func (file *File) readdirnames(n int) (names []string, err error) {
fis, err := file.Readdir(n)
names = make([]string, len(fis))
for i, fi := range fis {
names[i] = fi.Name()
}
return names, err
}

os.(*File).Readdir is implemented by calling FindFirstFile and FindNextFile Windows API (like @networkimprov already explained). These functions return WIN32_FIND_DATAW structure https://docs.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-win32_find_dataw that provide everything we need. Perhaps there are other Windows APIs that provide less file information, but I am not sure, they will be faster to use.

So I don't see how any new os API can improve speed on Windows.

Alex

@qingyunha
Copy link
Contributor

By the way, python have both os.scandir and os.listdir.

@networkimprov
Copy link

From Python docs:

Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory. All os.DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; os.DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.

@networkimprov
Copy link

@israel-lugo nothing will come of this unless it's turned into a proposal with a specific API.

@israel-lugo israel-lugo changed the title os: new File method to read directory entries and expose file type, inode proposal: os: new Readdirentries method to read directory entries and efficiently expose file metadata Aug 4, 2020
@israel-lugo
Copy link
Author

@israel-lugo nothing will come of this unless it's turned into a proposal with a specific API.

Sorry, didn't have bandwidth until now. I've edited the original post, please let me know what you think.

@ianlancetaylor
Copy link
Contributor

The os package already defines Mode constants (https://golang.org/pkg/os/#FileMode). Seems like we could use those rather than introducing new DT_xxx constants. I guess we would need to define ModeUnknown somehow.

Do we really need Inode? When would that be used?

I don't see a reason to add a lightweight parameter. If we don't know, return ModeUnknown.

Given that I'm not sure DirEntry needs methods at all. Seems like it could just be a struct with Name and Mode fields. And possible an Inode field if that seems really useful.

@ianlancetaylor ianlancetaylor modified the milestones: Backlog, Proposal Aug 5, 2020
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Aug 5, 2020
@ianlancetaylor
Copy link
Contributor

CC @bradfitz

@networkimprov
Copy link

networkimprov commented Aug 5, 2020

The API should be cross-platform. On Windows, dirents provide different metadata than on Linux. So DirEntry's methods should encompass most of the Windows & Linux metadata. See link below. Inode() should be called Id(), as inode is a unix term.

I don't think we want Lstat(); a cached FileInfo should be internal. It's only used when a method is called that dirents don't store on that platform.

@ianlancetaylor the variability of dirents across platforms is why DirEntry needs methods. The Id/inode is required when duplicating a tree that contains multiple hardlinks to the same file (and there is currently no Go API to get the Id on Windows).

https://docs.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-win32_find_dataa

cc @bcmills

@alexbrainman
Copy link
Member

DirEntry looks like simplified version of os.FileInfo. Why do we need 2 versions of the same thing? Users need to understand this difference - DirEntry is faster to get then os.FileInfo. And that is not even true on Windows.

Sorry, but I don't see the benefits.

Alex

@israel-lugo
Copy link
Author

israel-lugo commented Aug 5, 2020

The API should be cross-platform. On Windows, dirents provide different metadata than on Linux. So DirEntry's methods should encompass most of the Windows & Linux metadata. See link below. Inode() should be called Id(), as inode is a unix term.

@networkimprov This is why the Lstat method is there. With the metadata we get from Windows, we can pretty much fill out a FileInfo for free when scanning a directory. With Linux we need one Lstat to get anything outside of Name, Id and Type.

Perhaps instead of Lstat the method should just be called FileInfo. I used the Lstat name to make it clear that this is not following symlinks, but the point is really to provide a FileInfo (for free on Windows, with a syscall on Linux).

@israel-lugo
Copy link
Author

DirEntry looks like simplified version of os.FileInfo. Why do we need 2 versions of the same thing? Users need to understand this difference - DirEntry is faster to get then os.FileInfo. And that is not even true on Windows.

@alexbrainman DirEntry exposes a FileInfo, plus the file Id (which is not available with any API on Windows) plus the file type, which is not retrievable on Linux without making one system call for every file.

Do not underestimate the "DirEntry is faster to get" point. That is 1 system call versus 500 to list a basic directory. It's the difference between taking 10s to crawl a typical user home vs taking 2 minutes with heavy I/O.

@israel-lugo
Copy link
Author

israel-lugo commented Aug 5, 2020

The os package already defines Mode constants (https://golang.org/pkg/os/#FileMode). Seems like we could use those rather than introducing new DT_xxx constants. I guess we would need to define ModeUnknown somehow.

@ianlancetaylor That's a good point, thank you. I've udpated to return a FileMode, and proposed a new ModeUnknown.

Do we really need Inode? When would that be used?

(I've renamed Inode to Id, per suggestion from @networkimprov.)

This is useful e.g. when crawling a directory tree and you care about hard links (e.g. because you're backing up files, or doing some sort of batch processing). Also, we don't have any portable API to get this today (you have to assert FileInfo.Sys() and even then that doesn't work on Windows).

I don't see a reason to add a lightweight parameter. If we don't know, return ModeUnknown.

That is certainly a reasonable possibility, and I am not opposed to it. The user can just Stat/Lstat explicitly in case of ModeUnknown.

In the current state, the proposal tries to make DirEntry a bit friendlier, like the Python os.DirEntry. I.e. what it doesn't know immediately, it will obtain. This makes sense in my mind, because we have different cases across Linux and Windows where different syscalls need to be made anyway: Windows needs a syscall to get the inode number, but it basically gives you the other fields in a FileInfo for free. Whereas Linux is the other way around.

The lightweight parameter in this case lets you control the one case where the performance penalty is not predictable for the user: on Windows, Type is free, and on Linux it's free most of the time (depends on the filesystem). You may not want to pay the extra syscall to get it, i.e. if you're more likely than not to open the file, you may decide that you prefer opening first then doing an fstat to have the bonus of avoiding races.

@rsc
Copy link
Contributor

rsc commented Aug 5, 2020

I'd like to put this on hold until the FS proposal is worked out. We might be able to make this available with no new API.

@networkimprov
Copy link

@rsc, could you provide a link to the FS proposal?

@ianlancetaylor
Copy link
Contributor

@networkimprov
Copy link

networkimprov commented Aug 5, 2020

I've commented on https://www.reddit.com/r/golang/comments/hv976o/qa_iofs_draft_design/g0hwjij/

@rsc, Reddit has buried the comment so you may need this link to find it.

@benhoyt
Copy link
Contributor

benhoyt commented Aug 6, 2020

As author of PEP-471 and os.scandir (included in Python 3.5+), I'll say this has definitely been useful in the Python world for speeding up recursively walking directory trees, just because of how many stat() calls it avoids. And most people don't even need to use it directly -- they just got a faster os.walk() (I guess the Go equivalent is filepath.Walk, but the signature of WalkFunc returning a complete FileInfo kind of rules that out -- if only the FileInfo methods returned errors...).

In my tests of os.scandir, it sped up directory walking 2-3x on Linux, and 8-9x on Windows -- and it's higher for network-connected file systems. So a very significant performance improvement. (Note that Type() will generally be free on Linux due to d_type, and Lstat() is free on Windows due to the info in the FindFirst/Next results.)

This is cross-platform in Python, so I don't see why it can't be in Go. If it can be made available "with no new API" via the FS proposal, all the better! Though I'm not sure what Russ has in mind there.

I think that the Python DirEntry API ended up a bit too complex. I don't think the follow_symlinks param is needed (should probably just not follow symlinks), and the per-DirEntry caching of values might be too much. So I like the simpler API @israel-lugo is proposing, though I'm not sure whether lightweight or the caching are really necessary.

@networkimprov
Copy link

networkimprov commented Aug 6, 2020

Thanks for the input, Ben. I think this proposal may be moot. Our task now is to help define an API for the ReadDirFile & ReadDirFS interfaces, which currently give FileInfo, meaning all implementations of them could have performance problems.

@networkimprov
Copy link

A new proposal on this topic is #41188, and there's some alternatives in that thread.

@networkimprov
Copy link

I filed #41265. It offers a new ReadDir() API for io/fs.

@rsc
Copy link
Contributor

rsc commented Sep 18, 2020

Please see #41467.

@rsc rsc moved this from Incoming to Active in Proposals (old) Sep 18, 2020
@rsc
Copy link
Contributor

rsc commented Sep 23, 2020

If #41467 is accepted, I think we can close this one.

@rsc rsc moved this from Active to Hold in Proposals (old) Sep 23, 2020
@networkimprov
Copy link

FYI #41467 doesn't address one of the requirements given above:

// ... if the file type is not immediately known, ModeUnknown will be
// returned. This may be useful e.g. if the caller will be opening the file anyway and would prefer
// to do a Stat of the open file to avoid filename races

See also #40352 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Proposal Proposal-Hold
Projects
Status: Hold
Development

No branches or pull requests

9 participants