Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: Readdir swallows partial results if it fails to Lstat any file in the listing #27416

Open
scudette opened this issue Aug 31, 2018 · 5 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@scudette
Copy link

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.10.1 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOHOSTOS="linux"

What did you do?

Test program here
https://play.golang.org/p/Xgvoi1GQGUv

It is hard to reproduce but when using fuse sometimes the mount point becomes disconnected. In this case when you try to do a lstat() of it you get "transport endpoint is not connected" (Google for that to see that it is a common thing). Anyway the issue is really that when a directory contains a disconnected mount point like this, then Golang's readdir variants (os.Open + Readdir and ioutils.ReadDir()) break out of the loop as soon as they can not Lstat one of the files in the directory and this either returns no results in the case of ioutils or randomly less results in the case of os.Open + Readdir.

This is very surprising and leads to weird program failures because an external event (unclean mount point) suddenly causes the entire directory listing to fail in Go programs. The directory is fine for e.g. ls - you can see how /bin/ls shows it:

$ ls -l /
ls: cannot access '/mnt': Transport endpoint is not connected
total 2235132
drwxr-xr-x   2 root root      12288 Aug 24 07:35 bin
drwxr-xr-x   3 root root       4096 Aug 30 06:38 boot
...
drwxr-xr-x   2 root root       4096 May  3 15:48 lib64
drwx------   2 root root      16384 Sep 26  2017 lost+found
drwxr-xr-x   5 root root       4096 Feb  9  2018 media
d?????????   ? ?    ?             ?            ? mnt
drwxr-xr-x   4 root root       4096 Jun 28 12:41 opt
dr-xr-xr-x 362 root root          0 Jun 27 23:02 proc
...
-rw-------   1 root root 2147483648 Dec 17  2017 swapfile
dr-xr-xr-x  13 root root          0 Sep  1 01:28 sys
drwxrwxrwt  89 root root     102400 Sep  1 01:51 tmp
drwxr-xr-x  14 root root       4096 Dec 29  2017 usr

The issue is in os/dir_unix.go exiting out of the loop in case of an lstat error.

A more robust implementation is possible by copying out the code from dir_unix.go and modifying it (https://play.golang.org/p/UFvrro7cqqu) so maybe this is a valid workaround but IMHO almost every user of ioutil.ReadDir() expects to get some results back - it is unexpected to just have the entire directory listing fail because someone has put a fuse mount inside it.

Semantically to me at least, when I call ReadDir on a directory (say "/"), then any error that I get back should relate to the directory itself. I was surprised that I was unable to ReadDir("/") when a subdir of "/" was actually un-stat'able. The error does not seem to relate to the thing I was calling the function on.

What did you expect to see?

I expected partial results with all the results that could be stat'ed. Maybe add an empty FileInfo for the bad mount point or omit it.

What did you see instead?

ReadDir() returns an error and no results - weird since I can totally do an "ls -l /" and this works fine.

@FiloSottile FiloSottile changed the title os.Readdir swallows partial results if it fails to Lstat any file in the listing. os: Readdir swallows partial results if it fails to Lstat any file in the listing Aug 31, 2018
@FiloSottile FiloSottile added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 31, 2018
@FiloSottile FiloSottile added this to the Unplanned milestone Aug 31, 2018
@scudette
Copy link
Author

Just spent 2+ hours debugging a problem around this bug. It should not happen!

In our project we needed to reimplement ReadDir() ourselves to work around this bug
https://github.com/Velocidex/velociraptor/blob/master/utils/file_linux.go#L28

But there was an odd place where ioutils.ReadDir() was used instead of our fixed version.

We should not need to reinvent parts of the standard library - please consider fixing this for good. It is extremely frustrating to spend time chasing bugs in standard libraries - resorting to very careful analysis of strace output to notice that getdents64 returns 82 entries but we only have 52 newfstatat() calls!

This is a high severity bug it should get more attention.

@ianlancetaylor ianlancetaylor modified the milestones: Unplanned, Backlog Mar 28, 2020
@ianlancetaylor
Copy link
Contributor

I don't know what the right thing to do is here. What should happen if there is more than one error calling lstat?

@scudette
Copy link
Author

scudette commented Dec 3, 2021

Just hit the same bug again with macos. Turns out there is a standard file on MacOS that can not normally be stat'ed meaning the entire directory can not be listed:

[DEBUG] 2021-12-03T17:06:14+10:00 Globber.ExpandWithContext: lstat /var/db/DifferentialPrivacy: operation not permitted while processing /var/db

I suspect it is therefore impossible to list /var/db/ on macos.

scudette added a commit to Velocidex/velociraptor that referenced this issue Dec 3, 2021
Workaround upstream go library bug needs to be applied to darwin too
and probably freebsd as well.
Ref: golang/go#27416
scudette added a commit to Velocidex/velociraptor that referenced this issue Dec 3, 2021
Workaround upstream go library bug needs to be applied to darwin too
and probably freebsd as well.
Ref: golang/go#27416
@xiaq
Copy link

xiaq commented Apr 10, 2023

I don't know what the right thing to do is here. What should happen if there is more than one error calling lstat?

Now that errors.Join is a thing, maybe it can be used to handle multiple errors?

@ianlancetaylor
Copy link
Contributor

I don't think it's going to make sense for a function like os.File.Readdir to return a list of errors.

That said, as of Go 1.16 we now have os.File.ReadDir (the only difference is that a d changed to a D) which on Linux with most file systems won't call lstat at all. So people encountering this problem today should switch to that.

Looking at the code that does call lstat, if lstat returns ENOENT we just ignore the error and don't return the file. Perhaps we should do the same for ENOTCONN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants