Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: reading the contents of a directory can result in "lost" files #24015

Closed
jechols opened this issue Feb 21, 2018 · 31 comments
Closed

os: reading the contents of a directory can result in "lost" files #24015

jechols opened this issue Feb 21, 2018 · 31 comments
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@jechols
Copy link

jechols commented Feb 21, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.9 and 1.10

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

Linux (CentOS RHEL 7), amd64

What did you do?

https://play.golang.org/p/g_0Xgifb9nD

var infos, err = ioutil.ReadDir(i.Location)

i.Location is a CIFS-mounted directory using "vers=3.0" in /etc/fstab's list of options

What did you expect to see?

All files listed in "infos"

What did you see instead?

Most files were listed, but one was missing.


This appears to be related to the version of CIFS in use and/or block size. I dug into the guts of the Go code and changed https://github.com/golang/go/blob/master/src/os/dir_unix.go#L51 to allocate 10x the buffer size, and the operation was successful. I added some debugging output to the loop which populates d.buf (https://github.com/golang/go/blob/master/src/os/dir_unix.go#L62-L75) and noticed that the file prior to the missing one was right at the end of one buffer-fill operation, and the file directly after the missing one was at the beginning of the next buffer-fill operation, leading me to believe that somewhere in the buffer setup, something is being cut off and lost somehow. I'm not sure why this is only the case in this precise setup (CIFS 3.0).

I wasn't able to debug further, but it is important to note that you can work around this issue by forcing CIFS version 1.0 via "vers=1.0" when mounting.

@ALTree ALTree changed the title Reading the contents of a directory can result in "lost" files os: reading the contents of a directory can result in "lost" files Feb 21, 2018
@ALTree ALTree added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 21, 2018
@ALTree ALTree added this to the Go1.11 milestone Feb 21, 2018
@ianlancetaylor
Copy link
Contributor

Is this repeatable? Are any files being added or removed to the directory while your program is running?

@jechols
Copy link
Author

jechols commented Feb 21, 2018

It is completely reproducible... on our particular server. I just tested it on a Fedora 27 system and it doesn't happen. We're going to try it out on Ubuntu as well. There may be other knobs to fiddle with as well in terms of mount options. If we discover anything useful I'll comment here.

@jechols
Copy link
Author

jechols commented Feb 21, 2018

@jechols
Copy link
Author

jechols commented Feb 21, 2018

Sorry, didn't answer your other question: no, the directory is completely unchanged and the mount point is effectively inaccessible to everybody but me, so I'm fairly certain this isn't an issue of files changing.

@rmmh
Copy link
Contributor

rmmh commented Feb 21, 2018

Additional context: glibc readdir() defaults to 32KB buffers and uses getdents64(). There are some old kernel bugs (fixed before 3.10, which CentOS 7 should be on) that might contribute to this.

@jechols
Copy link
Author

jechols commented Feb 21, 2018

Interesting - our RHEL 7 system reports 3.10.0-693.17.1.el7.x86_64. The two systems I've tested which didn't have a problem were on a 4.x kernel.

Yes, I misspoke initially - it's RHEL 7, not CentOS 7. I'll update that in the description.

@jechols
Copy link
Author

jechols commented Feb 22, 2018

FWIW: Using readdir in PHP or C, and os.listdir in Python all seem to work, but in C and PHP, listings are returned in a one-item-per-call way, so the lower-level system calls may not be equivalent. And I have no idea what the low-level Python code does (probably just uses the C function, which I suspect PHP also does).

But I don't really know how to tell if this is truly a Go bug other than trying other languages, as this is far lower-level than I usually try to debug....

@jechols
Copy link
Author

jechols commented Feb 22, 2018

Forgot to add that we tested on a separate RHEL 7 system and had the same results, so it isn't just something broken with our environment.

@heschi
Copy link
Contributor

heschi commented Feb 22, 2018

Try strace -v -f -e trace=file,getdents,getdents64 $exe. It should be relatively comprehensible if you've done any C programming. You're looking for an open(..., ...O_DIRECTORY...) = FD, followed by a (series of) getdents calls on that FD. With -v, strace will show the directory entries that getdents returned. That should help clarify whether it's a Go bug or a kernel bug.

@jechols
Copy link
Author

jechols commented Feb 23, 2018

It's tough to say the precise problem, as I just don't understand enough of what is or isn't correct. The C version is calling getdents, whereas Go calls getdents64. The C version uses a 32k block where Go is using a 4k block. Since C uses a 32k block, all the responses show up in a single block, whereas Go of course has to call getdents64 one extra time.

One potentially interesting difference is that Go calls openat differently from C:

  • Go: [pid 6742] openat(AT_FDCWD, "/mnt/news/data/workflow/2007260204-2017110301-544", O_RDONLY|O_CLOEXEC) = 3
  • C: openat(AT_FDCWD, "/mnt/news/data/workflow/2007260204-2017110301-544", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3

Another oddity which (hopefully) points to the problem is that Go's output suggests a skipped record in the getdents64 call:

[pid  6742] getdents64(3, [
  {d_ino=2684633051, d_off=1, d_reclen=24, d_type=DT_DIR, d_name="."},
  {d_ino=3226265192, d_off=2, d_reclen=24, d_type=DT_DIR, d_name=".."},
  {d_ino=2684633053, d_off=3, d_reclen=32, d_type=DT_REG, d_name="0001.pdf"},
  {d_ino=2684633056, d_off=4, d_reclen=32, d_type=DT_REG, d_name="0002.pdf"},
...
  {d_ino=2684694871, d_off=125, d_reclen=32, d_type=DT_REG, d_name="0035.jp2"},
  {d_ino=2684694872, d_off=126, d_reclen=32, d_type=DT_REG, d_name="0036.jp2"},
  {d_ino=2684694892, d_off=127, d_reclen=32, d_type=DT_REG, d_name="0037.jp2"},
  {d_ino=2684694896, d_off=129, d_reclen=32, d_type=DT_REG, d_name="0038.jp2"}
], 4096) = 4080
[pid  6742] getdents64(3, [
  {d_ino=2684694898, d_off=130, d_reclen=32, d_type=DT_REG, d_name="0040.jp2"},
  {d_ino=2684694899, d_off=131, d_reclen=32, d_type=DT_REG, d_name="0041.jp2"},
  {d_ino=2684694900, d_off=132, d_reclen=32, d_type=DT_REG, d_name="0042.jp2"},
  {d_ino=2684694901, d_off=133, d_reclen=32, d_type=DT_REG, d_name="0043.jp2"},
  {d_ino=2684694902, d_off=134, d_reclen=32, d_type=DT_REG, d_name="0044.jp2"}
], 4096) = 160

I reformatted these for display, but clearly d_off=128 is missing, just as "0039.jp2" is missing from the final output list.

When running on Fedora 27, the getdents64 call doesn't skip a record, resulting in that final call being 192 bytes instead of 160.

Another interesting thing I noticed is that when I tested a different directory, one with far more files, there's a pattern. On the first call, the bit at the end (I'm assuming this is the number of bytes put into the buffer) is 4080, and the next-to-last d_off value is skipped. On the second call, the buffer receives 1632 bytes, and nothing is skipped. This pattern repeats until there is too little data to fill the buffer.

When the C program runs against the same huge directory, every getdents call has 5760 bytes (until the final calls, obviously).

I really don't know where to go from here. I still have no idea if this is a kernel or Go bug.

@jechols
Copy link
Author

jechols commented Feb 23, 2018

Another very unfortunate note: using CIFS v1.0 doesn't fix this problem in all cases, it seems. I just noticed on the really huge directory, Go is still skipping files. I have no idea why the CIFS version fixes one particular case but not the other.

@heschi
Copy link
Contributor

heschi commented Feb 23, 2018

Great analysis. A giant gist would have been okay, but I appreciate not having to read through it myself.

I don't know much about this, and it seems like maybe @rmmh has some background here. But to me this is unambiguously a kernel bug. getdents works like read: the kernel maintains state for the FD, and it's responsible for giving you things that fit in the buffer you provided, then continuing from that point when you call again. Assuming nothing strange happened (a seek or some such) in between the two getdents64 calls, the kernel should have given the next file in the next call.

Based on the second part of your comment, I suspect the bug is in the interaction between the size of the replies from the CIFS server and the size of the buffer passed to getdents(). It sounds like the CIFS server gives a reply of some size, and then the kernel chunks that into getdents() results, and misses the dentry at each chunk boundary.

@rmmh
Copy link
Contributor

rmmh commented Feb 23, 2018

Try rebuilding after setting go/src/os/dir_unix.go's blockSize to 32KB and see if that fixes things.

It's probably a kernel bug (since it occurs on 3.10 and not 4.x). I don't know the policy on workarounds for old kernel bugs, but you could easily make the argument that a larger buffer is more efficient for listing directories (~128 files per syscall is small).

I would suggest writing a benchmark that does filepath.Walk, but filepath.Walk is probably too slow to notice a difference, since it calls lstat on each file (#16399).

@jechols
Copy link
Author

jechols commented Feb 23, 2018

@heschik I think you're right - I thought strace was reporting what was sent to getdents64; obviously it's reporting what's put into the buffer. Massive oversight on my part, given what @rmmh already said about strace :-/

I'll do some more hacking at the go source to see about a 32k blockSize. And I can probably do a benchmark that just lists a bunch of hard-coded directories - we have hundreds (maybe thousands) of these things that the Go application is reading, so I have no shortage of test material.

@jechols
Copy link
Author

jechols commented Feb 23, 2018

Benchmark results:

  • On a CIFS share, I'm not seeing any consistent difference between a 32k buffer and a 4k buffer
  • On a local filesystem, traversing hundreds of directories in /usr/lib, the 32k buffer is a tiny bit faster than the 4k buffer when the filesystem cache is emptied (~6s vs. ~6.3s)
  • On the same filesystem and directory list, when the filesystem cache is not emptied, the 32k buffer is significantly slower (~300ms vs ~200ms)
  • On a directory with 17,000 files in it, the 32k buffer is significantly faster when the filesystem cache is emptied (~50ms vs ~200ms), but there is no noticeable difference when the cache is not empty

These tests are far from scientific, but across dozens of repeated runs the general trends were the same.

Benchmarking aside, the 32k buffer does seem to give me the right readings in all cases I've tested on the CIFS mount. I'm not sure if this is an acceptable thing to change in core, given it's likely a kernel bug, but if nothing else I'm going to have to hack up our local Go source in order to prevent file loss.

@chlunde
Copy link
Contributor

chlunde commented Mar 1, 2018

@jechols I can reproduce this on RHEL 7.4. Did you report it at https://bugzilla.redhat.com or as a ticket? I suspect 32k hides the issue, but if is possible to get the kernel client to receive even larger listings from the server in one operation, it might be that the issue is still there.

I skimmed the kernel code, and I find this line suspicious, since filldir can return EINVAL:

                        rc = cifs_filldir(current_entry, file, filldir,
                                          direntry, tmp_buf, max_len);
                        if (rc == -EOVERFLOW) {
                                rc = 0;
                                break;
                        }
                        file->f_pos++;

from https://elixir.bootlin.com/linux/v3.10.108/source/fs/cifs/readdir.c#L869
or from https://access.redhat.com/labs/psb/versions/kernel-3.10.0-693.17.1.el7/fs/cifs/readdir.c#line879

Upstream, it seems this was changed in 3.11 when readdir was replaced with a new internal API called iterate. It is not documented as a bug fix for this specific issue:
torvalds/linux@be4ccdc#diff-d7c8b6c0804935effb0a22c44ff1be24R835

There is a new struct file_operations method:
int (*iterate) (struct file *, struct dir_context *);
Its job is to iterate through the contents of a directory. This method is meant to serve as a replacement for the readdir() method that eliminates persistent race conditions associated with updating the current read position. All internal users have been converted, and the readdir() method has been removed.

from https://lwn.net/Articles/557314/

If I change the code in RHEL7.4 to

                        if (rc) {
                                rc = 0;
                                break;
                        }

there are no lost files.

@jechols
Copy link
Author

jechols commented Mar 1, 2018

@chlunde that is excellent information. Our plan was to submit a bug to Red Hat, but we didn't have the time (or expertise) to do a very quick analysis. Your information should be really helpful! Thanks!

@ncw
Copy link
Contributor

ncw commented Apr 5, 2018

I've had an rclone user report identical symptoms to this on the forum.

The symptoms are a missing file in directory listings.

I note that on my system at least, none of the normal tools ls, find, python use getdents64 they use getdents which makes me wonder whether using getdents64 is travelling in uncharted territory for go.

I also mashed together a little C program which calls getdents64 in the same way as the go runtime which you may find useful.

@chlunde
Copy link
Contributor

chlunde commented Apr 10, 2018

@jechols Did you create an issue on bugzilla.redhat.com?
@ncw Which kernel version was that?

@jechols
Copy link
Author

jechols commented Apr 11, 2018

@chlunde I'm waiting for my supervisor to get the ball rolling, which may take a while to get from him to the campus central IT people and from them to RedHat. I don't really understand the red tape, but it sounds like something I need to let them handle, not go and do it myself....

@ianlancetaylor
Copy link
Contributor

It sounds like the general consensus is that this is a kernel bug. It doesn't sound like there is anything we do to fix it in Go, so closing. Please comment if you disagree.

@jechols
Copy link
Author

jechols commented Jun 28, 2018

@ianlancetaylor I agree this isn't Go's fault, but given the relative difficulty of understanding and fixing the problem for those of us using CentOS / RHEL, and the prevalence of RHEL 7, I feel like a solution in Go may be worthwhile. I'm still uncertain if / when this will be fixed at the OS level given that 90% of apps use the C function.

Remember that this bug only appears in Go. PHP, Python, Ruby, and others use the C function which has a large enough buffer that the bug is not triggered.

If this isn't addressed in the Go source, I fear Go users will come to believe it's a Go defect. And those who discover it isn't may still avoid Go out of fear of encountering similar problems since Go tends to avoid tried-and-true low-level libraries. Run into a problem like this in production, and it will leave a very bad taste in your mouth.

Don't get me wrong, I think this serves to show the importance of having multiple approaches instead of all these languages using the same core library, but others may not see it this way.

@jechols
Copy link
Author

jechols commented Jun 28, 2018

@ianlancetaylor, I didn't address your belief that this can't be fixed in Go. It can by upping the buffer size as I did in a test commit: https://github.com/uoregon-libraries/go/commit/1b4c5990626653f9f02754b27e9df61f6d46961e

@chlunde
Copy link
Contributor

chlunde commented Jun 28, 2018

@jechols I reported this to Red Hat, and a bz has been created: https://bugzilla.redhat.com/show_bug.cgi?id=1577640 The proper way to increase priority for this kernel bug is to create a support case and ask for higher priority for this bugzilla issue, and maybe ask for status directly on bugzilla too. If you can't get a support account, even though they do not cost your organization anything, at least go to bugzilla.

@ianlancetaylor
Copy link
Contributor

As far as I can see, increasing the buffer size just hides the bug, by making it only occur when there are more files in the directory, or when the files have longer names. That's not a fix. It's sort of an anti-fix. If your tools are not reliable, it's better to know that they are not reliable.

If there is an approach that actually works around the problem in all cases, I'm certainly open to it.

@jechols
Copy link
Author

jechols commented Jun 29, 2018

@chlunde, thanks! I'll point that out to my supervisor in an effort to get the university pushing it. Acadamia moves slowly, so I'm still waiting to hear back :-/

@ianlancetaylor, I don't disagree, I just don't like that people will believe this is a Go issue. But you are right, it would just hide, not fix, the problem.

It's worth noting that when the buffer size is increased, the bug cannot be reproduced on my end, because CIFS is returning only 5k responses. So there is (currently) no way for even an 8k buffer to hit this bug. From one of my early posts:

When the C program runs against the same huge directory, every getdents call has 5760 bytes (until the final calls, obviously).

I suspect this is why the bug has never been noticed to users of the C library.

@ianlancetaylor
Copy link
Contributor

Thanks, that does make it sounds like increasing the buffer size to more than 5760 will paper over the problem reliably for Go users. I'll do that.

@gopherbot
Copy link

Change https://golang.org/cl/121756 mentions this issue: os: increase directory reading block size on Unix systems

gopherbot pushed a commit that referenced this issue Jul 6, 2018
Reportedly CIFS on RHEL 7 can fail to report files if directories are
read in 4K increments. While this seems to be a CIFS or RHEL bug,
reportedly CIFS does not return more than 5760 bytes in a block, so
reading in 8K increments should hide the problem from users with
minimal cost.

Fixes #24015

Change-Id: Iaf9f00ffe338d379c819ed9edcd4cc9834e3b0f7
Reviewed-on: https://go-review.googlesource.com/121756
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@yuan1238y
Copy link

Hello! I also have this problem.
Go Version: 1.12.1
Count of the directory (use ls | wc -l): 206
Count of the directory (use file.Readdir(-1)): 205
I don't think that simply increasing the size of the buffer will solve this problem completely.

@ncw
Copy link
Contributor

ncw commented Apr 10, 2019

Hello! I also have this problem.
Go Version: 1.12.1
Count of the directory (use ls | wc -l): 206
Count of the directory (use file.Readdir(-1)): 205
I don't think that simply increasing the size of the buffer will solve this problem completely.

Which file system are you running this on? And what kernel? This is likely a linux kernel bug rather than a go bug IMHO.

@ianlancetaylor
Copy link
Contributor

@yuan1238y This issue is closed and we believe that it is fixed. Also the problem described here was clearly due to a small buffer interacting with a kernel bug. If you are still having trouble with Go version 12, please open a new issue with all the details required to reproduce the problem. Thanks.

@golang golang locked and limited conversation to collaborators Apr 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

10 participants