Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: small-count Readdirnames followed by seek-to-zero can lead to duplicate names #37161

Closed
chris3torek opened this issue Feb 10, 2020 · 5 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@chris3torek
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.11.2 linux/amd64
go version go1.13.5 freebsd/amd64

Does this issue reproduce with the latest release?

I think so, since the code has not changed. (It's kind of a minor bug too...)

What operating system and processor architecture are you using (go env)?

(I'll just include the freebsd one here - as long as the calls go through the file_unix.go code the OS is not particularly relevant)

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/torek/.cache/go-build"
GOENV="/home/torek/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="freebsd"
GONOPROXY=""
GONOSUMDB=""
GOOS="freebsd"
GOPATH="/home/torek/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/freebsd_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build370595268=/tmp/go-build -gno-record-gcc-switches"

What did you do?

This has to be run where there is a file system with some direntries (and probably it's specific to Unix-like systems with getdirentries calls). It does reproduce even on the Go Playground though.

package main

import (
	"fmt"
	"log"
	"os"
)

func main() {
	df, err := os.Open(".")
	if err != nil {
		log.Fatal(err)
	}
	names, err := df.Readdirnames(1)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("first Readdirnames(1):", names)
	if _, err = df.Seek(0, 0); err != nil {
		log.Fatal(err)
	}
	names, err = df.Readdirnames(0)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("second Readdirnames:", names)
}

What did you expect to see?

Output in which file names don't repeat, in the second output line.

What did you see instead?

Output in which file names do repeat. Here's what happens in the Playground:

first Readdirnames(1): [dev]
second Readdirnames: [tmp etc usr dev tmp etc usr]

The root cause

The actual cause of the bug is pretty straightforward. If you use Readdirnames to the end of the directory, the block holding getdirentries data is fully consumed, but if you call it with a small count (such as the 1 in the example above), it's not. So in os/dir_unix.go, the dirinfo data has d.bufp < d.nbuf.

The Seek function has a test in it:

    r, e := f.seek(offset, whence)
    if e == nil && f.dirinfo != nil && r != 0 {
            e = syscall.EISDIR
    }

which makes sure that the only allowable seek on a directory is to zero, but this never clears out the dirinfo. So the buffered entries remain, and we get them on the next call to Readdirnames(-1)—on the Playground this is tmp etc and usr. Then the buffer is drained so we call the underlying getdirentries system call again, filling the buffer with dev, tmp, etc, and usr, and we retrieve them, resulting in the doubled-up entries.

One proposed fix would be to move f.dirinfo checking from os/file.go's Seek to the next level down. If seeking on a directory, check the offset-and-whence there (perhaps seeking directories to nonzero locations and/or to the end should be allowed on some systems; this could also allow relative seek by 0 bytes to return an offset, if desirable) and zap any buffered data there.

@ianlancetaylor ianlancetaylor changed the title Small-count Readdirnames followed by seek-to-zero can lead to duplicate names os: small-count Readdirnames followed by seek-to-zero can lead to duplicate names Feb 11, 2020
@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 11, 2020
@ianlancetaylor ianlancetaylor added this to the Go1.15 milestone Feb 11, 2020
@odeke-em
Copy link
Member

Thank you for reporting this issue @chris3torek and welcome to the Go project!

So this looks quite similar to bug #35767 that involved cached results on Darwin and that @randall77 fixed with CL https://go-review.googlesource.com/c/go/+/209961 for Go1.14.
Your bug reproduces on my Mac for Go1.13 and below but on Go1.14rc* I don't get repeated entries.

Thus:
a) Could you please try Go1.14rc1 https://golang.org/dl/#unstable so

go get golang.org/dl/go1.14rc1 && go1.14rc1 run main.go

b) Kindly putting this on @randall77's radar as perhaps we might need to implement the previous polyfill for

func (f *File) seekInvalidate()

on non-GOOS=darwin and non-GOOS=js systems.

Thank you.

@randall77
Copy link
Contributor

Yes, looks like we need a similar fix to CL 209961 for non-darwin unix.
CL coming.
It's been broken since at least 1.11, so this will wait until 1.15.

@randall77 randall77 added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Feb 12, 2020
@randall77 randall77 self-assigned this Feb 12, 2020
@gopherbot
Copy link

Change https://golang.org/cl/219143 mentions this issue: os: seek should invalidate any cached directory reads

@chris3torek
Copy link
Author

I won't have a chance to try any of this for a few days at least, but the current fix looks fine and I'm certain it will work.

@gopherbot
Copy link

Change https://golang.org/cl/221778 mentions this issue: os: plan9 seek() should invalidate cached directory info

gopherbot pushed a commit that referenced this issue Mar 2, 2020
Update #37161

Change-Id: Iee828bbcc8436af29ca6dd9ed897cb5265a57cf8
Reviewed-on: https://go-review.googlesource.com/c/go/+/221778
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
@golang golang locked and limited conversation to collaborators Mar 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants