Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filepath: filepath.Walk readDirName will cause large memory leak #36197

Closed
wcc526 opened this issue Dec 18, 2019 · 10 comments
Closed

filepath: filepath.Walk readDirName will cause large memory leak #36197

wcc526 opened this issue Dec 18, 2019 · 10 comments
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@wcc526
Copy link

wcc526 commented Dec 18, 2019

https://github.com/golang/go/blob/master/src/path/filepath/path.go#L363

  // walk recursively descends path, calling walkFn.
  func walk(path string, info os.FileInfo, walkFn WalkFunc) error {
	if !info.IsDir() {
		return walkFn(path, info, nil)
	}
	names, err := readDirNames(path)
	err1 := walkFn(path, info, err)
	// If err != nil, walk can't walk into this directory.
	// err1 != nil means walkFn want walk to skip this directory or stop walking.
	// Therefore, if one of err and err1 isn't nil, walk will return.
	if err != nil || err1 != nil {
		// The caller's behavior is controlled by the return value, which is decided
		// by walkFn. walkFn may ignore err and return nil.
		// If walkFn returns SkipDir, it will be handled by the caller.
		// So walk should return whatever walkFn returns.
		return err1
	}

names, err := readDirNames(path)
the readDirNames will lead large memory if directory has large number of files

which conflict skipDir purpose.

@wcc526
Copy link
Author

wcc526 commented Dec 19, 2019

package main

import (
	"context"
	"path/filepath"
	"time"

	"os"
	"strings"

	log "github.com/sirupsen/logrus"

	"sort"
)

var lstat = os.Lstat // for testing
func walk(path string, info os.FileInfo, walkFn filepath.WalkFunc) error {
	if !info.IsDir() {
		return walkFn(path, info, nil)
	}
	err1 := walkFn(path, info, nil)
	if err1 != nil {
		return err1
	}
	names, err := readDirNames(path)
	log.Info("names",names)
	if err != nil {
		return err
	}
	for _, name := range names {
		filename := filepath.Join(path, name)
		fileInfo, err := lstat(filename)
		if err != nil {
			if err := walkFn(filename, fileInfo, err); err != nil && err != filepath.SkipDir {
				return err
			}
		} else {
			err = walk(filename, fileInfo, walkFn)
			if err != nil {
				if !fileInfo.IsDir() || err != filepath.SkipDir {
					return err
				}
			}
		}
	}
	return nil
}
func CustomWalk(root string, walkFn filepath.WalkFunc) error {
	info, err := os.Lstat(root)
	if err != nil {
		err = walkFn(root, nil, err)
	} else {
		err = walk(root, info, walkFn)
	}
	if err == filepath.SkipDir {
		return nil
	}
	return err
}
func readDirNames(dirname string) ([]string, error) {
	f, err := os.Open(dirname)
	if err != nil {
		return nil, err
	}
	names, err := f.Readdirnames(-1)
	f.Close()
	if err != nil {
		return nil, err
	}
	sort.Strings(names)
	return names, nil
}

func walkLarge(rootpath string) {
	ctx, cancel := context.WithTimeout(context.Background(), 7200*time.Second)
	defer cancel()
	err := CustomWalk(rootpath, func(path string, info os.FileInfo, err error) error {
		time.Sleep(1 * time.Second)
		log.Info(path)
		if err != nil {
			return nil
		}
		select {
		case <-ctx.Done():
			log.Warn("Timeout", rootpath)
			return ctx.Err()
		default:
		}

		rel, err := filepath.Rel(rootpath, path)
		if err != nil {
			return err
		}
		if uint(len(strings.Split(rel, "/"))) > 4 {
			return filepath.SkipDir
		}

		if info.IsDir() {
			if info.Name() == ".git" {
				return filepath.SkipDir
			}
			f, err := os.Open(path)
			defer f.Close()
			if err != nil {
				return filepath.SkipDir
			}
			files, err := f.Readdir(1000)
			if err != nil {
				return filepath.SkipDir
			}
			if len(files) > 999 {
				log.Warn("Skip dir ", path)
				return filepath.SkipDir
			}

		} else {
			if !info.Mode().IsRegular() {
				return nil
			}
		}
		return nil
	})
	if err != nil {
		log.Errorf("walk error [%v]\n", err)
	}
}

func main() {
	walkLarge("/")
}

@wcc526
Copy link
Author

wcc526 commented Dec 19, 2019

        err1 := walkFn(path, info, nil)
	if err1 != nil {
		return err1
	}
	names, err := readDirNames(path)
	log.Info("names",names)
	if err != nil {
		return err
	}

walkFn should before readDirNames

@dmitshur
Copy link
Contributor

dmitshur commented Dec 20, 2019

Can you please include information about what version of Go you tested with (go version), whether it's an issue with the latest Go 1.13.5 version, and your environment (go env)?

When you say "large memory leak", what kind of numbers were you seeing?

That will help with reproducing this and being able to make progress. Thanks.

@dmitshur dmitshur added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Dec 20, 2019
@dmitshur dmitshur added this to the Backlog milestone Dec 20, 2019
@wcc526
Copy link
Author

wcc526 commented Dec 23, 2019

go version is 1.13.5

[root@el6 ~]# go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/usr/local/:/builds/lucci/lucci-agent/:/builds/lucci/"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build288278525=/tmp/go-build"

I have pull request #36198 Please review and merge it!

When the directory has 8151134 files, it will cause 820MB meomory leak.

the root cause is "names, err := readDirNames(path)" which read the large files of directory.

Just swap the orders, User can SkipDir for this large directorys

thank you for your attention to this matter @dmitshur

@agnivade
Copy link
Contributor

There is already a PR for this: https://go-review.googlesource.com/c/go/+/211802.

@wcc526 - I was not aware of this issue. Could you please link this issue in your PR ? In the PR message, just add a new line "Fixes #3697"

@gopherbot
Copy link

Change https://golang.org/cl/211802 mentions this issue: path/filepath: fix readDirNames to not unconditionally read a directory

@anmathew
Copy link

does not seem these changes help much. walking a folder with 21k files still hogs about 82M RES size - after the walk.

@wcc526
Copy link
Author

wcc526 commented Dec 26, 2019

does not seem these changes help much. walking a folder with 21k files still hogs about 82M RES size - after the walk.

                         files, err := f.Readdir(1000)
			if err != nil {
				return filepath.SkipDir
			}
			if len(files) > 999 {
				log.Warn("Skip dir ", path)
				return filepath.SkipDir
			}

You should add filepath.SkipDir to skip this large files.

these changes help to user to right control the SkipDir.

Before change even if user skip large files, walk still readDirNames(path) whick hogs large RES size

@dajohi
Copy link

dajohi commented Apr 9, 2020

Is this just waiting on a regress test?

@seankhliao
Copy link
Member

The API guarantees made by filepath.Walk mean this is unlikely to change

The files are walked in lexical order, which makes the output deterministic but requires Walk to read an entire directory into memory before proceeding to walk that directory.

The CL has been abandoned, with the recommendation that if you need a more performant API with a different tradeoff, you could copy and modify the Walk/WalkDir code to suit your needs.

@golang golang locked and limited conversation to collaborators Jun 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants