Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: mod tidy takes excessive wall & CPU time statting irrelevant files #38791

Closed
pwaller opened this issue May 1, 2020 · 17 comments
Closed
Labels
FrozenDueToAge help wanted modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Tools This label describes issues relating to any tools in the x/tools repository. ToolSpeed
Milestone

Comments

@pwaller
Copy link
Contributor

pwaller commented May 1, 2020

Version: go1.14.2, Repro on latest release: Yes. OS/Arch: Ubuntu 18.04 and amd64.

What did you do?

Ran go mod tidy on a directory where go mod tidy had already been run.

(In the top level of a go module which also happens to be a deeply nested directory tree with millions of XML files)

What did you expect to see?

An instant result.

What did you see instead?

24s wall & 13s CPU.

Discussion

Running strace reveals that go mod tidy is doing a stat-like system call to every file under my directory tree, which includes very many (millions) of irrelevant xml files in a deeply nested directory tree within my go module. My expectation is that Go would ignore these files. Maybe it has to read the directory entry, to discover go files or so, but it shouldn't do more work than that.

This bug currently renders vscode unusable because it frequently runs go mod tidy, and the net result is that I end up with hundreds of go mod tidy processes all contending to see who amongst them can stat files the fastest.

My initial workaround was to run pkill -cf 'go mod tidy' in a loop, but @jayconrod informs me that go tooling will ignore directories beginning with _ or ., so I will probably use that workaround for now.

/cc @bcmills @matloob @jayconrod

@jayconrod jayconrod added modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance Tools This label describes issues relating to any tools in the x/tools repository. labels May 1, 2020
@bcmills bcmills added this to the Backlog milestone May 1, 2020
@jayconrod jayconrod added this to the Backlog milestone May 1, 2020
@jayconrod
Copy link
Contributor

tidy walks the file system tree in the main module to find packages. Depending on the operating system, it may not need to stat every file during that walk, since we mostly just need to know whether something's a file or directory. There's probably opportunity for optimization here.

@pwaller
Copy link
Contributor Author

pwaller commented May 1, 2020

Taking a stack dump with ctrl-\ shows the offending call to filepath.Walk, which obviously populates os.FileInfo using stat. I guess this would require some other walking logic.

filepath.Walk(root, func(path string, fi os.FileInfo, err error) error {

walkPkgs(root, modPrefix, prune)

modload.LoadALL()

@pwaller
Copy link
Contributor Author

pwaller commented May 1, 2020

Following a conversation with @bcmills, he suggests https://godoc.org/golang.org/x/tools/internal/fastwalk which avoids all the stat calls.

I'm unlikely to be able to submit a patch for this, as much as I want to.

I did try out fastwalk quickly (protecting 'have' with a mutex), and found there is yet another thing which does a readdir:

infos, err := ioutil.ReadDir(dir)

if _, _, err := scanDir(path, tags); err != imports.ErrNoGo {

@pwaller
Copy link
Contributor Author

pwaller commented May 1, 2020

OK, I think I succeeded at testing fastwalk and using fastwalk instead of readdir. It reduces the runtime to 1s wall + 1.5 CPU + 0.9s sys. GNU find by comparison takes 0.8s+0.3s+0.5s. So the CPU cost isn't fantastic but it's still better than it was.

I still care about the battery usage of so many go mod tidy instances in vscode, and given how many processes there are lying around to kill it seems like there is an appreciable amount of CPU time being used over time, even with my fix /cc @stamblerre on that point.

@mvdan
Copy link
Member

mvdan commented May 1, 2020

I definitely think that commands like go mod tidy should be faster, but I also wonder if gopls could do better here given that it's a long-running daemon. Watching an entire module for file changes is tricky, but surely there is a way we can avoid walking the entire module over and over.

@bcmills
Copy link
Contributor

bcmills commented May 1, 2020

go mod tidy should only be necessary if it is possible that the module has become untidy.

In general, we try to make all go commands tidiness-preserving, so if the module was already tidy to begin with, it should only have become untidy if either:

  1. some package has changed, in which case go list -deps -test on that package should suffice (rather than a full tree walk), or

  2. some dependency has changed (due to running go build or go test on a package outside the main module, an explicit user edit to the go.mod or go.sum file, or an aforementioned change in some package), all of which we can detect by watching for changes in go.mod and go.sum alone.

@bcmills
Copy link
Contributor

bcmills commented May 1, 2020

(So, you may want to open a separate issue against gopls.)

@heschi
Copy link
Contributor

heschi commented May 1, 2020

If I remove the last dep on a module from my project, how would gopls use go list -deps -test to discover that the module is now unused?

@bcmills
Copy link
Contributor

bcmills commented May 4, 2020

If I remove the last dep on a module from my project, how would gopls use go list -deps -test to discover that the module is now unused?

If you have a cached invocation of go list -deps -test for that package, then you could notice that that dependency is removed by a later edit, and (perhaps) also notice that that dependency is not present in the output of go list -deps -test for any other package in the module.

Or, you could use the change in package dependencies to trigger a go mod tidy run, but more conservatively than for the whole module on ~every edit.

@stamblerre
Copy link
Contributor

Or, you could use the change in package dependencies to trigger a go mod tidy run, but more conservatively than for the whole module on ~every edit.

We run it on saved changes to the go.mod file and on changes to imports that are saved on disk. We can probably optimize it further, but I think the gopls issue is probably a cancellation bug.

@pwaller
Copy link
Contributor Author

pwaller commented May 9, 2020

(That is now #38816)

@iwdgo
Copy link
Contributor

iwdgo commented Dec 5, 2020

The use of fsys package introduced by CL 261537 dramatically sped up go mod tidy when actual work is involved.
Below is a comparison for tests scripts run with go test -run=TestScript/.*tidy.* -v cmd/go using go1.15.6 and on tip (go version devel +be9379f8a8 Sat Dec 5 12:36:42 2020 +0000 windows/amd64).

PASS: TestScript 3.25s 0.02s -99%
--- PASS: TestScript/mod_indirect_tidy 5.31s 0.59s -89%
--- PASS: TestScript/mod_tidy_cycle 0.29s 0.66s +128%
--- PASS: TestScript/mod_tidy_error 2.49s 3.22s +29%
--- PASS: TestScript/mod_tidy 1.22s 0.35s -71%
--- PASS: TestScript/mod_tidy_old - 14.65s N/A
--- PASS: TestScript/mod_init_tidy - 0.68s N/A
--- PASS: TestScript/mod_tidy_quote 24.49s 12.52s -49%
--- PASS: TestScript/mod_tidy_sum 26.17s 16.40s -37%
--- PASS: TestScript/mod_tidy_replace 27.36s 15.94s -42%

It seems that the concern raised in this issue has been answered.

@jayconrod
Copy link
Contributor

@iwdgo I don't think that's expected. The fsys package is a wrapper around native file system calls. It can change which files we read (based on the -overlay flag), but it shouldn't affect performance significantly.

@matloob Have you seen speed differences in these tests? I wonder if that's actually caused by that CL or something else?

@bcmills
Copy link
Contributor

bcmills commented Dec 7, 2020

@jayconrod, the switch from Walk to WalkDir in CL 267719 (#42027) could have sped things up.

@iwdgo, how confident are you that the improvement was observed as of CL 261537 as opposed to CL 267719?

@iwdgo
Copy link
Contributor

iwdgo commented Dec 13, 2020

@bcmills I thought similarly when posting. Bisect-ing up to the root of go1.16 CL 248038 did not show anything relevant.

After installing go1.15.5 and go1.15.6 as sdk's, the performance change is obvious:

go versions go1.15.5 go1.15.6 %
--- PASS: TestScript/mod_tidy_sum 13.03s 24.19s +85%
--- PASS: TestScript/mod_tidy_replace 17.46s 27.20s +55%

go1.15.6 does not contain commits related to cmd/go as I understand. The performance change reported above might not be specific to go mod tidy.

@iwdgo
Copy link
Contributor

iwdgo commented Sep 16, 2021

Many changes later, the main performance problem seems solved on tip.

>gotip version
go version devel go1.18-ee91bb8319 Tue Sep 14 02:53:17 2021 +0000 windows/amd64
go versions go1.15.5 go1.15.6 go1.17.1 gotip
--- PASS: TestScript/mod_tidy_sum 13.03s 24.19s 26.30s 15.29s
--- PASS: TestScript/mod_tidy_replace 17.46s 27.20s 27.47s 14.44s

@pwaller
Copy link
Contributor Author

pwaller commented Feb 10, 2022

Sounds like this is fixed. Thanks!

@pwaller pwaller closed this as completed Feb 10, 2022
@golang golang locked and limited conversation to collaborators Feb 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Tools This label describes issues relating to any tools in the x/tools repository. ToolSpeed
Projects
None yet
Development

No branches or pull requests

8 participants