Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/gofmt: parallel processing causing "too many open files" errors #51164

Closed
dim13 opened this issue Feb 12, 2022 · 22 comments
Closed

cmd/gofmt: parallel processing causing "too many open files" errors #51164

dim13 opened this issue Feb 12, 2022 · 22 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@dim13
Copy link

dim13 commented Feb 12, 2022

What version of Go are you using (go version)?

$ go version
go version go1.18beta2 darwin/arm64

What did you do?

Execute gofmt -w . on decent sized project (1660 go files in my case).

What did you expect to see?

No errors.

What did you see instead?

Long list of errors:

…
open aaa/bbb/ccc.go: too many open files
open iii/jjj/kkk.go: too many open files
open uuu/vvv/www.go: too many open files
fcntl xxx/yyy/zzz/test: too many open files

Maybe related to

https://tip.golang.org/doc/go1.18#gofmt

gofmt now reads and formats input files concurrently [..]

$ ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8176
-c: core file size (blocks)         0
-v: address space (kbytes)          unlimited
-l: locked-in-memory size (kbytes)  unlimited
-u: processes                       2666
-n: file descriptors                256
@ALTree ALTree changed the title affected/package: gofmt cmd/gofmt: parallel processing causing "too many open files" errors Feb 12, 2022
@ALTree ALTree added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 12, 2022
@ALTree ALTree added this to the Go1.18 milestone Feb 12, 2022
@ALTree
Copy link
Member

ALTree commented Feb 12, 2022

I'm putting this in the 1.18 milestone because it's a regression introduced in this devel cycle.

cc @mvdan @bcmills

@mengzhuo
Copy link
Contributor

Maybe related to https://go-review.googlesource.com/c/go/+/317975/

@mengzhuo
Copy link
Contributor

CL317975 using file size as weight to control parallalism.
If go files are small enough which will trigger ulimit.

@gopherbot
Copy link

Change https://go.dev/cl/385515 mentions this issue: cmd/gofmt: limit open files to 100 simultaneously

@mvdan
Copy link
Member

mvdan commented Feb 14, 2022

@dim13 out of curiosity, how did you end up with a file descriptor limit of just 256? The Linux kernel has had a default of 1024 for a long time, and Systemd also defaults to a soft limit of 1024. In both instances, the hard limit is significantly higher, so a user shouldn't need root to increase the soft limit a bit.

I agree that the current gofmt logic in master may open thousands of files at once in some edge cases, so we should probably tweak that. I'm not sure that we should specifically aim to support open file limits as low as 256, though. Do we test the rest of Go with such limits, such as go build or go test? My hunch is that gofmt would not be the only command to run into issues :)

@ericlagergren
Copy link
Contributor

@mvdan macOS defaults to 256.

$ ulimit -a
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
file size                   (blocks, -f) unlimited
max locked memory           (kbytes, -l) unlimited
max memory size             (kbytes, -m) unlimited
open files                          (-n) 256
pipe size                (512 bytes, -p) 1
stack size                  (kbytes, -s) 8176
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 1333
virtual memory              (kbytes, -v) unlimited

@mvdan
Copy link
Member

mvdan commented Feb 14, 2022

Ah, thank you @ericlagergren, I hadn't noticed the original user was on Mac. So then that answers my question; we have CI machines on Mac, so that must already be tested.

@dim13
Copy link
Author

dim13 commented Feb 14, 2022

@mvdan 256 is default value on macOS; and for example on OpenBSD default value is 512.

@mvdan
Copy link
Member

mvdan commented Feb 14, 2022

Turns out that the biggest cause here is that gofmt in master now leaks open files; I hadn't noticed that as my ulimit -n is high enough and I only tested with gofmt -l $GOROOT/src, which doesn't have enough files to run into the limit. But the parallelism will also need tweaking, as even with the leak fixed up, I still ran into issues with ulimit -n 256 in the same directory.

I'm now reviewing @mengzhuo's CL.

@ericlagergren
Copy link
Contributor

ericlagergren commented Feb 14, 2022

@mvdan couple things (on mobile so I can’t comment on the CL):

  1. you can determine the max # of open files dynamically based on rlimit
  2. You could wait and retry when you see whatever errno value is returned for “too many open files”

both of those avoid hardcoding 100 or 240 open files.

@mvdan
Copy link
Member

mvdan commented Feb 14, 2022

Both of those are indeed a possibility, but we are very late into the freeze and 1.18 is already significantly delayed, so I think we need to go with a simpler approach that's less likely to cause further delays.

@gopherbot
Copy link

Change https://go.dev/cl/385656 mentions this issue: cmd/gofmt: limit to 100 concurrent file descriptors for file reads

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Feb 14, 2022
@bcmills bcmills self-assigned this Feb 14, 2022
@bcmills
Copy link
Contributor

bcmills commented Feb 14, 2022

I've fixed this for 1.18 with a hard-coded limit of 200. If someone wants to follow up with a change to use syscall.Getrlimit, I'd be happy to review that for 1.19.

@mvdan
Copy link
Member

mvdan commented Feb 15, 2022

I grabbed some benchmark numbers as I was curious if the 200 hard limit would cause a performance regression. As a reminder, this is 1.17 versus what we had in master before Bryan's fix:

name         old time/op         new time/op         delta
GofmtSrcCmd          4.13s ± 6%          0.53s ± 5%   -87.08%  (p=0.000 n=12+11)

name         old user-time/op    new user-time/op    delta
GofmtSrcCmd          4.88s ± 7%          6.54s ±10%   +33.87%  (p=0.000 n=12+12)

name         old sys-time/op     new sys-time/op     delta
GofmtSrcCmd          289ms ±25%          243ms ±32%   -15.76%  (p=0.007 n=12+12)

name         old peak-RSS-bytes  new peak-RSS-bytes  delta
GofmtSrcCmd         37.8MB ± 5%        280.3MB ±19%  +641.11%  (p=0.000 n=12+12)

And this is the effect that Bryan's fix had on master:

name         old time/op         new time/op         delta
GofmtSrcCmd          534ms ± 5%          537ms ± 9%    ~     (p=0.487 n=11+12)

name         old user-time/op    new user-time/op    delta
GofmtSrcCmd          6.54s ±10%          6.67s ± 9%    ~     (p=0.060 n=12+12)

name         old sys-time/op     new sys-time/op     delta
GofmtSrcCmd          243ms ±32%          246ms ±25%    ~     (p=0.932 n=12+12)

name         old peak-RSS-bytes  new peak-RSS-bytes  delta
GofmtSrcCmd          280MB ±19%          308MB ± 9%  +9.78%  (p=0.005 n=12+12)

This is on a Ryzen 5850u with 8 physical cores, 32GiB of RAM, and a soft limit of 32k file descriptors, all on Linux 5.16.9. If a hard limit of 200 open files doesn't cause a slow-down on this machine, I imagine it will be plenty for practically all other machines. I'd even go as far as saying that using getrlimit shouldn't be worth the added complexity, unless we can prove that it actually helps in some cases - e.g. on machines with many more CPU cores.

@mvdan
Copy link
Member

mvdan commented Feb 15, 2022

I forgot to mention - those numbers were obtained via benchcmd -n 12 GofmtSrcCmd gofmt -l /usr/lib/go/src/cmd, where /usr/lib/go was always Go 1.17.x, but the gofmt binary was changed to each one of the three versions. I used GOROOT/src/cmd as it's decently sized and has no testdata Go files that get in the way.

@ericlagergren
Copy link
Contributor

@bcmills
Copy link
Contributor

bcmills commented Feb 15, 2022

If a hard limit of 200 open files doesn't cause a slow-down on this machine, I imagine it will be plenty for practically all other machines.

I could see it causing a slowdown when formatting a large repo hosted on a very slow disk (such as an NFS mount).

That said, if the disk is slow then everything will be slow, so it's not clear to me why we would need to optimize gofmt in particular for that case.

@mvdan
Copy link
Member

mvdan commented Feb 15, 2022

@ericlagergren yep!

@dim13
Copy link
Author

dim13 commented Feb 15, 2022

I see another problem: 200 may be fine for now, but you never know, how many descriptors are already used. If benchmark doesn't fall into abyss, I would prefer more conservative value of maybe 128.

@mvdan
Copy link
Member

mvdan commented Feb 15, 2022

The limit is per-process - I'd be surprised if initializing a main Go program managed to open 56 file descriptors and keep them open. I can imagine a few, like stdin/stdout/stderr, but certainly nothing close to get from 200 to 256.

@dim13
Copy link
Author

dim13 commented Feb 15, 2022

Was under wrong assumption, it were per session. But you're right, it's per process.

@mvdan
Copy link
Member

mvdan commented Mar 14, 2022

Thinking outloud: I wonder if #46279 would allow gofmt to assume it can use a higher limit of files. Does anyone know what darwin's default hard limit for the number of open files is, given that the default soft limit seems to be 256?

@rsc rsc unassigned bcmills Jun 22, 2022
@golang golang locked and limited conversation to collaborators Jun 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

8 participants