Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/gofmt: format files in parallel, similar to 'go fmt' #43566

Closed
mvdan opened this issue Jan 7, 2021 · 11 comments
Closed

cmd/gofmt: format files in parallel, similar to 'go fmt' #43566

mvdan opened this issue Jan 7, 2021 · 11 comments
Labels
FrozenDueToAge GoCommand cmd/go NeedsFix The path to resolution is known, but the work has not been done. ToolSpeed
Milestone

Comments

@mvdan
Copy link
Member

mvdan commented Jan 7, 2021

There are two distinct formatting tools in a Go release; go fmt, which works on packages, and gofmt, which works on files. When go fmt ./... gets called, it ends up spawning one gofmt process for each file to format in each package. The concurrency is limited to the number of available CPUs, so as to format multiple files at once and go faster.

This makes sense, because gofmt is generally bound by CPU usage, not by I/O or some other factor. In the future, gofmt's CPU usage could be reduced a little, but I think it will always be bound by CPU and not the disk's read/write speed.

Unfortunately, this setup leaves gofmt being a sequential tool, to the point that it's measurably slow on large codebases - even though my CPU sits mostly idle. Using gofmt directly is also useful for two reasons:

  • It has more options, like -s, -d, and -r
  • It's more versatile; it can work on files which don't form part of any package, or format many Go modules at once

I think we should teach gofmt to use all available CPUs, using a similar mechanism to what go fmt does today. On a regular modern machine, this should make common operations like gofmt -l -w . many times faster.

go fmt could also benefit from this change. For example, we could have it call gofmt on all of the files in a package at once, reducing the number of times it has to fork and exec the other tool. An alternative could be to call gofmt in batches of files, such as 100 at a time, to perform better on tiny packages or with huge files.

I'm happy to work on this for Go 1.17, producing benchmarks for formatting all of $GOROOT/src using each tool. I did some similar work for go mod verify last year, which you can see here: https://go-review.googlesource.com/c/go/+/229817

cc @griesemer for cmd/gofmt, @bcmills @jayconrod @matloob for cmd/go

@bcmills bcmills added help wanted NeedsFix The path to resolution is known, but the work has not been done. labels Jan 7, 2021
@bcmills bcmills added this to the Backlog milestone Jan 7, 2021
@mvdan mvdan self-assigned this Jan 7, 2021
@mvdan
Copy link
Member Author

mvdan commented Jan 7, 2021

One potential downside of this is the loss of determinism. Right now, gofmt -l -w . will generally always produce the same input, as long as the input files are the same. When adding goroutines, the output of changed files or errors will become non-deterministic.

I don't imagine this will be a problem in general, though. Plus, go fmt ./... is already non-deterministic in the order that it formats files.

If someone truly wants determinism, they could easily sort the output by the filename prefix, or write a short script/program to walk the filesystem and run gofmt sequentially on each file.

Edit: Yet another option would be to regain determinism by sorting the results of formatting each file, as opposed to printing directly to stdout. That doesn't seem worth the effort to me at least, though.

@bcmills
Copy link
Contributor

bcmills commented Jan 7, 2021

Why would the output order need to be nondeterministic? We can have the goroutines work in parallel even if they block on actually emitting the output sequentially. We would lose a little bit of parallelism that way, but if emitting the output ends up on the critical path then the extra parallelism probably isn't that important anyway.

@mvdan
Copy link
Member Author

mvdan commented Jan 7, 2021

That's the kind of thing I meant by that last edit to "regain determinism by sorting the results". We could do it, but I'm not convinced it's worth the effort given that go fmt does not have it.

@bcmills
Copy link
Contributor

bcmills commented Jan 7, 2021

Arguably the nondeterminism of go fmt is a bug that should be fixed. 😉

@mvdan
Copy link
Member Author

mvdan commented Jan 7, 2021

Hm, if we go down that rabbit hole, then I imagine a handful of other tools like go mod verify should be fixed to have deterministic output as well. Though in most of those cases, it should only matter if many errors are printed, or with debug flags like -x. Do we really care about that kind of output being deterministic? I imagine not.

gofmt -l is a bit of a special case, though, because a machine capturing that output is a pretty common occurrence.

@bcmills
Copy link
Contributor

bcmills commented Jan 7, 2021

I really care about the output of things like go mod verify being deterministic. I want to be able to write tests for that output, and it's much easier to write tests for things with deterministic output.

(Also, I like being able to run a command that produces a bunch of errors and work through those errors sequentially, and that's much harder to do if the output is nondeterministic and the thing I'm trying to fix might be hopping all around in it.)

@mvdan
Copy link
Member Author

mvdan commented Jan 7, 2021

Of course, tests and humans :) You're absolutely right. I'll do it this way for gofmt and go fmt, then. We can file issues for other tools as needed.

@griesemer
Copy link
Contributor

I'm ok with this idea. A faster gofmt for the common use case is a good thing.

With respect to deterministic output: I believe the most common case is gofmt -w. I'm not sure we need to ensure that the order in which the files are written (if changed) remains deterministic; though I think that could be also be achieved by waiting with writing the files until the end if need be (at some memory cost).

Having deterministic output for operations that produce (terminal) output (-l, -d) is probably a good idea.

For other tools (e.g. the gotype command go/types/gotype.go) I have found it useful to have a mode that enforces non-concurrent execution as a means to explicitly exclude concurrency as the culprit in case of bugs. This doesn't have to be exposed, but for debugging it would be nice to have a (source-code) constant that just needs to be flipped to disable concurrency.

@gopherbot
Copy link

Change https://golang.org/cl/284139 mentions this issue: cmd/gofmt: format files in parallel batches

@gopherbot
Copy link

Change https://golang.org/cl/317975 mentions this issue: cmd/gofmt: format files in parallel

@bcmills bcmills modified the milestones: Backlog, Go1.18 Sep 7, 2021
@bcmills bcmills self-assigned this Sep 7, 2021
gopherbot pushed a commit that referenced this issue Sep 24, 2021
gofmt is pretty heavily CPU-bound, since parsing and formatting 1MiB
of Go code takes much longer than reading that amount of bytes from
disk. However, parsing and manipulating a large Go source file is very
difficult to parallelize, so we continue to process each file in its
own goroutine.

A Go module may contain a large number of Go source files, so we need
to bound the amount of work in flight. However, because the
distribution of sizes for Go source files varies widely — from tiny
doc.go files containing a single package comment all the way up to
massive API wrappers generated by automated tools — the amount of
time, work, and memory overhead needed to process each file also
varies. To account for this variability, we limit the in-flight work
by bytes of input rather than by number of files. That allows us to
make progress on many small files while we wait for work on a handful
of large files to complete.

The gofmt tool has a well-defined output format on stdout, which was
previously deterministic. We keep it deterministic by printing the
results of each file in order, using a lazily-synchronized io.Writer
(loosly inspired by Haskell's IO monad). After a file has been
formatted in memory, we keep it in memory (again, limited by the
corresponding number of input bytes) until the output for all previous
files has been flushed. This adds a bit of latency compared to
emitting the output in nondeterministic order, but a little extra
latency seems worth the cost to preserve output stability.

This change is based on Daniel Martí's work in CL 284139, but using a
weighted semaphore and ephemeral goroutines instead of a worker pool
and batches. Benchmark results are similar, and I find the concurrency
in this approach a bit easier to reason about.

In the batching-based approach, the batch size allows us to "look
ahead" to find large files and start processing them early. To keep
the CPUs saturated and prevent stragglers, we would need to tune the
batch size to be about the same as the largest input files. If the
batch size is set too high, a large batch of small files could turn
into a straggler, but if the batch size is set too low, the largest
files in the data set won't be started early enough and we'll end up
with a large-file straggler.

One possible alternative would be to sort by file size instead of
batching: identify all of the files to be processed, sort from largest
to smallest, and then process the largest files first so that the
"tail" of processing covers the smallest files. However, that approach
would still fail to saturate available CPU when disk latency is high,
would require buffering an arbitrary amount of metadata in order to
sort by size, and (perhaps most importantly!) would not allow the
`gofmt` binary to preserve the same (deterministic) output order that
it has today.

In contrast, with a semaphore we can produce the same deterministic
output as ever using only one tuning parameter: the memory footprint,
expressed as a rough lower bound on the amount of RAM available per
thread. While we're below the memory limit, we can run arbitrarily
many disk operations arbitrarily far ahead, and process the results of
those operations whenever they become avaliable. Then it's up to the
kernel (not us) to schedule the disk operations for throughput and
latency, and it's up to the runtime (not us) to schedule the
goroutines so that they complete quickly.

In practice, even a modest assumption of a few megabytes per thread
seems to provide a nice speedup, and it should scale reasonably even
to machines with vastly different ratios of CPU to disk. (In practice,
I expect that most 'gofmt' invocations will work with files on at most
one physical disk, so the CPU:disk ratio should vary more-or-less
directly with the thread count, whereas the CPU:memory ratio is
more-or-less independent of thread count.)

name \ time/op         baseline.txt  284139.txt    simplified.txt
GofmtGorootCmd           11.9s ± 2%     2.7s ± 3%       2.8s ± 5%

name \ user-time/op    baseline.txt  284139.txt    simplified.txt
GofmtGorootCmd           13.5s ± 2%    14.4s ± 1%      14.7s ± 1%

name \ sys-time/op     baseline.txt  284139.txt    simplified.txt
GofmtGorootCmd           465ms ± 8%    229ms ±28%      232ms ±31%

name \ peak-RSS-bytes  baseline.txt  284139.txt    simplified.txt
GofmtGorootCmd          77.7MB ± 4%  162.2MB ±10%    192.9MB ±15%

For #43566

Change-Id: I4ba251eb4d188a3bd1901039086be57f0b341910
Reviewed-on: https://go-review.googlesource.com/c/go/+/317975
Trust: Bryan C. Mills <bcmills@google.com>
Trust: Daniel Martí <mvdan@mvdan.cc>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
@gopherbot
Copy link

Change https://golang.org/cl/353309 mentions this issue: cmd/go: remove double parallelism from "go fmt"

@rsc rsc unassigned mvdan and bcmills Jun 23, 2022
@golang golang locked and limited conversation to collaborators Jun 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge GoCommand cmd/go NeedsFix The path to resolution is known, but the work has not been done. ToolSpeed
Projects
None yet
Development

No branches or pull requests

4 participants