New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/gofmt: format files in parallel, similar to 'go fmt' #43566
Comments
One potential downside of this is the loss of determinism. Right now, I don't imagine this will be a problem in general, though. Plus, If someone truly wants determinism, they could easily sort the output by the filename prefix, or write a short script/program to walk the filesystem and run Edit: Yet another option would be to regain determinism by sorting the results of formatting each file, as opposed to printing directly to stdout. That doesn't seem worth the effort to me at least, though. |
Why would the output order need to be nondeterministic? We can have the goroutines work in parallel even if they block on actually emitting the output sequentially. We would lose a little bit of parallelism that way, but if emitting the output ends up on the critical path then the extra parallelism probably isn't that important anyway. |
That's the kind of thing I meant by that last edit to "regain determinism by sorting the results". We could do it, but I'm not convinced it's worth the effort given that |
Arguably the nondeterminism of |
Hm, if we go down that rabbit hole, then I imagine a handful of other tools like
|
I really care about the output of things like (Also, I like being able to run a command that produces a bunch of errors and work through those errors sequentially, and that's much harder to do if the output is nondeterministic and the thing I'm trying to fix might be hopping all around in it.) |
Of course, tests and humans :) You're absolutely right. I'll do it this way for gofmt and go fmt, then. We can file issues for other tools as needed. |
I'm ok with this idea. A faster gofmt for the common use case is a good thing. With respect to deterministic output: I believe the most common case is Having deterministic output for operations that produce (terminal) output ( For other tools (e.g. the |
Change https://golang.org/cl/284139 mentions this issue: |
Change https://golang.org/cl/317975 mentions this issue: |
gofmt is pretty heavily CPU-bound, since parsing and formatting 1MiB of Go code takes much longer than reading that amount of bytes from disk. However, parsing and manipulating a large Go source file is very difficult to parallelize, so we continue to process each file in its own goroutine. A Go module may contain a large number of Go source files, so we need to bound the amount of work in flight. However, because the distribution of sizes for Go source files varies widely — from tiny doc.go files containing a single package comment all the way up to massive API wrappers generated by automated tools — the amount of time, work, and memory overhead needed to process each file also varies. To account for this variability, we limit the in-flight work by bytes of input rather than by number of files. That allows us to make progress on many small files while we wait for work on a handful of large files to complete. The gofmt tool has a well-defined output format on stdout, which was previously deterministic. We keep it deterministic by printing the results of each file in order, using a lazily-synchronized io.Writer (loosly inspired by Haskell's IO monad). After a file has been formatted in memory, we keep it in memory (again, limited by the corresponding number of input bytes) until the output for all previous files has been flushed. This adds a bit of latency compared to emitting the output in nondeterministic order, but a little extra latency seems worth the cost to preserve output stability. This change is based on Daniel Martí's work in CL 284139, but using a weighted semaphore and ephemeral goroutines instead of a worker pool and batches. Benchmark results are similar, and I find the concurrency in this approach a bit easier to reason about. In the batching-based approach, the batch size allows us to "look ahead" to find large files and start processing them early. To keep the CPUs saturated and prevent stragglers, we would need to tune the batch size to be about the same as the largest input files. If the batch size is set too high, a large batch of small files could turn into a straggler, but if the batch size is set too low, the largest files in the data set won't be started early enough and we'll end up with a large-file straggler. One possible alternative would be to sort by file size instead of batching: identify all of the files to be processed, sort from largest to smallest, and then process the largest files first so that the "tail" of processing covers the smallest files. However, that approach would still fail to saturate available CPU when disk latency is high, would require buffering an arbitrary amount of metadata in order to sort by size, and (perhaps most importantly!) would not allow the `gofmt` binary to preserve the same (deterministic) output order that it has today. In contrast, with a semaphore we can produce the same deterministic output as ever using only one tuning parameter: the memory footprint, expressed as a rough lower bound on the amount of RAM available per thread. While we're below the memory limit, we can run arbitrarily many disk operations arbitrarily far ahead, and process the results of those operations whenever they become avaliable. Then it's up to the kernel (not us) to schedule the disk operations for throughput and latency, and it's up to the runtime (not us) to schedule the goroutines so that they complete quickly. In practice, even a modest assumption of a few megabytes per thread seems to provide a nice speedup, and it should scale reasonably even to machines with vastly different ratios of CPU to disk. (In practice, I expect that most 'gofmt' invocations will work with files on at most one physical disk, so the CPU:disk ratio should vary more-or-less directly with the thread count, whereas the CPU:memory ratio is more-or-less independent of thread count.) name \ time/op baseline.txt 284139.txt simplified.txt GofmtGorootCmd 11.9s ± 2% 2.7s ± 3% 2.8s ± 5% name \ user-time/op baseline.txt 284139.txt simplified.txt GofmtGorootCmd 13.5s ± 2% 14.4s ± 1% 14.7s ± 1% name \ sys-time/op baseline.txt 284139.txt simplified.txt GofmtGorootCmd 465ms ± 8% 229ms ±28% 232ms ±31% name \ peak-RSS-bytes baseline.txt 284139.txt simplified.txt GofmtGorootCmd 77.7MB ± 4% 162.2MB ±10% 192.9MB ±15% For #43566 Change-Id: I4ba251eb4d188a3bd1901039086be57f0b341910 Reviewed-on: https://go-review.googlesource.com/c/go/+/317975 Trust: Bryan C. Mills <bcmills@google.com> Trust: Daniel Martí <mvdan@mvdan.cc> Run-TryBot: Bryan C. Mills <bcmills@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Change https://golang.org/cl/353309 mentions this issue: |
There are two distinct formatting tools in a Go release;
go fmt
, which works on packages, andgofmt
, which works on files. Whengo fmt ./...
gets called, it ends up spawning onegofmt
process for each file to format in each package. The concurrency is limited to the number of available CPUs, so as to format multiple files at once and go faster.This makes sense, because
gofmt
is generally bound by CPU usage, not by I/O or some other factor. In the future,gofmt
's CPU usage could be reduced a little, but I think it will always be bound by CPU and not the disk's read/write speed.Unfortunately, this setup leaves
gofmt
being a sequential tool, to the point that it's measurably slow on large codebases - even though my CPU sits mostly idle. Usinggofmt
directly is also useful for two reasons:-s
,-d
, and-r
I think we should teach
gofmt
to use all available CPUs, using a similar mechanism to whatgo fmt
does today. On a regular modern machine, this should make common operations likegofmt -l -w .
many times faster.go fmt
could also benefit from this change. For example, we could have it callgofmt
on all of the files in a package at once, reducing the number of times it has to fork and exec the other tool. An alternative could be to callgofmt
in batches of files, such as 100 at a time, to perform better on tiny packages or with huge files.I'm happy to work on this for Go 1.17, producing benchmarks for formatting all of
$GOROOT/src
using each tool. I did some similar work forgo mod verify
last year, which you can see here: https://go-review.googlesource.com/c/go/+/229817cc @griesemer for cmd/gofmt, @bcmills @jayconrod @matloob for cmd/go
The text was updated successfully, but these errors were encountered: