Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/format: add benchmarks #26528

Open
kevinburke opened this issue Jul 22, 2018 · 8 comments
Open

go/format: add benchmarks #26528

kevinburke opened this issue Jul 22, 2018 · 8 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@kevinburke
Copy link
Contributor

I maintain a (fork of a) tool called go-bindata that compiles static files into a binary. Often the compiled file is quite large - tens of megabytes are not uncommon. I call format.Source on this generated file to avoid gofmt thrash. However, I noticed recently that calling format.Source takes 60% of the runtime of the go-bindata program.

Currently there are no benchmarks for format.Source. It might be good to add some to see if there are quick wins to get that function to return more quickly.

A starting point may be the benchmark here: https://github.com/kevinburke/go-bindata/blob/master/benchmark_test.go. Committing an extremely large file to source might not be the best way to do this, though.

kevinburke added a commit to kevinburke/go-bindata that referenced this issue Jul 22, 2018
Previously calling format.Source would take up most of the runtime
(see golang/go#26528 for more info). But most of the file is already
formatted correctly, we mostly need to use go/format to whitespace
align the maps per the Go guidelines. Calling go/format on just this
part of the file yields a significant performance improvement by
avoiding calling it on the entire body of the file.

Use a bytes.Buffer as a file argument instead of an io.Writer which
allows using WriteByte and WriteString in places. Avoid calling
fmt.Fprintf with no arguments.

    name       old time/op    new time/op    delta
    Bindata-4     558ms ± 1%     216ms ± 6%   -61.34%  (p=0.008 n=5+5)

    name       old speed      new speed      delta
    Bindata-4  12.3MB/s ± 1%  31.8MB/s ± 6%  +159.03%  (p=0.008 n=5+5)

    name       old alloc/op   new alloc/op   delta
    Bindata-4     353MB ± 0%     162MB ± 0%   -54.21%  (p=0.008 n=5+5)

    name       old allocs/op  new allocs/op  delta
    Bindata-4     46.1k ± 0%     15.7k ± 0%   -66.00%  (p=0.008 n=5+5)
@josharian
Copy link
Contributor

I suspect go/format performance hotspots vary a lot based on input, with a long tail of pathological problematic inputs. Instead of a generic “add benchmarks” issue, which is hard to fix, let’s instead file performance issues for specific inputs, like go-bindata-style output.

We don’t want to add a large file to the repo, but I suspect you could create a suitable input (for reproduction purposes) on the fly, in memory.

Also, have you tried with 1.11 betas yet? I made some improvements to text/tabwriter this cycle that might help.

@josharian josharian added Performance WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Jul 22, 2018
@josharian josharian added this to the Go1.12 milestone Jul 22, 2018
@kevinburke
Copy link
Contributor Author

Also, have you tried with 1.11 betas yet? I made some improvements to text/tabwriter this cycle that might help.

I have, thanks! I'm not sure about the benchmarking output. When we use format.Source on the whole file, it looks slower.

1.9.7: https://travis-ci.org/kevinburke/go-bindata/jobs/406748540
tip: https://travis-ci.org/kevinburke/go-bindata/jobs/406748542

I just merged a change that only calls format.Source on a small part of the file. With that change it looks like there's an improvement between 1.9.7 and tip.

https://travis-ci.org/kevinburke/go-bindata/jobs/406757117
https://travis-ci.org/kevinburke/go-bindata/jobs/406757130

@kevinburke
Copy link
Contributor Author

kevinburke commented Jul 22, 2018

So what are my options here for a benchmark? I am guessing these are not possible:

  • importing github.com/kevinburke/go-bindata from the benchmark
  • checking in large file(s) into the testdata
  • adding a benchmark in the x repos

What about downloading a large file from the public internet, and skipping the benchmark if the download fails? This one for example would be suitable. https://github.com/kevinburke/go-bindata/blob/master/testdata/assets/bindata.go

Otherwise, I suppose I could check in a medium size photo (or use an existing one in the git repository), and manually reconstruct the process of building the bindata file.

@josharian
Copy link
Contributor

Skimming the beginning of that file, I'm guessing you could get similar performance behavior with just something like:

package p

var x1 = []byte("some very very very long string created with bytes.Repeat")

var x2 = []byte("another long string")

var x3 = []byte("as many of these as you need to recreate your performance characteristics")

That should be easy to create in memory with a plain old for loop. In any case, I'd start there and use pprof profiles to convince yourself that the cpu/memory behavior is similar. If it isn't, tweak the simple generated file until it is. If somehow (which I would find very surprising) the contents of the strings involved matter, just generate random bytes using math/rand.

@gopherbot
Copy link

Change https://golang.org/cl/125437 mentions this issue: go/format: add benchmark

@bronze1man
Copy link
Contributor

@kevinburke
You have another choose that you can skip the step format.Source to your generated go files. Almost no one will read your generated files,I think it is ok to leave it unformat, you just need it do not change with the same input data (like sort the import path part).
You can also generate a formated generated go files. It is not so hard then it looks like.
Both way can save you a lot of generate time.

I used to use format.Source to format all my generated files, then I find that simple skip the format step can save me more than 50% of the generate time.
So I just skip the format step.

@gopherbot
Copy link

Change https://golang.org/cl/127928 mentions this issue: go/format: benchmark performance on large autogen file

@gopherbot
Copy link

Change https://golang.org/cl/153642 mentions this issue: go/format: add simple benchmark framework and basic benchmark

gopherbot pushed a commit that referenced this issue Dec 12, 2018
For now, this CL adds as a first benchmark the formatting of
a 10,000 element array literal. It is easy to add additional
test cases as we see fit.

	name                   time/op
	Format/array1-10000-4    26.7ms ± 7%

	name                   speed
	Format/array1-10000-4  2.43MB/s ± 6%

	name                   alloc/op
	Format/array1-10000-4    5.52MB ± 0%

	name                   allocs/op
	Format/array1-10000-4      119k ± 0%

Updates #26528.

Change-Id: Ic8ec8f70160d122b877740412d4d4406f5f4b345
Reviewed-on: https://go-review.googlesource.com/c/153642
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@griesemer griesemer modified the milestones: Go1.12, Go1.13 Dec 12, 2018
@agnivade agnivade removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jan 7, 2019
@andybons andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 2, 2020
kssion pushed a commit to kssion/go-bindata that referenced this issue Oct 17, 2022
Previously calling format.Source would take up most of the runtime
(see golang/go#26528 for more info). But most of the file is already
formatted correctly, we mostly need to use go/format to whitespace
align the maps per the Go guidelines. Calling go/format on just this
part of the file yields a significant performance improvement by
avoiding calling it on the entire body of the file.

Use a bytes.Buffer as a file argument instead of an io.Writer which
allows using WriteByte and WriteString in places. Avoid calling
fmt.Fprintf with no arguments.

    name       old time/op    new time/op    delta
    Bindata-4     558ms ± 1%     216ms ± 6%   -61.34%  (p=0.008 n=5+5)

    name       old speed      new speed      delta
    Bindata-4  12.3MB/s ± 1%  31.8MB/s ± 6%  +159.03%  (p=0.008 n=5+5)

    name       old alloc/op   new alloc/op   delta
    Bindata-4     353MB ± 0%     162MB ± 0%   -54.21%  (p=0.008 n=5+5)

    name       old allocs/op  new allocs/op  delta
    Bindata-4     46.1k ± 0%     15.7k ± 0%   -66.00%  (p=0.008 n=5+5)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

9 participants