proposal: testing: automatically scale benchmark metrics in results output #33328

jpap · 2019-07-28T08:57:58Z

When performing a benchmark, we can currently use SetBytes(int64) to set the number of input or output bytes per run, and get a measurement like X MB/s in the output.

It would be nice to be able to set an arbitrary unit instead of "bytes", and to provide multiple scales instead of just "mega" (1e6).

Example 1: pixels per second (px/s): b.SetUnits(1920*1080, "px/s") when benchmarking a full-HD image would print to the output one of the following, depending on the scale of the result:

"X px/s"
"X kpx/s"
"X Mpx/s"
"X Gpx/s"

Example 2: frames per second (fps): b.SetUnits(1, "fps") when benchmarking one frame operation.

The existing implementation of SetBytes(b int64) could just call SetUnits(b, "B/s") for backwards compatibility.

The text was updated successfully, but these errors were encountered:

agnivade · 2019-07-28T15:43:10Z

I believe this https://tip.golang.org/pkg/testing/#B.ReportMetric is what you want.

toothrot · 2019-07-29T21:24:05Z

@jpap Thanks for writing this up. Does @agnivade's suggestion work for you?

jpap · 2019-07-29T21:35:13Z

Thanks @agnivade, it's great that arbitrary units of work are supported in Go 1.13+.

@toothrot, the second part to this issue, auto-scaling the results {_, k, M, G} instead of scaling by 1e6 (M) uniformly as is done now, still remains and doesn't appear to be already implemented (having just checked tip).

toothrot · 2019-07-29T21:51:45Z

Great, glad to hear the arbitrary units work (#26037) will help! @jpap Let me know if I captured your request correctly in the title.

@aclements Thoughts on this?

jpap · 2019-07-29T21:58:12Z

@toothrot, it would be great if the unit-scaling applied to all reported metrics in the benchmark data format, not just bytes/sec (MB/s).

rsc · 2019-09-25T17:58:24Z

/cc @aclements for thoughts

aclements · 2019-09-25T19:10:07Z

@jpap, just to clarify, custom metrics reported using ReportMetric aren't autoscaled by 1e6 (I'm not sure if that's what you were saying or not). Only SetBytes scales that way.

It's an interesting question whether we could scale other things this way automatically. Our position has been to keep the benchmark output format very machine-readable (while still being reasonably user-friendly), and to depend on downstream tools to present a much more processed view to users. For example, benchstat does scale the units automatically. You almost always need to do some statistical analysis of the benchmark results anyway, so looking at the raw benchmark output can be misleading at best.

I'm also concerned with how this would interact with custom units. For example, the sync/pool benchmarks emit custom tail latency metrics with units like "p95-ns/STW". Automatically putting a metric prefix on that unit would be the wrong thing to do.

Finally, this would be a break from the benchmark format and would thus require changes to many tools that consume that format.

Given all of this, I do not think the testing package should be autoscaling these units. Instead, we should leverage the existing standard format to build tools to better process and present benchmark results.

rsc · 2019-10-02T17:58:01Z

Based on the discussion above and past discussions about how hard it is to do arbitrary units correctly in general, this seems like a likely decline.

For a truly amazing programming language that fully embraces the idea of getting units right, see https://frinklang.org/.

Leaving open for a week for final comments.

rsc · 2019-10-09T17:13:51Z

No final comments, so declined.

toothrot added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Jul 29, 2019

toothrot added this to the Unplanned milestone Jul 29, 2019

toothrot changed the title ~~testing: proposal: allow benchmark to use arbitrary units of work per run~~ testing: proposal: automatically scale MB/s metric in benchmark results Jul 29, 2019

toothrot added Proposal and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Jul 29, 2019

toothrot changed the title ~~testing: proposal: automatically scale MB/s metric in benchmark results~~ testing: proposal: automatically scale benchmark metrics in results output Jul 30, 2019

andybons mentioned this issue Sep 25, 2019

proposal: review meeting minutes #33502

Open

rsc added the Proposal-FinalCommentPeriod label Oct 2, 2019

rsc closed this as completed Oct 9, 2019

andybons changed the title ~~testing: proposal: automatically scale benchmark metrics in results output~~ proposal: testing: automatically scale benchmark metrics in results output Oct 9, 2019

golang locked and limited conversation to collaborators Oct 8, 2020

gopherbot added the FrozenDueToAge label Oct 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: testing: automatically scale benchmark metrics in results output #33328

proposal: testing: automatically scale benchmark metrics in results output #33328

jpap commented Jul 28, 2019

agnivade commented Jul 28, 2019

toothrot commented Jul 29, 2019

jpap commented Jul 29, 2019 •

edited

toothrot commented Jul 29, 2019

jpap commented Jul 29, 2019

rsc commented Sep 25, 2019

aclements commented Sep 25, 2019

rsc commented Oct 2, 2019

rsc commented Oct 9, 2019

proposal: testing: automatically scale benchmark metrics in results output #33328

proposal: testing: automatically scale benchmark metrics in results output #33328

Comments

jpap commented Jul 28, 2019

agnivade commented Jul 28, 2019

toothrot commented Jul 29, 2019

jpap commented Jul 29, 2019 • edited

toothrot commented Jul 29, 2019

jpap commented Jul 29, 2019

rsc commented Sep 25, 2019

aclements commented Sep 25, 2019

rsc commented Oct 2, 2019

rsc commented Oct 9, 2019

jpap commented Jul 29, 2019 •

edited