Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/perf/benchstat: Allow for hardcoded old/new name in header to be configured #48380

Open
AbhiPrasad opened this issue Sep 14, 2021 · 1 comment
Labels
FeatureRequest NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@AbhiPrasad
Copy link

AbhiPrasad commented Sep 14, 2021

Feature Request

What version of Go are you using (go version)?

$ go version
go version go1.17 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/abhijeetprasad/Library/Caches/go-build"
GOENV="/Users/abhijeetprasad/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/abhijeetprasad/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/abhijeetprasad/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.17/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.17/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.17"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/abhijeetprasad/workspace/sentry-sdk-benchmark/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/_9/43498gx91ml4f25m1kt2r3jh0000gn/T/go-build1057351747=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I'm using the benchstat library to aggregate some statistics, and then displaying the results in text format. In the library, the values of old and new are hardcoded to show up when you run benchstat on two configs: https://github.com/golang/perf/blob/master/benchstat/text.go#L208.

name            old ms      new ms       delta
LatenciesMean    9.00 ± 0%   10.75 ±12%  +19.44%  (p=0.029 n=4+4)
Latencies50th    9.00 ± 0%   10.00 ± 0%  +11.11%  (p=0.029 n=4+4)
Latencies90th    10.2 ± 7%    12.2 ± 6%  +19.51%  (p=0.029 n=4+4)

What did you expect to see?

Is it possible for these values to be configured so that we display the name instead of old and new? Like the behaviour for running benchstat on 3+ configs as per: https://github.com/golang/perf/blob/master/benchstat/text.go#L210-L212.

Something like:

name  \ ms      baseline   instrumented  delta
LatenciesMean    9.00 ± 0%   10.75 ±12%  +19.44%  (p=0.029 n=4+4)
Latencies50th    9.00 ± 0%   10.00 ± 0%  +11.11%  (p=0.029 n=4+4)
Latencies90th    10.2 ± 7%    12.2 ± 6%  +19.51%  (p=0.029 n=4+4)

would be nice.

I wouldn't mind helping patch this if someone points me in the right direction. Thanks!

@gopherbot gopherbot added this to the Unreleased milestone Sep 14, 2021
@seankhliao seankhliao added FeatureRequest NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 15, 2021
@gopherbot
Copy link

Change https://golang.org/cl/309969 mentions this issue: cmd/benchstat: new version of benchstat

zchee pushed a commit to zchee/golang-perf that referenced this issue Nov 28, 2021
This is a complete rewrite of benchstat. Basic usage remains the same,
as does the core idea of showing statistical benchmark summaries and
A/B comparisons in a table, but there are several major improvements.

The statistics is now more robust. Previously, benchstat used
IQR-based outlier rejection, showed the mean of the reduced sample,
its range, and did a non-parametric difference-of-distribution test on
reduced samples. Any form of outlier rejection must start with
distributional assumptions, in this case assuming normality, which is
generally not sound for benchmark data. Hence, now benchstat does not
do any outlier rejection. As a result, it must use robust summary
statistics as well, so benchstat now uses the median and confidence
interval of the median as summary statistics. Benchstat continues to
use the same Mann-Whitney U-test for the delta, but now does it on the
full samples since the U-test is already non-parametric, hence
increasing the power of this test.

As part of these statistical improvements, benchstat now detects and
warns about several common mistakes, such as having too few samples
for meaningful statistical results, or having incomparable geomeans.

The output format is more consistent. Previously, benchstat
transformed units like "ns/op" into a metric like "time/op", which it
used as a column header; and a numerator like "sec", which it used to
label each measurement. This was easy enough for the standard units
used by the testing framework, but was basically impossible to
generalize to custom units. Now, benchstat does unit scaling, but
otherwise leaves units alone. The full (scaled) unit is used as a
column header and each measurement is simply a scaled value shown with
an SI prefix. This also means that the text and CSV formats can be
much more similar while still allowing the CSV format to be usefully
machine-readable.

Benchstat will also now do A/B comparisons even if there are more than
two inputs. It shows a comparison to the base in the second and all
subsequent columns. This approach is consistent for any number of
inputs.

Benchstat now supports the full Go benchmark format, including
sophisticated control over exactly how it structures the results into
rows, columns, and tables. This makes it easy to do meaningful
comparisons across benchmark data that isn't simply structured into
two input files, and gives significantly more control over how results
are sorted. The default behavior is still to turn each input file into
a column and each benchmark into a row.

Fixes golang/go#19565 by showing all results, even if the benchmark
sets don't match across columns, and warning when geomean sets are
incompatible.

Fixes golang/go#19634 by no longer doing outlier rejection and clearly
reporting when there are not enough samples to do a meaningful
difference test.

Updates golang/go#23471 by providing more through command
documentation. I'm not sure it quite fixes this issue, but it's much
better than it was.

Fixes golang/go#30368 because benchstat now supports filter
expressions, which can also filter down units.

Fixes golang/go#33169 because benchstat now always shows file
configuration labels.

Updates golang/go#43744 by integrating unit metadata to control
statistical assumptions into the main tool that implements those
assumptions.

Fixes golang/go#48380 by introducing a way to override labels from the
command line rather than always using file names.

Change-Id: Ie2c5a12024e84b4918e483df2223eb1f10413a4f
skyzyx pushed a commit to northwood-labs/go-x-perf that referenced this issue Apr 12, 2024
This is a complete rewrite of benchstat. Basic usage remains the same,
as does the core idea of showing statistical benchmark summaries and
A/B comparisons in a table, but there are several major improvements.

The statistics is now more robust. Previously, benchstat used
IQR-based outlier rejection, showed the mean of the reduced sample,
its range, and did a non-parametric difference-of-distribution test on
reduced samples. Any form of outlier rejection must start with
distributional assumptions, in this case assuming normality, which is
generally not sound for benchmark data. Hence, now benchstat does not
do any outlier rejection. As a result, it must use robust summary
statistics as well, so benchstat now uses the median and confidence
interval of the median as summary statistics. Benchstat continues to
use the same Mann-Whitney U-test for the delta, but now does it on the
full samples since the U-test is already non-parametric, hence
increasing the power of this test.

As part of these statistical improvements, benchstat now detects and
warns about several common mistakes, such as having too few samples
for meaningful statistical results, or having incomparable geomeans.

The output format is more consistent. Previously, benchstat
transformed units like "ns/op" into a metric like "time/op", which it
used as a column header; and a numerator like "sec", which it used to
label each measurement. This was easy enough for the standard units
used by the testing framework, but was basically impossible to
generalize to custom units. Now, benchstat does unit scaling, but
otherwise leaves units alone. The full (scaled) unit is used as a
column header and each measurement is simply a scaled value shown with
an SI prefix. This also means that the text and CSV formats can be
much more similar while still allowing the CSV format to be usefully
machine-readable.

Benchstat will also now do A/B comparisons even if there are more than
two inputs. It shows a comparison to the base in the second and all
subsequent columns. This approach is consistent for any number of
inputs.

Benchstat now supports the full Go benchmark format, including
sophisticated control over exactly how it structures the results into
rows, columns, and tables. This makes it easy to do meaningful
comparisons across benchmark data that isn't simply structured into
two input files, and gives significantly more control over how results
are sorted. The default behavior is still to turn each input file into
a column and each benchmark into a row.

Fixes golang/go#19565 by showing all results, even if the benchmark
sets don't match across columns, and warning when geomean sets are
incompatible.

Fixes golang/go#19634 by no longer doing outlier rejection and clearly
reporting when there are not enough samples to do a meaningful
difference test.

Updates golang/go#23471 by providing more through command
documentation. I'm not sure it quite fixes this issue, but it's much
better than it was.

Fixes golang/go#30368 because benchstat now supports filter
expressions, which can also filter down units.

Fixes golang/go#33169 because benchstat now always shows file
configuration labels.

Updates golang/go#43744 by integrating unit metadata to control
statistical assumptions into the main tool that implements those
assumptions.

Fixes golang/go#48380 by introducing a way to override labels from the
command line rather than always using file names.

Change-Id: Ie2c5a12024e84b4918e483df2223eb1f10413a4f
Reviewed-on: https://go-review.googlesource.com/c/perf/+/309969
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FeatureRequest NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants