proposal: x/perf/cmd/benchrun: add `-run` and `-count` option #63233

Jorropo · 2023-09-26T13:37:28Z

Collecting high quality benchmarks using benchstat is a bit annoying. First you have to run them many times, ideally you want to interleave old and new binaries to make noise less relevant.

Add -run bool option, if set then the old, new, ... file paths are not outputs of go test but go test -c, benchstat will invoke the files using a Thue-Morse sequence and with options -test.run=$^ and -test.bench= with benchstat's -filter option value.
-count specify how many invocation of each should happen, each invocation will run with -test.count=1 (or unset).

Expected usage:

> go test -c -o /tmp/new
> git checkout HEAD~1
> go test -c -o /tmp/old
> benchstat -run -count=10 /tmp/{old,new}

The text was updated successfully, but these errors were encountered:

AlexanderYastrebov · 2023-09-27T00:15:09Z

And to get the binaries one would need to checkout specific revisions and compile. I crafted the following benchrev.sh script:

#!/bin/bash
set -euo pipefail

if [ "$#" -lt 6 ]; then
    echo "Usage: $0 <revision A> <revision B> [<go test -bench=...>]"
    echo "Runs benchmark for each revision and prints benchstat results."
    echo "Example:"
    echo "  $0 master HEAD ../bin/go test -run=NONE -bench=BenchmarkLarge -count=10 ./internal/zstd"
    exit 2
fi

if [ $(git status --porcelain | wc -l) -ne "0" ]; then
    echo "Please commit your changes or stash them before you switch branches."
    exit 3
fi

readonly REVA="$1"; shift
readonly REVB="$1"; shift

readonly HEAD=$(git rev-parse --abbrev-ref HEAD)
readonly COMMITA=$(git rev-parse "$REVA")
readonly COMMITB=$(git rev-parse "$REVB")
readonly OUT=$(mktemp -d benchrev.XXXXXXXXXX)

function cleanup {
    rm -rf "$OUT"
    git checkout -q "$HEAD"
}
trap cleanup EXIT

git checkout -q "$COMMITA"
"$@" | tee "$OUT/$COMMITA"

git checkout -q "$COMMITB"
"$@" | tee "$OUT/$COMMITB"

benchstat "$REVA=$OUT/$COMMITA" "$REVB=$OUT/$COMMITB"

ianlancetaylor · 2023-09-27T00:45:10Z

CC @aclements

aclements · 2023-09-27T14:31:01Z

Interesting. I haven't encountered the Thue-Morse sequence before. I have a hunch as to why it could be valuable for interleaving, but could you please explain?

I'm not sure benchstat is the right place for this because this is a separable problem from running the analysis itself. But I agree that some tool for running high quality benchmark sequences would be valuable. I've written a few messy prototypes in the past.

Jorropo · 2023-09-27T14:41:12Z

The Thue-Morse sequence isn't that important here, you could interleave all binaries in order and it would have a very similar effect.
Using Thue-Morse would help break up noise patterns if by malchance some periodic load noise on the machine has a period similar to the benchmarks being run (because they wouldn't be interleaved in order the noise would affect boths benchmark in similar fashions).

Arguably I could add a command in my shell which does:

for i in (seq 10); /tmp/old -test.bench=. | tee /dev/stderr >> /tmp/old.results; /tmp/new -test.bench=. | tee /dev/stderr >> /tmp/new.results; end; benchstat /tmp/{old,new}

but I feel benchstat -run -count=10 /tmp/{old,new} is a quick win that will also help more peoples.

Jorropo · 2023-09-27T14:44:28Z

The main comparison to do is with:

> go test -bench=. -count=10 | tee /tmp/new
> git checkout HEAD~1
> go test -bench=. -count=10 | tee /tmp/old
> benchstat /tmp/{old,new}

which has tangible realistical drawbacks.

aclements · 2023-11-27T19:00:16Z

I agree it would be valuable to have a tool to help with running benchmarks for comparison. It shouldn't be added to benchstat, though, because this is pretty different from the functionality of benchstat. It should be a new tool.

There are many dimensions on which it may make sense to do an A/B comparison, too. Not just between two binaries. Two binaries built from different commits is probably the most common use case, but even that can have unintended results if the benchmark depends on any other checked-in files. Seven years ago (!) I wrote such a tool that was specifically for comparing between git commits, but even that got quite fiddly as I tried to reduce the time to repeatedly switch git checkouts. It may be that with things like build caching, more regular benchmarks, and the enhanced benchmark results format, that we're in a better place to do this now.

Jorropo added the Proposal label Sep 26, 2023

gopherbot added this to the Proposal milestone Sep 26, 2023

Jorropo changed the title ~~proposal: x/perf/cmd/benchstat: add -run option~~ proposal: x/perf/cmd/benchstat: add -run and -count option Sep 26, 2023

aclements changed the title ~~proposal: x/perf/cmd/benchstat: add -run and -count option~~ proposal: x/perf/cmd/benchrun: add -run and -count option Nov 27, 2023

AlexanderYastrebov mentioned this issue Feb 9, 2024

Optimise sanitize with string builder zalando/skipper#2918

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: x/perf/cmd/benchrun: add `-run` and `-count` option #63233

proposal: x/perf/cmd/benchrun: add `-run` and `-count` option #63233

Jorropo commented Sep 26, 2023 •

edited

AlexanderYastrebov commented Sep 27, 2023 •

edited

ianlancetaylor commented Sep 27, 2023

aclements commented Sep 27, 2023

Jorropo commented Sep 27, 2023

Jorropo commented Sep 27, 2023

aclements commented Nov 27, 2023

proposal: x/perf/cmd/benchrun: add -run and -count option #63233

proposal: x/perf/cmd/benchrun: add -run and -count option #63233

Comments

Jorropo commented Sep 26, 2023 • edited

AlexanderYastrebov commented Sep 27, 2023 • edited

ianlancetaylor commented Sep 27, 2023

aclements commented Sep 27, 2023

Jorropo commented Sep 27, 2023

Jorropo commented Sep 27, 2023

aclements commented Nov 27, 2023

proposal: x/perf/cmd/benchrun: add `-run` and `-count` option #63233

proposal: x/perf/cmd/benchrun: add `-run` and `-count` option #63233

Jorropo commented Sep 26, 2023 •

edited

AlexanderYastrebov commented Sep 27, 2023 •

edited