proposal: testing: support naming seed corpus values provided with (*testing.F).Add #50456

katiehockman · 2022-01-05T20:55:53Z

Seed corpus entries can be added to a fuzz test by calling (*testing.F).Add. There is no way to name these seed corpus entries, so they default to "seed#{num}". When run with go test -v, it will look something like this:

--- PASS: FuzzFoo (0.00s)
    --- PASS: FuzzFoo/seed#0 (0.00s)
    --- PASS: FuzzFoo/seed#1 (0.00s)
    --- PASS: FuzzFoo/seed#2 (0.00s)

The proposal is to allow a way to name these entries, much like you would with an execution of t.Run. This would help with error messages and debugging. We could do this a few different ways:

Amend f.Add to accept an optional first/last name string param. Since f.Add takes a []interface{} today, this is functionality that could be added later that wouldn't break backwards compatibility.

e.g. existing fuzz tests that look like this:

func FuzzFoo(f *testing.F) {
	f.Add("a3g1f3", 10)
	f.Add("----", 100)
	f.Add("", 0)
	f.Fuzz(func(*testing.T, string, int) {})
}

would change to

func FuzzFoo(f *testing.F) {
	f.Add("valid", "a3g1f3", 10)
	f.Add("invalid", "----", 100)
	f.Add("empty", "", 0)
	f.Fuzz(func(*testing.T, string, int) {})
}

We could add another method on *testing.F. For example:
(*testing.F).AddNamed(string, []interface{})

Originally proposed by @dnwe

/cc @golang/fuzzing

The text was updated successfully, but these errors were encountered:

thepudds · 2022-01-05T20:59:46Z

CC @dnwe

mvdan · 2022-01-06T09:26:39Z

Since f.Add takes a []interface{} today, this is functionality that could be added later that wouldn't break backwards compatibility.

While this will work just fine for the call sites, it would make for a pretty tricky signature: args ...any, but if it begins with an extra string, that's actually the name.

It would be a lot saner if we did name string, args ...any. Is it too late for that? beta1 already shipped, but rc1 hasn't yet, and I don't imagine a lot of people have written native fuzzers yet. We could also leverage go fix to automatically fix fuzz funcs, or supply a gofmt -r expression that people can run.

mvdan · 2022-01-06T10:30:07Z

2. (*testing.F).AddNamed(string, []interface{})

For completeness: I realise this is also an option, but leaving Add around as an extra is equally unfortunate, I'd say :) Having just Add(name string, args ...any) feels like the best result overall, especially as it mimics other methods like Run.

rsc · 2022-01-12T19:01:36Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

rsc · 2022-01-26T18:32:51Z

It's getting very late to change the API, so it's looking like AddNamed is the right approach?
Or maybe we don't need this feature at all?

Another possibility is to define a new string type like testing.FuzzName and then you could

f.Add(testing.FuzzName("myname"), "data")

but that's a bit verbose.

mvdan · 2022-01-26T22:37:49Z

I reckon we're not late to change the API; this user report was given three weeks ago, a couple of weeks after the beta came out. That's very much within the timeframe that you would expect users to give the beta an honest try and report issues. At the time of writing there are 38 open issues with the release-blocker label, so I also reckon the RC is at least a couple of weeks away, and shouldn't be further delayed by a minor API change.

I do think we need this feature; if we say "one shouldn't need to name seed corpus entries", presumably the same could have applied to sub-test and sub-benchmark names :) And I still believe that altering Add while we still can is the best and most consistent approach long-term.

mvdan · 2022-01-26T22:46:43Z

Also, if it comes between AddNamed and FuzzName, my vote would go for FuzzName; that way we avoid two "add" entrypoints, and it's clear which argument is the name versus the variable number of fuzz values.

Finally, just a thought: I seem to recall that the proposal review committee only meets once a week, and the RC isn't very far, so we should probably expedite a resolution here before more weeks go by and we really are too late.

mvdan · 2022-01-29T08:55:37Z

I also reckon the RC is at least a couple of weeks away

It seems like we'll get beta2 next week, which seems to support my reasoning above - the RC is probably four weeks away or so. And, if we can manage to get this small change into master early next week, users can also test it as part of beta2 :)

rsc · 2022-01-31T16:28:38Z

I am not sure about the need to name specific seed values. This reminds me of the pets vs cattle discussion for managing computers.

Every test function we write is typically an important, unique, carefully tended thing (like a pet).
I am not convinced fuzzing seed corpus values are like that at all: do we really want the overhead of having to name them?
Seeds are usually small: why not just give the subtest for it a sequence number and then print the actual value when it fails?

We are talking about adding complexity, and I am not convinced we have established that the complexity is needed.
And it is very late in the cycle to add complexity that we're not sure we need.
(Package constraints just missed the cut! Is this really more important than that? And are we so sure?)

mvdan · 2022-01-31T18:00:18Z

Perhaps you're right that not every seed corpus entry will need a name, but I think that already applies to sub-tests and sub-benchmarks - one can leave the name empty and they get a number instead. That seems like the best of both worlds to me.

Seeds are usually small: why not just give the subtest for it a sequence number and then print the actual value when it fails?

That assumption worries me a little; most of the fuzzers I've written take bytes or strings as input, and some of those inputs do get reasonably long fairly quickly when I want the seed corpus to also cover some particularly interesting and complex edge cases.

That said, if we think that unnamed seed corpus entries will be more common, then perhaps the FuzzName solution is a good middle ground. I can't say I have enough data to point one way or another.

I'm not sure if the comparison with package constraints applies; we can always add that package in 1.19, whereas we can't change the signature of testing.F.Add once it gets released :)

katiehockman · 2022-01-31T18:26:06Z

whereas we can't change the signature of testing.F.Add once it gets released :)

I don't think that adding the testing.FuzzName type would be a non-backwards compatible change to the signature of testing.F.Add if we were to add this in Go 1.19. Either way, the fuzz name should be optional. Given that, we could make both of the following testing.F.Add calls valid, whereas in Go 1.18 only the first would be:

f.Add("data")
f.Add(testing.FuzzName("myname"), "data")

testing.F.Add takes args ...any today, so we can certainly add an optional additional value later that will be processed as the name, once we've had more time to collect evidence, and when we don't have to rush.

dnwe · 2022-01-31T18:46:46Z

I originally raised this merely as an API discrepancy rather than necessarily a strong desire for the capability.

My 2p would be that testing.F.AddNamed jars too much and doesn't fit with the rest of testing, so I'd personally discount that immediately.

I similarly think that the functionality we gain isn't worth the overhead and unintuitiveness of a special testing.FuzzName type).

If it's too late to make testing.F.Add match the arg style of testing.T.Run then personally I'd discount the feature and just add a note in the docs that if you want to name seed values then you should directly write to testdata/fuzz/Fuzz_Name/Seed_Name yourself

dnwe · 2022-01-31T18:54:31Z

(I actually prefer storing my seed values under the testdata hierarchy anyway)

AlekSi · 2022-01-31T18:57:01Z

One option could be to combine ideas from #46780 and #47413 and add support for testing.F.Run:

func FuzzFoo(f *testing.F) {
    for _, tc := range []struct {
        name string
        s string
        expected int
    }{
        {name: "random", s: "a3g1f3", expected: 10},
        {name: "dashes", s: "----", expected: 100},
        {name: "empty", s: "", expected: 0},
    } {
        tc := tc
        f.Run(tc.name, func(f *testing.F) {
            // test tc
            if res := foo(tc.s); res != tc.expected {
                f.Errorf("foo(%s) = %d, want %d", tc.s, res, tc.expected)
            }

            f.Add(tc.s) // named tc.name
        })
    }

    f.Fuzz(...)
}

(I actually prefer storing my seed values under the testdata hierarchy anyway)

Interestingly, I prefer my seed values to be a part of the table-driver test.

rsc · 2022-01-31T19:02:16Z

@mvdan I would have thought that large seed inputs would be better handled as files under testdata/fuzz than placed in programs? Or are these mechanically constructed seeds?

dnwe · 2022-01-31T19:02:31Z

Interestingly, I prefer my seed values to be a part of the table-driver test.

what do you do after fuzzing found an interesting input and dumps it in testdata for you to re-run against? Do you manually move it to your seed table or do you just discard it after you’ve fixed the crasher?

AlekSi · 2022-01-31T19:09:07Z

Do you manually move it to your seed table

That, with an expected result or error. For example: https://github.com/FerretDB/FerretDB/blob/main/internal/bson/double_test.go

(It would be nice to have some tooling for that or a function to read seed corpus files, but that's offtopic)

rsc · 2022-02-02T18:23:40Z

It sounds like this is either a likely decline or a 'on hold'. Given that we are not making changes for Go 1.18, perhaps it should be declined for now, and then a new proposal with a different API can be proposed if we need it?

mvdan · 2022-02-02T18:39:19Z

I would have thought that large seed inputs would be better handled as files under testdata/fuzz than placed in programs? Or are these mechanically constructed seeds?

I admit I hadn't realised it was supported to add seed corpus entries directly as files while using custom names. Taking another look at https://go.dev/doc/fuzz/, I do see it's mentioned. If one seed input is too large to be named after its own value, then I think it's a reasonable tradeoff to instead include it as a named file.

I do have one case where I'm mechanically constructing reasonably sized seeds, but I'm not sure that it would be a huge problem either. I could always write a go generate program to construct the seeds and write them into testdata.

It sounds like this is either a likely decline or a 'on hold'.

I've grown less convinced that we need this feature right now as the conversation has progressed; declining for now seems reasonable.

rsc · 2022-02-09T19:17:14Z

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

rsc · 2022-02-16T19:01:49Z

No change in consensus, so declined.
— rsc for the proposal review group

katiehockman added Proposal FeatureRequest fuzz labels Jan 5, 2022

katiehockman added this to the Proposal milestone Jan 5, 2022

rsc added the Proposal-FinalCommentPeriod label Feb 9, 2022

rsc removed the Proposal-FinalCommentPeriod label Feb 16, 2022

rsc closed this as completed Feb 16, 2022

rsc moved this to Declined in Proposals Aug 10, 2022

rsc added this to Proposals Aug 10, 2022

julieqiu added this to Go Security Sep 8, 2022

golang locked and limited conversation to collaborators Feb 16, 2023

gopherbot added the FrozenDueToAge label Feb 16, 2023

rsc removed this from Proposals Feb 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: testing: support naming seed corpus values provided with (*testing.F).Add #50456

proposal: testing: support naming seed corpus values provided with (*testing.F).Add #50456

katiehockman commented Jan 5, 2022 •

edited

Loading

thepudds commented Jan 5, 2022

mvdan commented Jan 6, 2022

mvdan commented Jan 6, 2022

rsc commented Jan 12, 2022

rsc commented Jan 26, 2022

mvdan commented Jan 26, 2022

mvdan commented Jan 26, 2022

mvdan commented Jan 29, 2022

rsc commented Jan 31, 2022

mvdan commented Jan 31, 2022

katiehockman commented Jan 31, 2022 •

edited

Loading

dnwe commented Jan 31, 2022 •

edited

Loading

dnwe commented Jan 31, 2022

AlekSi commented Jan 31, 2022 •

edited

Loading

rsc commented Jan 31, 2022

dnwe commented Jan 31, 2022

AlekSi commented Jan 31, 2022

rsc commented Feb 2, 2022

mvdan commented Feb 2, 2022

rsc commented Feb 9, 2022

rsc commented Feb 16, 2022

proposal: testing: support naming seed corpus values provided with (*testing.F).Add #50456

proposal: testing: support naming seed corpus values provided with (*testing.F).Add #50456

Comments

katiehockman commented Jan 5, 2022 • edited Loading

thepudds commented Jan 5, 2022

mvdan commented Jan 6, 2022

mvdan commented Jan 6, 2022

rsc commented Jan 12, 2022

rsc commented Jan 26, 2022

mvdan commented Jan 26, 2022

mvdan commented Jan 26, 2022

mvdan commented Jan 29, 2022

rsc commented Jan 31, 2022

mvdan commented Jan 31, 2022

katiehockman commented Jan 31, 2022 • edited Loading

dnwe commented Jan 31, 2022 • edited Loading

dnwe commented Jan 31, 2022

AlekSi commented Jan 31, 2022 • edited Loading

rsc commented Jan 31, 2022

dnwe commented Jan 31, 2022

AlekSi commented Jan 31, 2022

rsc commented Feb 2, 2022

mvdan commented Feb 2, 2022

rsc commented Feb 9, 2022

rsc commented Feb 16, 2022

katiehockman commented Jan 5, 2022 •

edited

Loading

katiehockman commented Jan 31, 2022 •

edited

Loading

dnwe commented Jan 31, 2022 •

edited

Loading

AlekSi commented Jan 31, 2022 •

edited

Loading