Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools: random program generator for testing #7985

Closed
dvyukov opened this issue May 13, 2014 · 20 comments
Closed

x/tools: random program generator for testing #7985

dvyukov opened this issue May 13, 2014 · 20 comments

Comments

@dvyukov
Copy link
Member

dvyukov commented May 13, 2014

There is CSmith (http://embed.cs.utah.edu/csmith/) that generates random correct C
programs, and it's proved to be extremely useful in testing C compilers.

We could do something similar for Go.
The tool can use existing ast/types/ssa packages for program generation and
serialization. The tool can start very small, e.g generate only simple assignment
statement. And then gradually grow.

The tool could be used for testing of a range of programs and packages besides gc/gccgo
compilation:
- gofmt idempotentness (gofmt several times and compare results)
- ast/types/ssa robustness and idempotentness (generate, serialize, parse, serialize,
parse, compare).
- govet robustness
- compilation in race mode which had lots of bugs in past
- compare gc, gccgo, ssa/interp output

I foresee piles of bugs there.
@davecheney
Copy link
Contributor

Comment 1:

I would love to see this, do you have any idea where to start? I've looked at the csmith
code before but the logic they use to generate code is very confusing -- i'm tempted to
feed random noise to go/ast and compile anything that can be written to disk validly.

@dvyukov
Copy link
Member Author

dvyukov commented May 13, 2014

Comment 2:

I would try recursively construct a random correct program as:
- choose random number of statements
- for each statement choose random statement type
- for each expression in top level statement choose random type
- do the same recursively for subexpressions
This way it's possible to generate random functions, which can be used as expressions in
other functions.
There are complications like that for a slice indexing operations you need to have a
slice in scope. So probably I would need to track what variables are currently in scope.
Similarly it's possible to generate top-level declarations like constants and global
vars.

@griesemer
Copy link
Contributor

Comment 3:

Feeding random noise is extremely unlikely to parse w/o errors, and even less likely to
compile. That said, feeding random noise to a compiler is an extremely good way to test
a compiler against crashes.
If it's just random noise (text) the parser is likely to bail quickly and thus the test
won't be very good. Here's a few ideas for this first step:
- create random sequences of valid Go tokens
- make sure the parser passes at least the very first check (package clause present)
- use existing programs (std lib) and modify randomly (parse, and then make a few random
AST changes)
These all would test robustness of parsers/compilers in the presence of syntactically
erroneous input.
dvyukov's suggestion for generation of random programs would be the next step. Those
programs should always parse, but may not compile due to semantic requirements.
The hard part is to track scope contents and types; that information must be used when
generating random programs that actually compile.

@dvyukov
Copy link
Member Author

dvyukov commented May 14, 2014

Comment 4:

> That said, feeding random noise to a compiler is an extremely good way to test a
compiler against crashes.
Security people say that they take good inputs and then start fuzzing it, in particular
just do random bit flipping. This has much higher chances to trigger interesting code
paths in software.

@dvyukov
Copy link
Member Author

dvyukov commented May 14, 2014

Comment 5:

Robert, do you have suggestions on what packages one needs to use to generate programs
that compile?

@griesemer
Copy link
Contributor

Comment 6:

Re #5: I'm not sure there are any packages one would need. One could use go/ast and thus
randomly generate an ast rather than source text, but the difference in effort seems
small. As mentioned before, the "hard" part is tracking of scopes and types; expression
generation will have to choose random identifiers from those scopes and combine them
using random (but valid) operations depending on the types. One could use go/types to
get a scope and type representation, but I'm not sure the (minor) gain is worth the
dependency - the operations on those data structures are likely quite different from
what go/types provides.

@dvyukov
Copy link
Member Author

dvyukov commented May 14, 2014

Comment 7:

> One could use go/ast and thus randomly generate an ast rather than source text, but
the difference in effort seems small.
I see. On a second thought, just dumping program text using recursive functions like
randExpr(w) and randStmt(w) indeed looks simpler.

@dvyukov
Copy link
Member Author

dvyukov commented May 14, 2014

Comment 8:

Here what I have in mind:
http://play.golang.org/p/5d-8mPjr_h
Currently it can generate programs like:
http://play.golang.org/p/0NhOj4PUJc
I would not say that it's too difficult, but it's... tiresome.
Do you think it's the right approach? Do you see any reasons why it can't scale to
handle all other types of statements and expressions?

@griesemer
Copy link
Contributor

Comment 9:

Looks not unreasonable. I'd agree, with "hard" I meant "tedious". I don't see why it
shouldn't scale, it's just a matter of actually doing the work.

@dvyukov
Copy link
Member Author

dvyukov commented May 16, 2014

Comment 10:

I've commit the prototype to http://gosmith.googlecode.com
It already has found 2 new bugs:
https://golang.org/issue/8011
https://golang.org/issue/8012
and this known bug:
https://golang.org/issue/7998

@dvyukov
Copy link
Member Author

dvyukov commented May 16, 2014

Comment 11:

Here we go:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61204

@dvyukov
Copy link
Member Author

dvyukov commented May 16, 2014

Comment 12:

Another one in gccgo:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61205

@davecheney
Copy link
Contributor

Comment 13:

> I've commit the prototype to http://gosmith.googlecode.com
This is outstanding Dmitry.

Owner changed to @dvyukov.

Status changed to Accepted.

@dvyukov
Copy link
Member Author

dvyukov commented May 17, 2014

Comment 14:

Another one:
https://golang.org/issue/8017

@dvyukov
Copy link
Member Author

dvyukov commented May 18, 2014

Comment 15:

gofmt was clean because... well, I forgot to actually call the function that verifies
gofmt:
https://golang.org/issue/8021

@dvyukov
Copy link
Member Author

dvyukov commented May 19, 2014

Comment 16:

I've started generating several packages with several source files, it instantly kills
gccgo:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61226

@dvyukov
Copy link
Member Author

dvyukov commented May 20, 2014

Comment 17:

Two more in gccgo, I've created a list for all GoSmith bugs:
https://gcc.gnu.org/bugzilla/buglist.cgi?quicksearch=GoSmith&list_id=90399

@adonovan
Copy link
Member

Comment 18:

This is a great start.  Depending on which piece of the stack you're trying to stress, I
can imagine the output you'd want would look very different.  For example, to test the
scanner you'd want all kinds of odd tokens: nested comments, string literals with odd
escape sequences, various numeric literals, even just line noise.  To test the type
checker you'd want well-formed ASTs that are mostly well-typed, but with random types,
assignments, identifiers, constants, etc.  To test code generation, you'd want only
well-typed programs that exercise control flow, operator semantics, etc.  They might all
have a lot in common, but more intelligence or structure is needed in the sentence
generator for later passes.  Looking over the regression tests for each tool might give
you some hints for where to focus your efforts.

@dvyukov
Copy link
Member Author

dvyukov commented May 26, 2014

Comment 19:

re #18
Agree, I think the current code mostly targets compiler middle-end.
Generators for different parts of the stack needs to be different programs. A single
program that tries to cover all pieces will sacrifice expressiveness and/or will have
excessive complexity.

@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@rsc rsc changed the title misc: random program generator for testing x/tools: random program generator for testing Apr 14, 2015
@rsc rsc modified the milestones: Unreleased, Unplanned Apr 14, 2015
@rsc rsc removed the repo-tools label Apr 14, 2015
@ALTree
Copy link
Member

ALTree commented Jan 30, 2017

Implemented: dvyukov/go-fuzz, so closing this.

@ALTree ALTree closed this as completed Jan 30, 2017
@golang golang locked and limited conversation to collaborators Jan 30, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants