Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/fuzz: revisit use of shared memory-mapped files for marshaled inputs #48163

Open
jayconrod opened this issue Sep 2, 2021 · 6 comments
Labels
fuzz Issues related to native fuzzing support NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@jayconrod
Copy link
Contributor

Currently, the fuzzing coordinator process creates a 100MB temporary file for each worker process. Both the coordinator and worker read and write the file via a shared memory map. The file is currently used for 1) a few "header" fields such as iteration count and PRNG state, and 2) passing marshaled inputs for fuzzing in both directions.

We still need to use shared memory for the header fields. If a worker process terminates unexpectedly, the coordinator needs to be able to reconstruct the input that caused the crash using the initial input (sent from the coordinator), the call count, and the PRNG state.

However, it's not strictly necessary to write marshaled inputs to shared memory. It would be simpler to pass these through the pipes we use for RPCs. If we only supported unmarshaled byte slices, there would be a performance advantage to writing and mutating those directly in shared memory without incurring the cost of marshaling and pipe I/O. Since we need to marshal inputs anyway, it's not clear the extra complexity is worthwhile.

We should investigate the performance difference and pass inputs through pipes if it's not too bad. This would simplify our implementation, and would let us use much smaller shared memory files.

Additionally, for inputs that can be read from files, such as those in testdata or the cache, we can pass file names over pipes instead of reading, and sending that data over pipes. The coordinator doesn't need to hold that data in virtual memory at all.

@jayconrod jayconrod added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker fuzz Issues related to native fuzzing support labels Sep 2, 2021
@jayconrod jayconrod added this to the Go1.18 milestone Sep 2, 2021
@rsc rsc changed the title [dev.fuzz] internal/fuzz: revisit use of shared memory-mapped files for marshaled inputs internal/fuzz: revisit use of shared memory-mapped files for marshaled inputs Sep 21, 2021
@toothrot
Copy link
Contributor

Checking in on this issue as it's labeled a release blocker for Go 1.18. Is there any update?

@katiehockman
Copy link
Contributor

@jayconrod do you still think this is a release blocker? Reading it over, it feels more like an optimization that we might not necessarily need right now, but there may be context I'm not seeing here.

@jayconrod
Copy link
Contributor Author

This probably doesn't need to be a release blocker.

#48731 is related though, that's mostly the one I'm worried about.

@jayconrod
Copy link
Contributor Author

Unassigning, since I'm leaving.

I did benchmark whether there would be a significant slowdown to passing inputs through RPC vs pipes: there wasn't a measurable difference for inputs of ~100 bytes. The marshaling / unmarshaling cost is far higher, so I don't think communication overhead is a reason to implement this or not.

Reading entries from corpus files in workers is a good idea regardless. Then the coordinator doesn't need to read them at all.

#48731 is my main concern. We need to be able to reconstruct the entry that caused a problem, whether we're fuzzing or minimizing. Writing the entry to shared memory before every call to the fuzz target is too expensive because of the marshaling overhead, so we need to be able to rebuilt it from the input entry and some known sequence of transformations. For minimization, I think we should store the sequence of transformations in shared memory. For normal fuzzing, we already store initial RNG state and a count, and that works well enough. In any case, we do still need shared memory, but we could use a lot less of it since we don't need to store inputs there.

@jayconrod jayconrod removed their assignment Oct 14, 2021
@gopherbot
Copy link

Change https://golang.org/cl/356229 mentions this issue: internal/fuzz: pass fuzz inputs over pipe instead of shared memory

@gopherbot
Copy link

Change https://go.dev/cl/393660 mentions this issue: internal/fuzz: cleanup usage of shared memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fuzz Issues related to native fuzzing support NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: No status
Development

No branches or pull requests

4 participants