Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime/debug: add standard library dependency information to debug.BuildInfo #60767

Open
chrisnovakovic opened this issue Jun 13, 2023 · 5 comments
Labels
Milestone

Comments

@chrisnovakovic
Copy link

The information provided by the debug.BuildInfo struct, particularly in the Deps field, is useful for vulnerability detection. Many open-source and commercial vulnerability scanners are now capable of parsing dep entries in the BuildInfo data embedded in binaries to identify vulnerable third-party library dependencies that contributed to the build process.

Some commercial vulnerability scanners have gone a step further and are using the value of the GoVersion field in the same struct to identify vulnerable standard library packages in the version of the Go toolchain that built the binary - for example, they report that binaries with a GoVersion between go1.19 and go1.19.7 or go1.20 and go1.20.2 are affected by CVE-2023-24536. In general, this approach overestimates the number of vulnerabilities a binary is affected by, because a vulnerable standard library package may not be imported in a binary's transitive dependency tree.

To improve the accuracy of such scanners, it may be beneficial to add a new field to the BuildInfo struct containing a list of all standard library packages that, directly or indirectly, contributed to the build process. It could be as simple as a string slice listing the package names, given that the standard library version could be derived from the Go toolchain version that built the binary, which is already available in the GoVersion field.

Proposal

Add the following field to the debug.BuildInfo struct:

// BuildInfo represents the build information read from a Go binary.
type BuildInfo struct {
	// StdlibPkgs describes all the standard library packages, both direct and
	// indirect, that contributed to the build of this binary.
	StdlibPkgs []string
}

When embedded into the binary, the data structure becomes a list of lexicographically-ordered lines prepended with the keyword stdlibpkg, per the current BuildInfo serialisation format, e.g.:

stdlibpkg	fmt
stdlibpkg	io
stdlibpkg	io/fs
stdlibpkg	io/ioutil
[...]

Open questions

  • Should vendorised packages identified by the cmd/go/internal/load API as belonging to the stdlib be filtered? This covers Packages where Standard is true but ImportPath contains /vendor/ or is prefixed with vendor/. My feeling is that they should - presumably information about these packages is already (or should instead be) exposed in the Deps field if it is required.
  • Should internal stdlib packages be filtered? My feeling is that they shouldn't - there are vulndb reports for vulnerabilities that only affect specific internal packages (e.g. GO-2022-0318), and it seems appropriate to make that information available to scanners to improve detection accuracy.
  • Should symbol names be taken into consideration now? vulndb reports include information about which symbols in a package are affected by a given vulnerability. I could see a future case for including in StdlibPkgs a list of used symbols for each stdlib package in order to improve detection accuracy further, in which case a data structure other than []string would be appropriate, although that'd be a lot of extra information to embed.
@gopherbot gopherbot added this to the Proposal milestone Jun 13, 2023
@ianlancetaylor
Copy link
Contributor

CC @bcmills @matloob

@apparentlymart
Copy link

apparentlymart commented Jun 14, 2023

While I do quite like this idea, it does introduce an inconsistency: standard library packages would be captured on a per-package basis, while external dependencies are captured only on a whole-module basis.

It's been my experience (as a maintainer of a Go program distributed to users as built executables) that these vulnerability scanners are also prone to overestimate the impact of third-party dependencies, because the tracked information only tracks entire modules even though the official vulnerability database is capable of per-package tracking. So far I've received more reports of false positives where the affected package is not even included in my program than I have seen useful reports that indicate genuine problems[1].

With that in mind, would it make sense to generalize this proposal to capturing information about all packages that are linked into the program, whether standard library or otherwise? That would then in theory allow these vulnerability scanners to operate at the same granularity as the vulnerability database does, and thus generate fewer false-positives.

On the other hand, I expect that storing all of the package names directly as strings would bloat the metadata quite a bit, especially if stored in addition to the existing module-related information. Some sort of compression might be warranted to exploit the fact that a typical program will have multiple packages belonging to the same module which thus share a common prefix. Perhaps the storage format could extend the existing module metadata in a backward-compatible by pointing to the entries in the table of modules and then only storing the subsequent suffix to concatenate with a module name to produce the full package name. Of course, that could very well be a premature optimization.


[1] This is admittedly only really true for modules that contain an assortment of thematically-related-but-separate functionality. The golang.org/x/... modules seem to have been a common theme so far because most of them contain a wide variety of different packages where our software uses only a small fraction of that surface area. Maybe those "x" packages are just structured more like the standard library than third-party packages typically are, and so I'm over-generalizing a problem that is specific to those.

@bcmills
Copy link
Contributor

bcmills commented Jun 14, 2023

If you have the version info for all of the modules used in the binary, and you know which Go release was used to build the binary, you can use go list to determine which standard-library packages it used.

Moreover, just knowing the packages isn't enough for precise vulnerability reporting anyway: really you should analyze the individual symbols, not just the packages.

@peterebden
Copy link

While I do quite like this idea, it does introduce an inconsistency: standard library packages would be captured on a per-package basis, while external dependencies are captured only on a whole-module basis.

That's true, but the stdlib is sufficiently different (and well-known) that it may deserve this treatment. Most (arguably nearly all) modules are fairly focused on providing a specific piece of functionality; the stdlib is wide-ranging and covers many different things, with essentially every binary only needing a subset of them.
It also (in the foreseeable future, anyway) doesn't make major releases so it can't drop functionality that's now undesirable or difficult to maintain, whereas modules can do this if they have, say, an interface that later turns out to be a problem (e.g. a function that doesn't return an error but eventually turns out that it needs to).

@seankhliao
Copy link
Member

I also don't think this is precise enough to be worth doing, the inclusion of a package doesn't necessarily mean something vulnerable from it was used.
The approach taken by govulncheck seems to be much better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

7 participants