Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: use the go version declared in the go.mod file to determine module boundaries and checksums #30369

Closed
bcmills opened this issue Feb 23, 2019 · 17 comments

Comments

@bcmills
Copy link
Contributor

bcmills commented Feb 23, 2019

Summary

  • Use the go version declared in the go.mod file to determine the boundaries of the module's source code.
  • Only store and verify checksums for the source code of modules that were extracted with known-good boundaries.
  • Continue to compute, store, and verify a checksum for every go.mod loaded during a build, regardless of its version.

Background

In the fix for #27093, we changed the module loader to drop symlinks in repositories when converting them to modules.

That changed the contents of some modules, and therefore their hashes, and rendered the contents of some existing go.sum files invalid (#29278). In retrospect, that was a mistake: we should never give users a reason to delete or otherwise mistrust their go.sum files, because that undermines the very purpose of go.sum files: a checksum mismatch should be treated as a potential security threat, not just a bug in the go tool.

At some point, we will probably find another bug in module extraction, or decide to make a change in how we compute module boundaries (such as ignoring go.mod files in testdata directories for #27852, or pruning out vendor directories for #30240). If and when we do, we should be careful not to break existing go.sum files.

This proposal attempts to build on #28221 to provide a safe means to make such changes.

Detail

Just as the go version determines the semantics in effect for the compiler, it should also determine the semantics of the module loader. A given release of the go tool may understand how to load arbitrarily many versions, and patch releases for older versions may even support newer versions.

If the go version used to extract the module does not support the go version declared by that module, fetch the module according to the closest supported version instead. If we have an existing checksum for the module and it does not match, fail with an “unsupported go version” warning. If we do not have an existing checksum, mark the module as provisional and do not record the new checksum (per #28835).

  • If adopted, we should also backport this behavior to the next patch releases of Go 1.11 and 1.12: we want to avoid ever storing another bad checksum.
  • Alternately, we could change the checksum prefix from h1: to h2: — even though the checksum algorithm itself doesn't need to change — to indicate the reliability of that checksum. (An h1 checksum might indicate a correctly-computed sum for an incorrectly-extracted module.)

However, do continue to record and verify checksums for all go.mod files regardless of the go version in use.

  • This detail is important, because it allows us to trust the go version declared in that go.mod file: otherwise, an attacker could inject a module using a known-unsupported go version in order to disable source verification.

In the .Info files served by module proxies, include both the version of the go tool used to extract the module, and the go language version actually selected by that tool. For example, if the cmd/go binary from go 1.15.2 only supports the semantics of go 1.13 and above, and is used to extract a module that declares go 1.12, the .info file would indicate:

GoTool: "1.15.2",
GoVersion: "1.13",

This allows module proxies to serve up-to-date checksums even for older or newer clients: if the proxy indicates that the module was extracted using an appropriate go version, then the client can still verify that the zip file matches the recorded checksum, and can still add the checksum to its go.mod file — even though it cannot reproduce that zip file by re-extracting that module from the origin.

Edits:

(CC @rsc @jayconrod @FiloSottile @hyangah @heschik @katiehockman)

@gopherbot gopherbot added this to the Proposal milestone Feb 23, 2019
@bcmills bcmills added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Feb 23, 2019
@bcmills bcmills modified the milestones: Proposal, Go1.13 Feb 23, 2019
@kardianos
Copy link
Contributor

@bcmills I think your alternative proposal is spot on:

  • Alternately, we could change the checksum prefix from h1: to h2: — even though the checksum algorithm itself doesn't need to change — to indicate the reliability of that checksum.

I think the key insight is a hash is only valid if you both define the hash algorithm and what the input is defined to me. In simple situations, such as a hash of a file or chunk, this is easy, you define the hash to be the sequence of bytes of the file.

But in Go, this hash is used over many file over many folders. The hash version should reflect the input, not just the algorithm.

I could easily see people changing an embedded go version in tools or manually for one reason or another and mostly of the time nothing would break. But then in a corner case (hash change), doing so would break go.sum. Don't make this a hidden dependency, tie directly to the hash version.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 25, 2019

@kardianos

The hash version should reflect the input, not just the algorithm.

Note that the “input” in this case may be a .zip file provided by a proxy.

If a user with an older go toolchain obtains a zipfile from a proxy, and the proxy indicates that the file was extracted using the version from its module's go.sum file, then the older go command ought to be able to verify that the contents match the checksum, even though the older toolchain cannot produce that zipfile itself.

If we change the name of the hash function, then that property will not hold: the older go toolchain would not know whether it has the correct hash function, so it could not flag a checksum error if the zipfile doesn't match. That would allow a compromised proxy to serve arbitrary contents to older clients.

But the problem is even worse than that. Since the extraction algorithm changes the contents of the zipfile itself, a zipfile extracted using a newer Go version wouldn't necessarily match the checksum computed by an older version, and the files needed to recompute the correct checksum wouldn't necessarily even be present in the file (the change to the algorithm may have pruned them out). So we would still need some mechanism for older clients to determine whether to record their own checksum: changing the name of the hash function is still only a partial solution.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 25, 2019

On the other hand, it might be nice for older-version clients to be able to record the checksums for probably-incorrectly-extracted versions as well. (If we know that “the copy of module m when extracted with Go 1.12” has checksum X, then it should continue to have checksum X when extracted with Go 1.12, even if the copy extracted with Go 1.13 has a different checksum.)

However, I would argue that that should be part of the path information stored in the go.sum file, not the hash algorithm: any client that understands the algorithm, regardless of version, ought to be able to compute and verify the checksum for a given zipfile.

So perhaps we could add some sort of path suffix (like we do today for the go.sum file) to allow for multiple different extraction algorithms. I'm just not sure that extension would be worth the extra complexity, given that at least the public module proxies should always have the latest release available to use.

@kardianos
Copy link
Contributor

Regardless of how you handle the issue of versioning hash algorithms, it does seem like any update to the hash aglo or verification process should also trigger a new point release on older supported versions of Go, and older versions of Go without the point release should be smart enough to be able to reliably detect that something with the hash algo or process has been changed, that it can't understand it, and report it to the user and suggest an update. For security reasons, it seems like it would probably need to "fail" oh unknown hash algo, but at least the user messaging wouldn't be "don't trust go.sum" but "update your Go install".

So I support another benefit of incrementing the hash algo number as opposed to the go version, is that older clients would have a clear signal "oh, I don't understand this, upgrade me". Rather then, oh, that's a newer version of Go, maybe something changed?

@bcmills
Copy link
Contributor Author

bcmills commented Feb 25, 2019

@kardianos Part of the point of this proposal is that older clients won't need to upgrade, as long as they're getting their modules from an up-to-date module proxy. If the hash function hasn't changed and the zipfile format hasn't changed, why should the client consuming that zipfile and hashing it with that function need to change?

But perhaps we could adjust the behavior a bit. Perhaps we should only fail to record checksums for newer versions, but still verify them: we could emit the “you need to upgrade” warning instead of “checksum mismatch” only if the checksum fails and the required version is newer (as in #28221).

Then, the cases would be:

  • If the proxy returns a zipfile extracted using the required go version, verify that the zipfile matches the checksum and record the checksum, even if the client does not support that go version.
  • If the client supports the required go version but the proxy does not, fail with an explicit upgrade warning (or fall back to another proxy or fetch from the origin if so configured).
  • If neither the client nor the proxy supports the required go version, but the go.sum file has a checksum entry for the given module, fetch the module (from the proxy, if so configured) using the closest supported semantics.
    • If the checksum matches, use the module: any changes to the extraction algorithm must not affect that module.
    • If the checksum does not match, do not use the module, and fail with an upgrade warning.

The interesting case is:

  • If neither the client nor the proxy supports the required go version, and the client does not have a checksum entry for the module, what should we do?
    • Fail open: download the module (from the proxy or origin) using the closest supported semantics, but do not record the checksum, and replace the downloaded copy of the module as soon as the go command is upgraded (or when go mod verify is run).
    • Fail closed: emit an upgrade warning and refuse to download the module.
    • We should not do this, but for completeness, the third option is to “fail corrupted”: download and use the module using the closest supported semantics, and record the resulting checksum in the go.sum file.

@bcmills bcmills changed the title proposal: cmd/go: do not add or check 'go.sum' entries for modules that indicate a newer 'go' version than the tool used to extract them proposal: cmd/go: use the go version declared in the go.mod file to determine module boundaries and checksums Feb 25, 2019
@bcmills
Copy link
Contributor Author

bcmills commented Feb 25, 2019

Edited the proposal per the above comment, with the “fail open” behavior. (I'm open to arguments that we should choose “fail closed” instead, but given the assumption that we don't have a checksum at all I'm skeptical of the benefit — especially given that that should be a very rare condition.)

@bcmills
Copy link
Contributor Author

bcmills commented Feb 25, 2019

For security reasons, it seems like it would probably need to "fail" oh unknown hash algo,

@kardianos, note that that is not the behavior today. We emit a warning if the go.sum algorithm differs from the current one, but do treat that condition as an error.

if len(goSum.m[mod]) > 0 {
fmt.Fprintf(os.Stderr, "warning: verifying %s@%s: unknown hashes in go.sum: %v; adding %v", mod.Path, mod.Version, strings.Join(goSum.m[mod], ", "), h)
}

@jayconrod
Copy link
Contributor

Here's how I'm thinking about this (correct me if I have anything wrong):

  • We have two functions, Z and H.
    • Z creates a zip file from a repository, including some subset of files.
    • H creates a hash of a zip file (the output of Z).
  • H is currently versioned. Go clients need to hold multiple versions of this algorithm so they can verify modules produced by older and newer clients. We currently print a warning and fail open if we can't verify a hash because we don't have a new enough version of H.
  • Z is not explicitly versioned, but there are multiple versions of Z, before and after cmd/go: symbolic links not dropped from repo #27093.
  • Clients verify modules by comparing the output of H to a known value in go.sum. The version of H is selected by a prefix in the go.sum entry. The version of Z used to produce the zip file fed to H is currently not specified.

The goal of this proposal is to make it so that new versions of Z can be introduced without breaking older clients. This works by tying Z to the go version inside each go.mod file.

@jayconrod
Copy link
Contributor

I guess what I'm not clear on is what the current stated purpose of the go declaration is outside of this proposal. There's minimal documentation, and the commit message that introduces it is vague.

In particular, it doesn't seem like there's any mechanism for a client at an older version (say go1.12) to exclude modules that declare a newer version (say go1.20). That means that an old client may need to verify zip files produced by a newer proxy. That seems okay under this proposal. However, if an old client is not using a proxy at all, they won't be able to verify go.sum entries for modules with newer Go versions.

Another concern: what if you want to change Z between minor versions? What if it changes multiple times during development of a new Go version? Different sums will be produced before and after each change.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 25, 2019

I guess what I'm not clear on is what the current stated purpose of the go declaration is outside of this proposal.

See https://github.com/golang/proposal/blob/master/design/28221-go2-transitions.md#language-removals.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 25, 2019

if an old client is not using a proxy at all, they won't be able to verify go.sum entries for modules with newer Go versions.

They will, but only for the subset of modules whose extraction isn't affected by the change. So that doesn't give us free reign in how we change Z: if we make a change that we believe will affect a significant fraction of new modules, and we also believe that a significant fraction of users will not be using proxies, then at the very least we will still need to give a lot of lead time (and probably backport the new logic).

@bcmills
Copy link
Contributor Author

bcmills commented Feb 25, 2019

what if you want to change Z between minor versions?

Under this proposal, that is not permissible.

What if it changes multiple times during development of a new Go version?

As long as most modules during development are still on the old version, that's fine.

That suggests that we should be careful to bump the default go directive only near the end of the cycle, not at the beginning: that way, folks running nightly builds during development won't end up creating modules that are later rendered invalid.

@rsc
Copy link
Contributor

rsc commented Feb 26, 2019

I don't think we should add this complexity. Instead we should make the module extraction as simple as possible. Yes, there was a bug involving symlinks, and we were able to fix it because modules were very young. Now the algorithm is the algorithm. Let's leave it there. Any bugs that remain are now features.

@marwan-at-work
Copy link
Contributor

Now the algorithm is the algorithm. Let's leave it there. Any bugs that remain are now features.

Will this algorithm ever be exposed or formally documented? Or will all proxies forever must use go mod download to ensure consistency across all clients?

@rsc
Copy link
Contributor

rsc commented Mar 7, 2019

@marwan-at-work, proxies should forever use go mod download. The reason it exists is precisely so all proxies and other downloaders agree on the bits and don't have to reimplement all the different version control mechanisms themselves, or even have stale libraries linked in.

@bcmills
Copy link
Contributor Author

bcmills commented Aug 25, 2021

At the moment the problems (and bugs) we have encountered relating to module boundaries has not been severe enough to warrant the extreme churn this proposal would cause.

I am withdrawing it until (and unless) such a severe problem is found.

@rsc
Copy link
Contributor

rsc commented Dec 1, 2021

This proposal has been declined as retracted.
— rsc for the proposal review group

@golang golang locked and limited conversation to collaborators Dec 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

7 participants