Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: mod verify fails if directory entries are in zip file #53448

Open
darkfeline opened this issue Jun 19, 2022 · 16 comments
Open

cmd/go: mod verify fails if directory entries are in zip file #53448

darkfeline opened this issue Jun 19, 2022 · 16 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. GoCommand cmd/go modules NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@darkfeline
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.18.3 linux/amd64

Does this issue reproduce with the latest release?

I think so

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18.3"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build456399546=/tmp/go-build -gno-record-gcc-switches"

What did you do?

go mod verify fails immediately on freshly downloaded modules. This can be reproduced with a hermetic Docker setup:

Dockerfile:

FROM docker.io/library/golang:alpine as builder
WORKDIR /usr/src/app
COPY go.mod go.sum ./
RUN go mod download && go mod verify
COPY . .
RUN go build -v .

go.mod:

module example
go 1.18
require go.felesatra.moe/cloudflare v0.3.0

main.go:

package main

import (
	"go.felesatra.moe/cloudflare"
)

func main() {
	_ := cloudflare.Client{}
}

I suspect there's something weird about the module zip files I build for go.felesatra.moe/cloudflare, but I'm not sure where to start looking.

Also, it seems strange how go mod verify could fail. If I understand correctly, it's checking that the downloaded zip and the unpacked dir haven't been modified, and given the above hermetic Docker reproduction I don't see how that could be the case.

What did you expect to see?

No error

What did you see instead?

go mod verify says the downloaded dir has been modified

@seankhliao
Copy link
Member

how were the zip files made?
you can try comparing the hash results using https://pkg.go.dev/golang.org/x/mod/sumdb/dirhash HashZip and HashDir

@ZekeLu
Copy link
Contributor

ZekeLu commented Jun 19, 2022

@seankhliao Thanks for pointing the direction. The zip file (https://proxy.golang.org/go.felesatra.moe/cloudflare/@v/v0.3.0.zip) contains an unexpected folder entry (go.felesatra.moe/cloudflare@v0.3.0). If I remove this entry, the hashes will match.

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2022-06-19 12:55:24 D....            0            0  go.felesatra.moe/cloudflare@v0.3.0
2022-06-19 12:55:24 .....        11358         3949  go.felesatra.moe/cloudflare@v0.3.0/LICENSE
2022-06-19 12:55:24 .....          567          308  go.felesatra.moe/cloudflare@v0.3.0/README.md
2022-06-19 12:55:24 .....         3541         1411  go.felesatra.moe/cloudflare@v0.3.0/client.go
2022-06-19 12:55:24 .....         1309          679  go.felesatra.moe/cloudflare@v0.3.0/client_test.go
2022-06-19 12:55:24 .....           44           44  go.felesatra.moe/cloudflare@v0.3.0/go.mod
2022-06-19 12:55:24 .....         3185         1016  go.felesatra.moe/cloudflare@v0.3.0/tools.go
2022-06-19 12:55:24 .....          843          515  go.felesatra.moe/cloudflare@v0.3.0/tools_test.go
------------------- ----- ------------ ------------  ------------------------
2022-06-19 12:55:24              20847         7922  7 files, 1 folders

@seankhliao seankhliao changed the title go mod verify fails immediately after fresh go mod download cmd/go: mod verify fails if directory entries are in zip file Jun 19, 2022
@seankhliao seankhliao added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 19, 2022
@seankhliao
Copy link
Member

not sure if HashZip should skip over directory entries or if they should continue to be invalid (to be strict on input)

cc @bcmills @matloob

@bcmills
Copy link
Contributor

bcmills commented Jun 21, 2022

not sure if HashZip should skip over directory entries or if they should continue to be invalid (to be strict on input)

It seems to me that the bug here is that HashZip presents directories to the hash function whereas HashDir explicitly ignores them.

Unfortunately, there is no way in general to make HashDir match the original checksum computed by HashZip for these zipfiles: in order to figure out the correct hash, we would need to know whether the zipfile contained the spurious directory entry or not, and by then it's too late.

Ideally I think we should change golang.org/x/mod/zip.Unzip and CheckZip to explicitly reject zip files that contain directories — that failure mode seems much clearer than a checksum mismatch, and now that we know that even non-empty directories lead to checksum mismatches I'm more comfortable making that an error. (We discussed this behavior a bit in a review thread, but didn't realize at the time that it would lead to checksum mismatches.)

@bcmills
Copy link
Contributor

bcmills commented Jun 21, 2022

(We could, independently, also skip directories explicitly in dirhash.HashZip, but I'm not sure that that would really do much good — if the zip file has already been downloaded, it would make the file seem like it had been corrupted when it really hasn't been.)

@bcmills bcmills added NeedsFix The path to resolution is known, but the work has not been done. modules labels Jun 21, 2022
@bcmills bcmills added this to the Backlog milestone Jun 21, 2022
@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 21, 2022
@darkfeline
Copy link
Contributor Author

I don't know if there's a formal reference for zip file format, but my experience is that the dir entry is commonly present. The zip file in question I constructed using git archive --format=zip --prefix=blah@v0.1.0/, so I imagine other folks will run into this once they try building module zip files. For reference, doing 7z a module@v0.1.0.zip module@v0.1.0 also includes the dir entry.

@bcmills
Copy link
Contributor

bcmills commented Jun 27, 2022

@darkfeline, the zip format is documented in https://go.dev/ref/mod#zip-files, and is not the same as what is produced by git archive.

That reference does currently state that “Empty directories (entries with paths ending with a slash) may be included in module zip files but are not extracted. The go command does not include empty directories in zip files it creates.” However, in light of the checksum mismatches caused by those entries, it seems clear to me that it needs to be revised.

@bcmills bcmills modified the milestones: Backlog, Go1.20 Jun 27, 2022
@bcmills bcmills self-assigned this Jun 27, 2022
@darkfeline
Copy link
Contributor Author

I'm talking about the zip format generally, e.g., whether it is considered standard practice or correct to include dir entries. Because that is what both git archive and 7z do (and what I generally see), I assume that is de facto how proper zip archives should be.

Hypothetically then, Go recommending zip archives not containing such dir entries would mean Go is recommending improper/incorrect zip archives to be used. Hence my open question on a formal reference for zip file format.

I might be preaching to the choir, but my opinion is that Go should probably explicitly tolerate zip archives with such dir entries assuming that de facto everyone expects zip archives to include them, even if Go recommends not including them. This is in response to your above comment:

Ideally I think we should change golang.org/x/mod/zip.Unzip and CheckZip to explicitly reject zip files that contain directories

@bcmills
Copy link
Contributor

bcmills commented Jun 29, 2022

That's fair. I think we'd still need to adjust the checksumming, though, and that will still invalidate all existing modules with zipfiles containing directory entries. 🤷‍♂️

@bcmills bcmills added the early-in-cycle A change that should be done early in the 3 month dev cycle. label Jul 13, 2022
@gopherbot
Copy link

This issue is currently labeled as early-in-cycle for Go 1.20.
That time is now, so a friendly reminder to look at it again.

@rsc
Copy link
Contributor

rsc commented Jan 31, 2023

I looked into this a bit with @bcmills. A few notes.

  1. As far as scope of the problem, I am scanning the cached modules in the proxy and up to Jan 1 2021 the only affected modules are in the go.felesatra.moe domain. Of course, we don't want to break any modules at all, but it's reassuring that the problem is not more widespread.
  2. The checksum database lists multiple go.sum lines, and all the access code treats the result as a list of lines. The go command has a checksum it is looking for, and if it's listed on one of the lines, all is well. The code was written this way specifically to accommodate hash algorithm updates (from h1: to h2:) but it will work for updating the canonical h1: hash as well.
  3. The overall plan would be:
    • Update the struct listed in 'go help mod download' to add two new fields OldSums []string and OldGoSums []string. The go command would then compute the "fixed" h1 hash as Sum and GoModSum and the original, if it differs, into OldSums and OldGoSums. (The go.mod sums are unaffected due to this bug, so OldGoSums will always be empty for now. OldSums will only be non-empty for affected modules, which right now looks like only go.felesatra.moe modules.)
    • When checking against a checksum database result, the go command changes to accept either the canonical sum or any of the "old" sums.
    • When checking against an existing go.sum line, the go command changes to accept either the canonical sum or any of the "old" sums.
    • When writing a new line to go.sum (because there is no existing line for a given module), the go command writes only the canonical sum.
    • Update the checksum database server to include OldSums and OldGoSums as well as Sum and GoModSum in its checksum entries. This must happen before or at the same time that the checksum database starts using the new go command. Otherwise older go commands will be served entries with only the new checksums (for newly indexed modules) and will fail to validate them.
    • Any external verifiers would need to be updated to check that the database lists a subset of the sums listed in 'go mod download -json'.

This problem seems like a fairly low priority, but that would be the plan.

@rsc
Copy link
Contributor

rsc commented Jan 31, 2023

@darkfeline are you using your own code to generate the modules, or are you using some broadly available tool? If the latter we'd want to notify the author. Thanks.

@darkfeline
Copy link
Contributor Author

I am using git -C code-dir archive --format=zip --prefix=package@v0.1.0/ v0.1.0 (code)

@gopherbot
Copy link

This issue is currently labeled as early-in-cycle for Go 1.21.
That time is now, so a friendly reminder to look at it again.

@gopherbot
Copy link

gopherbot commented Jul 18, 2023

This issue is currently labeled as early-in-cycle for Go 1.22.
That time is now, so a friendly reminder to look at it again.

@heschi heschi modified the milestones: Go1.21, Go1.22 Jul 18, 2023
@bcmills
Copy link
Contributor

bcmills commented Sep 25, 2023

I think this needs to wait for at least Go 1.23, so that Go toolchains that don't enforce strict toolchain bounds are no longer supported when the hashes change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. GoCommand cmd/go modules NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

7 participants