Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: checksum mismatch for module containing Git LFS files #41708

Open
costela opened this issue Sep 30, 2020 · 7 comments
Open

cmd/go: checksum mismatch for module containing Git LFS files #41708

costela opened this issue Sep 30, 2020 · 7 comments
Labels
modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@costela
Copy link
Contributor

costela commented Sep 30, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/costela/.cache/go-build"
GOENV="/home/costela/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/costela/go/pkg/mod"
GOOS="linux"
GOPATH="/home/costela/go"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/costela/sdk/go1.15.2"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/costela/sdk/go1.15.2/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build342748982=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I think I might have found data corruption on proxy.golang.org, and it's unfortunately unclear how it relates to the module's content.

First, the "correct" case: the zip file stored in proxy.golang.org matches the checksum on sum.golang.org, as expected.

$ go1.15.2 clean -modcache && go1.15.2 mod download -json github.com/costela/bio-rd@v0.0.99-1598984019
{
	"Path": "github.com/costela/bio-rd",
	"Version": "v0.0.99-1598984019",
	"Info": "/home/costela/go/pkg/mod/cache/download/github.com/costela/bio-rd/@v/v0.0.99-1598984019.info",
	"GoMod": "/home/costela/go/pkg/mod/cache/download/github.com/costela/bio-rd/@v/v0.0.99-1598984019.mod",
	"Zip": "/home/costela/go/pkg/mod/cache/download/github.com/costela/bio-rd/@v/v0.0.99-1598984019.zip",
	"Dir": "/home/costela/go/pkg/mod/github.com/costela/bio-rd@v0.0.99-1598984019",
	"Sum": "h1:/wCyhMn+DmkHkW3mGjGSso+QfpGsZMojhthWjFfptvw=",
	"GoModSum": "h1:Dm2pV+USySIWrQ13pjU0+KxXwiKPGdiigDv2fM+RcDs="
}

NOTE: this repo was forked and tagged explicitly to debug this issue. The tag is new and hasn't been "moved".

Now we hit the problem with GOPROXY=direct:

$ go1.15.2 clean -modcache && GOPROXY=direct go1.15.2 mod download -json github.com/costela/bio-rd@v0.0.99-1598984019
{
	"Path": "github.com/costela/bio-rd",
	"Version": "v0.0.99-1598984019",
	"Error": "github.com/costela/bio-rd@v0.0.99-1598984019: verifying module: checksum mismatch\n\tdownloaded: h1:kHzRJEumdz5H8oG53A0iBfTKELjaNSYvv+r9KIJk7p4=\n\tsum.golang.org: h1:/wCyhMn+DmkHkW3mGjGSso+QfpGsZMojhthWjFfptvw=\n\nSECURITY ERROR\nThis download does NOT match the one reported by the checksum server.\nThe bits may have been replaced on the origin server, or an attacker may\nhave intercepted the download attempt.\n\nFor more information, see 'go help module-auth'.\n",
	"Info": "/home/costela/go/pkg/mod/cache/download/github.com/costela/bio-rd/@v/v0.0.99-1598984019.info",
	"GoMod": "/home/costela/go/pkg/mod/cache/download/github.com/costela/bio-rd/@v/v0.0.99-1598984019.mod",
	"GoModSum": "h1:Dm2pV+USySIWrQ13pjU0+KxXwiKPGdiigDv2fM+RcDs="
}

If we compare the zip file received from proxy.golang.org to the one generated locally as a tempfile when using GOPROXY=direct, we notice a discrepancy:
$ diff -urN <(zipinfo goproxy.zip) <(zipinfo direct.zip)

--- /proc/self/fd/11	2020-09-30 10:44:29.400724671 +0200
+++ /proc/self/fd/13	2020-09-30 10:44:29.396724658 +0200
@@ -1,5 +1,5 @@
-Archive:  goproxy.zip
-Zip file size: 3020773 bytes, number of entries: 403
+Archive:  direct.zip
+Zip file size: 7157287 bytes, number of entries: 403
 -rw----     2.0 fat      107 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/.circleci/build-examples
 -rw----     2.0 fat      226 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/.circleci/check-gofmt
 -rw----     2.0 fat      405 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/.circleci/config.yml
@@ -12,7 +12,7 @@
 -rw----     2.0 fat    11357 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/LICENSE
 -rw----     2.0 fat      860 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/README.md
 -rw----     2.0 fat      473 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/RFCs.md
--rw----     2.0 fat      133 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/benchmarks/bgp/decode_real_full_feed/AS8881.raw
+-rw----     2.0 fat 12282584 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/benchmarks/bgp/decode_real_full_feed/AS8881.raw
 -rw----     2.0 fat     2444 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/benchmarks/bgp/decode_real_full_feed/main.go
 -rw----     2.0 fat     4421 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/benchmarks/bgp/learning/main.go
 -rw----     2.0 fat      967 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/benchmarks/ipcache/main.go
@@ -403,4 +403,4 @@
 -rw----     2.0 fat      836 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/util/net/net_addr_test.go
 -rw----     2.0 fat     4095 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/util/servicewrapper/grpc.go
 -rw----     2.0 fat     1091 bl defN 80-000-00 00:00 github.com/costela/bio-rd@v0.0.99-1598984019/util/time/ticker.go
-403 files, 6622187 bytes uncompressed, 2916799 bytes compressed:  56.0%
+403 files, 18904638 bytes uncompressed, 7053313 bytes compressed:  62.7%

It looks like proxy.golang.org silently truncated one binary file in the repository, so the checksum in sum.golang.org is for this "truncated" version, which doesn't correspond to the actual code in the original repository.

What did you expect to see?

The checksum seen when downloading with or without GOPROXY=direct should be the same.

What did you see instead?

Checksum mismatch.

@seankhliao
Copy link
Member

I see the file is stored with Git LFS? I don't think the proxy has that setup

@bcmills
Copy link
Contributor

bcmills commented Sep 30, 2020

For Git LFS, see also #38941 and #39720.

@costela
Copy link
Contributor Author

costela commented Sep 30, 2020

@seankhliao oh man, that's embarassing. Totally overlooked LFS. Then this is just a duplicate of #38941.

Then the question becomes: shouldn't the locally generated zip also ignore all filters, to ensure the content seen by the proxy is the same as seen by go when not using the proxy?

The git archive call already takes some steps to normalize the output, but maybe the existence of git filters was overlooked? It feels a bit weird to have the sort of "environment changes" (whether git-lfs is installed locally or not) have such a disruptive (and opaque) effect.

@bcmills bcmills changed the title proxy.golang.org: possible data corruption cmd/go: checksum mismatch for module containing Git LFS files Sep 30, 2020
@bcmills bcmills added modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 30, 2020
@bcmills
Copy link
Contributor

bcmills commented Sep 30, 2020

(Since #38941 timed out, let's continue the discussion here.)

@bcmills
Copy link
Contributor

bcmills commented Sep 30, 2020

Picking up from #38941 (comment):

The go command uses git archive under the hood.

According to git-lfs/git-lfs#1322 (comment),

If you have Git LFS enabled (i.e., the filter rules are properly set up via git lfs install), a recent version of git archive will include the LFS files in it, even in a bare repository.

So it's not obvious to me why GIT_LFS_SKIP_SMUDGE=1 would be the right resolution here: why would git-lfs users expect Go modules to include (or omit) LFS files, and under what conditions? Does this behavior vary with git and git-lfs versions, and is upgrading to a more recent git and/or git-lfs binary a viable workaround?

(I haven't used Git LFS, so the answers to those questions are not obvious to me.)

@costela
Copy link
Contributor Author

costela commented Sep 30, 2020

I'm unfortunately not familiar with git internals, but I suspect the issue is not exclusive to LFS. Any comparable "git extension" requiring external binaries and opportunistic integration via .gitattributes ("opportunistic" in the sense that git will still work without them) will presumably fail in a similarly opaque way when running GOPROXY=direct go get in a system that happens to have them installed. (git-crypt is another example; though probably less relevant).

It seems unrealistic to expect proxy.golang.org to support all of these extensions. OTOH, there doesn't seem to be a clean way to disable all clean/smudge filters during local checksum generation (i.e.: git archive). The following patch works for me™, but feels like a huge hack:

diff --git a/src/cmd/go/internal/modfetch/codehost/git.go b/src/cmd/go/internal/modfetch/codehost/git.go
index 31921324a7..099837554a 100644
--- a/src/cmd/go/internal/modfetch/codehost/git.go
+++ b/src/cmd/go/internal/modfetch/codehost/git.go
@@ -820,7 +820,7 @@ func (r *gitRepo) ReadZip(rev, subdir string, maxSize int64) (zip io.ReadCloser,
        // text file line endings. Setting -c core.autocrlf=input means only
        // translate files on the way into the repo, not on the way out (archive).
        // The -c core.eol=lf should be unnecessary but set it anyway.
-       archive, err := Run(r.dir, "git", "-c", "core.autocrlf=input", "-c", "core.eol=lf", "archive", "--format=zip", "--prefix=prefix/", info.Name, args)
+       archive, err := Run(r.dir, "git", "-c", "core.autocrlf=input", "-c", "core.eol=lf", "archive", "--worktree-attributes", "--format=zip", "--prefix=prefix/", info.Name, args)
        if err != nil {
                if bytes.Contains(err.(*RunError).Stderr, []byte("did not match any files")) {
                        return nil, os.ErrNotExist

The documentation for the --worktree-attributes flag mentions only usage in a working-tree and we're running archive in a bare repository, so there isn't any guarantee its behavior won't change.

@OneOfOne
Copy link
Contributor

OneOfOne commented Oct 8, 2020

Got hit by this as well, the patch from @costela worked since it was kinda urgent.

@seankhliao seankhliao added this to the Backlog milestone Aug 27, 2022
danielfdickinson added a commit to danielfdickinson/image-handling-mod-demo that referenced this issue Jul 17, 2023
Workaround for golang/go#41708 :
(use git lfs install before use Hugo (Go) modules)

Signed-off-by: Daniel F. Dickinson <dfdpublic@wildtechgarden.ca>
danielfdickinson added a commit to wildtechgarden/www-wildtechgarden that referenced this issue Aug 18, 2023
Work around for golang/go#41708

Signed-off-by: Daniel F. Dickinson <dfdpublic@wildtechgarden.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants