Skip to content

x/mod/sumdb/dirhash: incorrect Unix command example #48498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jjlin opened this issue Sep 20, 2021 · 9 comments
Closed

x/mod/sumdb/dirhash: incorrect Unix command example #48498

jjlin opened this issue Sep 20, 2021 · 9 comments
Labels
Documentation Issues describing a change to documentation. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@jjlin
Copy link

jjlin commented Sep 20, 2021

At https://cs.opensource.google/go/x/mod/+/refs/tags/v0.5.0:sumdb/dirhash/hash.go;l=36, the Unix command equivalent given for Hash1() is

find . -type f | sort | sha256sum

which doesn't really capture what Hash1() actually does. It would be less misleading to give an equivalent like

find . -type f | sort | xargs sha256sum | sha256sum
@gopherbot gopherbot added this to the Unreleased milestone Sep 20, 2021
@dr2chase dr2chase added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Documentation Issues describing a change to documentation. labels Sep 20, 2021
@dr2chase
Copy link
Contributor

@jayconrod

@jayconrod jayconrod added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 20, 2021
@MarkLodato
Copy link

MarkLodato commented Sep 14, 2022

Friendly ping. Would you accept a PR to fix?

I believe the following is the actual command needed to get the correct output:

PREFIX="..."
find . -type f | LC_ALL=C sort | xargs sha256sum | sed "s#  \\.#  $PREFIX#" | sha256sum | cut -f1 -d' ' | xxd -r -p | {printf 'h1:'; base64}

This fails if the current directory has no files, or if $PREFIX contains a #, and it's a bit off since the $PREFIX substitution actually happens in the caller of Hash1(), but I think it is close enough for documentation purposes.

@seankhliao
Copy link
Member

cc @bcmills

@bcmills
Copy link
Contributor

bcmills commented Sep 21, 2022

Can we just get rid of the Unix command from the comment instead? The marginal increase in clarity does not seem worth the effort of fine-tuning and reviewing a fix to the example command.

@jjlin
Copy link
Author

jjlin commented Sep 21, 2022

I think that having an example command is useful as a concise overview of what the function does, though saying "prepared as if by the Unix command" seems to imply that the command is supposed to serve as a (close to full) equivalent (which isn't true of course).

Perhaps something like this would be a bit better:

Hash1 is "h1:" followed by the base64-encoding of the binary representation of the SHA-256 hash given (approximately) by the Unix command:

find . -type f | sort | xargs sha256sum | sha256sum

While I appreciate the effort, the command given by @MarkLodato seems too complex to serve as a summary. I suspect most Go programmers would come to understand the details of the function more quickly by just reading the code rather than unraveling that command.

@MarkLodato
Copy link

I don't think the code in question is straightforward enough to substitute for proper documentation. I tried to read it and failed. That's why I came here. In particular, I was looking for:

  • What is the exact transformation of filenames using prefix? The code is complex enough that it takes many minutes to figure this out, and even then, I wasn't confident in my answer.
  • What is the sort order of sort.Strings? Is it the ASCII sort order (LC_ALL=C)?

The unit tests don't help because they are themselves complicated code. It would help to have hard-coded test vectors here, which could serve as a form of documentation.

Most of the complexity in my command is due to the substitution of prefix and converting the final sha256 to base64. What about the following compromise?

Hash1 is "h1:" followed by the base64-encoding of the binary representation of the SHA-256 hash given by the Unix command:

find "$DIR" -type f | LC_ALL=C sort | xargs sha256sum | sha256sum

except that $DIR/ is replaced with $PREFIX/ before the final sha256sum command.

@MarkLodato
Copy link

MarkLodato commented Sep 22, 2022

Also, to take a step back, I'm not actually looking for documentation for this particular function, but rather for how go module checksumming works overall. Right now that information only exists on this one function. But it's awkward because the prefix transformation has already happened and gets magically undone in the open functor, which is difficult to follow and document. That was the source of my confusion: this lists a find command that actually happens outside this function, but doesn't list the prefix substitution which also happens outside this function.

A better solution might be moving this description to the top of the file to describe the format overall. Then you can describe the two interfaces to the checksumming: directory (subject to prefix replacement, and only finding normal files) and zip file (no prefix replacement).

@MarkLodato
Copy link

MarkLodato commented Sep 23, 2022

FWIW, here's another option that is exactly correct but perhaps more readable:

hex_to_base64() { cut -f1 -d' ' | xxd -r -p | base64 ; }
ln -s "$DIR" "$MODULENAME"

echo -n 'h1:' && find "$MODULENAME" -type f | LC_ALL=C sort | xargs -r sha256sum | sha256sum | hex_to_base64

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/464295 mentions this issue: sumdb/dirhash: correct documentation of hash

@golang golang locked and limited conversation to collaborators Feb 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Documentation Issues describing a change to documentation. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

7 participants