Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: module-mode import path check is more restrictive than GOPATH mode #29101

Closed
bcmills opened this issue Dec 4, 2018 · 11 comments
Closed
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge GoCommand cmd/go modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Dec 4, 2018

As of #18660, go get in GOPATH mode allows any Unicode letter, but in module mode allows only ASCII letters in order to ensure that the path can be represented in the filesystem-agnostic “safe” encoding.

We should resolve that difference, ideally by defining a filesystem-safe encoding for all paths currently accepted in GOPATH mode.

(CC @rsc @dmitshur)

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. GoCommand cmd/go modules labels Dec 4, 2018
@bcmills bcmills added this to the Go1.13 milestone Dec 4, 2018
@dmitshur
Copy link
Contributor

dmitshur commented Dec 7, 2018

As of #18660, go get in GOPATH mode allows any Unicode letter

To be more precise, #18660 was about allowing Unicode letters within the "subdir" part of github.com import paths like "github.com/user/repo/subdir" (but not within "user" or "repo" parts).

To my knowledge, Unicode letters were allowed in vanity import paths since Go 1 or close to thereof. I tried Go 1.2.2 just now (the oldest release with binaries at https://golang.org/dl/), and it doesn't reject Unicode letters in a vanity import path:

$ ./bin/go version
go version go1.2.2 linux/amd64
$ ./bin/go get dmitri.shuralyov.com/temp/go-get-issue-unicode/испытание
$ echo $?
0

@FiloSottile
Copy link
Contributor

There is a relevant passage in the cmd/go/internal/module docs by @rsc that mentions that the eventual plan is to allow Unicode letters.

// Although paths are disallowed from using Unicode (see pathOK above),
// the eventual plan is to allow Unicode letters as well, to assume that
// file systems and URLs are Unicode-safe (storing UTF-8), and apply
// the !-for-uppercase convention. Note however that not all runes that
// are different but case-fold equivalent are an upper/lower pair.
// For example, U+004B ('K'), U+006B ('k'), and U+212A ('K' for Kelvin)
// are considered to case-fold to each other. When we do add Unicode
// letters, we must not assume that upper/lower are the only case-equivalent pairs.
// Perhaps the Kelvin symbol would be disallowed entirely, for example.
// Or perhaps it would encode as "!!k", or perhaps as "(212A)".

I assume this is now targeting Go 1.14? Should it be a release-blocker?

@gopherbot
Copy link

Change https://golang.org/cl/212198 mentions this issue: module: allow unicode Letters in CheckImportPath

@matloob
Copy link
Contributor

matloob commented Jan 24, 2020

Following up here as requested in @rsc's comment on golang.org/cl/212198.

I'm not sure what we did historically, but it looks like now, we do two checks when go-getting in GOPATH mode: first, we call cmd/go/internal/get.CheckImportPath, which is a copy of the definition of CheckImportPath in golang.org/x/mod/module, except that it additionally allows unicode letters as components of import paths (through an edit in its copy of pathOK).

As I understand from reading the code, the get.CheckImportPath call is the only check we do for "vanity" import paths. After that check, when we look up which VCS system is being used, for VCS-backed paths, the paths need to match against regexps is vcsPaths. Only the GitHub import path regexp allows non-ascii Unicode letters (not in the repo name, but it paths within a repo). And as I understand it we don't do any stricter checks for non-VCS-backed import paths. So in effect, non-ascii Unicode letters are only allowed in non-VCS-backed import paths and in (some) components of Github import paths.

@rsc
Copy link
Contributor

rsc commented Jun 4, 2020

There are a lot of issues to solve once we start allowing Unicode.
The special case for GitHub path elements after the module path is unfortunate and should probably be removed. Then we can reexamine the much larger problem as a whole. Let's not privilege GitHub users any longer.

Let's make import path elements checked with pathOK starting at the beginning of the Go 1.16 cycle. If a very important existing package comes up we can reconsider.

@gopherbot
Copy link

Change https://golang.org/cl/250942 mentions this issue: cmd/go/internal/get: make pathOK identical to that in golang.org/x/mod

@matloob
Copy link
Contributor

matloob commented Aug 28, 2020

Some more context: I'm looking into doing this as the Go 1.16 tree has opened, but it's not clear which path to take:

GOPATH mode lets through unicode letters in github import paths, so using pathOK to check import paths will break users in GOPATH mode using these repos.

In module mode, the proxy disables non-ASCII unicode letters, but if GOPROXY=direct, the go command will use the more lenient checking that allows non-ASCII unicode letters:

# Without proxy
$ GO111MODULE=on GOPROXY=direct go get github.com/matloob/unicode/𝕞𝕒𝕥𝕝𝕠𝕠𝕓
go: found github.com/matloob/unicode/𝕞𝕒𝕥𝕝𝕠𝕠𝕓 in github.com/matloob/unicode v0.0.0-20200828144222-b203c46e6021

# With proxy
$ GO111MODULE=on go get github.com/matloob/unicode/𝕞𝕒𝕥𝕝𝕠𝕠𝕓 
go get github.com/matloob/unicode/𝕞𝕒𝕥𝕝𝕠𝕠𝕓: malformed module path "github.com/matloob/unicode/𝕞𝕒𝕥𝕝𝕠𝕠𝕓": invalid char '𝕞'

So the question is do we just want to disable these paths in module mode, or for both module and GOPATH mode. Disabling them for module mode sounds like a reasonable idea because these modules can't be used with the proxy, but there might be GOPATH users we'd be breaking.

I'm going to try to see if I can find a GitHub index I can search through to see if any repos with go files in directories with non-ASCII unicode letters actually exist.

@gopherbot
Copy link

Change https://golang.org/cl/251878 mentions this issue: src/cmd/go/internal/get: disallow non-ASCII unicode letters from import paths

@gopherbot
Copy link

Change https://golang.org/cl/251877 mentions this issue: cmd: update vendored golang.org/x/mod

gopherbot pushed a commit that referenced this issue Sep 1, 2020
This pulls in golang.org/cl/250920 which rejects Windows shortnames as
path components in module.CheckImportPath (as is already done in
cmd/go/internal/get's copy of CheckImportPath). This will allow us to replace
the copy of CheckImportPath with the original.

This also pulls in golang.org/cl/250919 which rejects + in CheckPath and
CheckImportPath, and golang.org/cl/235597, which adds methods to the zip
package for gorelease, but shouldn't affect cmd.

This change also updates the cmd/go test case TestScript/mod_bad_filenames
to reflect that golang.org/x/mod/zip error messages now include filenames
for bad file names that can't be included in zip archives.

Updates #29101

Change-Id: I7f654325dc33b19bc9c6f77a56546747add5a47f
Reviewed-on: https://go-review.googlesource.com/c/go/+/251877
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jay Conrod <jayconrod@google.com>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
gopherbot pushed a commit that referenced this issue Sep 1, 2020
…aths

The copy of CheckImportPath in path.go and the regular expression for github
repos in vcsPaths together allow import paths with unicode letters with import
paths. These all come from github repos with non-ASCII unicode letters
with paths in directories. This mainly shows up in GOPATH mode, but could
also show up in Module mode when getting a module in GOPROXY=direct mode.

We expect there to not be any significant affected users of this change--
an investingation of github repos that would produce import paths that
would comply with the copy CheckImportPaths that's being removed, but not
modload.CheckImportPaths only surfaced a handful of cases, all of which
seemed to be small test or demonstation repos. But this CL is being
submitted early in the cycle so that it can be backed out if need be.

Updates #29101

Change-Id: I719df4af5b318e1330e90d8a0bffe5bb8d816f4f
Reviewed-on: https://go-review.googlesource.com/c/go/+/251878
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
@gopherbot
Copy link

Change https://golang.org/cl/252297 mentions this issue: cmd/go: remove TestScript/get_unicode

gopherbot pushed a commit that referenced this issue Sep 1, 2020
That test tested that import paths with non-ASCII unicode paths
were allowed by the Go command. Remove this test case because
golang.org/cl/251878 removes that support.

Also rewrite a test case in TestRepoRootForImportPath in the test
for cmd/go/internal/get to reflect that unicode directory names are now
disallowed.

Updates #29101

Change-Id: I669e220facd04fc82ccd05dd08e8f1ff4d48b1fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/252297
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
@matloob
Copy link
Contributor

matloob commented Sep 14, 2020

On tip, the module mode check is now the same as the GOPATH mode check.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge GoCommand cmd/go modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants