Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: clarify legality of non-ASCII graphic characters in import paths #44970

Open
adonovan opened this issue Mar 12, 2021 · 9 comments
Open
Assignees
Labels
Documentation NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@adonovan
Copy link
Member

adonovan commented Mar 12, 2021

About import paths, the spec (https://golang.org/ref/spec#Import_declarations) says:

Implementation restriction: A compiler may restrict ImportPaths to non-empty strings using only characters belonging to Unicode's L, M, N, P, and S general categories (the Graphic characters without spaces) and may also exclude the characters !"#$%&'()*,:;<=>?[]^`{|} and the Unicode replacement character U+FFFD.

but the Go 1.16 release notes (https://golang.org/doc/go1.16#go-command) say

In module mode, the go command now disallows import paths that include non-ASCII characters or path elements with a leading dot character (.). Module paths with these characters were already disallowed (see Module paths and versions), so this change affects only paths within module subdirectories.

The latter wording appears to be a stronger restriction than allowed by the spec. Does the spec need to be changed?

@AlexRouSg
Copy link
Contributor

AlexRouSg commented Mar 12, 2021

The spec deals with the language. Modules isn't part of the language, it's part of the toolchain.

@dmitshur
Copy link
Contributor

CC @jayconrod, @bcmills.

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 12, 2021
@jayconrod
Copy link
Contributor

Hey Alan, hope all is well. We miss you over here.

I think it would be reasonable to note within the spec that clarify the build tool (cmd/go, Bazel, Blaze) is responsible for interpreting import paths and may impose additional restrictions. The restrictions for module paths are documented in Module paths and versions.

Incidentally, in 1.16.2, we rolled back some of the changes in 1.16 because they were affecting more users than we anticipated. Import paths are now validated separately from module paths, and import paths may begin with a leading dot and may contain + characters.

cc @bcmills @matloob

@jayconrod jayconrod added Documentation NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Mar 17, 2021
@jayconrod jayconrod added this to the Go1.17 milestone Mar 17, 2021
@rsc
Copy link
Contributor

rsc commented Apr 6, 2021

The spec is wrong. At the least, an implementation needs to be able to insist on Unicode normalization that would exclude some of those code points. But further restrictions should be allowed as well, along with name-specific restrictions (we don't allow a package named "com1" for example).

@mengzhuo
Copy link
Contributor

@rsc It's a pity that Go will do normalization since package ʕʘϖʘʔ is fine with spec and go tool now.

https://pkg.go.dev/github.com/mengzhuo/moe

import (
        "fmt"
        ʕʘϖʘʔ "github.com/mengzhuo/moe"
)

func main() {
        fmt.Println(ʕʘϖʘʔ.Ꮷಠ_ಠ("ʕʘϖʘʔ powered"))
}

@mknyszek mknyszek modified the milestones: Go1.17, Go1.18 Aug 18, 2021
@ianlancetaylor
Copy link
Contributor

This is in the Go1.18 milestone. Is it likely to happen for 1.18? Thanks.

@rsc You assigned this to yourself back in April.

@griesemer see #44970 (comment)

@griesemer griesemer self-assigned this Nov 18, 2021
@ianlancetaylor
Copy link
Contributor

@griesemer This is in the 1.18 milestone; time to move to 1.19? Thanks.

@griesemer
Copy link
Contributor

Not urgent. Moving to 1.19.

@griesemer griesemer modified the milestones: Go1.18, Go1.19 Jan 28, 2022
@griesemer griesemer modified the milestones: Go1.19, Backlog May 12, 2022
@mdempsky
Copy link
Member

mdempsky commented Jul 19, 2022

As a case study for this, #53954 was an issue relating to macOS's handling of Unicode file names.

The test involved the character Ä, which can be represented either as the composed character c3 84 or the decomposed character 41 cc 88. In Git, we had file paths and source files using the composed character (c3 84).

However, on macOS, the file names are rewritten to use the decomposed characters (41 cc 88). This caused problems with the test, because test/run.go used the file name to construct cmd/compile's -p flag; so it invoked cmd/compile with -p=test/\x41\xcc\x88foo, and wrote the result as ./\x41\xcc\x88foo.a.

Then when compiling the second package, cmd/compile saw import "./\xc3\x84foo" and tried opening ./xc3\x84foo.a. The macOS file system layer silently translated this into opening ./\x41\xcc\x88foo.a instead, but that tripped a recently added assertion that ensures the -p flag used to compile a package (i.e., "test/\x41\xcc\x88foo") matches the import path string used to import it (i.e., "test/\xc3\x84foo").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

10 participants