Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: revisit allowed set of characters in module, import, and file paths #45549

Open
jayconrod opened this issue Apr 13, 2021 · 11 comments
Open
Labels
modules NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@jayconrod
Copy link
Contributor

Currently, import paths have the following lexical restrictions (see module.CheckImportPath):

  • Must consist of valid path elements, separated by slashes. Must not begin or end with a slash.
  • A valid path element is a non-empty string that consists of ASCII letters, ASCII digits, and the punctuation characters - . _ ~. Must not end with a dot or contain two dots in a row.
  • A path element prefix up to the first dot must not be a reserved name on Windows, regardless of case (CON, com1, ...). An element must not have a suffix of a tilde followed by ASCII digits (like a Windows short name).

Module paths have the same restrictions as import paths, with additional constraints (see module.CheckPath:

  • The first path element (by convention, a domain name) must const only lower-case ASCII letters, ASCII digits, dots, and dashes. It must contain at least one dot and must not start with a dash.
  • If the path ends with /vN where N consists of ASCII digits and dots, N must not begin with 0, must not be 1, and must not contain any dots (there's a separate special case for gopkg.in/... module paths).
  • No path element may begin with a dot.

File paths have the same restrictions as import paths, but the set of allowed characters is larger (see module.CheckFilePath):

  • Path elements may consist of Unicode letters, ASCII digits, ASCII spaces, and ASCII punctuation characters ! # $ % & ( ) + , - . = @ [ ] ^ _ { } ~. The remaining ASCII punctuation characters " * < > ? ` ' | / \ : are excluded.

These restrictions are generally in place for good reasons (see Unicode restrictions):

  • Module paths are frequently written and encoded into URLs, and we don't want to allow strings that interfere with that (for example, non-ASCII domain names).
  • Module contents are extracted into directories on a variety of systems. We don't want to allow strings that aren't valid file names or might collide with a different string (on case-insensitive or Unicode normalizing systems). We don't want to allow strings that are reserved, might be interpreted by the shell, might be interpreted as a flag (starting with -), or might be interpreted as a repository (.git).

That being said, these restrictions more English-centric than necessary (#45507). They're also more restrictive than GOPATH (#29101).

We should come up with a wider set of characters that may be allowed without causing compatibility problems, particularly for import and file paths.

cc @bcmills @matloob

@duolabmeng6
Copy link

Please support Chinese characters

@ddbxyrj
Copy link

ddbxyrj commented Jan 20, 2022

For culture diversity, maybe we should take more uncode tyep into consideration.

@golang golang deleted a comment from yangyile1990 Mar 16, 2022
MawKKe added a commit to MawKKe/audiobook-split-ffmpeg-go that referenced this issue Mar 31, 2022
The file in question is not a Go file, but a file for testing. The
filename has quotes in it, causing error during install:

$ go install github.com/MawKKe/audiobook-split-ffmpeg-go/cmd/audiobook-split-ffmpeg@latest
go: github.com/MawKKe/audiobook-split-ffmpeg-go/cmd/audiobook-split-ffmpeg@latest:
create zip: test/beep with spaces and some' quotes" in name.m4a:
malformed file path "test/beep with spaces and some' quotes\" in
name.m4a": invalid char '\''

Perhaps these are related?
- golang/go#50396
- golang/go#45549

Idk, life is too short for dealing with shitty tooling...
@FiloSottile
Copy link
Contributor

Related: the handling of punycode domains. #20210

@FiloSottile
Copy link
Contributor

Also related, the conclusion that it's up to review tooling to keep homoglyph or LTR/RTL attacks at bay. https://research.swtch.com/trojan

@sxin0
Copy link

sxin0 commented Dec 29, 2022

Please support Chinese characters

@FiloSottile
Copy link
Contributor

Also related, #44970 discusses spec interactions.

@yzzd
Copy link

yzzd commented Mar 29, 2023

Please support Chinese characters

go1.15.15 (This version is normal, and errors are reported in subsequent versions)

@ShaharSep
Copy link

Proposal: skip checking resource file names
For example. the package of "github.com/google/wuffs" contains a filename named 😻.txt .
The file is not part of the module, but a resource used for tests.
It's path is within Unicode standards.
I would like to think the rules can be more flexible here ;)

@yangyile1990
Copy link

when I use go 1.15 without go.mod, my go package can name as "ACM题目小马过河"。

while after I use go.mod in go1.20 or go1.21,it says. not support.

I think the "ACM题目小马过河" is easy to be understood for me. easy more than "ACM topic Pony Crossing the River".

So I think it's important to support native languages。

If you think it can make some mistakes. you can use a flag such as "support_native_language", when I open it, my package can not be popular but only for fun.

@yzzd
Copy link

yzzd commented Sep 9, 2023 via email

@SgtCoDFish
Copy link

SgtCoDFish commented Mar 11, 2024

Since #66243 was closed as a dupe of this issue, it's worth pointing out here that this issue seems to break the Go Sum DB. As an example, https://sum.golang.org/lookup/github.com/!doppler!h!q/cli@v0.5.9 currently has the following output:

not found: create zip: docker/node:alpine: malformed file path "docker/node:alpine": invalid char ':'
docker/python:alpine: malformed file path "docker/python:alpine": invalid char ':'
docker/ruby:alpine: malformed file path "docker/ruby:alpine": invalid char ':'

This seems to be because there are files in the repo which have colons in.

(It seems like maybe a separate bug that the Go sum DB prints errors like that as output)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
modules NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

9 participants