Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: unzip github.com/bep/golibsass@v0.1.0 fails with invalid char '́' (U+0301) #37139

Closed
bep opened this issue Feb 8, 2020 · 10 comments
Closed
Labels
FrozenDueToAge GoCommand cmd/go modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Unfortunate
Milestone

Comments

@bep
Copy link
Contributor

bep commented Feb 8, 2020

Embedding https://github.com/sass/libsass as a Git subtree in a Go Module fails:

-> unzip /Users/bep/go/pkg/mod/cache/download/github.com/bep/golibsass/@v/v0.1.0.zip: malformed file path "libsass_src/test/e2e/unicode-pwd/Sáss-UŢF8/input.scss": invalid char '́'
go get github.com/bep/golibsass: unzip /Users/bep/go/pkg/mod/cache/download/github.com/bep/golibsass/@v/v0.1.0.zip: malformed file path "libsass_src/test/e2e/unicode-pwd/Sáss-UŢF8/input.scss": invalid char '́'

See sass/libsass#3055 -- where some maintainer claims that the path in question is valid Unicode.

go version go1.13.7 darwin/amd64
go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/bep/Library/Caches/go-build"
GOENV="/Users/bep/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/bep/go"
GOPRIVATE=""
GOPROXY="direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/bep/dev/go/bep/golibsass/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/wy/g1rt5g3s5bd9flpq8g1741740000gn/T/go-build126617397=/tmp/go-build -gno-record-gcc-switches -fno-common"
@dmitshur dmitshur added GoCommand cmd/go modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Feb 8, 2020
@dmitshur
Copy link
Contributor

dmitshur commented Feb 8, 2020

/cc @matloob @jayconrod @bcmills

@dmitshur dmitshur changed the title mod: Unzip: invalid char '́' cmd/go: unzip github.com/bep/golibsass@v0.1.0 fails with invalid char '́' Feb 8, 2020
@dmitshur
Copy link
Contributor

dmitshur commented Feb 8, 2020

Reproduce steps:

$ go version
go version go1.14rc1 darwin/amd64
$ export GO111MODULE=on                          
$ export GOPATH=$(mktemp -d)
$ export GOPROXY=direct
$ export GOSUMDB=off 
$ go mod download github.com/bep/golibsass@v0.1.0
create zip: malformed file path "libsass_src/test/e2e/unicode-pwd/Sáss-UŢF8/input.scss": invalid char '́'
$ 

@dmitshur dmitshur added this to the Backlog milestone Feb 8, 2020
@dmitshur
Copy link
Contributor

dmitshur commented Feb 8, 2020

The error is coming from module.CheckFilePath:

$ goexec -quiet 'fmt.Println(module.CheckFilePath("libsass_src/test/e2e/unicode-pwd/Sáss-UŢF8/input.scss"))'
malformed file path "libsass_src/test/e2e/unicode-pwd/Sáss-UŢF8/input.scss": invalid char '́'

Its documentation says:

CheckFilePath checks that a slash-separated file path is valid.
The definition of a valid file path is the same as the definition
of a valid import path except that the set of allowed characters is larger:
all Unicode letters, ASCII digits, the ASCII space character (U+0020),
and the ASCII punctuation characters
“!#$%&()+,-.=@[]^_{}~”.
(The excluded punctuation characters, " * < > ? ` ' | / \ and :,
have special meanings in certain shells or operating systems.)

CheckFilePath may be less restrictive in the future, but see the
top-level package documentation for additional information about
subtleties of Unicode.

The Unicode Restrictions section is likely relevant.

@dmitshur

This comment has been minimized.

@dmitshur

This comment has been minimized.

@dmitshur
Copy link
Contributor

dmitshur commented Feb 8, 2020

I made mistakes in the two comments above, so disregard them.

CheckFilePath is documented and implemented to accept all Unicode letters. The problem is that one of the runes in the string "Sáss-UŢF8" is determined not to be a Unicode letter. See https://play.golang.org/p/7l4Je9wHcNY, which prints:

checking string "Sáss-UŢF8" rune by rune

checking U+0053 ('S') ... character ok
checking U+0061 ('a') ... character ok
checking U+0301 ('́') ... invalid char '́' because rune U+0301 ('́') is not a Unicode letter
checking U+0073 ('s') ... character ok
checking U+0073 ('s') ... character ok
checking U+002D ('-') ... character ok
checking U+0055 ('U') ... character ok
checking U+0054 ('T') ... character ok
checking U+0327 ('̧') ... invalid char '̧' because rune U+0327 ('̧') is not a Unicode letter
checking U+0046 ('F') ... character ok
checking U+0038 ('8') ... character ok

I suspect this has to do with Unicode normalization and Unicode combining characters.

Leaving this for people familiar with cmd/go to decide what to do next here.

@dmitshur dmitshur changed the title cmd/go: unzip github.com/bep/golibsass@v0.1.0 fails with invalid char '́' cmd/go: unzip github.com/bep/golibsass@v0.1.0 fails with invalid char '́' (U+0301) Feb 8, 2020
@jayconrod
Copy link
Contributor

cc @matloob @bcmills

Ideally, this seems like a string we should accept. Though we'll have to be careful about file system Unicode normalization (I think HFS+ does this).

There are a couple other open issues about path validation (#29101, #28389, #36774, #31376, probably more). We should come up with a plan to fix all of these before making changes.

@bcmills
Copy link
Contributor

bcmills commented Feb 10, 2020

Sáss-UŢF8 indeed is “valid Unicode”, but the go command intentionally rejects some paths that are valid unicode so that (for example) module proxies can extract and serve the same file contents (without filename collisions) regardless of the underlying filesystem. HFS+ in particular normalizes to NFD, whereas many other environments tend to use NFC.

(The Sáss-UŢF8 variant should be accepted, since it uses only letters: https://play.golang.org/p/3v35DpZC3v0)

@bcmills
Copy link
Contributor

bcmills commented Feb 10, 2020

Ideally, this seems like a string we should accept.

Maybe? But we would need to tighten the duplicate-path validation, I think. At the moment we reject case-equivalent paths in module contents, but we would need to also check for normalization-equivalent paths.

@bcmills
Copy link
Contributor

bcmills commented Feb 10, 2020

Oh, neat! The libsass source tree intentionally includes what appears to be a pair of normalization-equivalent subdirectories:
https://github.com/sass/libsass/tree/28bbd0e7c2eda384912242213b767bf310bea1bb/test/e2e/unicode-pwd

So, yeah: that part of the source tree can never feasibly be included in a Go module.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge GoCommand cmd/go modules NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Unfortunate
Projects
None yet
Development

No branches or pull requests

5 participants