Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: NFD-normalized unicode identifiers result in malformed error messages #33271

Closed
niaow opened this issue Jul 25, 2019 · 7 comments
Closed
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@niaow
Copy link

niaow commented Jul 25, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.7 linux/amd64

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/jadenw/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/jadenw/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/lib/go"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build217674304=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I put in an NFD-normalized unicode identifier.

Smallest reproducible playground: https://play.golang.org/p/wf6_glAAIRP

Internally, the ä is encoded as two unicode code points rather than one. If it is NFC normalized (one code point), it works. The spec is fairly vague as to how it should handle NFD unicode.

What did you expect to see?

Either:

  1. it compiles
  2. a properly formatted error message saying why it does not work

What did you see instead?

prog.go:3:7: illegal character U+0308 '̈' (and 1 more errors)

On some text renderers, the combining mark actually merges into the quote at the end of the error message.

Solutions

First, the spec should probably be clarified as to how it processes multi-codepoint letters.
Additionally, combining marks should probably not be written into the error message in raw form.

@robpike
Copy link
Contributor

robpike commented Jul 25, 2019

I think the spec is pretty clear.

identifier = letter { letter | unicode_digit } .

and

letter        = unicode_letter | "_" .

and

unicode_letter = /* a Unicode code point classified as "Letter" */ .

What is vague?

There are outstanding issues to broaden the definition of identifiers, but as it stands the spec quite clearly does not include combining characters.

@niaow
Copy link
Author

niaow commented Jul 25, 2019

Ok. Yeah I misread that originally, missing the definition of unicode_letter. Then really the error message should just be fixed so it doesn't combine, and that would be it.

@niaow niaow changed the title cmd/compile: NFD-normalized unicode identifiers cmd/compile: NFD-normalized unicode identifiers result in malformed error messages Jul 25, 2019
@zerkms
Copy link

zerkms commented Jul 25, 2019

Then really the error message should just be fixed so it doesn't combine, and that would be it.

I'm not following why just the error message should be fixed: why that code is illegal?

The spec probably should switch from using the unicode letter term to unicode character

@ALTree
Copy link
Member

ALTree commented Jul 25, 2019

The spec probably should switch from using the unicode letter term to unicode character

@zerkms For combining, that's #20706, which is being considered for Go2.

@niaow
Copy link
Author

niaow commented Jul 26, 2019

For reference, this issue was not a feature request for that. When I originally did this what I considered a bug was the error message not properly protecting the use of the combining mark - such that it composes with the quotes on compatible text renders, as seen at the end of this error (rendered in atom):
image
As can be seen in the screenshot, a combining mark combined with the second single-quote in the error message.

The reason I had referenced the spec was I was checking what the appropriate definition and missed the unicode_letter definition.

As of now, I am not very well acquainted with go's parser, and have not yet pinned down the creation point of this error. Later (probably after gophercon) I can try to read through and make a fix.

In order to see how other systems handled this, I looked at the Wikipedia page on combining characters. Wikipedia prepends a dotted circle (U+25CC) to the combining mark in order to handle the combination. This also indicates visually that it is meant to be a combining mark, as the dotted circle can be used to represent the absence of a base character. Would combining the mark onto the dotted circle be an appropriate way to render the error? The error from the beginning of the thread would render as:

prog.go:3:7: illegal character U+0308 '◌̈' (and 1 more errors)

@ALTree ALTree added this to the Unplanned milestone Jul 26, 2019
@ALTree ALTree added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jul 26, 2019
@ALTree
Copy link
Member

ALTree commented Jul 26, 2019

@jadr2ddude Thanks for clarifying. What you are asking makes sense, but I'm not sure "combining characters in identifiers" is a common enough mistake to warrant special handling in the displaying of error messages.

Moreover, it seems that applying the combining character to a is not enough to ensure the error message is correctly displayed everywhere; for example your message above, on my machine, is rendered as:

combining

Also cc @griesemer

@niaow
Copy link
Author

niaow commented Jul 27, 2019

Eh, this isnt too critical and the problem might not be worth the fix.

@niaow niaow closed this as completed Jul 27, 2019
@golang golang locked and limited conversation to collaborators Jul 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

5 participants