New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go/token: unpack() doesn't handle unicode properly #34322
Comments
There is really not a clear definition of a column with multi-lingual text, with variable-pitch fonts, or in the presence of control or other invisible characters. The number presented in the error message is a byte offset, not a column, and is poorly documented. The documentation ( |
The error message you are seeing is from cmd/compile, not go/parser, which gives a similar message, Additionally, it's unclear from your description what a correct column count should be. Behavior here is inconsistent across languages and compilers. One could argue that code points are not meaningful enough to identify a column offset, and that extended grapheme clusters would be better. It's unclear which is better, but at the least a byte offset is unambiguous to tools, especially in the case of a UTF-8 encoded source file. Let me know if this makes sense to you. I will leave you with an error message comparison from some sibling languages: For example, compare the column/caret positions given by Clang 9, GCC 7, Python 2, Python 3, and Swift 4:
|
After digging deeper into unicode I agree that there is no canonical way to assign a width to a string and the issue is thus unsolvable. Given that a lot of tool integrations do not get this right, there seem to be a lot of bugs to be reported on their side, though. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I used the following test file:
https://play.golang.org/p/g5qvYsFx9lr
What did you expect to see?
The compiler should have produced:
What did you see instead?
Compiling it produces:
Note the different column where the error is claimed to occur. Also not that the test file does not even have 51 columns.
What is the root cause of this?
The column number is calculated in https://golang.org/src/go/token/position.go, function
unpack()
, line 303. This code implicitly assumes that every column is exactly one byte, which is not the case with unicode.Examples:
The text was updated successfully, but these errors were encountered: