Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/gofmt: incorrect comment alignment with double-width runes #7481

Closed
gopherbot opened this issue Mar 6, 2014 · 13 comments
Closed

cmd/gofmt: incorrect comment alignment with double-width runes #7481

gopherbot opened this issue Mar 6, 2014 · 13 comments

Comments

@gopherbot
Copy link

by travis.cardwell:

The gofmt command does not take into account the width of runes when calculating the
alignment for comments.  I ran into the issue in a program that has rune constants, some
of which are single-width and some of which are double-width, but the issue is more
apparent in the following (contrived) example:

What does 'go version' print?

$ go version
go version go1.2.1 linux/amd64

What steps will reproduce the problem?

$ cat fmttest.go
package fmttest

const (
    msgEn = "This is a test."    // English
    msgJa = "これはテストです。" // Japanese
)

$ gofmt -d fmttest.go
diff fmttest.go gofmt/fmttest.go
--- /tmp/gofmt819271649 2014-03-06 14:13:28.000000000 +0900
+++ /tmp/gofmt002870220 2014-03-06 14:13:28.000000000 +0900
@@ -1,6 +1,6 @@
 package fmttest
 
 const (
-   msgEn = "This is a test."    // English
-   msgJa = "これはテストです。" // Japanese
+   msgEn = "This is a test." // English
+   msgJa = "これはテストです。"       // Japanese
 )

The issue is not clear in this web interface [with my font settings] because spaces do
not have the same constant width as non-spaces, but it is clear if you copy the above
example into a file and run the gofmt command in a terminal.

What happened?

The alignment for comments is calculated based on the maximum width of code in the
block.  Currently, all runes are assumed to have a width of 1.  With tabwidth=8, the
width of the msgEn code is 33 columns, and the width of the msgJa code is incorrectly
calculated as 27 columns.  Comments are therefore calculated to start on column 35.  The
msgEn comment is offset from the code by one space, and the msgJa comment is offset from
the code by 7 spaces.  The result is not aligned properly.  The original alignment was
expected to be correct.

What should have happened instead?

Instead, the width of runes should be taken into account when calculating alignment. 
With tabwidth=8, the width of the msgJa code should be calculated as 36 columns, as all
9 runes in the string are double-width.  Comments should then be calculated to start on
column 38, resulting in the original alignment.

Please provide any additional information below.

Note that the length of a rune (in bytes) is unrelated to the width of a rune (in
columns).  When testing, I highly recommend including troublesome cases.  For example,
the following rune is commonly used in Japanese yet causes issues in some software:

http://decodeunicode.org/u+203B
@minux
Copy link
Member

minux commented Mar 6, 2014

Comment 1:

I don't think we can solve this problem without introducing megabytes of data
to go/printer, not to mention the problem of runes with ambiguous widths.

@gopherbot
Copy link
Author

Comment 2 by travis.cardwell:

Vim does a great job with calculating widths, though it is not perfect in dealing with
ambiguous characters.  I just retrieved the source from [1] and found the relevant code
in "src/mbyte.c" /^utf_char2cells/.  Intervals are used to identify double-width and
ambiguous characters, so not too much data is required.
[1] http://www.vim.org/sources.php

@griesemer
Copy link
Contributor

Comment 3:

The problem is not so much in gofmt (or go/printer) but in text/tabwriter, which is the
package dealing with alignment. As has been pointed out in #1, doing this correctly
assumes detailed knowledge of font sizes which was an explicit non-goal.
There may be heuristics that work reasonably well (#2), but it won't solve the problem
for people using proportional fonts for programming (which would be the right thing to
do for readability).
But even if this information were available, formatting it for one person using one
editor and font does not help another person using a different editor and font.
gofmt used to have a mode where all alignment was indicated with tabs (as one would do
on a mechanical type writer, with the tabs properly set for each section), and the idea
was that perhaps at some point an editor (or editor "plug-in") would come along that
would do the job that text/tabwriter is doing now. Having such an editor would make code
look properly aligned no matter what font or character one is using.
Albeit, we're not yet in the 21st century proper when it comes to source code editing.
Closing as unfortunate.

Status changed to Unfortunate.

@clausecker
Copy link

Actually, the size of characters in a generally mono-spaced environment is independent of font and well-defined. There are half-width characters that occupy one column and full-width characters that occupy two columns. See this report for details. Issue #8273 mentions some strategies to implement. The corresponding tables aren't that large either as full-width characters appear in long, uninterrupted blocks.

@minux
Copy link
Member

minux commented Jan 4, 2015

what about ambiguous characters?

@clausecker
Copy link

@minux Ambiguous characters have a size that depends on whether they appear in a full-width or half-width “context.” It isn't really clear to me what is meant with “context” here, but in all environments I tested, ambiguous characters always come out as occupying only one column. I suspect that some legacy encodings allowed to select whether these ambiguous characters should occupy one column or two. For the sake of supporting full-width characters, it seems sensible to make ambiguous characters behave as occupying only one column.

@clausecker
Copy link

Please let me notice, that go fmt also does not correctly account for combining accents. A quick fix would be to count characters of the Unicode class Mn and Me as 0 columns.

@minux
Copy link
Member

minux commented Jan 4, 2015

At least vim has an option to set the default width of ambiguous
characters [0], so I don't think every software treat them as single
width. Obviously, this depends on the font you're using, and not
entirely on the context, so the problem is not that easy.

[0] http://vimdoc.sourceforge.net/htmldoc/options.html#'ambiwidth'

@clausecker
Copy link

@minux I believe this setting is rarely used / provided for compatibility with old systems only. Perhaps someone with more experience with CJK terminal applications could chime in?

@minux
Copy link
Member

minux commented Jan 4, 2015

No. Nearly every terminal has a similar setting for that.
(Terminal.app has a "Unicode East Asian Ambiguous
characters as wide" option, GNU Screen has a cjkwidth
option, xterm has a cjk_width option, gnome-terminal
(vte) has VTE_CJK_WIDTH, etc.)

For example, this em-dash \u2014 character, is one of
the problematic characters, depending on your font, it
could be single or double width.

I think the option is quite popular with Japanese users
or any user who uses CJK font as the default terminal
monospace font.

@clausecker
Copy link

@minux Hm... This sounds like a situation. Any way, the current situation of treating every character as being one column wide is not acceptable. This does things wrong for either setting.

@mattn
Copy link
Member

mattn commented Jan 5, 2015

go-runewidth detect locale automatically. And switching width of the characters for ambiguous width.

https://github.com/mattn/go-runewidth/blob/master/runewidth_posix.go#L13

@griesemer
Copy link
Contributor

This is essentially a duplicate of #8273 .

@golang golang locked and limited conversation to collaborators Jun 25, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants