Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: character position vs line comments #22662

Closed
rsc opened this issue Nov 10, 2017 · 12 comments
Closed

cmd/compile: character position vs line comments #22662

rsc opened this issue Nov 10, 2017 · 12 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Nov 10, 2017

If the compiler reads //line xxx:10, it understands the next line in the source file to be attributed to file xxx line 10, but it also disables reporting of character position information, because often generated code does not preserve character positions when converting from the original file to the generated code. But sometimes a generator can preserve that information, and it would be nice for our tools to be able to present it.

I suggest that we allow /*line xxx:10:20*/, where the xxx may be omitted if the file name has not changed since the last comment, to declare that the very next byte after that comment is :10:20, and character position should be tracked and reported until the next line comment. Similarly, //line xxx:10:20 would say that the next line begins at character position 20. It's important to have a /* */ comment form in order to be able to introduce a line annotation without introducing an implicit semicolon.

With this and some pending CLs I have for cmd/cover and cmd/cgo, the errors reported by the compiler would be exactly as precise when using coverage as when not using coverage, and the same for cgo.

Also, because the character position would be precise, using cgo-preprocessed sources in source code refactoring tools would no longer prevent those tools from modifying the original sources.

This is orthogonal to #16623 (passing unmodified cgo sources to the compiler and perhaps also go/types), since it would still be very helpful for cover and other converters (we can't put them all in the compiler and go/types).

@rsc rsc added this to the Go1.11 milestone Nov 10, 2017
@rsc
Copy link
Contributor Author

rsc commented Nov 10, 2017

@griesemer
Copy link
Contributor

One thing to consider is the impact of gofmt. For instance, it could make sense to define /*line file:line:col */ as the position of the first Go token following it (rather than the byte immediately following).

Since the /*line ... */ annotations are (likely) produced by a tool, and that tool is using positions of existing tokens, it might make those positions more stable in the presence of minor gofmt formatting changes (extra white space on the same line).

@griesemer griesemer self-assigned this Dec 20, 2017
@gopherbot
Copy link

Change https://golang.org/cl/86037 mentions this issue: cmd/compile/internal/syntax: permit /*line file:line:col*/ directives

@griesemer griesemer added the early-in-cycle A change that should be done early in the 3 month dev cycle. label Jan 4, 2018
@griesemer
Copy link
Contributor

@rsc At least as specified, there is a fundamental problem on Windows because filenames may contain a drive letter followed by a :. For instance with

//line c:/foo/bar.go:987

(from test/fixedbugs/issue18149.go) the pragma processor assumes that we have a filename c, followed by an invalid line number /foo/bar.go and a column number 987 and parsing will fail.

For line directives with an invalid line number specification one could simply assume that the line number section is the filename; with such a work-around the above example would work. But file names may be numbers, and then the work-around falls flat:

//line c:1234:567

Is this a line directive with filename c, line 1234, and column 567; or is it file c:1234 with line 567 and no column specification?

Without optional column specification we don't have this problem: Currently, a line directive must have a :, and the lexer looks for the last : to determine the break between file and line specification. With optional columns the notation becomes ambiguous on any file system that permits : in filenames.

There's a several ways out of the dilemma: 1) introduce some form of escape mechanism for any : in a filename; 2) use a character other than : as separator and which is not permitted in filenames; or 3) use a different line directive syntax altogether (see below). There may be other options.

Here is an alternative format that doesn't have the problem. Borrowing from C's line directive, we could use the forms:

//line <line>
//line <line> <file>
//line <line> <column> <file>

where <line>, <file> and <column> are blank-separated line number, file name, and column number, respectively. If only the line (and/or column) number changes, the file can be left away many times (often after the first //line directive). If we don't want to change the line or column number, but change the file, we could designate the invalid line/column values 0 as "skip" values. And finally, because the line-only form would have to be a valid number that cannot contain a :, we could distinguish these forms from the current line directive, allowing for them to co-exist.

But at least on OS X, file names can have blanks in them, so some suitable escape mechanism will still be needed.

@ianlancetaylor
Copy link
Contributor

In C's #line directive the filename is be a string constant. That handles spaces and unusual characters in a predictable way. We could do the same, but make the string constant optional.

//line "c:/foo/bar.go":987

If the quotes are there, we apply the usual string constant escape handling.

@griesemer
Copy link
Contributor

@ianlancetaylor That's not backward-compatible with what we have now: Existing directive texts with two :'s where the first one belongs to the filename would still be misinterpreted. I don't think we can require people to rewrite those.

I think any solution requires some kind of format change.

@griesemer
Copy link
Contributor

One could require the filename to always be a string constant (or absent) if a column is specified. Otherwise it's the existing format (with the filename optionally being a string constant).

gopherbot pushed a commit that referenced this issue Feb 12, 2018
R=go1.11

This implements parsing of /*line file:line*/ and /*line file:line:col*/
directives and also extends the optional column format to regular //line
directives, per #22662.

For a line directive to be recognized, its comment text must start with
the prefix "line " which is followed by one of the following:

:line
:line:col
filename:line
filename:line:col

with at least one : present. The line and col values must be unsigned
decimal integers; everything before is considered part of the filename.

Valid line directives are:

//line :123
//line :123:8
//line foo.go:123
//line C:foo.go:123	(filename is "C:foo.go")
//line C:foo.go:123:8	(filename is "C:foo.go")
/*line ::123*/		(filename is ":")

No matter the comment format, at the moment all directives act as if
they were in //line comments, and column information is ignored.
To be addressed in subsequent CLs.

For #22662.

Change-Id: I1a2dc54bacc94bc6cdedc5229ee13278971f314e
Reviewed-on: https://go-review.googlesource.com/86037
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@gopherbot
Copy link

Change https://golang.org/cl/94515 mentions this issue: cmd/compile/internal/syntax: remove dependency on cmd/internal/src

@gopherbot
Copy link

Change https://golang.org/cl/96476 mentions this issue: cmd/compile/internal/syntax: implement //line :line:col handling

@gopherbot
Copy link

Change https://golang.org/cl/96535 mentions this issue: cmd/compile: track line directives w/ column information

gopherbot pushed a commit that referenced this issue Feb 26, 2018
For dependency reasons, the data structure implementing source
positions in the compiler is in cmd/internal/src. It contains
highly compiler specific details (e.g. inlining index).

This change introduces a parallel but simpler position
representation, defined in the syntax package, which removes
that package's dependency on cmd/internal/src, and also removes
the need to deal with certain filename-specific operations
(defined by the needs of the compiler) in the syntax package.
As a result, the syntax package becomes again a compiler-
independent, stand-alone package that at some point might
replace (or augment) the existing top-level go/* syntax-related
packages.

Additionally, line directives that update column numbers
are now correctly tracked through the syntax package, with
additional tests added. (The respective changes also need to
be made in cmd/internal/src; i.e., the compiler accepts but
still ignores column numbers in line directives.)

This change comes at the cost of a new position translation
step, but that step is cheap because it only needs to do real
work if the position base changed (i.e., if there is a new file,
or new line directive).

There is no noticeable impact on overall compiler performance
measured with `compilebench -count 5 -alloc`:

name       old time/op       new time/op       delta
Template         220ms ± 8%        228ms ±18%    ~     (p=0.548 n=5+5)
Unicode          119ms ±11%        113ms ± 5%    ~     (p=0.056 n=5+5)
GoTypes          684ms ± 6%        677ms ± 3%    ~     (p=0.841 n=5+5)
Compiler         3.19s ± 7%        3.01s ± 1%    ~     (p=0.095 n=5+5)
SSA              7.92s ± 8%        7.79s ± 1%    ~     (p=0.690 n=5+5)
Flate            141ms ± 7%        139ms ± 4%    ~     (p=0.548 n=5+5)
GoParser         173ms ±12%        171ms ± 4%    ~     (p=1.000 n=5+5)
Reflect          417ms ± 5%        411ms ± 3%    ~     (p=0.548 n=5+5)
Tar              205ms ± 5%        198ms ± 2%    ~     (p=0.690 n=5+5)
XML              232ms ± 4%        229ms ± 4%    ~     (p=0.690 n=5+5)
StdCmd           28.7s ± 5%        28.2s ± 2%    ~     (p=0.421 n=5+5)

name       old user-time/op  new user-time/op  delta
Template         269ms ± 4%        265ms ± 5%    ~     (p=0.421 n=5+5)
Unicode          153ms ± 7%        149ms ± 3%    ~     (p=0.841 n=5+5)
GoTypes          850ms ± 7%        862ms ± 4%    ~     (p=0.841 n=5+5)
Compiler         4.01s ± 5%        3.86s ± 0%    ~     (p=0.190 n=5+4)
SSA              10.9s ± 4%        10.8s ± 2%    ~     (p=0.548 n=5+5)
Flate            166ms ± 7%        167ms ± 6%    ~     (p=1.000 n=5+5)
GoParser         204ms ± 8%        206ms ± 7%    ~     (p=0.841 n=5+5)
Reflect          514ms ± 5%        508ms ± 4%    ~     (p=0.548 n=5+5)
Tar              245ms ± 6%        244ms ± 3%    ~     (p=0.690 n=5+5)
XML              280ms ± 4%        278ms ± 4%    ~     (p=0.841 n=5+5)

name       old alloc/op      new alloc/op      delta
Template        37.9MB ± 0%       37.9MB ± 0%    ~     (p=0.841 n=5+5)
Unicode         28.8MB ± 0%       28.8MB ± 0%    ~     (p=0.841 n=5+5)
GoTypes          113MB ± 0%        113MB ± 0%    ~     (p=0.151 n=5+5)
Compiler         468MB ± 0%        468MB ± 0%  -0.01%  (p=0.032 n=5+5)
SSA             1.50GB ± 0%       1.50GB ± 0%    ~     (p=0.548 n=5+5)
Flate           24.4MB ± 0%       24.4MB ± 0%    ~     (p=1.000 n=5+5)
GoParser        30.7MB ± 0%       30.7MB ± 0%    ~     (p=1.000 n=5+5)
Reflect         76.5MB ± 0%       76.5MB ± 0%    ~     (p=0.548 n=5+5)
Tar             38.9MB ± 0%       38.9MB ± 0%    ~     (p=0.222 n=5+5)
XML             41.6MB ± 0%       41.6MB ± 0%    ~     (p=0.548 n=5+5)

name       old allocs/op     new allocs/op     delta
Template          382k ± 0%         382k ± 0%  +0.01%  (p=0.008 n=5+5)
Unicode           343k ± 0%         343k ± 0%    ~     (p=0.841 n=5+5)
GoTypes          1.19M ± 0%        1.19M ± 0%  +0.01%  (p=0.008 n=5+5)
Compiler         4.53M ± 0%        4.53M ± 0%  +0.03%  (p=0.008 n=5+5)
SSA              12.4M ± 0%        12.4M ± 0%  +0.00%  (p=0.008 n=5+5)
Flate             235k ± 0%         235k ± 0%    ~     (p=0.079 n=5+5)
GoParser          318k ± 0%         318k ± 0%    ~     (p=0.730 n=5+5)
Reflect           978k ± 0%         978k ± 0%    ~     (p=1.000 n=5+5)
Tar               393k ± 0%         393k ± 0%    ~     (p=0.056 n=5+5)
XML               405k ± 0%         405k ± 0%    ~     (p=0.548 n=5+5)

name       old text-bytes    new text-bytes    delta
HelloSize        672kB ± 0%        672kB ± 0%    ~     (all equal)
CmdGoSize       7.12MB ± 0%       7.12MB ± 0%    ~     (all equal)

name       old data-bytes    new data-bytes    delta
HelloSize        133kB ± 0%        133kB ± 0%    ~     (all equal)
CmdGoSize        390kB ± 0%        390kB ± 0%    ~     (all equal)

name       old exe-bytes     new exe-bytes     delta
HelloSize       1.07MB ± 0%       1.07MB ± 0%    ~     (all equal)
CmdGoSize       11.2MB ± 0%       11.2MB ± 0%    ~     (all equal)

Passes toolstash compare.

For #22662.

Change-Id: I19edb53dd9675af57f7122cb7dba2a6d8bdcc3da
Reviewed-on: https://go-review.googlesource.com/94515
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
gopherbot pushed a commit that referenced this issue Feb 26, 2018
For line directives which have a line and a column number,
an omitted filename means that the filename has not changed
(per the issue below).

For line directives w/o a column number, an omitted filename
means the empty filename (to preserve the existing behavior).

For #22662.

Change-Id: I32cd9037550485da5445a34bb104706eccce1df1
Reviewed-on: https://go-review.googlesource.com/96476
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
gopherbot pushed a commit that referenced this issue Feb 26, 2018
Extend cmd/internal/src.PosBase to track column information,
and adjust the meaning of the PosBase position to mean the
position at which the PosBase's relative (line, col) position
starts (rather than indicating the position of the //line
directive). Because this semantic change is made in the
compiler's noder, it doesn't affect the logic of src.PosBase,
only its test setup (where PosBases are constructed with
corrected incomming positions). In short, src.PosBase now
matches syntax.PosBase with respect to the semantics of
src.PosBase.pos.

For #22662.

Change-Id: I5b1451cb88fff3f149920c2eec08b6167955ce27
Reviewed-on: https://go-review.googlesource.com/96535
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@griesemer griesemer changed the title cmd/compile, go/parser: character position vs line comments cmd/compile: character position vs line comments Feb 27, 2018
@griesemer
Copy link
Contributor

Created #24143 to track the non-compiler aspects of this issue.

@gopherbot
Copy link

Change https://golang.org/cl/97375 mentions this issue: cmd/compile, cmd/compile/internal/syntax: print relative column info

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge
Projects
None yet
Development

No branches or pull requests

4 participants