cmd/compile: principled line numbering #16943

mdempsky · 2016-08-31T22:56:17Z

Currently, cmd/compile assigns line numbers somewhat ad-hocly based on whatever the global lineno value happens to be set at when it calls Nod. With the new parser, @griesemer and I want to do a better job at this. I think this will be especially useful if/once we start tracking/reporting column information; e.g., in error messages for #10324, or in export data for something like golang.org/cl/22936.

I want to propose two simple rules:

Objects are associated with the identifier used to declare them; e.g., in var x int = 3, we associate the newly declared int variable with the position of the x identifier.
Non-terminal grammar rules are associated with a terminal token uniquely associated with that rule, generally the first. For example, a binary expression x + y is associated with the + token, not x or y. This ensures that compound expressions like x + y + z are unambiguous (at least when column position is available).

Contrived example:

var
a,
b int =
1,
f() +
2

Implications:

The variables a and b are associated with the line numbers their respective identifiers.
Instructions to zero-initialize a and b would be associated with the var keyword (though in practice I expect they'd be optimized away).
The add instruction for adding 2 and 3 would be associated with the + token.
The call to f() is associated with the ( token.
Assigning the values to a and b would be associated with the = token.

As a near-term plan I'm trying to match the old behavior as much as possible to ease transition, and I'm now to the point where go build -a -toolexec='toolstash -cmp' std cmd passes with enabling the new parser. I can continue matching more obscure line number behaviors, but unless people think line numbering compatibility is really important, I don't think it's worthwhile. I'd rather focus on making sure performance is acceptable so we can switch over and continue cleaning up the compiler.

/cc @griesemer @ianlancetaylor @alandonovan

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2016-08-31T23:03:26Z

For what it's worth, your two rules are exactly what I do in gccgo (which does have column information).

josharian · 2016-09-01T15:27:58Z

SGTM. Line numbers have never been sacrosanct; preserving the quirks of existing behavior is not a priority. Matching gccgo would be a nice bonus.

I'm curious how well the SSA handling of line numbers preserves your rules.

One small data point I'd like to see is binary size before/after. The pclntab is the single largest component of binaries, and its size is tied directly to line-numbering decisions in the compiler. I don't expect much change, but it'd be good to keep an eye on it.

mdempsky · 2016-09-01T16:53:54Z

For what it's worth, your two rules are exactly what I do in gccgo (which does have column information).

Cool.

I'm curious how well the SSA handling of line numbers preserves your rules.

I am too. At least optimizations are inherently lossy about source information like line numbers, and users seem to understand/appreciate that.

Also, since the current line numbering is often so ad hoc, I don't think there's any particular rush to implement/guarantee these rules. My goal with this issue is more to agree on a long-term ideal to strive towards.

One small data point I'd like to see is binary size before/after. The pclntab is the single largest component of binaries, and its size is tied directly to line-numbering decisions in the compiler. I don't expect much change, but it'd be good to keep an eye on it.

Ack, I don't expect much change either, but I'll keep an eye out.

dr2chase · 2016-09-02T14:54:40Z

Something to consider in optimizer-land might be to notice which instruction "commits" a line, and then arrange that all the preceding instructions match it, up to the previous commit. This fails for lines with more than one store, but such is life -- plenty of lines become visible with a single store. Single-stepping in a debugger will be slightly less confusing.

randall77 · 2016-09-02T15:24:31Z

I was also thinking about ways to do a more concise job of recording line
number information. SSA gives line numbers to all instructions and they
get recorded faithfully in runtime data structures. In reality, many of
those line numbers don't really correspond to a source operation, like
spills and restores. We could choose line numbers for those within some
acceptable range (anywhere between def and use, say). That flexibility
lets us choose the line numbers so that the runtime data structure encodes
more compactly.
Maybe a simple first step is to not specify line numbers for some SSA ops
that don't have an obvious correspondence to source ops. Then do a pass at
the end that assigns those ops a line numbers equal to the line number of
the nearest line-number-specified instruction.

On Fri, Sep 2, 2016 at 7:55 AM, dr2chase notifications@github.com wrote:

Something to consider in optimizer-land might be to notice which
instruction "commits" a line, and then arrange that all the preceding
instructions match it, up to the previous commit. This fails for lines with
more than one store, but such is life -- plenty of lines become visible
with a single store. Single-stepping in a debugger will be slightly less
confusing.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#16943 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGkgIMPbjo8vBaF473p4FPBeNpaacPIKks5qmDjbgaJpZM4JyJl9
.

alandonovan · 2016-09-02T16:20:49Z

On 2 September 2016 at 11:25, Keith Randall notifications@github.com
wrote:

Maybe a simple first step is to not specify line numbers for some SSA ops
that don't have an obvious correspondence to source ops. Then do a pass at
the end that assigns those ops a line numbers equal to the line number of
the nearest line-number-specified instruction.

I think that's a good approach. There is no perfect answer for many
instructions, even for the high-level representation in
golang.org/x/tools/go/ssa, and the compiler's low-level SSA with
optimizations will have even more operations that aren't exactly associated
with a given piece of syntax. Control flow branches in particular are hard
to correlate with syntax. Using the correct answer when you have it and
the "nearest" answer when you don't will result in smaller line number
tables if they use any kind of run-length encoding.

gopherbot · 2017-02-15T00:35:15Z

CL https://golang.org/cl/37017 mentions this issue.

quentinmit added this to the Go1.8 milestone Sep 6, 2016

quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 11, 2016

rsc modified the milestones: Go1.9Early, Go1.8 Oct 21, 2016

josharian mentioned this issue Dec 7, 2016

cmd/compile: bad line number for "missing type in composite literal" #18231

Closed

gopherbot closed this as completed in 5267ac2 Feb 15, 2017

golang locked and limited conversation to collaborators Feb 15, 2018

gopherbot added the FrozenDueToAge label Feb 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: principled line numbering #16943

cmd/compile: principled line numbering #16943

mdempsky commented Aug 31, 2016 •

edited

ianlancetaylor commented Aug 31, 2016

josharian commented Sep 1, 2016

mdempsky commented Sep 1, 2016

dr2chase commented Sep 2, 2016

randall77 commented Sep 2, 2016

alandonovan commented Sep 2, 2016

gopherbot commented Feb 15, 2017

cmd/compile: principled line numbering #16943

cmd/compile: principled line numbering #16943

Comments

mdempsky commented Aug 31, 2016 • edited

ianlancetaylor commented Aug 31, 2016

josharian commented Sep 1, 2016

mdempsky commented Sep 1, 2016

dr2chase commented Sep 2, 2016

randall77 commented Sep 2, 2016

alandonovan commented Sep 2, 2016

gopherbot commented Feb 15, 2017

mdempsky commented Aug 31, 2016 •

edited