Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: principled line numbering #16943

Closed
mdempsky opened this issue Aug 31, 2016 · 7 comments
Closed

cmd/compile: principled line numbering #16943

mdempsky opened this issue Aug 31, 2016 · 7 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@mdempsky
Copy link
Member

mdempsky commented Aug 31, 2016

Currently, cmd/compile assigns line numbers somewhat ad-hocly based on whatever the global lineno value happens to be set at when it calls Nod. With the new parser, @griesemer and I want to do a better job at this. I think this will be especially useful if/once we start tracking/reporting column information; e.g., in error messages for #10324, or in export data for something like golang.org/cl/22936.

I want to propose two simple rules:

  1. Objects are associated with the identifier used to declare them; e.g., in var x int = 3, we associate the newly declared int variable with the position of the x identifier.
  2. Non-terminal grammar rules are associated with a terminal token uniquely associated with that rule, generally the first. For example, a binary expression x + y is associated with the + token, not x or y. This ensures that compound expressions like x + y + z are unambiguous (at least when column position is available).

Contrived example:

var
a,
b int =
1,
f() +
2

Implications:

  • The variables a and b are associated with the line numbers their respective identifiers.
  • Instructions to zero-initialize a and b would be associated with the var keyword (though in practice I expect they'd be optimized away).
  • The add instruction for adding 2 and 3 would be associated with the + token.
  • The call to f() is associated with the ( token.
  • Assigning the values to a and b would be associated with the = token.

As a near-term plan I'm trying to match the old behavior as much as possible to ease transition, and I'm now to the point where go build -a -toolexec='toolstash -cmp' std cmd passes with enabling the new parser. I can continue matching more obscure line number behaviors, but unless people think line numbering compatibility is really important, I don't think it's worthwhile. I'd rather focus on making sure performance is acceptable so we can switch over and continue cleaning up the compiler.

/cc @griesemer @ianlancetaylor @alandonovan

@ianlancetaylor
Copy link
Contributor

For what it's worth, your two rules are exactly what I do in gccgo (which does have column information).

@josharian
Copy link
Contributor

SGTM. Line numbers have never been sacrosanct; preserving the quirks of existing behavior is not a priority. Matching gccgo would be a nice bonus.

I'm curious how well the SSA handling of line numbers preserves your rules.

One small data point I'd like to see is binary size before/after. The pclntab is the single largest component of binaries, and its size is tied directly to line-numbering decisions in the compiler. I don't expect much change, but it'd be good to keep an eye on it.

@mdempsky
Copy link
Member Author

mdempsky commented Sep 1, 2016

For what it's worth, your two rules are exactly what I do in gccgo (which does have column information).

Cool.

I'm curious how well the SSA handling of line numbers preserves your rules.

I am too. At least optimizations are inherently lossy about source information like line numbers, and users seem to understand/appreciate that.

Also, since the current line numbering is often so ad hoc, I don't think there's any particular rush to implement/guarantee these rules. My goal with this issue is more to agree on a long-term ideal to strive towards.

One small data point I'd like to see is binary size before/after. The pclntab is the single largest component of binaries, and its size is tied directly to line-numbering decisions in the compiler. I don't expect much change, but it'd be good to keep an eye on it.

Ack, I don't expect much change either, but I'll keep an eye out.

@dr2chase
Copy link
Contributor

dr2chase commented Sep 2, 2016

Something to consider in optimizer-land might be to notice which instruction "commits" a line, and then arrange that all the preceding instructions match it, up to the previous commit. This fails for lines with more than one store, but such is life -- plenty of lines become visible with a single store. Single-stepping in a debugger will be slightly less confusing.

@randall77
Copy link
Contributor

I was also thinking about ways to do a more concise job of recording line
number information. SSA gives line numbers to all instructions and they
get recorded faithfully in runtime data structures. In reality, many of
those line numbers don't really correspond to a source operation, like
spills and restores. We could choose line numbers for those within some
acceptable range (anywhere between def and use, say). That flexibility
lets us choose the line numbers so that the runtime data structure encodes
more compactly.
Maybe a simple first step is to not specify line numbers for some SSA ops
that don't have an obvious correspondence to source ops. Then do a pass at
the end that assigns those ops a line numbers equal to the line number of
the nearest line-number-specified instruction.

On Fri, Sep 2, 2016 at 7:55 AM, dr2chase notifications@github.com wrote:

Something to consider in optimizer-land might be to notice which
instruction "commits" a line, and then arrange that all the preceding
instructions match it, up to the previous commit. This fails for lines with
more than one store, but such is life -- plenty of lines become visible
with a single store. Single-stepping in a debugger will be slightly less
confusing.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#16943 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGkgIMPbjo8vBaF473p4FPBeNpaacPIKks5qmDjbgaJpZM4JyJl9
.

@alandonovan
Copy link
Contributor

On 2 September 2016 at 11:25, Keith Randall notifications@github.com
wrote:

Maybe a simple first step is to not specify line numbers for some SSA ops
that don't have an obvious correspondence to source ops. Then do a pass at
the end that assigns those ops a line numbers equal to the line number of
the nearest line-number-specified instruction.

I think that's a good approach. There is no perfect answer for many
instructions, even for the high-level representation in
golang.org/x/tools/go/ssa, and the compiler's low-level SSA with
optimizations will have even more operations that aren't exactly associated
with a given piece of syntax. Control flow branches in particular are hard
to correlate with syntax. Using the correct answer when you have it and
the "nearest" answer when you don't will result in smaller line number
tables if they use any kind of run-length encoding.

@quentinmit quentinmit added this to the Go1.8 milestone Sep 6, 2016
@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 11, 2016
@rsc rsc modified the milestones: Go1.9Early, Go1.8 Oct 21, 2016
@gopherbot
Copy link

CL https://golang.org/cl/37017 mentions this issue.

@golang golang locked and limited conversation to collaborators Feb 15, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

9 participants