Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/format: normalize number prefixes and exponents, like cmd/gofmt does #37476

Closed
dmitshur opened this issue Feb 26, 2020 · 12 comments
Closed

go/format: normalize number prefixes and exponents, like cmd/gofmt does #37476

dmitshur opened this issue Feb 26, 2020 · 12 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@dmitshur
Copy link
Contributor

dmitshur commented Feb 26, 2020

CL 160184 implemented a change to cmd/gofmt for Go 1.13 to normalize number prefixes and exponents.

The same change should be applied to the go/format package (or one of its dependencies, depending on where it's most appropriate to implement).

Also see #37453.

/cc @griesemer @cespare @heschik

@dmitshur dmitshur added the NeedsFix The path to resolution is known, but the work has not been done. label Feb 26, 2020
@dmitshur dmitshur added this to the Go1.15 milestone Feb 26, 2020
@griesemer
Copy link
Contributor

I wonder if this should just be done in the go/printer; possibly with a flag to control it (of course, the flag won't be exposed to gofmt).

@dmitshur
Copy link
Contributor Author

I suspect go/printer may be a good place for it. Whoever is working on this issue should investigate that possibility and compare it with others.

@dmitshur
Copy link
Contributor Author

dmitshur commented Feb 26, 2020

A flag to control this behavior is possible, but I don't think it would be helpful, and so I prefer it's not added unless there is a good reason. We can never remove the flag and it'll adding some complexity.

If someone wants to "disable" this behavior, or produce Go code that is formatted in a way that is compatible with an older version of Go, a better way of accomplishing that is by using the cmd/gofmt binary or the go/format package from an older version of Go.

@dmitshur dmitshur added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed NeedsFix The path to resolution is known, but the work has not been done. labels Feb 27, 2020
@dmitshur
Copy link
Contributor Author

I've investigated this briefly, and want to share my early findings. I looked into how it'd look if the responsibility of formatting numbers was moved from cmd/go into the go/printer package.

The go/printer package has a small amount of precedent for modifying the AST it is about to print. It happens when printing *ast.ImportSpec nodes:

func (p *printer) spec(spec ast.Spec, n int, doIndent bool) {
	switch s := spec.(type) {
	case *ast.ImportSpec:
		// ...
		p.expr(sanitizeImportPath(s.Path))
		// ...
}

func sanitizeImportPath(lit *ast.BasicLit) *ast.BasicLit { ... }

I realize that printing import paths (especially ones that need to be sanitized) happens much less frequently that numbers may need to be formatted, so it's very possible the same approach would not work for this. But I wanted to try it as a first step, and see if it would at least produce correct results or not.

It was easy to perform a simple code transformation and move normalizeNumbers from cmd/gofmt into go/printer, and start using it in the printer.expr1 method as follows:

 func (p *printer) expr1(expr ast.Expr, prec1, depth int) {
 	p.print(expr.Pos())
 
 	switch x := expr.(type) {
 	case *ast.BasicLit:
-		p.print(x)
+		switch x.Kind {
+		case token.INT, token.FLOAT, token.IMAG:
+			p.print(normalizeNumber(x))
+		default:
+			p.print(x)
+		}
 }
// normalizeNumber rewrites base prefixes and exponents to
// use lower-case letters, and removes leading 0's from
// integer imaginary literals. It leaves hexadecimal digits
// alone.
func normalizeNumber(lit *ast.BasicLit) *ast.BasicLit {
	if len(lit.Value) < 2 {
		return lit // only one digit (common case) - nothing to do
	}
	// len(lit.Value) >= 2

	// ... handle all the cases, return lit if there's no change, or a new &ast.BasicLit if there is
}

What I found so far:

  1. The aforementioned code transformation seems to produce correct results. All cmd/gofmt and go/... tests are passing (after adjusting one test case in go/printer to accommodate the new formatting behavior). I have not spotted any correctness problems yet.

  2. I am concerned that one of the tasks that normalizeNumbers does: removing leading 0's from integer imaginary literals. This operation modifies the length of the basic literal value by shortening it. I worry this may cause problems because the positions of all AST nodes after the modified one may become incorrect. I don't know if this is safe or will cause problems, and whether this may invalidate the approach of performing this formatting task as part of printing the AST in go/printer. Needs more investigation.

  3. If the above concern is resolved, the next step will be to run some benchmarks and see how much of a performance difference this implementation would incur (including whether allocating more *ast.BasicLit values is a significant problem).

@griesemer
Copy link
Contributor

@dmitshur Thanks for the analysis. I am not concerned about 2). The whole point of go/printer is to change the layout of space which will invariably change all positions of tokens after that. The printer can handle that. Also, note that with respect to this issue, whether the rewrite happens in gofmt or in go/printer doesn't change the outcome, does it? We must have this situation already now.

What I am concerned is that go/printer's task is to print, not to update or change the AST. Which is why I suggested a flag. That flag would basically mean: you're allowed to make approved changes. If the flag is not set, the tree's contents are not modified in any way.

@dmitshur
Copy link
Contributor Author

dmitshur commented Feb 28, 2020

I am not concerned about 2). The whole point of go/printer is to change the layout of space which will invariably change all positions of tokens after that. The printer can handle that. Also, note that with respect to this issue, whether the rewrite happens in gofmt or in go/printer doesn't change the outcome, does it? We must have this situation already now.

Makes sense, thanks. The current implementation in cmd/gofmt modifies the AST (by walking over it with ast.Inspect) before printing it with go/printer, so you're right that nothing would change.

What I am concerned is that go/printer's task is to print, not to update or change the AST. Which is why I suggested a flag. That flag would basically mean: you're allowed to make approved changes. If the flag is not set, the tree's contents are not modified in any way.

I'm open to the possibility of adding a flag, if it is needed. I just have some questions that I'd like to clarify first.

Note that the implementation I described above does not update or modify the AST that a user passes into Fprint. This is because new *ast.BasicLit nodes are created when a number needs to be printed with different formatting:

func normalizeNumber(lit *ast.BasicLit) *ast.BasicLit {
	// ...

	// don't modify the original lit, return a copy with a modified Value
	return &ast.BasicLit{ValuePos: lit.ValuePos, Kind: lit.Kind, Value: x}
}

I agree one of primary tasks of go/printer is to "pretty-print" Go code, optionally configured by a non-default printer.Config. To quote the documentation of Fprint (both the function and method):

Fprint "pretty-prints" an AST node to output [...]

Given that go/printer already produces output that is modified by the "pretty-printing" process (whitespace is modified to be well-aligned, semicolons are removed, imports are sorted, etc.), can you help me understand why you're suggesting a flag to control this aspect of the "pretty-printing" process specifically? Is it only because it's a recent addition, or is there another reason?

Edit: Import sorting is not done as part of the "pretty-printing" process in go/printer, it's done externally.

@griesemer
Copy link
Contributor

  1. The fact that the incoming AST is not modified anymore is a clear improvement; we should definitively do that.

  2. Pretty-printing is "just" about formatting space. It's true that semicolons are omitted, but that seems to be on a different level than actually changing literals. (But then we also adjust comments in cases, something we probably should not do, either).

The only reason for the flag would be that some go/printer or go/format clients might expect the old behavior. Basically, I think it makes sense to have two modes: one where the AST is untouched but for space (and semicolons), and one where it might do some minor cleanups. There's already a Mode bit available in printer.Config - if not set, nothing changes and existing code will not notice. But gofmt and go/format will set it. Seems pretty clean while reducing the chance for surprises in clients.

(Semicolons are not even recorded in the AST, so it's harder to argue that we changed something. The printer in the compiler's syntax package - while far from the power of go/printer - has a mode where one can choose to have everything on one line (if possible), in which case semicolons are inserted. That's useful for printing AST snippets in error messages. Though admittedly, I'm not relying on it much at the moment.)

@dmitshur
Copy link
Contributor Author

Thanks for elaborating, that's very helpful. I'll think more about this. I have another question for now.

Suppose we add a new Mode for turning on this behavior now, and in a future Go version there is another gofmt change that formats string literals in some way (that doesn't change behavior of course). Do you anticipate the same Mode bit would also control that future gofmt behavior, or would it require a second Mode bit to be added?

Put differently, you said:

That flag would basically mean: you're allowed to make approved changes. If the flag is not set, the tree's contents are not modified in any way.

Is "approved changes" meant to include only the current non-space/semicolon modifications, or is it expected to include future ones too?

@griesemer
Copy link
Contributor

I would use the same flag. Basically, the flag means that go/printer can do the changes that are "gofmt-approved" if you will, going forward. Without setting the flag, the printer just prints w/o any changes.

@dmitshur dmitshur self-assigned this Feb 28, 2020
@gopherbot
Copy link

Change https://golang.org/cl/231461 mentions this issue: cmd/gofmt, go/format, go/printer: move number normalization to printer

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 1, 2020
xujianhai666 pushed a commit to xujianhai666/go-1 that referenced this issue May 21, 2020
Normalization of number prefixes and exponents was added in CL 160184
directly in cmd/gofmt. The same behavior change needs to be applied in
the go/format package. This is done by moving the normalization code
into go/printer, behind a new StdFormat mode, which is then re-used
by both cmd/gofmt and go/format.

Note that formatting of Go source code changes over time, so the exact
byte output produced by go/printer may change between versions of Go
when using StdFormat mode. What is guaranteed is that the new formatting
is equivalent Go code.

Clients looking to format Go code with standard formatting consistent
with cmd/gofmt and go/format would need to start using this flag, but
a better alternative is to use the go/format package instead.

Benchstat numbers on go test go/printer -bench=BenchmarkPrint:

	name     old time/op    new time/op    delta
	Print-8    4.56ms ± 1%    4.57ms ± 0%   ~     (p=0.700 n=3+3)

	name     old alloc/op   new alloc/op   delta
	Print-8     467kB ± 0%     467kB ± 0%   ~     (p=1.000 n=3+3)

	name     old allocs/op  new allocs/op  delta
	Print-8     17.2k ± 0%     17.2k ± 0%   ~     (all equal)

That benchmark data doesn't contain any numbers that need to be
normalized. More work needs to be performed when formatting Go code
with numbers, but it is unavoidable to produce standard formatting.

Fixes golang#37476.
For golang#37453.

Change-Id: If50bde4035c3ee6e6ff0ece5691f6d3566ffe8d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/231461
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/237740 mentions this issue: doc/go1.15: document go/printer.StdFormat

gopherbot pushed a commit that referenced this issue Jun 15, 2020
For #37419
For #37453
For #37476

Change-Id: Ia032ec844773af421bc4217d5dd6e60996d8e91f
Reviewed-on: https://go-review.googlesource.com/c/go/+/237740
Reviewed-by: Robert Griesemer <gri@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/240683 mentions this issue: go/printer: document the StdFormat mode better

gopherbot pushed a commit that referenced this issue Jul 17, 2020
The StdFormat flag was added as part of CL 231461, where the primary aim
was to fix the bug #37476. It's expected that the existing printer modes
only adjust spacing but do not change any of the code text itself. A new
printing flag served as a way for cmd/gofmt and go/format to delegate
a part of formatting work to the printer—where it's more more convenient
and efficient to perform—while maintaining current low-level printing
behavior of go/printer unmodified.

We already have cmd/gofmt and the go/format API that implement standard
formatting of Go source code, so there isn't a need to expose StdFormat
flag to the world, as it can only cause confusion.

Consider that to format source in canonical gofmt style completely it
may require tasks A, B, C to be done. In one version of Go, the printer
may do both A and B, while cmd/gofmt and go/format will do the remaining
task C. In another version, the printer may take on doing just A, while
cmd/gofmt and go/format will perform B and C. This makes it hard to add
a gofmt-like mode to the printer without compromising on above fluidity.

This change prefers to shift back some complexity to the implementation
of the standard library, allowing us to avoid creating the new exported
printing flag just for the internal needs of gofmt and go/format today.

We may still want to re-think the API and consider if something better
should be added, but unfortunately there isn't time for Go 1.15. We are
not adding new APIs now, so we can defer this decision until Go 1.16 or
later, when there is more time.

For #37476.
For #37453.
For #39489.
For #37419.

Change-Id: I0bb07156dca852b043487099dcf05c5350b29e20
Reviewed-on: https://go-review.googlesource.com/c/go/+/240683
Reviewed-by: Robert Griesemer <gri@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
@golang golang locked and limited conversation to collaborators Jul 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

3 participants