Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: go/doc: ignore leading and trailing whitespace in the directive arguments #68092

Open
jimmyfrasche opened this issue Jun 20, 2024 · 7 comments
Labels
Milestone

Comments

@jimmyfrasche
Copy link
Member

Proposal Details

#51082 and #37974 standardized Go directive comments as line comments whose content matches ^(line |extern |export |[a-z0-9]+:[a-z0-9])(.*)$ where the first group is the directive and the second group is the argument to the directive.

This makes leading and trailing whitespace part of directive comments.

Trailing whitespace is almost always removed and should never be semantic so it is already de facto not part of the argument.

One space is required for the old (line/export/extern) directives and additional leading whitespace is presumably ignored [have not tested this, will update if incorrect].

Someone writing a general //x:y directive in their code will likely add whitespace between the directive and argument even if not syntactically required so anyone parsing directives will likely ignore at least one leading space regardless.

Someone reading the directive //x:yZ would assume that x:yZ is the directive name rather than it is the directive x:y with argument Z so having whitespace here would clarify.

I propose changing the argument syntax from .*$ to [ \t]*([^ \t]*)[ \t]*$ where the leading and trailing whitespace is ignored.

This would allow two things

  1. The iterator proposed in go/ast: add ParseDirective #68021 could trim the arguments automatically so no one using the official API for parsing directives will need to worry about this
  2. (not proposed here but enabled by this change) gofmt could insert a clarifying space when necessary, normalize leading whitespace, or even line up arguments to multiple directives
@gopherbot gopherbot added this to the Proposal milestone Jun 20, 2024
@gabyhelp
Copy link

Similar Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@ianlancetaylor
Copy link
Member

The current gc compiler requires at least one space character after the directive name. However, for directives that take arguments, the compiler consistently treats a sequence of whitespace characters (per unicode.IsSpace) as a single whitespace character. So trailing whitespace is currently ignored and leading whitespace is ignored other than requiring a single space character.

For what it's worth the current version of gccgo is stricter: it ignores sequences of space and tab characters, but does not permit arbitrary Unicode whitespace characters.

So if we made this change, one compatibility concern would be people who wrote a directive followed by a tab character. Such directives would have their expected meaning in a new version of Go, and would be ignored by an old version of Go. This could potentially lead to confusion if someone wrote code for a newer version and then compiled it with an older version. I don't think that is worth worrying about but I want to call it out.

The other compatibility concern if we adopt the proposal precisely as written is that currently Unicode whitespace characters like U+00A0 are treated as space, but this proposal suggests just space and tab. So I do think that the regexp we use, if we change this, should replace |\t with [[:space:]]|\pZ.

@jimmyfrasche
Copy link
Member Author

I'm certainly fine with unicode.IsSpace. I don't know how often unicode spaces end up in comments but there's no reason for invisible differences to have distinct effects.

@seankhliao seankhliao changed the title proposal: directive comments: ignore leading and trailing whitespace in the argument proposal: go/doc: ignore leading and trailing whitespace in the directive arguments Jul 11, 2024
@jimmyfrasche
Copy link
Member Author

So the API is now looking like:

package ast // "go/ast"

func ParseDirective(line string) (Directive, bool)

type Directive struct {
    Tool string // may be "" if Name is "line", "extern", "export"
    Name string
    Args string // may contain whitespace
}

and the question is how to further specify the syntax so that we know when to split Name and Args.

It seems like #68092 should be folded into that discussion. (Can we go further and normalize all medial runs of whitespace to a single " " and have Args []string?)

Can someone collect non-Go directives from the module proxy so we can see how people are defining/using directives in the real world?

@adonovan
Copy link
Member

Someone writing a general //x:y directive in their code will likely add whitespace between the directive and argument even if not syntactically required so anyone parsing directives will likely ignore at least one leading space regardless.

It is important that leading spaces are not permitted, as otherwise routine paragraph-fill operations on typical comment text could easily cause a comment to trigger a spurious match.

Someone reading the directive //x:yZ would assume that x:yZ is the directive name rather than it is the directive x:y with argument Z so having whitespace here would clarify.

I agree, this is a little confusing. It would be nice if the regular expression's subgroups corresponded to the three components. On that topic, it says:

A directive comment is a line matching the regular expression //(line |extern |export |[a-z0-9]+:[a-z0-9]).

It should be clarified that it means a left-anchored match (so directive comments cannot be indented) but not a right-anchored one. Adding a caret would address this: ^//...

There are several parts to this proposal:

  • permit arbitrary leading Unicode whitespace. I think we cannot do that for the reason above.
  • permit arbitrary Unicode whitespace between name and args. I am in favor.
  • omit arbitrary trailing Unicode whitespace from Diagnostic.Args. I am in favor.

Changes would be required to cmd/compile, gccgo, and ast.ParseDirective. I think the risk that @ianlancetaylor flags is real but not worth worrying about.

@jimmyfrasche
Copy link
Member Author

Sorry for any confusion: I only ever meant leading the argument not the directive itself.

line directives are special in that they can be placed anywhere using /**/ which complicates using a regex here.

@jimmyfrasche
Copy link
Member Author

A related question is the argument-leading whitespace required before an argument? The current regex only defines a prefix for custom directives. Consider //a:a( ). Is the argument ( ) or )?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

5 participants