Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: Shorter struct literals when key and variable name are the same #57949

Closed
jimmyfrasche opened this issue Jan 21, 2023 · 17 comments
Labels
Milestone

Comments

@jimmyfrasche
Copy link
Member

jimmyfrasche commented Jan 21, 2023

It's fairly common to name variables the same as the fields they will be assigned to in a struct literal. This change allows that to be lighter on the page.

Allow someStruct{:key} to be shorthand for someStruct{key: key}. That is, :F sets the field named F to the value of the variable identically named F. As this is a simple shorthand for a keyed composite literal, it can be freely mixed with the longform keyed syntax.

Here is an example from some real code I wrote.

Before:

c := &Context{
  log:            log.Default(),
  args:           os.Args[1:],
  stdout:         os.Stdout,
  version:        version,
  defaultVersion: defaultVersion,
  root:           root,
}

After:

c := &Context{
  log:    log.Default(),
  args:   os.Args[1:],
  stdout: os.Stdout,
  :version,
  :defaultVersion,
  :root,
}

The change is not dramatic but it is still more succinct.

This is available in a number of languages (by a number of names, noted parenthetically):

  • Javascript (shorthand property names)
  • Ocaml (field punning)
  • Haskell (NamedFieldPuns)
  • Rust (field init shorthand)

In all of those languages the syntax is {key} instead of {:key} as they do not allow unkeyed literals so there is no ambiguity with the shorthand and an unkeyed literal. The prefix : removes the ambiguity and makes the intent explicit, win-win.

This change is backwards compatible. The only sticky point is how to encode it in the go/ast KeyValueExpr, but I imagine there are a number of ways such as duplicating the Ident node for Key and Value and noting that it's shorthand when Colon Precedes the Pos of Key. Most tooling could safely ignore such a change and any that would find it of interest could be updated easily.

Possible future extensions:

Allow :qualified.Name to be shorthand for Name: qualified.Name, which would, among other things, make it easy to copy from one struct to another.

Allow :&x and :*y to be shorthand for x: &x and y: *y respectively.

Allow :key in maps with a ~string key type to be shorthand for "key": key.

template

Author background

Would you consider yourself a novice, intermediate, or experienced Go programmer?

Experienced.

What other languages do you have experience with?

Many.

Related proposals

Has this idea, or one like it, been proposed before? If so, how does this proposal differ?

I mentioned something like this in an earlier thread #34174 (comment) This cleans up the idea and pulls it out on its own.

Does this affect error handling? no
Is this about generics? no

Proposal

What is the proposed change? see main text

Who does this proposal help, and why? Anyone reading or writing struct literals that would otherwise stutter due to field and variable name duplication.

Please describe as precisely as possible the change to the language. see main text

What would change in the language spec? the grammar for composite literals

Please also describe the change informally, as in a class teaching Go. see main text

Is this change backward compatible? yes, see main text

Orthogonality: how does this change interact or overlap with existing features? no

Is the goal of this change a performance improvement? no

Costs

Would this change make Go easier or harder to learn, and why? Little change. One more thing to learn but it's a small thing and easy to explain.

What is the cost of this proposal? (Every language change has a cost). Minimal. The implementation changes and spec changes would all be minimal.

How many tools (such as vet, gopls, gofmt, goimports, etc.) would be affected? Depending on how the changes are worked into go/ast and go/printer very few tools would need to be updated as the change is purely syntactic.

What is the compile time cost? immeasurably small

What is the run time cost? none

Can you describe a possible implementation? see note on backwards compatibility in main text

Do you have a prototype? (This is not required.) no

@jimmyfrasche jimmyfrasche added LanguageChange v2 A language change or incompatible library change Proposal labels Jan 21, 2023
@gopherbot gopherbot added this to the Proposal milestone Jan 21, 2023
@ghost

This comment was marked as spam.

@ianlancetaylor
Copy link
Contributor

Thanks for the proposal.

This is, of course, just syntactic sugar. In Go we normally reserve syntactic sugar for cases where it would be very widely used. This doesn't seem like such a case. Also the emoji voting is not in favor. Therefore, this is a likely decline. Leaving open for four weeks for final comments.

@jimmyfrasche
Copy link
Member Author

I've collected some data.

The repo https://github.com/jimmyfrasche/issue57949 includes the collection tool

The gist https://gist.github.com/jimmyfrasche/217805adc3c17d997e0dd0c493b5b5a1#file-notes-and-summary-md contains totals and the raw data for std and 10 large projects.

For easy reference here are the totals

keyed struct literals: 42404
total KV pairs: 101954
non-candidate KV pairs: 75571
ident:
	total: 18504
	no match: 8359
	exact: 6957
	partial: 3188
qual.ident: 
	total: 7123
	no match: 4889
	exact: 2234
	partial: N/A
*ident:
	total: 59
	no match: 46
	exact: 7
	partial: 6
*qual.ident:
	total: 25
	no match: 15
	exact: 10
	partial: N/A
&ident:
	total: 505
	no match: 360
	exact: 13
	partial: 132
&qual.ident:
	total: 166
	no match: 124
	exact: 42
	partial: N/A

It's explained in the links but partial matches are a != b && string.EqualFold(a, b) to catch cases like Title: title.

@adonovan
Copy link
Member

This change would also make code harder to navigate. When you see T{x: x}, you have the choice of asking your IDE the question "where is field x defined/referenced?" or "where is var/const x defined/referenced?" by selecting the different identifiers. If there was only one, it would require more work to get around the code.

@jimmyfrasche
Copy link
Member Author

@alandonovan I've never really looked at the language server protocol so I'm not sure what headaches it would cause there but I don't see why the context menu couldn't provide both options in this case.

@adonovan
Copy link
Member

adonovan commented Feb 22, 2023

I'm not sure what headaches it would cause there but I don't see why the context menu couldn't provide both options in this case.

FWIW, that's not how LSP works. The protocol has no concept of a "context menu". It has a fixed set of operations such as "report the definition of the selection" or "report references to the selection", and these are exposed by the editor however it chooses, e.g. as menu items or commands. But the server doesn't get to choose the UI. So for T{:x} the server would be forced to compute the references (or definitions) of both meanings of x and report them all as one big union.

@bcmills
Copy link
Contributor

bcmills commented Feb 22, 2023

If there was only one, it would require more work to get around the code.

Jumping to the definition of the struct field for the cursor is conceptually very similar to jumping to the definition of the function parameter containing the cursor. Does LSP support that today? If so, could the same mechanism be use for struct fields?

@adonovan
Copy link
Member

adonovan commented Feb 22, 2023

Jumping to the definition of the struct field for the cursor is conceptually very similar to jumping to the definition of the function parameter containing the cursor. Does LSP support that today?

Yes, they are identical: jump to definition of selected identifier. That's my point: LSP has only one flavor of "jump to definition" operation, but the introduction of the proposed change would create the need for two flavors of query.

The closest existing situation is that of embedded fields: struct{T} is both a reference to type T and a definition of a field T. The two meanings of the pun on T are opposite so it's obvious from the query what do to: if you ask for the definition of T, the server reports the type definition; if you ask for the references, it reports references to the field (but not the type). However in the proposed change the puns are both references so the protocol offers no obvious way to tease the queries apart.

(Also: the go/types API has no way to indicate that a single identifier is a reference to two different objects, so many tools would be broken by such a change.)

@jimmyfrasche
Copy link
Member Author

Using a VS Code project where I had something similar to:

type v struct { 
  x string // L0
}
 
func f(x string) *v { // L1
  return &v{
    x: x, // L2
  }
}

On L2 I can press F12 (Go to Definition) on the first x to go to L0 or the second x to L1.

I'd personally be fine if F12 just went to L1 for :x. I can still use go to type definition on &v. That's not that hard.

A jump to field definition that worked on either x would probably be handy even modulo this change. I'm not sure it should be a blocker.

Does anyone have experience with any Haskell/Ocaml/Rust LSP servers? What do they do in this case?

@bcmills
Copy link
Contributor

bcmills commented Feb 22, 2023

LSP has only one flavor of "jump to definition" operation, but the introduction of the proposed change would create the need for two flavors of query.

I guess what I'm saying is: given that we already have function parameters, it seems to me that we already have the need for two flavors of query. This proposal might draw out that problem more frequently, but it's an existing problem with the LSP protocol either way.

@jimmyfrasche
Copy link
Member Author

@adonovan

(Also: the go/types API has no way to indicate that a single identifier is a reference to two different objects, so many tools would be broken by such a change.)

In the first post I state that the go/ast should encode :x the same as x: x with some hint to formatters that it should be output as :x to avoid breaking tools

@jimmyfrasche
Copy link
Member Author

The only real downside mentioned so far is go to definition. I don't really see that as a showstopper and given how many big languages already support this (many more support it but they are all fairly niche) I would hope LSP grows the necessary primitives to handle this case.

Let me do a brief and simplistic analysis of the data I gathered. I'll ignore everything except the currently proposed simple identifier case. In the 11 modules (counting std as a module) analyzed, there were 1986 packages and 1339 (67%) contained keyed struct literals.

There were 101954 KV pairs total spread across those literals. Of those, 18504 (18%) had a simple identifier for the value and 6957 of them had identical key and value. That's about 38% of KV pairs where the value is a simple identifier or about 7% of all KV pairs.

That's not bad. That is in the language as it is today, though.

If this feature were to be adopted, I conjecture that there would be a stronger incentive to name more of the identifiers identically. There were 3188 matches that would be identical if not for case. Some of those may have good reason but let's say that 95% would be written with identical case given this proposal. Similarly let's say of the 8539 identifiers that didn't match at all, 50% would have been written identically. That gives us approximately round(0.5×8359 + 0.95×3188 + 6957) = 14165 KV pairs that would have used the :shorthand, or about 75% of KV pairs where the value is a simple identifier or 14% of all KV pairs.

I can't say if that counts as "widely used", as @ianlancetaylor put it, but it's still pretty good bang for the buck.

If that counts, let's look at its effect on writability, readability, and editability.

Writability is an easy win. You can type a name once instead of twice. Even without this you are generally choosing a name for your identifier that is close to the name of the field (you don't see a lot of Red: Green) so unless the field name is just huge and messy might as well just name it identically and write less in total.

Readability isn't as obvious but I think it's also a win. You can usually tell at a glance if a name is Twice: Twice but sometimes you may have to double check which doesn't apply if it's only written :Once. More importantly it's just lighter on the page and less repetitious.

Editability is a bit harder in the case of renaming the identifier or the field. The field name is unlikely to change but the problem is symmetric and you may need to change the identifier. In either case gorename could easily handle this case and expand :X to X: Y as necessary (if it wouldn't just do it automatically given the suggested encoding in go/ast). I imagine it would also be easy to include editor macros to go from X: X to :X and vice versa. For the most part I don't think this would come up too often for new code and would probably mostly be an issue for someone wanting to rename an identifier to take advantage of the new sugar. Given that it's easily automated I am unconcerned.

Just subjectively I'd add that having used languages with this feature more I find myself getting annoyed when having to repeat myself in Go. It feels wrong and there's an easy fix.

@adonovan
Copy link
Member

Writability is an easy win. You can type a name once instead of twice. Even without this you are generally choosing a name for your identifier that is close to the name of the field (you don't see a lot of Red: Green) so unless the field name is just huge and messy might as well just name it identically and write less in total.

Wearing my devil's advocate hat, I wonder whether encouraging people to use the same name for the var and the field is always desirable: perhaps today they choose a more specific and informative name for the var.

In any case I don't think tools such as gopls should be the primary consideration in the language design. Thanks for the analysis.

@jimmyfrasche
Copy link
Member Author

I don't think it's always desirable but I do think that it quite often is.

I used the example of Red: Green as something that basically never happens because you're generally doing one of

  • Red: Red
  • Red: red
  • Red: c

Either a name that is close to if not exactly the field name or a nondescript name.

There can definitely be good reasons for going with the last one (annoyingly long field name, c gets used in more than one place, etc.).

The :x shorthand would strongly encourage you to use an identical name instead of a basically-close-enough name and weakly encourage using an identical name instead of a nondescript name.

I didn't try to count abbreviations or "close enough" names but as I was glancing through random files the names that weren't exact or partial tended to either be an abbreviation of some kind or something nondescript.

When writing in languages that support this feature I default to using the identical name but diverge if I have a good reason not to. It saves a little bit of thought when it doesn't matter and when it does not using the feature adds an extra clue in the text that something unusual may be happening.

@jimmyfrasche
Copy link
Member Author

I patched in a quick hack to log the 8359 no match identifiers (not == or equals fold). I filtered out the uninteresting case of the value being nil, true, or false, reducing it to 6649 entries. I ran those through sort and uniq, reducing it to 3760 entries. I saved this as nomatch.set then ran shuf -n100 nomatch.set | sort >nomatch.samp. I manually partitioned them into 3 piles

  1. shorter: value is abbreviation of key: 19
  2. longer: key is abbreviation of value: 32
  3. no relation: names don't obviously coincide: 49 (I could probably have shunted more of these into one pile or the other if I were more lenient about "obviously")

If this is representative, my intuition about no matches was a bit optimistic. About 50% are abbreviations but I wasn't expecting so many K to be the abbreviation of the V.

The new data files are in this gist: https://gist.github.com/jimmyfrasche/4b80780ea40b79f9c51a158f78e263bf

@jimmyfrasche
Copy link
Member Author

Based on the empirical sample, I'm lowering my estimate for nomatch from 50% to 15% and
round(0.15×8359 + 0.95×3188 + 6957) = 11239 lowers it from 14% to 11%.

However, I think I should have filtered out the predeclared identifiers. They are, for our purposes here, more akin to constants than identifiers and should have been counted in the noncandidate pool. Making that adjustment gives round(0.15×6649 + 0.95×3188 + 6957) = 10983 but that still keeps it at 11% (first 11% is rounded down and this one is rounded up).

@ianlancetaylor
Copy link
Contributor

Thanks for the additional discussion and data. However, it doesn't tip the balance for adding new syntactic sugar. Closing.

@ianlancetaylor ianlancetaylor closed this as not planned Won't fix, can't repro, duplicate, stale Mar 1, 2023
@golang golang locked and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants