Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: go/token: add Export and Unexport #32625

Closed
mvdan opened this issue Jun 14, 2019 · 6 comments
Closed

Proposal: go/token: add Export and Unexport #32625

mvdan opened this issue Jun 14, 2019 · 6 comments

Comments

@mvdan
Copy link
Member

mvdan commented Jun 14, 2019

Summary

I propose adding two functions to the package's API:

// Export returns name as an exported identifier.
func Export(name string) string {
    // TODO: possibly avoid the alloc+copy if the first rune is already upper
    ch, w := utf8.DecodeRuneInString(name)
    return string(unicode.ToUpper(ch)) + name[w:]
}

// Unexport returns name as an unexported identifier.
func Unexport(name string) string {
    // TODO: possibly avoid the alloc+copy if the first rune is already lower
    ch, w := utf8.DecodeRuneInString(name)
    return string(unicode.ToLower(ch)) + name[w:]
}

Current situation

With #30064 merged, we already have some operations to deal with identifiers and names in go/token, such as IsIdentifier and IsExported.

Another common need is to export or unexport a certain name. I've personally implemented either of those functions a handful of times over the years, and I can easily find implementations in popuar projects like this one.

However, I've also seen plenty of bad implementations. For example:

exported = strings.ToUpper(s[:1]) + s[1:] // What if the first rune isn't ASCII?
exported = strings.Title(s) // Almost! unicode.ToTitle isn't the same as unicode.ToUpper

A very quick google search yielded an example of Title in a google-owned repository. And the last time I've seen the first mistake is here today, which made me think filing this proposal was worthwhile.

Implementation

The correct implementation of each func is just two lines, but it's easy to forget about unicode, or to fall for the "false friend" that is strings.Title. Moreover, I think the API names would make code expressive and self-explanatory.

One detail to keep in mind is that some implementations, like google/wire's I linked first, handle prefixes like acronyms separately. For example, I believe they would unexport URLFoo as urlFoo.

However, I don't think this should be within the scope of token.Export; there isn't a way to impement this easily, covering all cases. A user can always implement this on top of token.Export by replacing certain well-known prefixes with their unexported counterparts.

If this proposal is accepted, I'm happy to upload a CL for Go 1.14.

@gopherbot gopherbot added this to the Proposal milestone Jun 14, 2019
@mvdan
Copy link
Member Author

mvdan commented Jun 15, 2019

I forgot to mention - I'm of course open to suggestions when it comes to the package for this. I thought of go/token because that's where IsExported was added, but perhaps another is more appropriate.

@rsc
Copy link
Contributor

rsc commented Jun 25, 2019

The implementation above does not work for unexported identifiers that have no upper-case equivalents, like CJK languages. We've encouraged a "soft standard" of using X for those if needed, but then Export(中国) = X中国 but Unexport(X中国) = x中国, so they are not inverses.
(The same problem would occur if we changed the rules to make those identifiers exported by default.)

I'm also bothered by your example Unexport("URLFoo") => "urlFoo" (hard to figure out how to know to do this) or "uRLFoo".

This doesn't seem like it is a well-formed operation. When you have a name you either generate an exported or unexported identifier to begin with.

What is the use case for this operation? Is it well-enough defined to merit being in the standard library?

In contrast, #30064 was very well-defined inspection of these, not modification.

/cc @zombiezen for wire's use case

@zombiezen
Copy link
Contributor

The test cases for the export/unexport function may be helpful for understanding.

Wire's generated code requires a bunch of temporary variables of distinct types. In early versions of the tool, the temporaries were very unhelpful (e.g. v0, arg42, etc.) The linked algorithm is used to create friendlier names for these temporaries based on the type of the variable, although in some cases it gives up and falls back to the undescriptive name. The symbols returned from the export and unexport functions are never intended to be used by anything other than Wire. They are observable in the generated code, but are intended as a human inspection aid, and may still be weirdly capitalized for initialisms.

@mvdan
Copy link
Member Author

mvdan commented Jun 26, 2019

so they are not inverses

Ah, I hadn't thought about that. It certainly makes the problem even trickier.

In a way, it will would make the API a bit more confusing. On the other hand, it can be used as an argument that even more (all?) of the implementations out there are wrong.

I'm also bothered by your example Unexport("URLFoo") => "urlFoo" (hard to figure out how to know to do this) or "uRLFoo".

I only brought up this point to clarify that the API wouldn't try to do anything fancy here. So it would simply convert URLFoo into uRLFoo.

What is the use case for this operation? Is it well-enough defined to merit being in the standard library?

Code generation, similar to Wire's use case. For example, in the chromedp link I gave above, we code generate a list of exported devices via their names. The original names aren't exported, so we must do that operation.

I agree with you that the two operations here aren't well understood generally. Perhaps that means the APIs shouldn't exist. That still worries me, though, because of all the misconceptions and wrong implementations out in the wild.

@rsc
Copy link
Contributor

rsc commented Aug 14, 2019

@mvdan, it's not that the operations aren't well understood. It's that they are fundamentally not well-defined. They depend on knowledge of the underlying human language, which is quite complex.

I looked more carefully at the three examples given earlier.

Wire. I don't think wire supports the argument that a general library is needed. The unexport function is unused. The export function has exactly one use:

name := typeVariableName(t, "", func(name string) string { return "_wire" + export(name) + "Value" }, g.nameInFileScope)

The name on the right-hand side is the name of the function being called. So for names foo, Bar, you get _wireFooValue and _wireBarValue. It's kind of cute but I think ultimately not worth the complexity involved in defining export. It seems like it would have been completely fine to just use a form that did not change the names at all, like _wire_foo and _wire_Bar. The latter would actually be easier to match back up with the functions being called, since no case changes would be required. If anything, I think export should not have been defined or used, and therefore that having token.Export would have been an attractive nuisance in this case more than a help.

Chromedp. Another motivating example was a use in chromedp, which says:

	name := cleanRE.ReplaceAllString(d.Name, "")
	name = strings.ToUpper(name[0:1]) + name[1:]

The claim was that the second line is wrong. But the first line strips out any non-ASCII-alphanumeric characters, so the second line is correct as written. The domain-specific knowledge here (in particular, that the input is not general text but is a specific JSON structure](https://raw.githubusercontent.com/GoogleChrome/puppeteer/master/lib/DeviceDescriptors.js)) makes this code fine. A general solution token.Export would not have this knowledge.

Go-endpoints. The use of strings.Title may be wrong, but since the input is multiple words, using token.Export would not be any better, unless token.Export handled conversions like foo_bar -> FooBar, which would be surprising. In this case, the code should be using the same conversion that the protocol buffer support for Go does, which is more involved. Maybe that should be exported; maybe it should be copied. But it wouldn't be token.Export.

What I see from these three examples is that conversion between naming styles is sometimes unnecessary and often requires other adjustments throughout the name.

It looks like token.Export is difficult to define in general, and on top of that would rarely be sufficient on its own. And we have zero examples of needing token.Unexport.

@mvdan
Copy link
Member Author

mvdan commented Aug 14, 2019

Fair enough, I agree that this isn't clearly a good idea. I've become less sure about this idea over the past few months.

@mvdan mvdan closed this as completed Aug 14, 2019
@golang golang locked and limited conversation to collaborators Aug 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants