Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: regexp: Add functions to the Regexp type to ease accessing named capture groups #24208

Closed
marstr opened this issue Mar 1, 2018 · 4 comments

Comments

@marstr
Copy link

marstr commented Mar 1, 2018

Today, when a *regexp.Regexp has named capture groups, one cannot directly find submatches from the subexpression name. Rather, there is a level of indirection, where one finds the index of a named subexpression. In practical terms, this often either creates hard-coded references to a groups index or writing boilerplate code to find the index of a named group.

I've quickly written up four additional methods that work as accessors for named capture groups:

marstr@7f0dde1

func (re *Regexp) FindNamedSubmatch(b []byte) map[string][]byte {}
func (re *Regexp) FindNamedStringSubmatch(s string) map[string]string {}
func (re *Regexp) FindAllNamedSubmatch(b []byte, n int) []map[string][]byte {}
func (re *Regexp) FindAllNamedStringSubmatch(s string, n int) []map[string]string {}

These four methods return each named subexpression mapped to the appropriate submatch. They associate the empty string with the whole expression's match. Any unnamed capture groups are excluded. (See the Example tests I added in the commit for a quick demonstration of the behavior.)

One undefined behavior is what to do when a regexp has multiple capture groups that share a name, like the example here:
https://play.golang.org/p/xeaMHKX1nya

The commit I link to does not have thorough enough testing for submission yet, if folks like this proposal.

@gopherbot gopherbot added this to the Proposal milestone Mar 1, 2018
@rsc
Copy link
Contributor

rsc commented Mar 5, 2018

At the least, you missed FindAllNamedIndexSubmatch, FindAllNamedStringSubmatchIndex, ...

There are already too many methods on Regexp. The bar here is high for adding more, especially since as you note the answer is ambiguous. There is already:

 func (re *Regexp) SubexpNames() []string

It would be easy for you to build a map[string]int from that []string, and then use it in indexing the regular []string, [][]byte, or []int returned by the existing methods. That's probably the right approach if you find yourself doing this a lot.

@rsc rsc closed this as completed Mar 5, 2018
@marstr
Copy link
Author

marstr commented Mar 5, 2018

Fair enough! Thanks for the consideration.

@robpike
Copy link
Contributor

robpike commented Mar 7, 2018

By the way, the package https://github.com/ghemawat/re might solve your problems even more nicely than what you're proposing.

@marstr
Copy link
Author

marstr commented Mar 7, 2018

Thanks for the tip, @robpike. The package linked certainly makes for clean/concise code. It doesn't quite do what I want though, because a big part of the motivation behind my proposal is to allow people a little less of a tight-coupling between their regexp definition and the code that consumes matches.

For instance, if I have a regexp: (?P<first>[a-zA-Z]+) (?P<last>[a-zA-Z]+) and modify it to (?P<first>[a-zA-Z]+) (?P<middle>[a-zA-Z]+) (?P<last>[a-zA-Z]+), any code that reads matches must be modified to work around middle. If I write my code in a way that doesn't use indices directly, the addition of the middle group won't impact existing code.

That said, I totally buy the argument that the stdlib's Regexp package would be cluttered by the methods I proposed. Based on this thread, it sounds like my best options would be to contribute to @ghemawat's package, or to write up one of my own. :)

@golang golang locked and limited conversation to collaborators Mar 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants