Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: regexp: add (*Regexp).MustSubexpIndex #47593

Closed
shabbyrobe opened this issue Aug 8, 2021 · 4 comments
Closed

proposal: regexp: add (*Regexp).MustSubexpIndex #47593

shabbyrobe opened this issue Aug 8, 2021 · 4 comments

Comments

@shabbyrobe
Copy link

shabbyrobe commented Aug 8, 2021

Regexp.SubexpIndex was added in a recent proposal (#32420).

This is a very helpful addition but doesn't address the use case where regexps are made in a top-level var block using MustCompile. The issue I have with the current implementation is that SubexpIndex can't be used alongside MustCompile without introducing the risk of deferred runtime slice bounds panics, as it returns -1 on error instead of panicking on startup.

To be used safely in a global context, you either need to bring a helper function along every time (thus negating the value of the original proposal, which was to save carrying around a helper function and encourage better practices via the stdlib), or capture the indexes in an init() and error check each one, which doesn't scan well and degrades the readability benefits of colocating the pattern compilation with the subexp index extraction.

When used naively, without checks or helpers, pattern compilation and subexp index extraction can go out of sync without being noticed, turning into a runtime panic possibly a long time after program startup:

var (
	pattern      = regexp.MustCompile(`(?P<foo>bork)`)
	patternIndex = pattern.SubexpIndex("bar") // Doesn't fail at startup
)
func check(v string) {
	match := pattern.FindStringSubmatch(os.Args[1])
	if match != nil {
		fmt.Println(match[patternIndex])
	}
}
check("foo") // ok
check("bork") // panic!

I concede this example may look contrived when viewed in its distilled form, but this situation is very real and has occurred to me in more complex patterns when groups have moved around, or typos were introduced. Regexps are easy to mangle and can be tricky to format in Go when they capture more than a couple of small groups.

The helper function version I am currently using:

func mustSubexpIndex(p *regexp.Regexp, name string) int {
	idx := p.SubexpIndex(name)
	if idx < 0 {
		panic(fmt.Errorf("regexp %q does not have named capture %q", p, name))
	}
	return idx
}
var (
	pattern      = regexp.MustCompile(`(?P<foo>bork)`)
	patternIndex = mustSubexpIndex(pattern, "bar") // Fails at startup
)

This works, of course, and it's what I'm doing currently, but does not save us any real effort over the examples that justified the original proposal as we still need to carry a function around with us wherever we go.

The init version:

var (
	pattern      = regexp.MustCompile(`(?P<foo>bork)`)
	patternIndex = pattern.SubexpIndex("bar")
)
func init() {
	if patternIndex < 0 {
		panic("pattern missing named capture")
	}
}

This also works, but requires us to remember to keep it all perfectly in sync and things start to get further and further away if the pattern and capturing gets more complex.

So my proposal would be to add the following API to the regexp package:

// MustSubexpIndex is like SubexpIndex but panics if SubexpIndex would
// return -1. It simplifies safe initialization of global variables holding
// subexpression indexes.
func (*Regexp) MustSubexpIndex(name string) int

I think I read somewhere a while back that there's a preference to avoid more of the Must idiom (but I couldn't find the specific reference), but in this case I think it's warranted as SubexpIndex can't be changed to panic without compromising backward compatibility. It also pairs well with MustCompile and serves part of the same use case.

@gopherbot gopherbot added this to the Proposal milestone Aug 8, 2021
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Aug 11, 2021
@rsc
Copy link
Contributor

rsc commented Aug 11, 2021

In general we try not to have must of every possible API, only the ones that are very likely to be used with initializers.
I understand that your use case is with an initializer, but I am not sure that is common enough to warrant even more methods on Regexp.
The same argument that justifies MustSubexpIndex would justify a Must form of essentially every regexp method.

@rsc rsc moved this from Incoming to Active in Proposals (old) Aug 11, 2021
@rsc
Copy link
Contributor

rsc commented Aug 11, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Decline in Proposals (old) Aug 18, 2021
@rsc
Copy link
Contributor

rsc commented Aug 18, 2021

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Likely Decline to Declined in Proposals (old) Aug 25, 2021
@rsc
Copy link
Contributor

rsc commented Aug 25, 2021

No change in consensus, so declined.
— rsc for the proposal review group

@rsc rsc closed this as completed Aug 25, 2021
@golang golang locked and limited conversation to collaborators Aug 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

3 participants