-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regexp: LiteralPrefix behaving incorrectly #30425
Comments
Maybe just adding document, because prefix is applied for unanchored matches only Line 88 in 20930c7
|
Yes, this is a documentation issue. The code is behaving as intended. |
Do note that if the expression is a bit more anchored, then LiteralPrefix() applies again:
The behaviour has changed through versions. The relevant versions I found:
|
@antong go 1.6 to 1.12 seems to be correct. Only |
@Gnouc ,Well, I found only one place in the standard library where https://play.golang.org/p/j27u5DdE1oS :
|
@antong: I mean |
I must be missing something obvious, but why would "^foo$" deserve to return a prefix but not "^foo" ? Any match to either of these needs to have the prefix "foo" (discounting multiline flag etc.). regexp/syntax also has a variant
Now I must admit I've never used So I would like to understand why "^foo$" returns a literal prefix even though it is anchored. I believe I do understand how it happens: There is an optimization that takes a separate path for regexps that can be compiled for "one-pass" execution, and the prefix is calculated differently for these. But to me it sounds like an internal implementation detail and not an external feature of a regular expression that should have relevance on the output. For reference, here are
|
cc: @rsc for insights on the behavior change of LiteralPrefix. |
Some more testcases here: https://play.golang.org/p/e0IJhTBidQe |
I spent some time looking at this, and I believe there are two separate issues we should be discussing here:
This issue is about (2), so I’ll focus on this here. Once we resolve the issue we should discuss (1) as well, and I can open a separate issue about that (it may be too late to change behavior, but at least the docs could be changed). The change in 1.3 seems to be a regression that wasn’t caught for a long time. If we don’t see a prefix for In 1.3, the one-pass algorithm was added for regexps, and a new path appeared for the prefix computation (which
The first three go through The last one triggers one-pass (because There is a deviation in the implementations of |
I just noticed that in #21463 (comment), @rsc says:
So it sounds like an answer for (1). |
This is tricky, because part of the idea behind the complete bool in LiteralPrefix was that if you have
then if complete is true then you should be able to substitute strings.Index(prefix) for a regular expression search entirely. But if anchors are involved then that's not safe. In the behaviors that @antong helpfully worked out, Go 1.2 is doing the right thing. Go 1.3 through Go 1.5 are suspicious but at least not wrong. Go 1.6 and later are wrong or at least very misleading when they return complete=true for any anchored match. I think the right thing to do here, in terms of the safest result, is to go back to the Go 1.2 behavior and return empty string, false if re begins with any empty-width assertions (\b, ^, and so on). There is no safe way to use LiteralPrefix if it returns an actual prefix there. Any optimization involving search for that prefix out-of-band could make the later search do the wrong with with the assertions. This is different from what I said in 2017. Maybe I didn't think through it completely then, or maybe I haven't thought through it completely today. :-) |
Change https://golang.org/cl/377294 mentions this issue: |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yeah, go1.12 has just been released.
What did you do?
See also on Play or even more
What did you expect to see?
What did you see instead?
The text was updated successfully, but these errors were encountered: