Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp: compiling incorrectly #59007

Open
apmckinlay opened this issue Mar 13, 2023 · 11 comments · May be fixed by #66165
Open

regexp: compiling incorrectly #59007

apmckinlay opened this issue Mar 13, 2023 · 11 comments · May be fixed by #66165
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@apmckinlay
Copy link

apmckinlay commented Mar 13, 2023

What version of Go are you using (go version)?

$ go version
go version go1.20.2 darwin/amd64

Does this issue reproduce with the latest release?

yes
and with 1.19 on the Go Playground

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/andrew/Library/Caches/go-build"
GOENV="/Users/andrew/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/andrew/Go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/andrew/Go:/Users/andrew/Dropbox/suneido_tests"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.20.2"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/andrew/Dropbox/gsuneido/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/dc/0y3gfqnn56g6pd8d3_b5wz300000gn/T/go-build1311023722=/tmp/go-build -gno-record-gcc-switches -fno-common"
GOROOT/bin/go version: go version go1.20.2 darwin/amd64
GOROOT/bin/go tool compile -V: compile version go1.20.2
uname -v: Darwin Kernel Version 22.3.0: Mon Jan 30 20:42:11 PST 2023; root:xnu-8792.81.3~2/RELEASE_X86_64
ProductName:		macOS
ProductVersion:		13.2.1
BuildVersion:		22D68
lldb --version: lldb-1400.0.38.17
Apple Swift version 5.7.2 (swiftlang-5.7.2.135.5 clang-1400.0.29.51)

What did you do?

pat := regexp.MustCompile(`0A|0[aA]`)
fmt.Println(pat.MatchString("0a"))

(https://go.dev/play/p/jCLPp8AXnJB)

What did you expect to see?

true

What did you see instead?

false

I tested on regex101 to confirm that this should match.
(Strangely, it shows golang matching)
It appears to compile incorrectly. Simplify doesn't seem to matter.

syntax.Parse("0A|0[aA]", 0)
fmt.Println(syntax.Compile(re))
  0	fail
  1*	rune1 "0" -> 2
  2	rune1 "A" -> 3
  3	nop -> 4
  4	match

syntax.Parse("0a|0[aA]", 0)
fmt.Println(syntax.Compile(re))
  0	fail
  1*	rune1 "0" -> 2
  2	rune "AAaa" -> 3
  3	match
@apmckinlay apmckinlay changed the title regex: compiling incorrectly regexp: compiling incorrectly Mar 13, 2023
@cherrymui cherrymui added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 13, 2023
@cherrymui cherrymui added this to the Go1.21 milestone Mar 13, 2023
@cherrymui
Copy link
Member

cc @rsc

@ianlancetaylor
Copy link
Contributor

The regexp 0A|0[aA] means 0(A|0)[aA]. You seem to be looking for (0A)|(0[aA]).

@ianlancetaylor ianlancetaylor closed this as not planned Won't fix, can't repro, duplicate, stale Mar 14, 2023
@randall77
Copy link
Contributor

From https://github.com/google/re2/wiki/Syntax , it says

The operator precedence, from weakest to strongest binding, is first alternation, then concatenation, and finally the repetition operators... Some examples: ab|cd is equivalent to (ab)|(cd)

That makes me think the OP's regexp is (0A)|(0[aA]), not 0(A|0)[aA].

@randall77 randall77 reopened this Mar 14, 2023
@crisman
Copy link
Contributor

crisman commented Mar 14, 2023

The regexp 0A|0[aA] means 0(A|0)[aA]. You seem to be looking for (0A)|(0[aA]).

No, re2 flavor regex don't bind that way, see https://github.com/google/re2/wiki/Syntax

Some examples: ab|cd is equivalent to (ab)|(cd);

https://go.dev/play/p/EQ6nkEEHJCT

For re `aB|a[cd]` parsed as `a[Bc-d]`
For re `0A|0[aA]` parsed as `0A(?:)`
For re `0a|0[aA]` parsed as `0[Aa]`

Notice the second one parsed as the string 0A followed by an empty non-capturing group, which is just wrong. It should be similar to the third item where the alternation collapsed and the second character is casefolded a (either [aA] or (?i:a)).

An prefix followed by a capital letter with a character class with capital and lower case seems broken, larger prefixes are still broken (see the play link above)

@crisman
Copy link
Contributor

crisman commented Mar 14, 2023

I think that syntax.Regexp Equal() is broken between literals and caseFolded literals.

justA, _ := syntax.Parse(`A`, syntax.Perl)
foldA, _ := syntax.Parse(`(?i:A)`, syntax.Perl)
fmt.Println(justA.Equal(foldA)) // should not be true

Given the odd pattern as above (0A|0[aA]).

In src/regexp/syntax/parse.go once factor() pulls out the 0 prefix it then tries to factor() between A and (?i:A) (what [aA] is parsed as) and is able to find a prefix of A as Equal() says true. Once the code goes wrong it then does some sillyness with alternation between two empty non-capturing groups, which makes no sense as one would expect down a wrong branch.

In src/regexp/syntax/regexp.go Equal() under the case OpLiteral does not have any logic for the FoldCase flag, which I think is the error.

At first glance the logic in re2/regexp.cc TopEqual() for Literal does care about FoldCase which lets me be more sure this is what needs fixing.

@apmckinlay
Copy link
Author

The regexp 0A|0[aA] means 0(A|0)[aA]. You seem to be looking for (0A)|(0[aA]).

Even if that was the case, there would still be a problem because 0A|0[aA] is compiling and working very differently from 0a|0[aA] (as I showed)

@ianlancetaylor
Copy link
Contributor

Apologies for being confused.

@gopherbot gopherbot modified the milestones: Go1.21, Go1.22 Aug 8, 2023
@dolmen
Copy link
Contributor

dolmen commented Dec 12, 2023

$ perl -E 'say ("0a" =~ /0A|0[Aa]/)'
1
$ go version
go version go1.21.5 darwin/arm64
$ go run github.com/dolmen-go/goeval@latest -i=regexp -i=fmt 'fmt.Println(regexp.MustCompile("0A|0[Aa]").MatchString("0a"))' 
false

@dolmen
Copy link
Contributor

dolmen commented Dec 12, 2023

@apmckinlay You must use the regexp/syntax.Perl flag with regexp/syntax.Parse to match regexp.Compile behaviour.

Here is your example fixed: https://go.dev/play/p/OrLoZxhMwMN

But the issue is still there.

@junyer
Copy link
Contributor

junyer commented Dec 12, 2023

RE2 has a similar bug, FWIW, but it's caused by this assumption. I haven't looked into what the Go regexp package is doing, but I suspect that @crisman is correct or, at the very least, on the right track.

@gopherbot gopherbot modified the milestones: Go1.22, Go1.23 Feb 6, 2024
itchyny added a commit to itchyny/go that referenced this issue Mar 7, 2024
Fixing `Equal` method in `regexp/syntax` package fixes compilation of
some alternate patterns like `0A|0[aA]`.

Fixes golang#59007
itchyny added a commit to itchyny/go that referenced this issue Mar 7, 2024
Fixing Equal method in regexp/syntax package fixes compilation of
some alternate patterns like "0A|0[aA]".

Fixes golang#59007
@gopherbot
Copy link

Change https://go.dev/cl/569735 mentions this issue: regexp: fix compiling alternate patterns of different fold case literals

itchyny added a commit to itchyny/go that referenced this issue Apr 2, 2024
Fixing Equal method in regexp/syntax package fixes compilation of
some alternate patterns like "0A|0[aA]".

Fixes golang#59007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants