Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp/syntax: quadratic time for compilation in some cases #39542

Closed
andybalholm opened this issue Jun 12, 2020 · 7 comments
Closed

regexp/syntax: quadratic time for compilation in some cases #39542

andybalholm opened this issue Jun 12, 2020 · 7 comments
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@andybalholm
Copy link
Contributor

As discussed on https://groups.google.com/forum/#!topic/golang-nuts/8M-qVbRKIWA

Playground link to illustrate: https://play.golang.org/p/82UBmyfyqV-

Regular expressions with large numbers of empty alternations are taking a long time to compile. The time required is roughly quadratic in the length of the regex.

The quadratic algorithm is building the patchlist by repeatedly appending to it. Since it is a linked list with no tail pointer, appending to it is O(n). The append method is found in regexp/syntax/compile.go, at line 42.

I don't know that this is exactly a bug, but it is surprising that although executing a regex is linear-time, compiling one isn't. It probably doesn't affect any useful regex, but it might be a DOS vector.

@krader1961
Copy link

krader1961 commented Jun 12, 2020

Note that it isn't just empty alternations such as "x||y". This affects any alternation where the length of the alternatives is not equal to one. So it also affects "a|xy|b" and "a|b*|c" but not "a|b|c". For any regex written by a human, where the number of alternate expressions is small, the cost to compile this pattern isn't a problem. But, as the O.P. points out, it could be used as a DOS vector by stringing together a huge number of alternatives using the "|" operator.

P.S., I haven't looked at the source code but it appears that "a|b" is treated as "[ab]" based on the runtime cost of compiling such a regex when that pattern is extremely long. However, "aa|bb" does not receive that special treatment as it is also quite costly to compile when that pattern is repeated thousands of times.

@ianlancetaylor ianlancetaylor changed the title regexp: quadratic time for compilation in some cases regexp/syntax: quadratic time for compilation in some cases Jun 12, 2020
@ianlancetaylor ianlancetaylor added help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 12, 2020
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Jun 12, 2020
@ianlancetaylor
Copy link
Contributor

CC @rsc in case this has already been addressed in RE2.

@andybalholm
Copy link
Contributor Author

@krader1961 If it happens with longer alternations too, though we don't need to worry about it for regexes written by humans, it could be an issue with regexes written by computers. If I remember right, I've seen JS profanity filters that had their banned word list as a regex, all joined by pipe characters. That style of regex could be a problem if it has thousands of words.

@junyer
Copy link
Contributor

junyer commented Jun 14, 2020

Unsurprisingly, "append" is essentially identical in RE2.

@junyer
Copy link
Contributor

junyer commented Jun 15, 2020

Fixed in RE2 as per google/re2@e9d5179. Who wants to make the corresponding change for Go?

@ianlancetaylor ianlancetaylor added the NeedsFix The path to resolution is known, but the work has not been done. label Jun 15, 2020
@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 15, 2020
@andybalholm
Copy link
Contributor Author

@junyer I'll give it a try.

@gopherbot
Copy link

Change https://golang.org/cl/238079 mentions this issue: regexp/syntax: append patchLists in constant time

@golang golang locked and limited conversation to collaborators Jun 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants