-
Notifications
You must be signed in to change notification settings - Fork 18k
regexp: improve performance #19629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The benchmark game is, as the name suggest, a game. A game that is won by tuning a specific program until it performs good on the specific input used in the benchmark and/or, in this case, using highly-optimized regexp libraries. The page you linked, for example, shows that C++ is slower than PHP, Hack, Node.js, and Typescript. Is it, though? This issue is, IMHO, a little too broad to be actionable. "Make the regexp engine faster" well, yes, everyone likes fast regexp engines. We have a few (a little more specific) issues about that. |
@ALTree > The benchmark game is, as the name suggest, a game.
No. Shows a C++ program slower than PHP, Hack, Node.js and Typescript programs.
Yes. regex-redux is a new task, based on regexdna. Try re-writing some of the old Go regex-dna programs to perform the new substitutions. |
@igouy , |
As you say, in a language like Python, the regexp engine is implemented in C. So you are comparing a somewhat tuned Go implementation with a highly tuned C implementation. You also need to consider the characteristics of the engine. Go has chosen to follow the re2 path (not surprising, since Russ Cox is a major author of both Go and re2). re2 has much better performance characteristics than some other regexp engines, in that it never has an exponential slowdown, but that comes at a cost for other regexps (https://swtch.com/~rsc/regexp/). Even then, the C++ re2 implementation has many optimizations that are not in the Go implementation. See, for example, #11646. So: can regexp be sped up? Yes. But it will take work. I'm not sure whether to bother leaving this issue open or not. |
I'm going to close this in favor of concrete optimization bugs like #11646. |
go 1.8, linux, amd64
Recently, I use regular expression a lot in golang. From the benchmarksgame, I see the regexp of go is very slow, about 5x-10x than many other languages.
After I made some modifications by removing unnecessary channels and allocations in the source code, the performance is improved about 1/4, but is still slower than even many dynamic languages.
from pprof:
(pprof) top 10
68.33s of 68.49s total (99.77%)
Dropped 31 nodes (cum <= 0.34s)
Showing top 10 nodes out of 17 (cum >= 63.59s)
flat flat% sum% cum cum%
32.46s 47.39% 47.39% 37.17s 54.27% regexp.(*machine).add
18.58s 27.13% 74.52% 27.05s 39.49% regexp.(*machine).step
7.06s 10.31% 84.83% 68.28s 99.69% regexp.(*machine).match
4.77s 6.96% 91.79% 4.77s 6.96% runtime.memmove
1.86s 2.72% 94.51% 1.86s 2.72% regexp/syntax.(*Inst).MatchRunePos
1.72s 2.51% 97.02% 1.72s 2.51% regexp.(*inputBytes).step
1.48s 2.16% 99.18% 1.48s 2.16% regexp/syntax.EmptyOpContext
0.39s 0.57% 99.75% 2.25s 3.29% regexp/syntax.(*Inst).MatchRune
0.01s 0.015% 99.77% 4.84s 7.07% regexp.(*Regexp).replaceAll
0 0% 99.77% 63.59s 92.85% main.countMatches
It seems that there are many type assertion in regexp, am I right? And is there any solution for this?
The text was updated successfully, but these errors were encountered: