New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strings: Count is O(n*m) #4600
Labels
Milestone
Comments
Here is an example using a fnv-like hash. A suggestion would be to do the following in Contains/Count(s1, s2): if len(s2) > 16, count the number N of occurences of s2[:4] in s1. If N*len(s2) > K*len(s1) for some constant K (the hash function cost, here K=3), then full comparisons are probably more costly than hashing and we switch algorithms. s1 = strings.Repeat("ACTG", 32<<10), s2 = strings.Repeat("ACTG", 4<<10) SetBytes(len(s1)+len(s2)) BenchmarkContainsTorture 5 403871636 ns/op 0.37 MB/s BenchmarkCountTorture 5 403779882 ns/op 0.37 MB/s BenchmarkContainsHashTorture 5000 454527 ns/op 324.43 MB/s BenchmarkCountHashTorture 5000 439006 ns/op 335.90 MB/s s1 = strings.Repeat("ACTG", 32<<10) s2 = strings.Repeat("ACTG", 2<<10)+"GATC"+strings.Repeat("ACTG", 2<<10) SetBytes(len(s1)+len(s2)) BenchmarkContainsTorture 100000 28224 ns/op 5224.35 MB/s BenchmarkCountTorture 10000 227133 ns/op 649.20 MB/s BenchmarkContainsHashTorture 20000 87624 ns/op 1682.81 MB/s BenchmarkCountHashTorture 5000 684507 ns/op 215.42 MB/s Attachments:
|
It turns out that even on the original example from the G+ post the hash method is faster: BenchmarkContainsTorture 500000 4971 ns/op 223.28 MB/s BenchmarkCountTorture 500000 6585 ns/op 168.55 MB/s BenchmarkContainsHashTorture 500000 3062 ns/op 362.43 MB/s BenchmarkCountHashTorture 500000 3793 ns/op 292.62 MB/s s1 = "Lang lang lang. langaha Langan langarai langate langauge langbanite Langbehn langbeinite langca Langdon Lange langeel langel Langelo Langeloth Langer Langford Langham Langhian Langhorne langi langiel Langill Langille langite langka lang-kail Langland langlauf langlaufer langlaufers langlaufs langle Langley langley langleys Langlois Langmuir Lango Langobard Langobardic langobardic langoon langooty langosta langourous langourously langouste langrage langrages langrel langrels Langrenus Langreo Langres langret langridge langsat Langsdon Langsdorffia langset langsettle Langshan langshan langshans Langside langspiel langspil Langston Langsville langsyne langsynes langteraloo Langton Langtry language languaged languageless languages languaging langue langued Languedoc languedoc Languedocian Languedoc-Roussillon languent langues languescent languet languets languette languid languidly languidness languidnesses languish languished languisher languishers languishes languishing languishingly languishment languor languorment languorous languorously languorousness languors langur langurs Langworthy" s2 = "languid" |
This issue was closed by revision 23093f8. Status changed to Fixed. |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: