Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index/suffixarray: optimize QSufSort by removing allocated scratch space #15480

Closed
samuelhuang opened this issue Apr 28, 2016 · 3 comments
Closed

Comments

@samuelhuang
Copy link

samuelhuang commented Apr 28, 2016

Go's implementation of QSufSort allocates scratch space for an intermediate update step. We can get rid of this scratch space.

In /master/src/index/suffixarray/qsufsort.go, after x.sa[] == sa[pi:pk] has been sorted, with x.inv[x.sa[i] + x.h] as key, updateGroups() updates x.sa[] and x.inv[]. updateGroup() needs to scan x.inv[x.sa[i] + x.h] for "groups", but also writes to x.inv[]. To avoid pollution, current code uses 2 loops, using x.buf[] to store intermediate group results. x.buf[] is a scratch space that needs allocation.

I propose to optimize updateGroups() by doing the update in 1 pass without x.buf[]. Let

old_group := x.inv[sa[0]] == offset + len(x.sa) - 1

The key idea is to realize that x.inv[x.sa[:]] will always be assigned from old_group to some value in [offset, old_group]. Moreover, nothing in x.inv[] outside x:sa[:] uses these values. Therefore we can write to x.inv[] as we read x.inv[x:sa[i] + x.h], providing that every time we read x.inv[x:sa[i] + x.h] we undo the change by interpreting [offset, old_group] as old_group

A prototype was implemented in C++ for Chrome's Courgette:
https://codereview.chromium.org/1914303002/diff/1/courgette/third_party/qsufsort.h
Eventually a different solution was used -- but the first solution is useful for optimizing Go's QSufSort implementation.

Edit 2016/05/18: "The key idea is... " x.inv[x.sa[:] + x.h] => x.inv[x.sa[:]]

@bradfitz bradfitz changed the title Optimize QSufSort by removing allocated scratch space. index/suffixarray: optimize QSufSort by removing allocated scratch space Apr 28, 2016
@bradfitz bradfitz added this to the Unplanned milestone Apr 28, 2016
@bradfitz
Copy link
Contributor

/cc @griesemer

@griesemer griesemer self-assigned this May 6, 2016
@gopherbot
Copy link

Change https://golang.org/cl/174100 mentions this issue: index/suffixarray: index 3-10X faster in half the memory

@dgryski
Copy link
Contributor

dgryski commented Apr 26, 2019

/cc @dsnet

@golang golang locked and limited conversation to collaborators May 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants