New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: index/suffixarray: expose qsufsort #29080
Comments
Maybe this is better suited for an external package rather than in standard library ? Please see https://golang.org/doc/faq#x_in_std |
Possibly, but qsufsort.go is already in the library, and forking it just to export one symbol doesn't feel right somehow. Patch here: https://go-review.googlesource.com/c/go/+/151982 |
Suffix arrays are a relatively simple data structure with complex algorithms to construct them (see SACA). Qsufsort is a relatively simply algorithm, but is not state of the art. Exposing QSufSort is an implementation detail of |
I don't have an objection exposing it; the signature and semantics are very well-defined and clean: Given a byte array, return an array of indices of the sorted substrings. It seems to me independent of algorithm for this specific function. But I'd use a different name when exporting. Perhaps just |
I'm still not a fan of exposing this, but if we were I'd suggest a signature like: func Sort(b []byte, sa []int) rather than the current internal signature. This gives us flexibility to reuse memory provided by the user. Also, state-of-the-art SACAs are several times faster than the current |
@dsnet +1 on the signature suggestion. Regarding the Go authors being on the hook: We currently are to some extend as this is already in the std library. It doesn't mean we have to provide the best implementation (it's already not the case). What we should decide is whether this package should be in the std library in the first place, or whether it should be deprecated in favor of external high-performance implementations. Hence deference to the proposal committee for now. |
Per discussion with @golang/proposal-review, our plan is to move this outside of the standard library eventually, so we’re not going to make any changes. In the meantime, we encourage you to fork the code into its own repo and experiment as you see fit. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
This is not an error, rather a missing feature in the src/index/suffixarray library.
What did you expect to see?
The library is written assuming that the qsufsort embedded within it will only ever be used as a suffix array for lookup purposes. Since the direct output of qsufsort is extremely useful, for example in performing a BWT, I'd expect qsufsort() in qsufsort.go to be exported, e.g., as QSufSort().
What did you see instead?
qsufsort() is not exported, making it difficult to get at the suffix array data structure directly. Serializing and then deserializing it via the (supported) mechanism would be unacceptably slow for a large suffix array, so is a nonstarter as an alternative to modifying qsufsort.go.
Note: I will submit a patch shortly.
The text was updated successfully, but these errors were encountered: