Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text: add grapheme cluster iteration #14820

Open
cgilling opened this issue Mar 14, 2016 · 4 comments
Open

x/text: add grapheme cluster iteration #14820

cgilling opened this issue Mar 14, 2016 · 4 comments
Assignees
Milestone

Comments

@cgilling
Copy link

Hi, I'm in the middle of implementing support for iterating over grapheme clusters in a project that I am working on and it seems like something that would be a good fit for the golang.org/x/text. I wanted to reach out and see how much interest there would be around this and whether I should work on making something that would fit into this project. I was thinking the interface could be somewhat like this (naming just a stand-in for now, not a big fan of the name decode) :

package grapheme

// Decode reads the first grapheme cluster out of s and return it. To get the length of the
// grapheme simply take the len() of the return value.
func Decode(s string) string

I didn't want to go through the whole proposal process until I get an idea of whether there might be interest for this. I hope this is the right forum for this, if not, I'd appreciate being pointed to the right place.

Thanks

@bradfitz bradfitz changed the title adding grapheme cluster iteration to golang.org/x/text x/text: add grapheme cluster iteration Apr 9, 2016
@bradfitz bradfitz added this to the Unreleased milestone Apr 9, 2016
@mpvl
Copy link
Contributor

mpvl commented Apr 10, 2016

I have a segment package planned, that would provide an API for defining any kind of segmentation. The advantage of a single API for grapheme, word, line, sentence, etc. breaking and segmentation is that it promotes reuse of sometimes complicated code.

It may be a while before this is done. However, in the mean time, you can now already approximate Grapheme Cluster Iteration using "golang.org/x/text/unicode/norm".Iter. Normalization segments are not entirely the same. but it is sufficiently close for many applications..

@SamWhited
Copy link
Member

See also #17256

@rivo
Copy link

rivo commented Mar 13, 2019

Because the normalization package didn't do the trick in many cases, I went ahead and implemented grapheme cluster segmentation in the following package:

https://github.com/rivo/uniseg

It passes the grapheme cluster break test cases so I'm fairly confident that it works as expected. But since it's a new project, I appreciate any bug reports.

I might add Word Boundaries and Sentence Boundaries, too, at some point. But for now, it's not my main focus.

I don't know if there's any interest in moving this to x/text at some point. I'm open to that but I'd like to know the efforts and responsibilities that would come with that. Get in touch if you want to push this forward.

@SamWhited
Copy link
Member

@mpvl I've been needing an implementation of this for a project recently and have been considering writing up a design document for it. However, it sounds like you've got a more general purpose API in mind already. Would you have the time to write that up and post it somewhere? If you aren't planning an implementation in the immediate future it's possible that I'll be writing one anyways, and I'd much rather write something that stands a chance of eventually being upstreamed. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants