proposal: runes: create new package analogous to bytes, for rune slices #34313

srinathh · 2019-09-16T01:45:29Z

Working with and manipulating non-English data requires us to use runes slices. If we want to do operations like comparing two rune slices, replacing, indexing etc, we have to cast to string, do those operations and cast back or write custom functions.

I would like to therefore propose creating a package runes mirroring the package bytes with functionality to work directly with rune slices rather than bytes to support international language use cases

The text was updated successfully, but these errors were encountered:

bserdar · 2019-09-16T01:47:58Z

This would not be necessary once (if) generics are implemented.

lootch · 2019-09-16T03:59:07Z

On 9/16/19, Burak Serdar ***@***.***> wrote: This would not be necessary once (if) generics are implemented.

You could also claim (I do) that if something like this came along, there would be less justification for "generics". Frankly, generics require a much shallower boundary between intrinsic and user-defined objects or, perhaps more usefully, but much more difficult to do right, a much richer "type" mechanism with open-ended attributes. Go with generics then becomes either a beautiful academic artifact or a Frankenstein monster of a language. Guess which is more likely to happen first. Incidentally, even knowing that the Go Team's efforts put the integrity of the language very high on the list of objectives, it is still quite revealing that there is no "Go with Generics" in the wild, whether to be disparaged or to be revered. Lucio.

bserdar · 2019-09-16T04:30:40Z

On Sun, Sep 15, 2019 at 10:00 PM lootch ***@***.***> wrote: On 9/16/19, Burak Serdar ***@***.***> wrote: > This would not be necessary once (if) generics are implemented. > You could also claim (I do) that if something like this came along, there would be less justification for "generics". Frankly, generics require a much shallower boundary between intrinsic and user-defined objects or, perhaps more usefully, but much more difficult to do right, a much richer "type" mechanism with open-ended attributes. Go with generics then becomes either a beautiful academic artifact or a Frankenstein monster of a language. Guess which is more likely to happen first.

I disagree. I think the latest generics proposal has a chance to be useful without becoming a monster. The idea that in order to implement generics you have to define the semantics of the generic types precisely is what created c++/Java generics. Defining generics in terms of existing types has a better chance of being used correctly because it demands less from the author and from the reader.

Incidentally, even knowing that the Go Team's efforts put the integrity of the language very high on the list of objectives, it is still quite revealing that there is no "Go with Generics" in the wild, whether to be disparaged or to be revered.

I think the reason for this is the experience with the c++/java generics, and despite all the efforts, many counter-proposals ended up offering similar solutions.

…

Lucio. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#34313?email_source=notifications&email_token=AA4AGDNAYJ6EF3SDURKWZDDQJ4AEDA5CNFSM4IW4YTDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6YBNGQ#issuecomment-531633818>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA4AGDIJ6DIY2LM72MIKAXLQJ4AEDANCNFSM4IW4YTDA> .

robpike · 2019-09-16T12:27:57Z

I have trouble with your opening sentence: Working with and manipulating non-English data requires us to use runes slices. That is presented as a fact but is an opinion, one I just don't think is true.

I speak only English but I have spent a lot of time working with text that is not ASCII and, although it can be attractive to work with rune slices, they are not really a good solution. In fact, I think they are a trap: they don't answer most of the questions that persist with multilingual text because, despite what many want to believe, a rune is not a character. (See blog.golang.org/strings for an explanation of this.)

I would therefore prefer not to add such a package as it would promote bad practice.

srinathh · 2019-09-16T18:35:43Z

@robpike I hear you but now I'm really puzzled. My take away from your blog post (which I have revisited many times over the years including just before making this proposal today) is that runes are a better way to deal with non-english characters and smileys ad what not vs. bytes. Ranging over a string gives runes.

Now I do recall from reading the article linked to in your blog that some Unicode code points are modifiers and what not and some characters can be made with multiple combination of Unicode code points and they can mess things up but what's a better way to deal with mutable collections of Unicode code points than a slice of runes that's made available in Go?

robpike · 2019-09-17T01:26:19Z

Runes are code points, from which characters are made. Bytes are also things from which characters are made. Why use both?

Sometimes we need the code points themselves, but providing a package that handles slices of them will encourage the poor practice of converting back and forth between rune slices and bytes slices/strings rather than the more efficient method of just iterating the bytes appropriately.

srinathh · 2019-09-17T02:33:07Z

May I share an example use case? Suppose we're building a simple text editor. When people enter text, the enter unicode code points to make characters. If we use rune slices, we can simply insert the required rune at the right position.

If we are using byte slices, for each insertion or deletion, we would have to iterate the slice through a function to parse Unicode, find the right position to insert or delete & make the change. Since this iteration can throw an error, we'd have to check for error. If we are using strings, we'd have to reallocate for every single insertion or deletion & then again run iterations.

Essentially if we want to work with mutable sets of unicode characters, then neither the bytes solution nor the strings solution seems efficient

ghost · 2019-11-03T19:19:22Z

off topic, but I thought to mention Perl6 here

https://www.evanmiller.org/a-review-of-perl-6.html

cf: Strings and Regexes

caveat, see footnote 2

a contributor to Perl6

https://perlgeek.de/

also wrote this module

https://metacpan.org/pod/Perl6::Str

ghost · 2019-11-03T19:33:26Z

the idea of using rope data structures in an editor intrigued me at one point

but I've never taken the time to look into it

robpike · 2019-11-03T20:34:41Z

Essentially if we want to work with mutable sets of unicode characters, then neither the bytes solution nor the strings solution seems efficient

And the runes solution is misleading and leads to incorrect thinking. Text is hard, and rune slices solve almost none of what makes text hard.

ghost · 2019-11-04T16:32:51Z

on a side note

A Philosophy of Software Design

by J. Ousterhout

The book includes commentary on a student project of writing a text editor.

rsc · 2019-11-06T19:00:35Z

Using runes in a text editor seems like a good idea at first, but it fails badly once you get to Unicode compose sequences, like e + composing acute vs é. The former is two runes while the latter is one. And for some sequences there's not even a single-rune sequence. In general Unicode text processing requires considering largish sequences of input, not just a single byte and not just a single rune either. There's little benefit to []rune as the representation, and there are real drawbacks to having two representations. So Go has standardized on []byte/string and UTF-8.

If you find that []rune works really well for your editor somehow (maybe you ignore all the multirune characters), that's fine. A "runes" library forked from "bytes" could easily be maintained as a go get-able package outside the standard library.

Note that generics are not going to help here, because the encoding stored in the underlying data is different between []byte and []rune.

This is a likely decline. Leaving open for a week for final comments.

ghost · 2019-11-07T16:02:52Z

Hopefully my comment won't be interpreted as cultural bias.

I'm opposed to this on linguistic reasons.

Rune is used in Plan 9, and also appears in Golang.

The suggested use diverging excessively from the original North Germanic languages' use of the word.

D. Mendeleev used एक (eka) and द्वि (dvi) for certain postulated elements.

экаалюминій, экаборъ, экасилицій
двимарганец

rsc · 2019-11-13T18:18:01Z

There have been no comments objecting to declining this issue. Declined.

gopherbot added this to the Proposal milestone Sep 16, 2019

gopherbot added the Proposal label Sep 16, 2019

rsc added the Proposal-FinalCommentPeriod label Nov 6, 2019

rsc changed the title ~~proposal: create a package runes with functionality similar to bytes to work with rune slices~~ proposal: runes: create new package analogous to bytes, for rune slices Nov 6, 2019

andybons mentioned this issue Nov 6, 2019

proposal: review meeting minutes #33502

Open

rsc closed this as completed Nov 13, 2019

golang locked and limited conversation to collaborators Nov 12, 2020

gopherbot added the FrozenDueToAge label Nov 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: runes: create new package analogous to bytes, for rune slices #34313

proposal: runes: create new package analogous to bytes, for rune slices #34313

srinathh commented Sep 16, 2019

bserdar commented Sep 16, 2019

lootch commented Sep 16, 2019 via email

bserdar commented Sep 16, 2019 via email

robpike commented Sep 16, 2019 •

edited

srinathh commented Sep 16, 2019

robpike commented Sep 17, 2019

srinathh commented Sep 17, 2019

ghost commented Nov 3, 2019

ghost commented Nov 3, 2019

robpike commented Nov 3, 2019 •

edited

ghost commented Nov 4, 2019

rsc commented Nov 6, 2019

ghost commented Nov 7, 2019 •

edited by ghost

rsc commented Nov 13, 2019

proposal: runes: create new package analogous to bytes, for rune slices #34313

proposal: runes: create new package analogous to bytes, for rune slices #34313

Comments

srinathh commented Sep 16, 2019

bserdar commented Sep 16, 2019

lootch commented Sep 16, 2019 via email

bserdar commented Sep 16, 2019 via email

robpike commented Sep 16, 2019 • edited

srinathh commented Sep 16, 2019

robpike commented Sep 17, 2019

srinathh commented Sep 17, 2019

ghost commented Nov 3, 2019

ghost commented Nov 3, 2019

robpike commented Nov 3, 2019 • edited

ghost commented Nov 4, 2019

rsc commented Nov 6, 2019

ghost commented Nov 7, 2019 • edited by ghost

rsc commented Nov 13, 2019

robpike commented Sep 16, 2019 •

edited

robpike commented Nov 3, 2019 •

edited

ghost commented Nov 7, 2019 •

edited by ghost