bytes: docs do not always state that UTF-8 encoding is used #21950

ghost · 2017-09-20T17:00:48Z

Not all functions in the bytes package that operate on strings and runes specify that UTF-8 encoding is used. From an initial glance, it appears the following functions are missing that information from their documentation:

ContainsRune
Count
Fields
Runes
Title
ToLower
ToLowerSpecial
ToTitle
ToTitleSpecial
ToUpper
ToUpperSpecial
TrimSpace

Does it make sense to include that detail in each of the above functions?

ianlancetaylor · 2017-09-20T17:24:01Z

I do see that many of the function doc comments do explicitly say "UTF-8-encoded". It doesn't really seem necessary to me, but I guess it might be reasonable to say that a few more places.

robpike · 2017-09-20T21:20:47Z

The package documentation for the strings packages says, "Package strings implements simple functions to manipulate UTF-8 encoded strings." Adding that information to every function in the package is unnecessary.

On the other hand, the bytes package makes no such sweeping claim, so for the functions where that is an issue, such as ContainsRune, it would indeed be reasonable to add the information although I believe context does make it unnecessary. Go is a UTF-8 language.

as · 2017-09-21T04:42:02Z

@robpike
I admittedly don't know how many users get that UTF-8 replaces invalid runes in the stream, but I know I didn't when I first started learning the language. The result is more unexpected than usual in the bytes package.

The example below bytes.ToLower returns a byte slice with replacement characters when the input contains an invalid rune. The doc doesn't mention that functions like bytes.ToLower may do this, but it would be nice if it did.

https://play.golang.org/p/Qkv_hPXUwL

gopherbot · 2017-09-28T00:45:02Z

Change https://golang.org/cl/66750 mentions this issue: bytes: explicitly state if a function expects UTF-8-encoded data

gopherbot added the Documentation label Sep 20, 2017

gopherbot closed this as completed in f2af0c1 Oct 2, 2017

golang locked and limited conversation to collaborators Oct 2, 2018

gopherbot added the FrozenDueToAge label Oct 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bytes: docs do not always state that UTF-8 encoding is used #21950

bytes: docs do not always state that UTF-8 encoding is used #21950

ghost commented Sep 20, 2017

ianlancetaylor commented Sep 20, 2017

robpike commented Sep 20, 2017 •

edited

as commented Sep 21, 2017

gopherbot commented Sep 28, 2017

bytes: docs do not always state that UTF-8 encoding is used #21950

bytes: docs do not always state that UTF-8 encoding is used #21950

Comments

ghost commented Sep 20, 2017

ianlancetaylor commented Sep 20, 2017

robpike commented Sep 20, 2017 • edited

as commented Sep 21, 2017

gopherbot commented Sep 28, 2017

robpike commented Sep 20, 2017 •

edited