Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytes: docs do not always state that UTF-8 encoding is used #21950

Closed
ghost opened this issue Sep 20, 2017 · 4 comments
Closed

bytes: docs do not always state that UTF-8 encoding is used #21950

ghost opened this issue Sep 20, 2017 · 4 comments

Comments

@ghost
Copy link

ghost commented Sep 20, 2017

Not all functions in the bytes package that operate on strings and runes specify that UTF-8 encoding is used. From an initial glance, it appears the following functions are missing that information from their documentation:

  • ContainsRune
  • Count
  • Fields
  • Runes
  • Title
  • ToLower
  • ToLowerSpecial
  • ToTitle
  • ToTitleSpecial
  • ToUpper
  • ToUpperSpecial
  • TrimSpace

Does it make sense to include that detail in each of the above functions?

@ianlancetaylor
Copy link
Contributor

I do see that many of the function doc comments do explicitly say "UTF-8-encoded". It doesn't really seem necessary to me, but I guess it might be reasonable to say that a few more places.

@robpike
Copy link
Contributor

robpike commented Sep 20, 2017

The package documentation for the strings packages says, "Package strings implements simple functions to manipulate UTF-8 encoded strings." Adding that information to every function in the package is unnecessary.

On the other hand, the bytes package makes no such sweeping claim, so for the functions where that is an issue, such as ContainsRune, it would indeed be reasonable to add the information although I believe context does make it unnecessary. Go is a UTF-8 language.

@as
Copy link
Contributor

as commented Sep 21, 2017

@robpike
I admittedly don't know how many users get that UTF-8 replaces invalid runes in the stream, but I know I didn't when I first started learning the language. The result is more unexpected than usual in the bytes package.

The example below bytes.ToLower returns a byte slice with replacement characters when the input contains an invalid rune. The doc doesn't mention that functions like bytes.ToLower may do this, but it would be nice if it did.

https://play.golang.org/p/Qkv_hPXUwL

@gopherbot
Copy link

Change https://golang.org/cl/66750 mentions this issue: bytes: explicitly state if a function expects UTF-8-encoded data

@golang golang locked and limited conversation to collaborators Oct 2, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants