Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strings, bytes: Document behavior of Replace when old is empty #8143

Closed
cespare opened this issue Jun 2, 2014 · 12 comments
Closed

strings, bytes: Document behavior of Replace when old is empty #8143

cespare opened this issue Jun 2, 2014 · 12 comments

Comments

@cespare
Copy link
Contributor

cespare commented Jun 2, 2014

strings.Replace and bytes.Replace insert 'new' in between each rune when 'old' is empty:

strings.Replace("foo 世界", "", ".", -1) // .f.o.o.
.世.界.

Especially for package bytes, this behavior is a little surprising (one might expect new
to be inserted between each byte). It can't be changed, but it should at least be
documented.
@ianlancetaylor
Copy link
Contributor

Comment 1:

Labels changed: added repo-main, release-go1.4, documentation.

@adg
Copy link
Contributor

adg commented Aug 20, 2014

Comment 2:

Why does bytes do this?
The code path has been there since the Replace function was added four years ago, but
there's no discussion of this in the code review[1]. Was it a mistake in copying the
strings.Replace code to bytes.Replace?
1. https://golang.org/cl/1731048

@adg
Copy link
Contributor

adg commented Aug 20, 2014

Comment 3:

cc rsc, who wrote the code in the first place.

Status changed to Accepted.

@cznic
Copy link
Contributor

cznic commented Aug 20, 2014

Comment 4:

I think it can be changed.
""""
Bugs. If a compiler or library has a bug that violates the specification, a program that
depends on the buggy behavior may break if the bug is fixed. We reserve the right to fix
such bugs.
""""[0]
  [0]: https://golang.org/doc/go1compat#expectations

@adg
Copy link
Contributor

adg commented Aug 20, 2014

Comment 5:

I agree that it can be changed.
I doubt anyone is depending on this behaviour.
Or if they are, they might not even realise it's doing the wrong thing sometimes.

@rsc
Copy link
Contributor

rsc commented Aug 24, 2014

Comment 6:

Feel free to change the docs. The code is correct as written. These functions are
operating on UTF-8 and should generate correctly formed UTF-8.
strings and bytes are meant to do the same thing, just on different input types (string
and []byte). In particular, if one treats input as UTF-8, the other does too.

@cespare
Copy link
Contributor Author

cespare commented Aug 24, 2014

Comment 7:

I don't understand #6. You say that "these functions are operating on UTF-8" -- but no,
UTF-8 only enters into it in the special case of old being empty. Consider
utf8.Valid(bytes.Replace([]byte("世界"), []byte{184}, nil, -1)) // false
So the special case behavior for empty 'old' is quite surprising. This bug arose from a
case where the UTF-8 behavior of bytes.Replace was undesired and ultimately a custom
replace function was needed.
bytes.Split becomes UTF-8 aware only when provided with an empty separator, but that is
well-documented.

@rsc
Copy link
Contributor

rsc commented Aug 24, 2014

Comment 8:

In general these routines guarantee that if given valid UTF-8 arguments, they produce
valid UTF-8 output. 
UTF-8 is self-synchronizing. A valid UTF-8 sequence cannot appear in the middle of some
other unrelated sequence, EXCEPT for the empty string. So to make that guarantee, the
empty string is the only case that needs special handling. For example, if you are
trying to do search and replace on UTF-8 text correctly, then the empty string has to be
handled carefully: you only want to match the empty string between adjacent runes, not
the one in the middle of multibyte runes.
You can disagree with the decision to make these routines UTF-8-friendly, but they are.
Count, Replace, Split all take care to handle the empty string correctly.
I am sorry it surprised you, but it's working as intended. Like I said, it's fine to
change the docs.

@cespare
Copy link
Contributor Author

cespare commented Aug 25, 2014

Comment 9:

Thanks for the explanation.

@cespare
Copy link
Contributor Author

cespare commented Aug 25, 2014

Comment 10:

See https://golang.org/cl/135760043/.

@gopherbot
Copy link

Comment 11:

CL https://golang.org/cl/135760043 mentions this issue.

@robpike
Copy link
Contributor

robpike commented Aug 25, 2014

Comment 12:

This issue was closed by revision 2c121b6.

Status changed to Fixed.

@rsc rsc added this to the Go1.4 milestone Apr 14, 2015
@rsc rsc removed the release-go1.4 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jun 25, 2018
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jul 9, 2018
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants