Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strings: NewReplacer does not replace strings in order #25071

Closed
nicksnyder opened this issue Apr 25, 2018 · 13 comments
Closed

strings: NewReplacer does not replace strings in order #25071

nicksnyder opened this issue Apr 25, 2018 · 13 comments
Labels
Documentation FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@nicksnyder
Copy link

nicksnyder commented Apr 25, 2018

What version of Go are you using (go version)?

go1.10.1

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

amd64
darwin

What did you do?

https://play.golang.org/p/oB5SCKv4LUx

r := strings.NewReplacer("aaa", "xxx", "bbbaaa", "yyy")
fmt.Println(r.Replace("bbbaaa")) // yyy

What did you expect to see?

I expected Replace to return bbbxxx.

The documentation says:

Replacements are performed in order, without overlapping matches.

I expect the aaa->xxx replacement to happen first so the second replacement bbbaaa->yyy shouldn't happen.

What did you see instead?

yyy

@ghost
Copy link

ghost commented Apr 25, 2018

This is working as intended: replacements are being performed in order on the string linearly, without attempting to replace any text that was replaced.

The docs should be updated to make this more clear.

@nicksnyder
Copy link
Author

Makes sense. It didn't occur to me to read it the other way.

@ALTree ALTree added Documentation NeedsFix The path to resolution is known, but the work has not been done. labels Apr 25, 2018
@ALTree ALTree added this to the Go1.11 milestone Apr 25, 2018
@gopherbot
Copy link

Change https://golang.org/cl/109375 mentions this issue: strings: clarify Replacer's replacement order

@pam4
Copy link

pam4 commented Apr 25, 2018

meaning that substrings are replaced in the order they appear in the target string, and not that the old->new replacements are applied in the order they're passed to NewReplacer.

If more than one pattern matches at the same target string position (a pattern is prefix of another), the one that is applied does depend on the order they're passed to NewReplacer.
But I don't know if it can be conveyed without making the doc too complex.

@nicksnyder
Copy link
Author

I agree it is hard to be exhaustive without unnecessarily complicating the doc. The proposed change lgtm

@pam4
Copy link

pam4 commented Apr 26, 2018

@nicksnyder I'm not sure what I would have done myself, I think it's a pity though.

If I want to replace "var1" -> "var2" and "var" -> "var1" but the order of these two replacements is undocumented, I have no choice but roll up my own implementation to be sure that the order will always be predictable.

Unless the caller knows in advance that no pattern is prefix of another, I think he would generally prefer a predictable behavior (first pattern first? longest pattern first?).

Of course it is also true that if we expose too much details, the docs become less effective for the most common use cases, less clear, and easier to misinterpret.

@ALTree
Copy link
Member

ALTree commented Apr 26, 2018

@pam4

Unless the caller knows in advance that no pattern is prefix of another, I think he would generally prefer a predictable behavior

I'm not sure what about the current behaviour you find "unpredictable". The patterns are replaced in the order they appear in the string, from left to right. If two (old->new) couples has the same "old", the first couple (in the order they're passed as arguments) is used. I believe this constitutes "predictable" behaviour.

Are you suggesting to change something in the implementation? Or to change the documentation? How would you change it? Are you asking to explicitly document the Replacer's behaviour in the "old->new1", "old->new2" case? I think that's a reasonable request.

@nicksnyder
Copy link
Author

Are you asking to explicitly document the Replacer's behaviour in the "old->new1", "old->new2" case? I think that's a reasonable request.

Yes, I wondered this myself and had to experiment to find the answer. I found it hard to come up with a pithy doc for it though.

@pam4
Copy link

pam4 commented Apr 27, 2018

@ALTree thanks for the reply.

Are you suggesting to change something in the implementation?

No.

Are you asking to explicitly document the Replacer's behaviour in the "old->new1", "old->new2" case?

More generally, to explicitly document the behavior of A->B, C->D when C is a prefix of A (or vice versa).

strings.NewReplacer("X1", "X2", "X", "X1").Replace("X1") // "X2"
strings.NewReplacer("X", "X1", "X1", "X2").Replace("X1") // "X11"

If I had some good wording to suggest, I would have done so.

I'm not sure what about the current behaviour you find "unpredictable".

Bad choice of word, it probably is predictable, just not explicitly documented. In your own code, would you rely on it, in cases similar to the example above?

@ALTree
Copy link
Member

ALTree commented Apr 27, 2018

More generally, to explicitly document the behavior of A->B, C->D when C is a prefix of A (or vice versa).

This is already documented, it's the "without overlapping matches" part. "without overlapping matches" means that once we are done with the A->B replacement, we leave that part of the string and move on towards the right.

In your own code, would you rely on it, in cases similar to the example above?

Yes, because it's the documented behaviour.

@pam4
Copy link

pam4 commented Apr 27, 2018

@ALTree

This is already documented, it's the "without overlapping matches" part. "without overlapping matches" means that once we are done with the A->B replacement, we leave that part of the string and move on towards the right.

Incidentally I made an example with possible overlapping matches, but I wasn't referring to them.
I'm talking about which one of the two pair is used when they both match at the same starting position. The "overlapping" rule doesn't answer that.

Here is another example:

// first pair is used
strings.NewReplacer("ant", "bug", "antique", "old").Replace("antique") // "bugique"
// pairs swapped, again first pair is used
strings.NewReplacer("antique", "old", "ant", "bug").Replace("antique") // "old"

As you said, the first pair is used.
But such rule must cover, not only the case of two pair with the same "old" part, but also the case of an "old" part that is prefix of another "old" part (regardless of the "new" parts).
Otherwise, any of the two examples above could return either "bugique" or "old".

@ALTree
Copy link
Member

ALTree commented Apr 27, 2018

Ah, I see. Thanks for explaining. So it's not just a matter of clarifying what happens with couples having the same exact "old". We should probably also explain that the first couple is given higher priority when there are couples with overlapping "old"s. I'm not even sure this behaviour is actually mandated by the API, or if it's just an artefact of the current implementation.

@pam4
Copy link

pam4 commented Apr 27, 2018

Yes, that's what I meant.

I'm not even sure this behaviour is actually mandated by the API, or if it's just an artefact of the current implementation.

Well, obviously it was never documented, but looking at the code I presume it was intentional: the generic algorithm uses a lookup trie for prioritized key/value pairs, and keys are added with decreasing priority, here (len(oldnew)-i is the priority).
(Other algorithms are only used in case of single pair or single byte patterns).

And I can think of situations where you need to rely on this priority, that's why I said it's a pity.

@golang golang locked and limited conversation to collaborators Apr 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Documentation FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants