-
Notifications
You must be signed in to change notification settings - Fork 18k
proposal: strings: SplitAny and CountAny #72847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You mention |
Can you add a comment with an example or two showing the exact behavior difference? Thanks. |
Yes, you are right. I meant Here are some examples: source := ":something,to:split-"
parts := strings.Split(source, ":")
// part is [ "" "something,to" "split-"]
separators := ":,-.;"
parts = strings.SplitAny(source, separators)
// parts is [ "" "something" "to" "split" "" ]
count := strings.CountAny(source, separators)
// count is 4 (for the 4 found characters ':', ',', ':' and '-'
separators = "o,t.;"
parts = strings.SplitAny(source, separators)
// parts is [ ":s" "me" "hing" "" "" ":split-" ]
parts = strings.SplitAnyN(source, separators, 2)
// parts is [ ":s" "mething,to:split-" ] I hope this helps to clarify the proposal. I will be glad to provide any more information that is deemed necessary. |
@xformerfhs I'm wary of adding more |
Hi, @jub0bs, thanks four your comment. I see that you have a reported a security vulnerability that was caused by using I agree that using However, a What are the possible alternatives?
I think of my use case: The user specifies two encodings. One for the input file and one for the output file as a flag like e.g. ...
if len(encodingsFlagValue) < minEncodingLen || len(encodingsFlagValue) > maxEncodingLen {
return errors.New("invalid length of encodings")
}
encodings := strings.SplitAnyN(encodingsFlagValue, ":,", 3)
if len(encodings) > 2 {
return errors.New("invalid number of encodings")
}
var inputEncoding string
var outputEncoding string
inputEncoding = encodings[0]
if len(encodings) == 1 {
outputEncoding = inputEncoding
} else {
outputEncoding = encodings[1]
}
... This is simple and straight-forward. Now the same with an iterator: ...
var inputEncoding string
var outputEncoding string
var haveInputEncoding bool
for encoding := strings.SplitAnySeq(encodingsFlagValue, ":,") {
if !haveInputEncoding {
inputEncoding = encoding
haveInputEncoding = true
} else {
outputEncoding = encoding
break
}
}
if len(outputEncoding) == 0 {
outputEncoding = inputEncoding
}
... This is much less readable and understandable. So, I think Even |
The alternative is to normalize the seperators into one seperator and then call the split function.
|
While this yields the correct result, it has three disadvantages:
Using |
I have only had to write a Your three points are sound, but I am curious how often this function would be used and how many times people have had to create it. |
Proposal Details
The
strings
package contains the functionSplit
that splits a string whereever the separator string occurs. Only one string can be specified.There are use cases where one wants to split on any of a collection of characters. Often
FieldsFunc
is recommended for this. However,FieldsFunc
has a bug in that it skips leading and trailing separators. This behaviour can not be fixed, just documented.In order to make it possible to split a string on any of several characters there should be functions analogous to
IndexAny
, namelySplitAny
andCountAny
.SplitAny
would have the signaturefunc SplitAny(s, chars string) []string
, whileCountAny
would befunc CountAny(s, chars string) int
.SplitAny
splits the string on any character inchars
, whileCountAny
returns how many times any of the characters inchars
appears in the supplied string.There could be another function
SplitAnyN
with the signaturefunc SplitAnyN(s, chars string, n int []string
that limits the splitting to a maximum ofn
strings.I attach file split_any.zip where
SplitAny
andCountAny
have been implemented as an example.The text was updated successfully, but these errors were encountered: