Skip to content

strings: FieldFunc behaves differently from Split #72841

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xformerfhs opened this issue Mar 13, 2025 · 5 comments
Closed

strings: FieldFunc behaves differently from Split #72841

xformerfhs opened this issue Mar 13, 2025 · 5 comments
Labels
Documentation Issues describing a change to documentation. NeedsFix The path to resolution is known, but the work has not been done.

Comments

@xformerfhs
Copy link

Go version

go version go1.24.1 windows/amd64

Output of go env in your module/workspace:

set AR=ar
set CC=gcc
set CGO_CFLAGS=-O2 -g
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-O2 -g
set CGO_ENABLED=0
set CGO_FFLAGS=-O2 -g
set CGO_LDFLAGS=-O2 -g
set CXX=g++
set GCCGO=gccgo
set GO111MODULE=
set GOAMD64=v3
set GOARCH=amd64
set GOAUTH=netrc
set GOBIN=
set GOCACHE=C:\Users\User\AppData\Local\go-build
set GOCACHEPROG=
set GODEBUG=
set GOENV=C:\Users\User\AppData\Roaming\go\env
set GOEXE=.exe
set GOEXPERIMENT=
set GOFIPS140=off
set GOFLAGS=
set GOGCCFLAGS=-m64 -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=T:\UserTemp\User\go-build198970446=/tmp/go-build -gno-record-gcc-switches
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMOD=D:\Users\User\go\src\transposer\go.mod
set GOMODCACHE=D:\Users\User\go\pkg\mod
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=D:\Users\User\go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=C:\Program Files\Go
set GOSUMDB=sum.golang.org
set GOTELEMETRY=on
set GOTELEMETRYDIR=C:\Users\User\AppData\Roaming\go\telemetry
set GOTMPDIR=T:\UserTemp\User
set GOTOOLCHAIN=auto
set GOTOOLDIR=C:\Program Files\Go\pkg\tool\windows_amd64
set GOVCS=
set GOVERSION=go1.24.1
set GOWORK=
set PKG_CONFIG=pkg-config

What did you do?

I wanted to split a string on several different characters, not only one, so I replaced a call of strings.Split(encodings, ":") by strings.FieldFunc(encodings, separatorFunc) where separatorFunc was this:

func separatorFunc(r rune) bool {
   return r == ':' || r == ','
}

What did you see happen?

With encodings := ":utf16bom" the calls had two different results:

Split: Slice with two elements [ "" "utf16bom" ]

FieldFunc: Slice with one element [ "utf16bom" ]

What did you expect to see?

With encodings := ":utf16bom" the calls should have delivered identical results:

Split: Slice with two elements [ "" "utf16bom" ]

FieldFunc: Slice with two elements [ "" "utf16bom" ]

@seankhliao seankhliao changed the title strings.FieldFunc behaves differently from strings.Split strings: FieldFunc behaves differently from Split Mar 13, 2025
@seankhliao
Copy link
Member

If they were the same we wouldn't need 2 different functions.
I don;t think the behaviour can be changed, maybe Fields can also document that it collapses leading and trailing split characters.

@seankhliao seankhliao added the Documentation Issues describing a change to documentation. label Mar 13, 2025
@xformerfhs
Copy link
Author

xformerfhs commented Mar 13, 2025

Well, no, the two functions would still be different. They would definitely not collapse into one.

The point is that Split only splits on one specific separator string, while FieldFunc allows multiple characters.

  1. I - and quite a few other people out there - need a function that splits a string on any of different characters. In my example Split splits only on :, but I need a function that splits on either :, or ,. In general there should be a list of split characters or strings. There is no such function in the strings package and people recommend using FieldFunc for this, since there is no such thing as a SplitAny function analogous to the IndexAny function.

  2. Split splits the string ":utf16bom" into two strings: An empty one and one with the value "utf16bom". This is what one would expect from a split function. If there is a separator and there are no characters before that separator the first element is an empty string. This is the correct way to do this. In my point of view it is a bug that FieldFunc ignores a leading separation character. What is the rationale behind this? This is totally unexpected and unexpected behaviour is never a good thing in programming. FieldFunc should return an empty string as the first element, if the scanned string begins with a rune where the separator function returns true.

@adonovan
Copy link
Member

I agree with @seankhliao that the documentation of Fields and FieldsFunc could be more explicit about leading and trailing spaces, and that neither function can be changed at this point.

It sounds like you wish there was a third function in the library. Feel free to propose one.

@xformerfhs
Copy link
Author

I agree with both of you that the best solution is to add a few words about this unexpected behaviour (i.e. bug) to the documentation.

As for the missing functions: I already wrote two functions SplitAny and CountAny along the lines of IndexAny. They can easily be adapted to more general function, but they do for me now.

@seankhliao seankhliao added the NeedsFix The path to resolution is known, but the work has not been done. label Mar 14, 2025
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/658218 mentions this issue: bytes,strings: document Fields trimming of leading and trailing characters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Issues describing a change to documentation. NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants