Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: strings: add Lines #51039

Closed
icholy opened this issue Feb 6, 2022 · 21 comments
Closed

proposal: strings: add Lines #51039

icholy opened this issue Feb 6, 2022 · 21 comments

Comments

@icholy
Copy link

icholy commented Feb 6, 2022

If I already have a string in memory, there's no simple and efficient way of splitting it into lines.

The naïve approach won't handle \r\n.

strings.Split(s, "\n")

Using a bufio.Scanner works but it creates copies instead of slicing the existing string.

var lines []string
sc := bufio.NewScanner(strings.NewReader(s))
for sc.Scan() {
    lines = append(lines, sc.Text())
}

I'd like a new strings.Lines function similar to strings.Fields for this use-case.

@icholy icholy added the Proposal label Feb 6, 2022
@gopherbot gopherbot added this to the Proposal milestone Feb 6, 2022
@icholy
Copy link
Author

icholy commented Feb 6, 2022

Here are some other approaches I've seen while searching:

strings.Split(strings.ReplaceAll(s, "\r\n", "\n"), "\n")
regexp.MustCompile("\r?\n").Split(s, -1)

@ALTree
Copy link
Member

ALTree commented Feb 6, 2022

Assuming a "line" is anything delimited by one of \n, \r, and \r\n, calling strings.FieldsFunc with func(c rune) bool { return c == '\n' || c == '\r' } seems to work.

@icholy
Copy link
Author

icholy commented Feb 6, 2022

@ALTree thanks, I'll be using that. However, it doesn't give the correct output for strings with blank lines like "\n\n\n"

@ALTree
Copy link
Member

ALTree commented Feb 6, 2022

Oh, right. It's actually just Fields. You want something that returns 4 strings for "text\n\n\ntext`, two of which are empty, which is different from what Fields does.

@ALTree
Copy link
Member

ALTree commented Feb 6, 2022

What about adding strings.SplitFunc instead? That would mirror strings.FieldsFunc and let you split on a custom function.

@icholy
Copy link
Author

icholy commented Feb 6, 2022

What would the signature of the callback look like? Something like func(rune) bool wouldn't be able to handle the \r\n case.

@ALTree
Copy link
Member

ALTree commented Feb 6, 2022

EDIT: right, that's probably why it doesn't exist. The API is less obvious. You'd need to check substrings instead of the single runes.

@icholy
Copy link
Author

icholy commented Feb 6, 2022

If something like SplitMulti was added, then you'd also need all the different variations. That gets messy.

func SplitMulti(s, seps []string) []string
func SplitMultiAfter(s, seps []string) []string
func SplitMultiAfterN(s, seps []string, n int) []string
func SplitMultiN(s, seps []string, n int) []string

@icholy icholy changed the title proposal: strings: add Lines func strings: add Lines Feb 15, 2022
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Feb 16, 2022
@ianlancetaylor ianlancetaylor changed the title strings: add Lines proposal: strings: add Lines Feb 16, 2022
@rsc
Copy link
Contributor

rsc commented Feb 16, 2022

Cut works great.

for text != "" {
   line, rest, ok := strings.Cut(text, "\n")
   text = rest
   line = strings.TrimSuffix(line, "\r") // if you care
   ... use line ...
}

@rsc
Copy link
Contributor

rsc commented Feb 16, 2022

For that matter, strings.SplitAfter(text, "\n") also works.
There's an extra empty string at the end of the slice if the text ends in a final \n (as most do).

@rsc
Copy link
Contributor

rsc commented Feb 16, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) Feb 16, 2022
@icholy
Copy link
Author

icholy commented Feb 16, 2022

One potential problem with strings.Lines is that it might encourage people to write inefficient code.

data, _ := os.ReadFile("lines.txt)
for _, line := range strings.Lines(string(data)) {
  // use line
}

@jimmyfrasche
Copy link
Member

Having a reader and writers that transform line endings would be great. (Or, if https://pkg.go.dev/golang.org/x/text/transform were in std, they could just be transformers.)

@icholy
Copy link
Author

icholy commented Feb 16, 2022

@jimmyfrasche can be done with https://pkg.go.dev/github.com/icholy/replace

t := replace.String("\r\n", "\n")

@jimmyfrasche
Copy link
Member

Since you have that, how are you ending up with a string in memory that may have windows line endings?

@icholy
Copy link
Author

icholy commented Feb 16, 2022

@jimmyfrasche Say you're working with an AST which exposes comments as multiline strings.

@rsc
Copy link
Contributor

rsc commented Feb 23, 2022

It sounds like (1) the API is somewhat inefficient since it has to allocate the slice, (2) there are other ways to write the code that are more efficient and just as clean using existing standard library APIs, and (3) this could be in an external library easily instead if the exact operation of 'return a slice of all the lines' is required.

In that case, it seems like this is heading toward likely decline. Do I have that right?

@icholy
Copy link
Author

icholy commented Feb 23, 2022

(1) the API is somewhat inefficient since it has to allocate the slice,

Given the target use-cases, this would be negligible.

(2) there are other ways to write the code that are more efficient and just as clean using existing standard library APIs

I'm not sure about "just as clean". The proposed strings.Lines api is more ergonomic than other solutions.

(3) this could be in an external library easily instead if the exact operation of 'return a slice of all the lines' is required.

ofc

In that case, it seems like this is heading toward likely decline. Do I have that right?

Ya, I think so. The potential for abuse makes it not worth it IMO.

@rsc
Copy link
Contributor

rsc commented Mar 2, 2022

Given the target use-cases, this would be negligible.

The problem with putting things in the standard library is that it will get used for many uses cases beyond the target. Things in the standard library have to be particularly robust against unexpected uses.

@rsc
Copy link
Contributor

rsc commented Mar 2, 2022

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Decline in Proposals (old) Mar 2, 2022
@rsc rsc moved this from Likely Decline to Declined in Proposals (old) Mar 9, 2022
@rsc
Copy link
Contributor

rsc commented Mar 9, 2022

No change in consensus, so declined.
— rsc for the proposal review group

@rsc rsc closed this as completed Mar 9, 2022
@golang golang locked and limited conversation to collaborators Mar 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

5 participants