bufio: document SplitFunc corner cases #25472

yaxum62 · 2018-05-20T22:27:42Z

The current document of bufio.SplitFunc does not cover all cases and left few details buried into implementations. Such as:

A Scanner behave differently when a SplitFunc returns token with nil or an empty byte slice. example
A Scanner usually skip advanced bytes when returned token is nil. example

Here I propose few changes to doc, clearly state those behavior. Such as:

If a SplitFunc returns a non-nil error, returned advance and token are ignored.
If a SplitFunc returns a non-nil token, even if it is empty, Scanner will always yield, no matter what error a reader previously returns.
If a SplitFunc returns a nil token with non-zero advance, Scanner will skip those bytes.

The text was updated successfully, but these errors were encountered:

bcmills · 2018-05-21T16:57:23Z

This issue seems two describe two separate issues: missing doc comments, and a requested change in behavior.

Please file those separately: documentation issues tend to be relatively easy (and uncontentious) to fix, whereas changes in behavior may require a more rigorous proposal, especially if they are not compatible with Go 1.

yaxum62 · 2018-05-22T15:57:08Z

I have rephrase this issue to documentation, and a separate issue will be created after this is done.

bradfitz · 2018-05-29T19:13:16Z

@robpike, you want to clarify Scanner docs here?

robpike · 2018-06-05T05:25:34Z

The docs seem to me to cover the situation well enough but of course one can always explain in more detail. Detailed response:

If there's an error, scanning is stopping. It makes no difference what the scanner does before it returns.
There is no error in this case, so I don't see why it matters what the token is. The scanner advances with that token. However, I do not understand your point about previous errors. An error on the read stops the scan.
This one seems again very obvious to me. There is a token (being nil is irrelevant) and an advance value, so it advances.

The docs seem clear to me, unless you're arguing that a nil token is somehow special, which it is not. That could be documented but I don't believe it's necessary. Token values are irrelevant to the scanner, which just passes them on.

yaxum62 · 2018-06-09T21:40:55Z

The different behaviors between nil and empty slice is not that obvious since those two are generally interchangeable in golang. The difference should be clearly documented here if it is supported.

And also the feature "nil token skips bytes" are not always working as expected. In some case it will terminate the scan if the reader already hits a non-nil error. example. As @bcmills suggested the doc need to be fixed before we can fix the implementation.

robpike · 2018-06-10T12:53:20Z

I'm sorry but I still don't understand what you're after. If the reader hits an error, the scan always terminates. The token makes no difference. If you believe otherwise, please explain further.

I also disagree with the assertion that nil and empty slice are interchangeable. They're not in general.

yaxum62 · 2018-06-11T16:27:33Z

If the reader returns error and split function returns nil token, scan function skip everything still in the buffer, with certain buffer size. If you check the example I posted

…

On Sun, Jun 10, 2018, 5:55 AM Rob Pike ***@***.***> wrote: I'm sorry but I still don't understand what you're after. If the reader hits an error, the scan always terminates. The token makes no difference. If you believe otherwise, please explain further. I also disagree with the assertion that nil and empty slice are interchangeable. They're not in general. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#25472 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILPap4udjFTPGl-5VqLDrhpfYaekVlMks5t7RcugaJpZM4UGQSD> .

pam4 · 2018-06-12T12:33:21Z

The docs seem clear to me, unless you're arguing that a nil token is somehow special, which it is not.

It is special, or more precisely, there is no such thing as a nil token: returning (n, nil, nil) (with n > 0) from the SplitFunc doesn't produce any token (either the SplitFunc is called again with more data or Scan returns false).

Conversely returning (n, []byte{}, nil) does produce a zero-length token.

This distinction is undocumented; the docs only make a distinction about the (0, nil, nil) case.

If the reader returns error and split function returns nil token, scan function skip everything still in the buffer

I agree this is also undocumented / confusing.
In other words: returning (n, nil, nil) (with n > 0) from the SplitFunc while atEOF is true always terminates the scan immediately (even if n < len(data)).

All we know from the docs is that an error from the SplitFunc terminates the scan, but if the SplitFunc returns no error, the exact condition that terminates the scan is not documented at all.

ianlancetaylor · 2018-06-13T19:05:57Z

The docs describe what happens when a SplitFunc returns a nil token. A []byte{} is not a nil token. As far as I can see the docs are accurate. I don't see any reason to explicitly say anything about a non-nil slice with length zero. It's non-nil, and everything else follows.

I think more clarification about returning n, nil, nil, where n > 0, might be reasonable. I think it's there in the docs, which just say that 0, nil, nil is a special case, but it takes spme thought to see that n, nil, nil is not a special case.

gopherbot · 2018-06-13T19:45:24Z

Change https://golang.org/cl/118695 mentions this issue: bufio: clarify SplitFunc docs for nil token

pam4 · 2018-06-13T19:47:02Z

The docs describe what happens when a SplitFunc returns a nil token.

As you said yourself, it doesn't.
It just talks about the (0, nil, nil) case.
If (n, nil, nil) also have a special effect, the docs should just say so, otherwise it is incomplete. There is no basis to infer it.

As far as I can see the docs are accurate.

Accurate but incomplete.

I don't see any reason to explicitly say anything about a non-nil slice with length zero.

I agree. That would be the normal case. I was just pointing out that the (n, nil, nil) case, being different, must be documented.

This only cover the first point in my previous message. There is also the second point: the condition that terminates the scan is also not documented.

ianlancetaylor · 2018-06-13T19:57:35Z

@pam4 Do you agree with the change I sent in https://golang.org/cl/118695 ?

I feel like the condition that terminates the scan is documented at (*Scanner).Scan and ErrFinalToken.

pam4 · 2018-06-13T20:28:37Z

Do you agree with the change I sent in https://golang.org/cl/118695 ?

Definitely better.

I feel like the condition that terminates the scan is documented at (*Scanner).Scan and ErrFinalToken.

Have you read my first message? Because I don't know how else to say it...
Of course any error from the SplitFunc terminates the scan, and that is a documented condition, but it is unclear what happens if the SplitFunc returns no error.

The only relevant part of the docs is:

It returns false when the scan stops, either by reaching the end of the input or an error.

What does it mean to reach the end of the input?

For example, suppose Scan calls the SplitFunc with 100 bytes of data and atEOF set to true.
If such call returns (3, nil, nil), the scan terminates immediately.
If such call returns (3, []byte{'x'}, nil), the scan continues.
How can I infer this behavior from the docs?

~~And if such call returns (100, []byte{'x'}, nil), is the SplitFunc never called again, or it is called one last time with empty data?~~

ianlancetaylor · 2018-06-13T20:54:45Z

Thanks, I added a comment about what happens if the SplitFunc returns a nil token when called with atEOF set to true.

pam4 · 2018-06-13T21:45:43Z

Thanks.

If the token is not nil, the Scanner returns it to the user.

I think this sentence is very effective to clarify my first point.

If the token is nil, the Scanner reads more data and continues scanning;
if there is no more data--if atEOF was true--the Scanner returns.

It is correct, but how about something like:

If the token is nil, the Scanner reads more data and continues scanning,
unless atEOF was true, in which case the scan stops (Scan returns false).

Fixes golang#25472 Change-Id: Idb72ed06a3dc43c49ab984a80f8885352b036465 Reviewed-on: https://go-review.googlesource.com/118695 Reviewed-by: Rob Pike <r@golang.org>

pam4 · 2018-06-15T18:02:46Z

(*Scanner).Scan:

It returns false when the scan stops, either by reaching the end of the input or an error.

I just discovered that I had a misconception about it, which was probably caused by the above sentence.

There are exactly 3 ways for the scan to stop. None of them could be really described as "by reaching the end of the input":

the SplitFunc returns error
the SplitFunc returns nil token while atEOF is true
returning more than maxConsecutiveEmptyReads tokens without advancing the input (panic)

I think it is easy to interpret the quoted sentence to mean that you can also stop the scan by just advancing the input to the end. Probably the OP was also confused by this, considering his examples.

EDIT: Another observation:

Scan panics if the split function returns too many empty tokens without advancing the input.

Actually, it doesn't matter if they are empty (as in zero-length) or not.
Scan panics if the split function returns too many non-nil tokens of any kind without advancing the input.

Even though such distinction wouldn't make a difference for most users, I don't see why the docs should be so approximative.

gopherbot added this to the Proposal milestone May 20, 2018

gopherbot added the Proposal label May 20, 2018

yaxum62 changed the title ~~proposal: bufio: better support for customized SplitFunc~~ proposal: bufio: SplitFunc document left uncovered corner cases. May 22, 2018

gopherbot added the Documentation label May 22, 2018

bcmills changed the title ~~proposal: bufio: SplitFunc document left uncovered corner cases.~~ bufio: document SplitFunc corner cases May 22, 2018

bcmills removed the Proposal label May 22, 2018

bcmills modified the milestones: Proposal, Go1.11 May 22, 2018

bradfitz assigned robpike May 29, 2018

bradfitz added the NeedsFix The path to resolution is known, but the work has not been done. label May 29, 2018

gopherbot closed this as completed in 1e721cf Jun 13, 2018

golang locked and limited conversation to collaborators Jun 19, 2019

gopherbot added the FrozenDueToAge label Jun 19, 2019

rsc unassigned robpike Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bufio: document SplitFunc corner cases #25472

bufio: document SplitFunc corner cases #25472

yaxum62 commented May 20, 2018 •

edited

bcmills commented May 21, 2018

yaxum62 commented May 22, 2018

bradfitz commented May 29, 2018

robpike commented Jun 5, 2018

yaxum62 commented Jun 9, 2018

robpike commented Jun 10, 2018

yaxum62 commented Jun 11, 2018 via email

pam4 commented Jun 12, 2018 •

edited

ianlancetaylor commented Jun 13, 2018

gopherbot commented Jun 13, 2018

pam4 commented Jun 13, 2018

ianlancetaylor commented Jun 13, 2018

pam4 commented Jun 13, 2018 •

edited

ianlancetaylor commented Jun 13, 2018

pam4 commented Jun 13, 2018

pam4 commented Jun 15, 2018 •

edited

bufio: document SplitFunc corner cases #25472

bufio: document SplitFunc corner cases #25472

Comments

yaxum62 commented May 20, 2018 • edited

bcmills commented May 21, 2018

yaxum62 commented May 22, 2018

bradfitz commented May 29, 2018

robpike commented Jun 5, 2018

yaxum62 commented Jun 9, 2018

robpike commented Jun 10, 2018

yaxum62 commented Jun 11, 2018 via email

pam4 commented Jun 12, 2018 • edited

ianlancetaylor commented Jun 13, 2018

gopherbot commented Jun 13, 2018

pam4 commented Jun 13, 2018

ianlancetaylor commented Jun 13, 2018

pam4 commented Jun 13, 2018 • edited

ianlancetaylor commented Jun 13, 2018

pam4 commented Jun 13, 2018

pam4 commented Jun 15, 2018 • edited

yaxum62 commented May 20, 2018 •

edited

pam4 commented Jun 12, 2018 •

edited

pam4 commented Jun 13, 2018 •

edited

pam4 commented Jun 15, 2018 •

edited