Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bufio: document SplitFunc corner cases #25472

Closed
yaxum62 opened this issue May 20, 2018 · 16 comments
Closed

bufio: document SplitFunc corner cases #25472

yaxum62 opened this issue May 20, 2018 · 16 comments
Labels
Documentation FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@yaxum62
Copy link

yaxum62 commented May 20, 2018

The current document of bufio.SplitFunc does not cover all cases and left few details buried into implementations. Such as:

  1. A Scanner behave differently when a SplitFunc returns token with nil or an empty byte slice. example
  2. A Scanner usually skip advanced bytes when returned token is nil. example

Here I propose few changes to doc, clearly state those behavior. Such as:

  1. If a SplitFunc returns a non-nil error, returned advance and token are ignored.
  2. If a SplitFunc returns a non-nil token, even if it is empty, Scanner will always yield, no matter what error a reader previously returns.
  3. If a SplitFunc returns a nil token with non-zero advance, Scanner will skip those bytes.
@gopherbot gopherbot added this to the Proposal milestone May 20, 2018
@bcmills
Copy link
Contributor

bcmills commented May 21, 2018

This issue seems two describe two separate issues: missing doc comments, and a requested change in behavior.

Please file those separately: documentation issues tend to be relatively easy (and uncontentious) to fix, whereas changes in behavior may require a more rigorous proposal, especially if they are not compatible with Go 1.

@yaxum62 yaxum62 changed the title proposal: bufio: better support for customized SplitFunc proposal: bufio: SplitFunc document left uncovered corner cases. May 22, 2018
@yaxum62
Copy link
Author

yaxum62 commented May 22, 2018

I have rephrase this issue to documentation, and a separate issue will be created after this is done.

@bcmills bcmills changed the title proposal: bufio: SplitFunc document left uncovered corner cases. bufio: document SplitFunc corner cases May 22, 2018
@bcmills bcmills removed the Proposal label May 22, 2018
@bcmills bcmills modified the milestones: Proposal, Go1.11 May 22, 2018
@bradfitz
Copy link
Contributor

@robpike, you want to clarify Scanner docs here?

@bradfitz bradfitz added the NeedsFix The path to resolution is known, but the work has not been done. label May 29, 2018
@robpike
Copy link
Contributor

robpike commented Jun 5, 2018

The docs seem to me to cover the situation well enough but of course one can always explain in more detail. Detailed response:

  1. If there's an error, scanning is stopping. It makes no difference what the scanner does before it returns.
  2. There is no error in this case, so I don't see why it matters what the token is. The scanner advances with that token. However, I do not understand your point about previous errors. An error on the read stops the scan.
  3. This one seems again very obvious to me. There is a token (being nil is irrelevant) and an advance value, so it advances.

The docs seem clear to me, unless you're arguing that a nil token is somehow special, which it is not. That could be documented but I don't believe it's necessary. Token values are irrelevant to the scanner, which just passes them on.

@yaxum62
Copy link
Author

yaxum62 commented Jun 9, 2018

The different behaviors between nil and empty slice is not that obvious since those two are generally interchangeable in golang. The difference should be clearly documented here if it is supported.

And also the feature "nil token skips bytes" are not always working as expected. In some case it will terminate the scan if the reader already hits a non-nil error. example. As @bcmills suggested the doc need to be fixed before we can fix the implementation.

@robpike
Copy link
Contributor

robpike commented Jun 10, 2018

I'm sorry but I still don't understand what you're after. If the reader hits an error, the scan always terminates. The token makes no difference. If you believe otherwise, please explain further.

I also disagree with the assertion that nil and empty slice are interchangeable. They're not in general.

@yaxum62
Copy link
Author

yaxum62 commented Jun 11, 2018 via email

@pam4
Copy link

pam4 commented Jun 12, 2018

The docs seem clear to me, unless you're arguing that a nil token is somehow special, which it is not.

It is special, or more precisely, there is no such thing as a nil token: returning (n, nil, nil) (with n > 0) from the SplitFunc doesn't produce any token (either the SplitFunc is called again with more data or Scan returns false).

Conversely returning (n, []byte{}, nil) does produce a zero-length token.

This distinction is undocumented; the docs only make a distinction about the (0, nil, nil) case.

If the reader returns error and split function returns nil token, scan function skip everything still in the buffer

I agree this is also undocumented / confusing.
In other words: returning (n, nil, nil) (with n > 0) from the SplitFunc while atEOF is true always terminates the scan immediately (even if n < len(data)).

All we know from the docs is that an error from the SplitFunc terminates the scan, but if the SplitFunc returns no error, the exact condition that terminates the scan is not documented at all.

@ianlancetaylor
Copy link
Contributor

The docs describe what happens when a SplitFunc returns a nil token. A []byte{} is not a nil token. As far as I can see the docs are accurate. I don't see any reason to explicitly say anything about a non-nil slice with length zero. It's non-nil, and everything else follows.

I think more clarification about returning n, nil, nil, where n > 0, might be reasonable. I think it's there in the docs, which just say that 0, nil, nil is a special case, but it takes spme thought to see that n, nil, nil is not a special case.

@gopherbot
Copy link

Change https://golang.org/cl/118695 mentions this issue: bufio: clarify SplitFunc docs for nil token

@pam4
Copy link

pam4 commented Jun 13, 2018

The docs describe what happens when a SplitFunc returns a nil token.

As you said yourself, it doesn't.
It just talks about the (0, nil, nil) case.
If (n, nil, nil) also have a special effect, the docs should just say so, otherwise it is incomplete. There is no basis to infer it.

As far as I can see the docs are accurate.

Accurate but incomplete.

I don't see any reason to explicitly say anything about a non-nil slice with length zero.

I agree. That would be the normal case. I was just pointing out that the (n, nil, nil) case, being different, must be documented.

This only cover the first point in my previous message. There is also the second point: the condition that terminates the scan is also not documented.

@ianlancetaylor
Copy link
Contributor

@pam4 Do you agree with the change I sent in https://golang.org/cl/118695 ?

I feel like the condition that terminates the scan is documented at (*Scanner).Scan and ErrFinalToken.

@pam4
Copy link

pam4 commented Jun 13, 2018

Do you agree with the change I sent in https://golang.org/cl/118695 ?

Definitely better.

I feel like the condition that terminates the scan is documented at (*Scanner).Scan and ErrFinalToken.

Have you read my first message? Because I don't know how else to say it...
Of course any error from the SplitFunc terminates the scan, and that is a documented condition, but it is unclear what happens if the SplitFunc returns no error.

The only relevant part of the docs is:

It returns false when the scan stops, either by reaching the end of the input or an error.

What does it mean to reach the end of the input?

For example, suppose Scan calls the SplitFunc with 100 bytes of data and atEOF set to true.
If such call returns (3, nil, nil), the scan terminates immediately.
If such call returns (3, []byte{'x'}, nil), the scan continues.
How can I infer this behavior from the docs?

And if such call returns (100, []byte{'x'}, nil), is the SplitFunc never called again, or it is called one last time with empty data?

@ianlancetaylor
Copy link
Contributor

Thanks, I added a comment about what happens if the SplitFunc returns a nil token when called with atEOF set to true.

@pam4
Copy link

pam4 commented Jun 13, 2018

Thanks.

If the token is not nil, the Scanner returns it to the user.

I think this sentence is very effective to clarify my first point.

If the token is nil, the Scanner reads more data and continues scanning;
if there is no more data--if atEOF was true--the Scanner returns.

It is correct, but how about something like:

If the token is nil, the Scanner reads more data and continues scanning,
unless atEOF was true, in which case the scan stops (Scan returns false).

dna2github pushed a commit to dna2fork/go that referenced this issue Jun 14, 2018
Fixes golang#25472

Change-Id: Idb72ed06a3dc43c49ab984a80f8885352b036465
Reviewed-on: https://go-review.googlesource.com/118695
Reviewed-by: Rob Pike <r@golang.org>
@pam4
Copy link

pam4 commented Jun 15, 2018

(*Scanner).Scan:

It returns false when the scan stops, either by reaching the end of the input or an error.

I just discovered that I had a misconception about it, which was probably caused by the above sentence.

There are exactly 3 ways for the scan to stop. None of them could be really described as "by reaching the end of the input":

  1. the SplitFunc returns error
  2. the SplitFunc returns nil token while atEOF is true
  3. returning more than maxConsecutiveEmptyReads tokens without advancing the input (panic)

I think it is easy to interpret the quoted sentence to mean that you can also stop the scan by just advancing the input to the end. Probably the OP was also confused by this, considering his examples.

EDIT: Another observation:

Scan panics if the split function returns too many empty tokens without advancing the input.

Actually, it doesn't matter if they are empty (as in zero-length) or not.
Scan panics if the split function returns too many non-nil tokens of any kind without advancing the input.

Even though such distinction wouldn't make a difference for most users, I don't see why the docs should be so approximative.

@golang golang locked and limited conversation to collaborators Jun 19, 2019
@rsc rsc unassigned robpike Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Documentation FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

7 participants