Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bufio: Second Scan() call populates Scanner token field on tokens that exceed 64kb #9568

Closed
SamThompson opened this issue Jan 12, 2015 · 2 comments

Comments

@SamThompson
Copy link

I have confirmed this for
go version devel +fcff3ba Mon Jan 12 02:09:50 2015 +0000 linux/amd64
go version go1.4 linux/amd64

I know that there is a 64kb limit to the Scanner buffer. My issue mainly deals with successive calls to Scan() with a scanner that has encountered a token that is too long. When a Scanner encounters a token that exceeds 64kb, a call to Scan() returns false and its token field is empty. However, it seems that if Scan() is called a second time on the same Scanner, this then populates the token field of the Scanner up to 64kb and returns true. If a third, fourth, ..., Nth call to Scan() is made, the token field is empty and returns false.

Here is an example:

...

file, _ := os.Open("line.txt") // file has a single line that exceeds the 64kb limit
scanner := bufio.NewScanner(file)

var ret bool
ret = scanner.Scan() // ret is false, scanner.Text() is an empty slice of bytes, error field says the line is too long
ret = scanner.Scan() // ret is true, scanner.Text() is a slice containing the first 64kb of the line, error field says the line is too long
ret = scanner.Scan() // ret is false, scanner.Text() is an empty slice again, error field still says the line is too long
...

I would argue that successive calls to Scan() in these situations should give a consistent result, or perhaps to advertise that a second call to Scan() gets the first 64kb of the token. I would also like to argue for making the 64kb token limit in bufio.Scanner clear in the documentation as it may save some headaches.

I know this issue is low priority, so I would be happy to take it on.

@robpike
Copy link
Contributor

robpike commented Jan 15, 2015

The documentation says that Scan "returns false when the scan stops". Expecting to scan meaningfully after the scan has stopped seems wishful thinking at best. As the doctor says, don't do that.

I would rather not document the buffer size since it may change and I don't want people writing code that depends on the specific value. This is a convenience API, after all.

@robpike robpike closed this as completed Jan 15, 2015
@SamThompson
Copy link
Author

I see. Makes sense, I just thought it was an odd behavior.

@golang golang locked and limited conversation to collaborators Jun 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants