Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bufio: Scanner and empty final token #8672

Closed
gwenn opened this issue Sep 6, 2014 · 5 comments
Closed

bufio: Scanner and empty final token #8672

gwenn opened this issue Sep 6, 2014 · 5 comments
Milestone

Comments

@gwenn
Copy link

gwenn commented Sep 6, 2014

What does 'go version' print?
go version devel +87b93aeb1822 Fri Sep 05 15:01:09 2014 -0700 darwin/amd64

What steps reproduce the problem?
Here is a minimalist tab-delimited file reader with an empty final token
http://play.golang.org/p/jl1x-SRQ7S

What happened?

The ScanField is never called at EOF with an empty data slice so the last token cannot
be parsed:
'a' 'b' 'c'
'd' 'e' 

What should have happened instead?

The ScanField should have been called  at EOF with an empty data slice:
'a' 'b' 'c'
'd' 'e' ''

By looking at the code of bufio.Scanner.Scan :
http://golang.org/src/pkg/bufio/scan.go?s=4436:4465#L114
            if s.end > s.start {
                advance, token, err := s.split(s.buf[s.start:s.end], s.err != nil)
the split function cannot be called with an empty data slice.

But all Split implementations are doing the following test:
http://golang.org/src/pkg/bufio/scan.go?s=6903:6981#L214
http://golang.org/src/pkg/bufio/scan.go?s=6903:6981#L229
http://golang.org/src/pkg/bufio/scan.go?s=6903:6981#L275
http://golang.org/src/pkg/bufio/scan.go?s=6903:6981#L329
        if atEOF && len(data) == 0 {
            return 0, nil, nil
        }

It seems that the test 'len(data) == 0' is always false.

Due to the fact that the Split function is never called with an empty data slice,
the last empty token cannot be parsed correctly.

A possible (but ugly) workaround is to keep the tab unconsumed.

Another workaround is to use the encoding.csv package but it doesn't support
tab-delimited file with double-quote:
See https://golang.org/issue/3150

I've also tried to change the line:
http://golang.org/src/pkg/bufio/scan.go?s=4436:4465#L114
            if s.end > s.start {
to
            if s.end >= s.start {
But it breaks all Split implementations.

Thanks and regards.
@robpike
Copy link
Contributor

robpike commented Sep 6, 2014

Comment 1:

Labels changed: added release-go1.4, repo-main.

Owner changed to @robpike.

Status changed to Accepted.

@gwenn
Copy link
Author

gwenn commented Sep 8, 2014

Comment 2:

I tried this fix with no regression:
diff -r 87b93aeb1822 src/pkg/bufio/scan.go
--- a/src/pkg/bufio/scan.go Fri Sep 05 15:01:09 2014 -0700
+++ b/src/pkg/bufio/scan.go Mon Sep 08 19:15:24 2014 +0200
@@ -112,7 +112,7 @@
    // Loop until we have a token.
    for {
        // See if we can get a token with what we already have.
-       if s.end > s.start {
+       if s.end > s.start || s.err != nil {
            advance, token, err := s.split(s.buf[s.start:s.end], s.err != nil)
            if err != nil {
                s.setErr(err)
The SplitFunc's contract "The function is never called with an empty data slice unless
atEOF is true" is respected.
But the probability of regression with external implementations seems high.
A complete patch (with a unit test) is attached (scan.patch).
Regards.

Attachments:

  1. scan.patch (2137 bytes)

@robpike
Copy link
Contributor

robpike commented Sep 25, 2014

Comment 3:

The test was incorrect, but easy to fix. I reworked it a little and sent a CL.

@gopherbot
Copy link

Comment 4:

CL https://golang.org/cl/145390043 mentions this issue.

@robpike
Copy link
Contributor

robpike commented Sep 25, 2014

Comment 5:

This issue was closed by revision 74c0de8.

Status changed to Fixed.

@rsc rsc added this to the Go1.4 milestone Apr 14, 2015
@rsc rsc removed the release-go1.4 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jun 25, 2018
Fixes golang#8672.

LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/145390043
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jun 26, 2018
Fixes golang#8672.

LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/145390043
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jul 9, 2018
Fixes golang#8672.

LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/145390043
wheatman pushed a commit to wheatman/go-akaros that referenced this issue Jul 30, 2018
Fixes golang#8672.

LGTM=rsc
R=golang-codereviews, rsc
CC=golang-codereviews
https://golang.org/cl/145390043
@rsc rsc unassigned robpike Jun 23, 2022
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants