Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bufio: Scanner considers a buffer of length = len(token) as not enough #43183

Closed
k0nst4nt1n opened this issue Dec 14, 2020 · 11 comments
Closed
Labels
Documentation help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@k0nst4nt1n
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.15 darwin/amd64

Does this issue reproduce with the latest release?

Haven't tried

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/k0nst4nt1n/Library/Caches/go-build"
GOENV="/Users/k0nst4nt1n/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/k0nst4nt1n/_gopath/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/k0nst4nt1n/_gopath"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.15/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.15/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/3q/jr3whwkn13ld3j1s1xms0v4w0000gp/T/go-build070878361=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Golang bufio.Scanner considers a buffer of length = len(token) as not enough, please see example below:
(full code is here https://play.golang.org/p/3e95HG3j4H-)

func main() {
	d1 := "12" // data size == 2
	r1 := strings.NewReader(d1)
	
	s1 := bufio.NewScanner(r1)
	s1.Split(bufio.ScanLines)
	s1.Buffer(make([]byte, len(d1)), len(d1)) // buf size == 2
	
	_ = s1.Scan()
	fmt.Printf("Token=%s, Err=%v\n", s1.Text(), s1.Err())
}

What did you expect to see?

No error (output: Token=12, Err=<nil>)

What did you see instead?

Error (output: Token=, Err=bufio.Scanner: token too long)

@ianlancetaylor ianlancetaylor changed the title Golang bufio.Scanner considers a buffer of length = len(token) as not enough bufio: Scanner considers a buffer of length = len(token) as not enough Dec 14, 2020
@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 14, 2020
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Dec 14, 2020
@ianlancetaylor
Copy link
Contributor

I don't see an easy fix here. The code would have to do a look ahead when it reaches the end of the buffer, which would be complicated, and would help approximately nobody.

This should probably just be mentioned in the documentation.

@PawelPru
Copy link

PawelPru commented Dec 14, 2020

You are using
s1.Split(bufio.ScanLines)
//scanlines add additioonl byte for endline char "\n"

change to
s1.Split(bufio.ScanBytes)

or add
s1.Buffer(make([]byte, len(d1)), len(d1)+1) // buf size == 2

@k0nst4nt1n
Copy link
Author

bufio.ScanLine

@k0nst4nt1n
Copy link
Author

I know how to workaround with +1 and bufio.ScanBytes is not what I want. I want to read the whole line until the EOL or EOF.

@PawelPru
Copy link

PawelPru commented Dec 14, 2020

replace string to string with EOL to simulate reading file
d1 := "12\n"

@k0nst4nt1n
Copy link
Author

replace string with EOL to simulate reading file
d1 := "12\n"

The issue is not about how to workaround or simulate something. I can make it work. The issue emphasizes how EOL and EOF are treated differently in terms of how big your buffer should be for the scanner. This behavior is not immediately obvious (and of course it doesn't matter what we read, a file can also not have EOL).

@PawelPru
Copy link

PawelPru commented Dec 14, 2020

scanline without EOF and EOL is not working , take a look here
https://golang.org/src/bufio/scan.go?s=11967:12045#L340

I think you would like to add additional "if" to scanline function to read string without EOF and EOL

@k0nst4nt1n
Copy link
Author

scanline without EOF and EOL is not working , take a look here
https://golang.org/src/bufio/scan.go?s=11967:12045#L340

I think you would like to add additional "if" to scanline function to read string without EOF and EOL

I'm not sure why you say "without EOF and EOL" there is EOF in my case. In this case EOF is just a contract supported by the Reader interface. No more data to read it will return EOF.

@PawelPru
Copy link

I investigated this issue and you are right something is bad.

@Windsooon
Copy link

The code at scan.go indicate that if if len(s.buf) >= s.maxTokenSize, it will raise ErrTooLong error. A quick fix would be to update the document like this:

Buffer sets the initial buffer to use when scanning and the maximum size of buffer that may be allocated during scanning. The maximum token size can't be larger or equal to the larger of max and cap(buf). If max <= cap(buf), Scan will use this buffer only and do no allocation.

qiulaidongfeng added a commit to qiulaidongfeng/go that referenced this issue Aug 12, 2023
Fixes golang#43183

Change-Id: I50d99ef8ed513bba47166a25ea5c7c80cd8bd799
@gopherbot
Copy link

Change https://go.dev/cl/518860 mentions this issue: bufio: clarify the maximum token size

qiulaidongfeng added a commit to qiulaidongfeng/go that referenced this issue Aug 13, 2023
Fixes golang#43183

Change-Id: I50d99ef8ed513bba47166a25ea5c7c80cd8bd799
qiulaidongfeng added a commit to qiulaidongfeng/go that referenced this issue Aug 13, 2023
Fixes golang#43183

Change-Id: I50d99ef8ed513bba47166a25ea5c7c80cd8bd799
@dmitshur dmitshur modified the milestones: Backlog, Go1.22 Aug 16, 2023
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants