Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bufio: can improve bufio.Scanner and bufio.Reader for bytes.Reader and strings.Reader #11655

Closed
mei-rune opened this issue Jul 10, 2015 · 12 comments

Comments

@mei-rune
Copy link

bufio.Scanner and bufio.Reader will copy all bytes from io.Reader to internal buffer, copy is unnecessary while io.Reader is bytes.Reader or strings.Reader.

now code is

        // NewScanner returns a new Scanner to read from r.
        // The split function defaults to ScanLines.
        func NewScanner(r io.Reader) *Scanner {
            return &Scanner{
                r:            r,
                split:        ScanLines,
                maxTokenSize: MaxScanTokenSize,
                buf:          make([]byte, 4096), // Plausible starting size; needn't be large.
            }
        }

improve code is

        type eofReader struct {}
        func (r *eofReader) Read(p []byte) (n int64, e error) {
                return 0, io.EOF
        }
        var  eof_reader = &eofReader {}
        // NewScanner returns a new Scanner to read from r.
        // The split function defaults to ScanLines.
        func NewScanner(r io.Reader) *Scanner {
            if br, ok := r.(bytes.Reader); ok {
                     return &Scanner{
                        r:            eof_reader,
                        split:        ScanLines,
                        maxTokenSize: MaxScanTokenSize,
                        buf:          br.Bytes(),    // bytes.Reader should add Bytes() method for this.
                    }
            }
            return &Scanner{
                r:            r,
                split:        ScanLines,
                maxTokenSize: MaxScanTokenSize,
                buf:          make([]byte, 4096), // Plausible starting size; needn't be large.
            }
        }
@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Jul 10, 2015
@ianlancetaylor
Copy link
Contributor

I have no special objection to this idea, but is using a scanner on a bytes.Reader or strings.Reader a common case?

In any case, this is for after the 1.5 release.

@bradfitz
Copy link
Contributor

Please don't paste code changes in here. Using Gerrit both enforces our CLA and provides us good tools for seeing diffs and leaving comments on code. I can look at the code once it's there.

In the meantime, could you describe the problem? Is bufio.Scanner a performance problem on bytes/strings.Reader? How much faster can it get? Do you have benchmarking numbers?

@mei-rune
Copy link
Author

code is http://play.golang.org/p/WkOt-yMpjj

PASS
BenchmarkScan1 10000000 208 ns/op
BenchmarkScan2 300000 5586 ns/op
ok scan 4.075s

@mei-rune
Copy link
Author

@ianlancetaylor

I read bytes from a net.PacketConn, and parse data.

    n, addr, err := packetconn.ReadFrom(buf)
    if err == nil {
        scanner := bufio.NewScanner(bytes.NewReader(buf[:n]))
                ...
    }

@stemar94
Copy link

stemar94 commented Mar 2, 2017

I think this can be closed.
Use something like:

dummy := bytes.NewReader([0]byte{}[:])
scanner := bufio.NewScanner(dummy)
scanner.Buffer(buf[:n], 0)

@mei-rune
Copy link
Author

mei-rune commented Mar 2, 2017

@stemar94 No, zero copy and zero malloc in my code. one copy and one malloc in your code.

@stemar94
Copy link

stemar94 commented Mar 2, 2017

@runner-mei Where should this copy be?

@mei-rune
Copy link
Author

mei-rune commented Mar 3, 2017

your code is error.

func (s *Scanner) Buffer(buf []byte, max int) {
	if s.scanCalled {
		panic("Buffer called after Scan")
	}
	s.buf = buf[0:cap(buf)]
	s.maxTokenSize = max
}

@stemar94
Copy link

stemar94 commented Mar 3, 2017

Again I don't see an error here. Could you please be a bit more elaborate with such statements.

@ianlancetaylor
Copy link
Contributor

It does seem to me that you can do this entirely using the Buffer method. Something like (untested):

type inplaceReader int
func (p *inplaceReader) Read([]byte) (int, error) {
    if *p == 0 {
        return 0, io.EOF
    }
    ret := *p
    *p = 0
    return ret, io.EOF
}
...
n, addr, err := packetconn.ReadFrom(buf)
if err == nil {
    r := inplaceReader(len(buf))
    scanner := bufio.NewScanner(&r)
    scanner.Buffer(buf, len(buf))
    ...
}

@mei-rune
Copy link
Author

mei-rune commented Mar 22, 2017

@ianlancetaylor It seems to be working, but no documents, and looks like a hack

@ianlancetaylor
Copy link
Contributor

bufio.Scanner is basically a convenience tool for parsing a stream of data. When you have all the data in memory anyhow, there are usually simpler techniques. For example, if you want to use ScanLines on a []byte, you can instead use bytes.Split(s, []byte("\n")) and iterate over the resulting slice.

So I now think that there is insufficient reason to complicate the implementation of bufio.Scanner to support something that can be done in other ways. I'm going to close this.

@golang golang locked and limited conversation to collaborators Mar 22, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants