encoding/csv: add a way to reuse a []string for Reader.Read #19721

nussjustin · 2017-03-26T14:23:22Z

(*csv.Reader).Read allocates a new []string on each call. In many cases the slice could be reused instead of allocating a new one for each record.

I propose to add a new method ReadInto to csv.Reader that takes a []string and tries to read the next CSV line into the given slice and only allocates a new slice if necessary.

The method would look like this:

// ReadInto reads one record (a slice of fields) from r into s.
// If the record has more than len(s) fields, a new slice will be allocated
// instead and s will not be modified.
// If the record has an unexpected number of fields,
// Read returns the record along with the error ErrFieldCount.
// Except for that case, Read always returns either a non-nil
// record or a non-nil error, but not both.
// If there is no data left to be read, Read returns nil, io.EOF.
func (r *Reader) ReadInto(s []string) (record []string, err error) {
	// ...
}

Also Read would be updated to just call ReadInto(nil) internally.

I have a patch ready that yields the following numbers on my notebook (where the "old" column is using Read and the "new" column ReadInto):

name                          old time/op    new time/op    delta
Read-8                          2.75µs ± 2%    1.88µs ± 1%  -31.52%  (p=0.000 n=14+15)
ReadWithFieldsPerRecord-8       2.75µs ± 0%    1.89µs ± 1%  -31.43%  (p=0.000 n=13+13)
ReadWithoutFieldsPerRecord-8    2.77µs ± 1%    1.88µs ± 1%  -32.06%  (p=0.000 n=15+15)
ReadLargeFields-8               55.4µs ± 1%    54.2µs ± 0%   -2.07%  (p=0.000 n=15+14)

name                          old alloc/op   new alloc/op   delta
Read-8                            664B ± 0%       24B ± 0%  -96.39%  (p=0.000 n=15+15)
ReadWithFieldsPerRecord-8         664B ± 0%       24B ± 0%  -96.39%  (p=0.000 n=15+15)
ReadWithoutFieldsPerRecord-8      664B ± 0%       24B ± 0%  -96.39%  (p=0.000 n=15+15)
ReadLargeFields-8               3.94kB ± 0%    2.98kB ± 0%  -24.39%  (p=0.000 n=15+15)

name                          old allocs/op  new allocs/op  delta
Read-8                            18.0 ± 0%       8.0 ± 0%  -55.56%  (p=0.000 n=15+15)
ReadWithFieldsPerRecord-8         18.0 ± 0%       8.0 ± 0%  -55.56%  (p=0.000 n=15+15)
ReadWithoutFieldsPerRecord-8      18.0 ± 0%       8.0 ± 0%  -55.56%  (p=0.000 n=15+15)
ReadLargeFields-8                 24.0 ± 0%      12.0 ± 0%  -50.00%  (p=0.000 n=15+15)

Using ReadInto with the code in #16791 (see my comment) I get:

name   old time/op    new time/op    delta
CSV-8     1.12s ± 1%     1.03s ± 2%   -8.44%  (p=0.000 n=15+14)

name   old alloc/op   new alloc/op   delta
CSV-8     144MB ± 0%      48MB ± 0%  -66.66%  (p=0.000 n=13+14)

name   old allocs/op  new allocs/op  delta
CSV-8     2.00M ± 0%     1.00M ± 0%  -50.00%  (p=0.000 n=13+14)

Benchmarks run with: go version devel +ecc6a81617 Sat Mar 25 00:35:35 2017 +0000 linux/amd64

The text was updated successfully, but these errors were encountered:

nussjustin · 2017-03-26T14:25:30Z

/cc @bradfitz @ALTree

bradfitz · 2017-03-26T15:45:00Z

Or copy database/sql? https://golang.org/pkg/database/sql/#Row.Scan

bradfitz · 2017-03-26T15:45:12Z

/cc @kardianos

kardianos · 2017-03-26T16:30:07Z

The difference between encoding/csv and database/sql is that sql will return always return a rectangular result while csv may have jagged results. Another difference is that sql will do type conversions while csv only handles strings.

I think an API like this makes sense. In this API proposal, can s be written to even if a new slice is allocated (is this written to field by field or buffered internally?

The other option I could see is adding a csv Reader option that reuses the same buffer for each call to Read (ReadAll would be unaffected).

nussjustin · 2017-03-26T16:42:14Z

The current implementation of encoding/csv knows exactly how many fields a record has when allocating the slice, so there would be no need to modify s when it is to small.

Internally this would be implemented like this:

if len(s) >= fieldsCount {
    record = s[:fieldsCount]
} else {
   record = make([]ſtring, fieldsCount)
}

The comment for ReadInto currently states:

// If the record has more than len(s) fields, a new slice will be allocated
// instead and s will not be modified.

Although probably nice, this is a somewhat big restriction for future changes.

I also thought about adding a new field to the Reader, but I decided against it based on personal preference, even if the current proposal adds another new method. Considering ReadAll, adding a new field would need to make sure that ReadAll doesn't use it by making a copy after each call to Read or even preventing the reuse (this would avoid an extra copy).

bradfitz · 2017-03-26T16:47:54Z

Or we could add a bool field to csv.Reader, like Reader.ReuseRecord' that causes Reader.Read` to always use the same slice.

Edit: I can't read. @nussjustin addressed this above. But I still think if we're going to add API for a performance case, it should be less high profile than a new method.

kardianos · 2017-03-26T16:48:54Z

I also thought about adding a new field to the Reader, but I decided against it based on preference, even if the current proposal adds another new method. Considering ReadAll, adding a new field would need to make sure that ReadAll doesn't use it by making a copy after each call to Read.

Modifying ReadAll to copy the slice returned from Read if Reader.ResueRecord was set would be simple enough. However I'm not arguing for any particular API.

nussjustin · 2017-03-26T17:58:47Z

I agree with @bradfitz that a method is likely a bit too much for this simple optimization. I'm currently thinking about something along the lines of:

	// If ReuseRecord is true, calls to Read will reuse a single record slice.
	// When a record doesn't fit into the currently held slice, a new slice will
	// be allocated and used for succcessive calls to Read instead.
	// If set to false, each call to Read returns a fresh slice.
	// ReadAll is not affected by this setting.
	ReuseRecord bool

I'm not completely sure about the second sentence, but I think it can help avoid some potential confusion. This definition also makes clear that ReadAll is not affected.

I also tested a version with ReuseRecord locally and found no difference between ReadInto and ReuseRecord (as expected).

kardianos · 2017-03-26T19:54:52Z

@nussjustin @bradfitz SGTM

gopherbot · 2017-04-25T19:25:25Z

CL https://golang.org/cl/41730 mentions this issue.

nussjustin · 2017-04-25T19:29:12Z

Sorry for the delay. I finally got some time (and internet!) to mail my change.

nussjustin changed the title ~~encoding/csv: add method to read into an existing []string~~ encoding/csv: add a way to reuse a []string for Reader.Read Mar 26, 2017

bradfitz added this to the Go1.9 milestone Apr 26, 2017

gopherbot closed this as completed in 2181653 Apr 26, 2017

nussjustin mentioned this issue Apr 26, 2017

encoding/csv: Reading is slow #16791

Closed

golang locked and limited conversation to collaborators Apr 26, 2018

gopherbot added the FrozenDueToAge label Apr 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encoding/csv: add a way to reuse a []string for Reader.Read #19721

encoding/csv: add a way to reuse a []string for Reader.Read #19721

nussjustin commented Mar 26, 2017

nussjustin commented Mar 26, 2017

bradfitz commented Mar 26, 2017

bradfitz commented Mar 26, 2017

kardianos commented Mar 26, 2017

nussjustin commented Mar 26, 2017 •

edited

bradfitz commented Mar 26, 2017 •

edited

kardianos commented Mar 26, 2017

nussjustin commented Mar 26, 2017

kardianos commented Mar 26, 2017

gopherbot commented Apr 25, 2017

nussjustin commented Apr 25, 2017

encoding/csv: add a way to reuse a []string for Reader.Read #19721

encoding/csv: add a way to reuse a []string for Reader.Read #19721

Comments

nussjustin commented Mar 26, 2017

nussjustin commented Mar 26, 2017

bradfitz commented Mar 26, 2017

bradfitz commented Mar 26, 2017

kardianos commented Mar 26, 2017

nussjustin commented Mar 26, 2017 • edited

bradfitz commented Mar 26, 2017 • edited

kardianos commented Mar 26, 2017

nussjustin commented Mar 26, 2017

kardianos commented Mar 26, 2017

gopherbot commented Apr 25, 2017

nussjustin commented Apr 25, 2017

nussjustin commented Mar 26, 2017 •

edited

bradfitz commented Mar 26, 2017 •

edited