New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/csv: add a way to reuse a []string for Reader.Read #19721
Comments
Or copy database/sql? https://golang.org/pkg/database/sql/#Row.Scan |
/cc @kardianos |
The difference between I think an API like this makes sense. In this API proposal, can The other option I could see is adding a csv Reader option that reuses the same buffer for each call to Read (ReadAll would be unaffected). |
The current implementation of encoding/csv knows exactly how many fields a record has when allocating the slice, so there would be no need to modify Internally this would be implemented like this: if len(s) >= fieldsCount {
record = s[:fieldsCount]
} else {
record = make([]ſtring, fieldsCount)
} The comment for // If the record has more than len(s) fields, a new slice will be allocated
// instead and s will not be modified. Although probably nice, this is a somewhat big restriction for future changes. I also thought about adding a new field to the Reader, but I decided against it based on personal preference, even if the current proposal adds another new method. Considering ReadAll, adding a new field would need to make sure that ReadAll doesn't use it by making a copy after each call to Read or even preventing the reuse (this would avoid an extra copy). |
Or we could add a bool field to Edit: I can't read. @nussjustin addressed this above. But I still think if we're going to add API for a performance case, it should be less high profile than a new method. |
Modifying ReadAll to copy the slice returned from Read if |
I agree with @bradfitz that a method is likely a bit too much for this simple optimization. I'm currently thinking about something along the lines of: // If ReuseRecord is true, calls to Read will reuse a single record slice.
// When a record doesn't fit into the currently held slice, a new slice will
// be allocated and used for succcessive calls to Read instead.
// If set to false, each call to Read returns a fresh slice.
// ReadAll is not affected by this setting.
ReuseRecord bool I'm not completely sure about the second sentence, but I think it can help avoid some potential confusion. This definition also makes clear that ReadAll is not affected. I also tested a version with ReuseRecord locally and found no difference between ReadInto and ReuseRecord (as expected). |
@nussjustin @bradfitz SGTM |
CL https://golang.org/cl/41730 mentions this issue. |
Sorry for the delay. I finally got some time (and internet!) to mail my change. |
(*csv.Reader).Read allocates a new []string on each call. In many cases the slice could be reused instead of allocating a new one for each record.
I propose to add a new method
ReadInto
to csv.Reader that takes a []string and tries to read the next CSV line into the given slice and only allocates a new slice if necessary.The method would look like this:
Also Read would be updated to just call
ReadInto(nil)
internally.I have a patch ready that yields the following numbers on my notebook (where the "old" column is using Read and the "new" column ReadInto):
Using ReadInto with the code in #16791 (see my comment) I get:
Benchmarks run with: go version devel +ecc6a81617 Sat Mar 25 00:35:35 2017 +0000 linux/amd64
The text was updated successfully, but these errors were encountered: