encoding/csv: incorrect line:column for error #22352

dsnet · 2017-10-20T07:43:17Z

Consider the following snippet:

r := csv.NewReader(strings.NewReader(`
"one","two","three","
x
x
x
x
123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789123456789"x`))
_, err := r.Read()
fmt.Println(err)

On tip, this prints line 2, column 189: extraneous " in field. Notice that the column number is absolutely bogus? There is no column 189 on line 2. This is a result of #19019 where reporting the start of the problematic line is more useful than far away when we fail to find the matching quote.

The solution is to either simply report 0 as the column to signify the start of the line, or to report the column of where the bad string actually started.

\cc @nussjustin

The text was updated successfully, but these errors were encountered:

dsnet · 2017-10-20T07:54:12Z

I'm planning on just deprecating the ParseError.Column field. It's meaning is sometimes ambiguous and it is a hit to performance to even compute it. At least one other error ErrFieldCount has no use for the column.

gopherbot · 2017-10-20T08:23:17Z

Change https://golang.org/cl/72050 mentions this issue: encoding/csv: deprecate ParseError.Column

ianlancetaylor · 2017-10-20T18:12:47Z

I'm kicking this over to NeedsDecision. Anybody want to weigh in on whether deprecating Column is the right move? Looks like the field dates back to the initial implementation of the package by @pborman in https://golang.org/cl/4629085.

pborman · 2017-10-20T18:20:24Z

I think it would be unfortunate to deprecate this field. Being told a " is wrong in a line with 23 "'s might be difficult to find.

nussjustin · 2017-10-20T18:27:34Z

As commented on the CL:

Maybe it would make sense to replace (deprecate in favor of) ParseError.Column with a new field that contains the current record number (basically the current len(r.fieldIndexes)+1)? Unlike the column, the field number should always be meaningful, even for ErrFieldCount (where it would be the actual len of the record).

pborman · 2017-10-20T18:42:07Z

Knowing the field number is nice, but it still requires quite a bit of parsing for the human reading the error. For CSV files with few fields and simple values this is not an issue, but for CSV files with a large number of fields or longish fields this probably doesn't help a lot.

dsnet · 2017-10-20T18:51:35Z

When debugging there are several pieces of information that is helpful:

The line that the record started on
The line:column that the problem field started on
The line:column where the problem was actually detected

Note that all 3 pieces of information are distinct. In go1.9 and before, it used approach 3. However, with badly quoted strings, this is untenable since what the parser considers the problem is sometimes very far from what a human would consider the problem (which is the start of the record or field). As a result of #19019, the error was changed to be 1, in which case the column number is already senseless.

A better error message would combine information in 1 and 3.

Imagine the following:

record on line 2: wrong number of fields
parse error on line 5, column 32: extraneous " in field if starting line and error line are equal
record on line 2; parse error at line 5, column 32: extraneous " in field

Thoughts on this? While information in 2 is useful, it is also not performance friendly to keep that information around.

nussjustin · 2017-10-20T18:52:11Z

I agree that ParseError.Column is nice and somewhat I would expect from an error when reading a CSV file, but I also agree with @dsnet that the error he got makes no sense and it's not the only case where Column doesn't make sense (ErrFieldCount comes to my mind, at least it't not really what most people would expect from my experience)

Even if we keep ParseError.Column I'd like to see a field with the current field index for the simpler cases where you don't have that many columns and/or long fields, ParseError.Column isn't that helpful or you are just trying to find the error in a tool like Excel (which many people do) where you can more easily jump to a specific column (although this wouldn't work for all errors).

nussjustin · 2017-10-20T18:59:33Z

Didn't see @dsnet's comment pop up before submitting mine, so here I go again:

I like that idea. CL 52830 isn't even released yet so maybe we could just change it back to report the error line and add two new fields StartLine and StartColumn. This way existing uses of ParseError Line and Column wouldn't change at all from Go 1.9 to Go 1.10 (unless I'm missing something) and we would still get the information and nice error message.

gopherbot · 2017-10-20T19:32:49Z

Change https://golang.org/cl/72150 mentions this issue: encoding/csv: simplify and optimize Reader

The Reader implementation is slow because it operates on a rune-by-rune basis via bufio.Reader.ReadRune. We speed this up by operating on entire lines that we read from bufio.Reader.ReadSlice. In order to ensure that we read the full line, we augment ReadSlice in our Reader.readLine method to automatically expand the slice if bufio.ErrBufferFull is every hit. This change happens to fix #19410 because it no longer relies on rune-by-rune parsing and only searches for the relevant delimiter rune. In order to keep column accounting simple and consistent, this change reverts parts of CL 52830. This CL is an alternative to CL 36270 and builds on some of the ideas from that change by Diogo Pinela. name old time/op new time/op delta Read-8 3.12µs ± 1% 2.54µs ± 2% -18.76% (p=0.000 n=10+9) ReadWithFieldsPerRecord-8 3.12µs ± 1% 2.53µs ± 1% -18.91% (p=0.000 n=9+9) ReadWithoutFieldsPerRecord-8 3.13µs ± 0% 2.57µs ± 3% -18.07% (p=0.000 n=10+10) ReadLargeFields-8 52.3µs ± 1% 5.3µs ± 2% -89.93% (p=0.000 n=10+9) ReadReuseRecord-8 2.05µs ± 1% 1.40µs ± 1% -31.48% (p=0.000 n=10+9) ReadReuseRecordWithFieldsPerRecord-8 2.05µs ± 1% 1.41µs ± 0% -31.03% (p=0.000 n=10+9) ReadReuseRecordWithoutFieldsPerRecord-8 2.06µs ± 1% 1.40µs ± 1% -31.70% (p=0.000 n=9+10) ReadReuseRecordLargeFields-8 50.9µs ± 0% 4.1µs ± 3% -92.01% (p=0.000 n=10+10) name old alloc/op new alloc/op Read-8 664B ± 0% 664B ± 0% ReadWithFieldsPerRecord-8 664B ± 0% 664B ± 0% ReadWithoutFieldsPerRecord-8 664B ± 0% 664B ± 0% ReadLargeFields-8 3.94kB ± 0% 3.94kB ± 0% ReadReuseRecord-8 24.0B ± 0% 24.0B ± 0% ReadReuseRecordWithFieldsPerRecord-8 24.0B ± 0% 24.0B ± 0% ReadReuseRecordWithoutFieldsPerRecord-8 24.0B ± 0% 24.0B ± 0% ReadReuseRecordLargeFields-8 2.98kB ± 0% 2.98kB ± 0% name old allocs/op new allocs/op Read-8 18.0 ± 0% 18.0 ± 0% ReadWithFieldsPerRecord-8 18.0 ± 0% 18.0 ± 0% ReadWithoutFieldsPerRecord-8 18.0 ± 0% 18.0 ± 0% ReadLargeFields-8 24.0 ± 0% 24.0 ± 0% ReadReuseRecord-8 8.00 ± 0% 8.00 ± 0% ReadReuseRecordWithFieldsPerRecord-8 8.00 ± 0% 8.00 ± 0% ReadReuseRecordWithoutFieldsPerRecord-8 8.00 ± 0% 8.00 ± 0% ReadReuseRecordLargeFields-8 12.0 ± 0% 12.0 ± 0% Updates #22352 Updates #19019 Fixes #16791 Fixes #19410 Change-Id: I31c27cfcc56880e6abac262f36c947179b550bbf Reviewed-on: https://go-review.googlesource.com/72150 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>

gopherbot · 2017-10-21T00:08:47Z

Change https://golang.org/cl/72310 mentions this issue: encoding/csv: add ParseError.RecordLine

dsnet self-assigned this Oct 20, 2017

dsnet added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 20, 2017

dsnet added this to the Go1.10 milestone Oct 20, 2017

ianlancetaylor added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsFix The path to resolution is known, but the work has not been done. labels Oct 20, 2017

gopherbot closed this as completed in 29ea82d Oct 21, 2017

dsnet removed the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Oct 21, 2017

golang locked and limited conversation to collaborators Oct 21, 2018

gopherbot added the FrozenDueToAge label Oct 21, 2018

rsc unassigned dsnet Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encoding/csv: incorrect line:column for error #22352

encoding/csv: incorrect line:column for error #22352

dsnet commented Oct 20, 2017

dsnet commented Oct 20, 2017

gopherbot commented Oct 20, 2017

ianlancetaylor commented Oct 20, 2017

pborman commented Oct 20, 2017

nussjustin commented Oct 20, 2017 •

edited

pborman commented Oct 20, 2017

dsnet commented Oct 20, 2017 •

edited

nussjustin commented Oct 20, 2017

nussjustin commented Oct 20, 2017

gopherbot commented Oct 20, 2017

gopherbot commented Oct 21, 2017

encoding/csv: incorrect line:column for error #22352

encoding/csv: incorrect line:column for error #22352

Comments

dsnet commented Oct 20, 2017

dsnet commented Oct 20, 2017

gopherbot commented Oct 20, 2017

ianlancetaylor commented Oct 20, 2017

pborman commented Oct 20, 2017

nussjustin commented Oct 20, 2017 • edited

pborman commented Oct 20, 2017

dsnet commented Oct 20, 2017 • edited

nussjustin commented Oct 20, 2017

nussjustin commented Oct 20, 2017

gopherbot commented Oct 20, 2017

gopherbot commented Oct 21, 2017

nussjustin commented Oct 20, 2017 •

edited

dsnet commented Oct 20, 2017 •

edited