Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/csv: add the true number of lines to ErrFieldCount #6770

Closed
btracey opened this issue Nov 15, 2013 · 13 comments
Closed

encoding/csv: add the true number of lines to ErrFieldCount #6770

btracey opened this issue Nov 15, 2013 · 13 comments

Comments

@btracey
Copy link
Contributor

btracey commented Nov 15, 2013

In Reader.Read(), if the true record length is not equal to the FieldsPerRecord field,
an ErrFieldCount error is returned. It would be nice if this error also included the
expected and found number of lines. So, for example, instead of the error string being
"wrong number of fields in line", it would be instead "wrong number of
fields in line. 15 read, 10 expected". The specific problem I encountered was that
my csv file had an extra delimiter at the end of the line, so one more record was read
than I expected. If the error message had also listed that there was one more record
than expected it would have been much easier to debug.
@robpike
Copy link
Contributor

robpike commented Nov 16, 2013

Comment 1:

Perhaps. I would like to resist using error strings as API, though. They exist to tell
you what's wrong, but you're asking them to tell you what's right. I think that's a bad
precedent.

Labels changed: added priority-someday, removed priority-triage.

Status changed to Thinking.

@btracey
Copy link
Contributor Author

btracey commented Nov 18, 2013

Comment 2:

Why is a bad precedent? I can see that we don't want to encourage blind changing of
numbers to pass error checks. However, in this particular case, I don't think this error
message change would tell you what is right, it would just be more specific about what
is wrong. I understand the criticism with the word "expected", but the intent of the
issue is to help the user understand what went wrong.

@btracey
Copy link
Contributor Author

btracey commented Nov 18, 2013

Comment 3:

As an addendum, I don't see the difference between this case (returning the number of
fields set and the number of fields read) and, for example, returning the filename that
couldn't be opened in os.Open. os.Open could just return "no such file or directory",
but the filename is added so it's easier to tell if a file is unexpectedly not there, or
if the filename passed to os.Open was incorrect.

@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 4:

Labels changed: added go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 5:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 6:

Labels changed: added repo-main.

@nussjustin
Copy link
Contributor

I think the easiest fix here is to add a new field "Field" to ParseError, which would contain the number of the field where the error occured (e.g. a,b"", c would report Field == 2), which for the ErrFieldCount case would contain the number of fields.

/cc @bradfitz

@bradfitz
Copy link
Contributor

So Field would be 1-based and ParseError.Field == 0 would mean no information?

How does that solve the original problem when r.FieldsPerRecord is 10 but 15 rows were read. You would report Field == 11? or 15?

@nussjustin
Copy link
Contributor

Both 11 and 15 could make sense, although I think 11 makes more sense especially since in the ErrFieldCount case Read also returns the record along with the error, so the user can get the field count using len(record) even now.

15 would make sense as the code always reads the whole record before checking the field count. Although this could be seen as an implementation detail it is exposed as the returned slice.

I'm personally fine with both.

Field could also be 0-based. This basically only matters for the ErrFieldCount case as it's the only case where the user has an error and the records read. It would make accessing the fields more simpler (no -1 required), but I think that's all it does.

@bradfitz
Copy link
Contributor

The problem with 0-based (even if it makes more sense) is then you either have to always use it, or document on on the new ParseError.Field docs when it's defined and when it's not. Saying "when it's non-zero" is much easier than enumerating a list of reasons.

@nussjustin
Copy link
Contributor

I agree with you that the 1-based variant is easier.

I'm still unsure about the value of Field for ErrFieldCount, but am leaning towards FieldsPerRecord+1. What's your opinion on this?

@bradfitz
Copy link
Contributor

especially since in the ErrFieldCount case Read also returns the record along with the error, so the user can get the field count using len(record) even now.

If the user can already get 15 and they set 10, I think this whole bug is kinda useless. Who cares if we report 11, 12, 13, 14, or 15? The user set 10 and we returned len() of 15 and ErrFieldCount. Can't they draw their own conclusions?

@nussjustin
Copy link
Contributor

Yes. It would probably make more sense for the non-ErrFieldCount cases where the field could be more important and the user has just the column (rune index) of the error, although I can't currently think of a real use case where just the column isn't enough. I only thought about the possibility of using len(record) when writing my second comment.

So yeah, this whole bug is really kinda useless in retrospective.

@golang golang locked and limited conversation to collaborators Apr 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants