You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On one hand this seems like well-defined behavior -- the option causes the parser to do what it's supposed to: eat white space. On the other, one would assume that a field's delimiter is not a part of the field -- it's a separator that demarcates two fields.
Right now the docs say:
If TrimLeadingSpace is true, leading white space in a field is ignored.
Since I assume changing the behavior of the CSV parser isn't backwards compatible, I think it'd be beneficial to add to the documentation with something like:
If TrimLeadingSpace is true, leading white space in a field is ignored. If the delimiter is white space then TrimLeadingSpace will trim the delimiter.
(Perhaps it'll save some of us who like to program while tired some grief in the future.)
The text was updated successfully, but these errors were encountered:
ericlagergren
changed the title
encoding/csv: TrimLeadingSpace doesn't play nice with whitespace delimiters
encoding/csv: TrimLeadingSpace doesn't play nice with white space delimiters
Feb 22, 2016
A cursory reading of section 2 of RFC 4180 seems to indicate that delimiters are not part of a field:
Within the header and each record, there may be one or more
fields, separated by commas... The last field in the
record must not be followed by a comma."
Also, the ABNF grammar in section 2.7 also seems to indicate that delimiters are not a part of the field.
If delimiters are not a part of the field then (IMO) the correct parsing of a CSV row would be to scan until a delimiter is found, eating any white space iff the setting is toggled and the white space is not a delimiter.
(And yes, I know using tabs inside a CSV is icky but running sed 's#\t#,#g' in > out on the huge CSV files I have to deal with is impractical and could lose data; other programs are slow/impractical as well.)
Setting the
TrimLeadingSpace
field causes the parser to eat all the white space, including field delimiters.Example: https://play.golang.org/p/zw76XV6YIb
On one hand this seems like well-defined behavior -- the option causes the parser to do what it's supposed to: eat white space. On the other, one would assume that a field's delimiter is not a part of the field -- it's a separator that demarcates two fields.
Right now the docs say:
If TrimLeadingSpace is true, leading white space in a field is ignored.
Since I assume changing the behavior of the CSV parser isn't backwards compatible, I think it'd be beneficial to add to the documentation with something like:
If TrimLeadingSpace is true, leading white space in a field is ignored. If the delimiter is white space then TrimLeadingSpace will trim the delimiter.
(Perhaps it'll save some of us who like to program while tired some grief in the future.)
The text was updated successfully, but these errors were encountered: