encoding/csv: TrimLeadingSpace doesn't play nice with white space delimiters #14464

ericlagergren · 2016-02-22T08:56:12Z

Setting the TrimLeadingSpace field causes the parser to eat all the white space, including field delimiters.

Example: https://play.golang.org/p/zw76XV6YIb

On one hand this seems like well-defined behavior -- the option causes the parser to do what it's supposed to: eat white space. On the other, one would assume that a field's delimiter is not a part of the field -- it's a separator that demarcates two fields.

Right now the docs say:

If TrimLeadingSpace is true, leading white space in a field is ignored.

Since I assume changing the behavior of the CSV parser isn't backwards compatible, I think it'd be beneficial to add to the documentation with something like:

If TrimLeadingSpace is true, leading white space in a field is ignored. If the delimiter is white space then TrimLeadingSpace will trim the delimiter.

(Perhaps it'll save some of us who like to program while tired some grief in the future.)

The text was updated successfully, but these errors were encountered:

ericlagergren · 2016-02-22T09:07:24Z

A cursory reading of section 2 of RFC 4180 seems to indicate that delimiters are not part of a field:

Within the header and each record, there may be one or more
fields, separated by commas... The last field in the
record must not be followed by a comma."

Also, the ABNF grammar in section 2.7 also seems to indicate that delimiters are not a part of the field.

If delimiters are not a part of the field then (IMO) the correct parsing of a CSV row would be to scan until a delimiter is found, eating any white space iff the setting is toggled and the white space is not a delimiter.

(And yes, I know using tabs inside a CSV is icky but running sed 's#\t#,#g' in > out on the huge CSV files I have to deal with is impractical and could lose data; other programs are slow/impractical as well.)

ericlagergren · 2016-02-22T19:17:55Z

A fix for this issue (other than changing the documentation) would be to add this to line 257:

--- reader.go   2016-02-22 11:14:54.468633562 -0800
+++ old_reader.go   2016-02-22 11:19:50.834691111 -0800
@@ -254,8 +254,7 @@
    r.field.Reset()

    r1, err := r.readRune()
-   for err == nil && r.TrimLeadingSpace &&
-       r1 != '\n' && unicode.IsSpace(r1) && r1 != r.Comma {
+   for err == nil && r.TrimLeadingSpace && r1 != '\n' && unicode.IsSpace(r1) {
        r1, err = r.readRune()
    }

gopherbot · 2016-02-24T03:00:33Z

CL https://golang.org/cl/19861 mentions this issue.

ericlagergren changed the title ~~encoding/csv: TrimLeadingSpace doesn't play nice with whitespace delimiters~~ encoding/csv: TrimLeadingSpace doesn't play nice with white space delimiters Feb 22, 2016

ianlancetaylor added the Documentation label Feb 22, 2016

ianlancetaylor added this to the Go1.7 milestone Feb 22, 2016

gopherbot closed this as completed in 4feb47b Feb 24, 2016

golang locked and limited conversation to collaborators Feb 28, 2017

gopherbot added the FrozenDueToAge label Feb 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encoding/csv: TrimLeadingSpace doesn't play nice with white space delimiters #14464

encoding/csv: TrimLeadingSpace doesn't play nice with white space delimiters #14464

ericlagergren commented Feb 22, 2016

ericlagergren commented Feb 22, 2016

ericlagergren commented Feb 22, 2016

gopherbot commented Feb 24, 2016

encoding/csv: TrimLeadingSpace doesn't play nice with white space delimiters #14464

encoding/csv: TrimLeadingSpace doesn't play nice with white space delimiters #14464

Comments

ericlagergren commented Feb 22, 2016

ericlagergren commented Feb 22, 2016

ericlagergren commented Feb 22, 2016

gopherbot commented Feb 24, 2016