Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/csv: TrimLeadingSpace doesn't play nice with white space delimiters #14464

Closed
ericlagergren opened this issue Feb 22, 2016 · 3 comments

Comments

@ericlagergren
Copy link
Contributor

Setting the TrimLeadingSpace field causes the parser to eat all the white space, including field delimiters.

Example: https://play.golang.org/p/zw76XV6YIb

On one hand this seems like well-defined behavior -- the option causes the parser to do what it's supposed to: eat white space. On the other, one would assume that a field's delimiter is not a part of the field -- it's a separator that demarcates two fields.

Right now the docs say:

If TrimLeadingSpace is true, leading white space in a field is ignored.

Since I assume changing the behavior of the CSV parser isn't backwards compatible, I think it'd be beneficial to add to the documentation with something like:

If TrimLeadingSpace is true, leading white space in a field is ignored. If the delimiter is white space then TrimLeadingSpace will trim the delimiter.

(Perhaps it'll save some of us who like to program while tired some grief in the future.)

@ericlagergren ericlagergren changed the title encoding/csv: TrimLeadingSpace doesn't play nice with whitespace delimiters encoding/csv: TrimLeadingSpace doesn't play nice with white space delimiters Feb 22, 2016
@ericlagergren
Copy link
Contributor Author

A cursory reading of section 2 of RFC 4180 seems to indicate that delimiters are not part of a field:

Within the header and each record, there may be one or more
fields, separated by commas... The last field in the
record must not be followed by a comma."

Also, the ABNF grammar in section 2.7 also seems to indicate that delimiters are not a part of the field.

If delimiters are not a part of the field then (IMO) the correct parsing of a CSV row would be to scan until a delimiter is found, eating any white space iff the setting is toggled and the white space is not a delimiter.

(And yes, I know using tabs inside a CSV is icky but running sed 's#\t#,#g' in > out on the huge CSV files I have to deal with is impractical and could lose data; other programs are slow/impractical as well.)

@ericlagergren
Copy link
Contributor Author

A fix for this issue (other than changing the documentation) would be to add this to line 257:

--- reader.go   2016-02-22 11:14:54.468633562 -0800
+++ old_reader.go   2016-02-22 11:19:50.834691111 -0800
@@ -254,8 +254,7 @@
    r.field.Reset()

    r1, err := r.readRune()
-   for err == nil && r.TrimLeadingSpace &&
-       r1 != '\n' && unicode.IsSpace(r1) && r1 != r.Comma {
+   for err == nil && r.TrimLeadingSpace && r1 != '\n' && unicode.IsSpace(r1) {
        r1, err = r.readRune()
    }

@ianlancetaylor ianlancetaylor added this to the Go1.7 milestone Feb 22, 2016
@gopherbot
Copy link

CL https://golang.org/cl/19861 mentions this issue.

@golang golang locked and limited conversation to collaborators Feb 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants