New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/csv: leading " in non-quoted field leads to error #24422
Comments
Change https://golang.org/cl/101075 mentions this issue: |
Issue #19019 is more about not having enough detail in an error, whereas I'm saying we should not have an error at all. I have updated https://play.golang.org/p/YFyJ7uoGcXW example. There are two fields in every row, therefore the error is erroneous. |
This is working as intended. A leading quote in a field indicates that the field is quoted. In that situation, the parser must look for a terminating quote. In your example, there is no terminating quote before the next delimiter. If you want to the CSV parser to treat fields as unescaped fields separated by some delimiter, then You are better off doing this yourself as this example: in := `item-name item-description listing-id seller-sku price quantity open-date image-url item-is-marketplace product-id-type zshop-shipping-fee item-note item-condition zshop-category1 zshop-browse-path zshop-storefront-feature asin1 asin2 asin3 will-ship-internationally expedited-shipping zshop-boldface product-id bid-for-featured-placement add-delete pending-quantity fulfillment-channel merchant-shipping-group
"Medaka Box" Stray Bushiroad Storage Box Collection Vol.50 [ Japan Import ] 0124O7XD4ZR crdcase-bushi-50-medaka-box 14.88 2 24/01/2014 10:08:14 GMT y 1 Official Item Packed and Shipped Carefully and Quickly 11 B008DQZ0DA B008DQZ0DA 0 DEFAULT Migrated Template\
`
for _, s := range strings.Split(in, "\n") {
if i := strings.Index(s, "#"); i >= 0 {
s = s[:i]
}
if s == "" {
continue
}
records := strings.Split(s, "\t")
fmt.Printf("%d %q\n", len(records), records)
} |
Under https://tools.ietf.org/html/rfc4180#section-2.
Under that definition then a field which starts with a double quote but does not end with a double quote is not a valid CSV field. So under that strict definition, then yes, I would agree that [it works as intended] (for comma separated values). However, the encoding/csv package already breaks with the strictest interpretation of the spec via LazyQuotes option. LazyQuotes = true allows double quotes in the field without the entire field being quoted, directly counter to this sentence:
Clearly the intention of the LazyQuote option is to support files from vendors that do not produce valid quoted fields. If the package allows the flexibility for use LazyQuotes and the flexibility to switch the delimiter to '\t', I would argue that developers would and should expect encoding/csv to be THE defacto standard package for handing csv files and for handling other delimited files by changing the r.Comma rune. Other CSV parsing libs in other languages, such as ruby, can handle delimiting '\t' and do not run into this issue. The error message generated here https://play.golang.org/p/YFyJ7uoGcXW just makes no sense, (clearly the number of fields is uniform between lines) is unintuitive and wastes developer time. It should be handled. The CSV spec is silent on using other delimiters. But the package here not only allows alternate delimiters, it also allows flexibility for breaking out of strictly enclosed quoted fields so we should handle this case correctly. The necessity of using quoted fields at all is partial to using commas as a delimiter in the first place. If the package advertises that it is flexible to break the spec as a pragmatic consideration to parsing quotes in non-quoted fields and flexible to switching to a non-comma delimiter then I think this issue lies in it's domain. |
The paragraph you cite says:
This says nothing about what is a quoted field or not. The ABNF grammar is as such:
Note that a field may either be Let's look at an example. The problematic field in question in your data is: In order to be valid it would have to be: |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?1.9.2
Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?darwin
amd64
What did you do?
If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.
https://play.golang.org/p/nrG_caEeYL7
A quote
"
that appears at the beginning of the field but not at the end of the field"Medaka Box" Stray Bushiroad Storage Box Collection Vol.50
leads to an erroneous error:record on line 2: wrong number of fields
, even though this file has a uniform number of tab'\t'
delimited fields for every row. This file parses correctly in Excel, OpenOffice, Numbers etc. The data comes from a legitimate source, generated tab separated file from an Amazon report.If I turn lazy quotes off, then I get a different error:
parse error on line 2, column 11: extraneous or missing " in quoted-field
What did you expect to see?
I expect not to see an error. A partially quoted field in a non-quoted field is a legit use case that other systems/software generate and can consume.
What did you see instead?
record on line 2: wrong number of fields
The text was updated successfully, but these errors were encountered: