Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/csv: LazyQuotes option breaks if the fields are quoted and the lazy quotes appear at the end of the field #6258

Closed
gopherbot opened this issue Aug 26, 2013 · 12 comments

Comments

@gopherbot
Copy link

by alex@zylman.com:

What steps will reproduce the problem?
Try to read a CSV file where the fields are quoted and one of them ends in something
that would need lazy quotes, e.g.
"Field1","Field2
"LazyQuotes"","Field3","Field4"

What is the expected output?
["Field1", "Field2 \"Lazy Quotes\"", "Field3",
"Field4"]
Basically, an array of length 4, one for each field, with the lazy quotes in place.

What do you see instead?
["Field1", "Field2 \"LazyQuotes\",\"Field3",
"Field4"]
An array of length 3, where it smashed Field2 and Field3 together.

Which compiler are you using (5g, 6g, 8g, gccgo)?
Whatever the default is from https://code.google.com/p/go/downloads/list

Which operating system are you using?
Mac OS X 10.8

Which version are you using?  (run 'go version')
go version go1.1 darwin/amd64

Please provide any additional information below.
I've attached a .tar.gz that contains four .csv files and a Go program that reads them
and prints them.
Two of them files have quoted fields show lazy quotes in the middle of a field
(inp_quoted_end.csv) and at the end of a field (inp_quoted.csv), and two are the same
thing but unquoted fields (inp_unquoted_end.csv, inp_unquoted.csv, respectively). Only
the quoted fields with lazy quotes at the end break (inp_quoted.csv).

Attachments:

  1. broken_lazy_quotes.tar.gz (487754 bytes)
@gopherbot
Copy link
Author

Comment 1 by alex@zylman.com:

Also reproduced on go version go1.1.2 darwin/amd64

@robpike
Copy link
Contributor

robpike commented Aug 27, 2013

Comment 2:

What CSV lacks in specification it makes up in unreliability.

Labels changed: added priority-later, go1.2maybe, removed priority-triage.

Status changed to Accepted.

@robpike
Copy link
Contributor

robpike commented Aug 27, 2013

Comment 3:

Labels changed: added feature.

@gopherbot
Copy link
Author

Comment 4 by alex@zylman.com:

I understand that you guys probably get a lot of weird edge cases like this that are are
low priority, but if this is a feature request and not a bug, then there's a bug in the
documentation since it says this works. :/

@gopherbot
Copy link
Author

Comment 5 by alex@zylman.com:

So I think there might be errors any time that LazyQuotes are at field boundaries. Other
failing cases I found today:
"Field1","Field2",""LazyQuotes" Field3","Field4","Field5"
and
Field1,Field2,"LazyQuotes" Field3,Field4,Field5

@robpike
Copy link
Contributor

robpike commented Aug 30, 2013

Comment 6:

Not for 1.2.

Labels changed: removed go1.2maybe.

@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 7:

Labels changed: added go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 8:

Labels changed: removed feature.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 9:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 10:

Labels changed: added repo-main.

@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@nussjustin
Copy link
Contributor

This could be fixed for the simple case by peeking ahead for the next rune after a lazy quote, but I don't think this is really possible to fix in a simple and consistent way for all cases. I don't think this is really worth fixing.

The issue author originally wrote that

if this is a feature request and not a bug, then there's a bug in the
documentation since it says this works. :/

But I can't find anything about this case in the documentation. There is a case with a double-quoted word in a non-lazy quoted field, but nothing about a lazy quoted case.

@dsnet
Copy link
Member

dsnet commented Oct 20, 2017

I agree that this can't really be fixed. There is an infinite number of edge cases where LazyQuotes will fail. The current rules it follows in simple. Anything that starts to require peeking from the incoming stream is probably heading down the wrong direction. Secondly, subtle changes to LazyQuotes has massive effects on how files are parsed and would break compatibility. So probably not worth "fixing".

@dsnet dsnet closed this as completed Oct 20, 2017
@golang golang locked and limited conversation to collaborators Oct 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants