Descriptionencoding/csv: Fix handling of quoted substrings at start of field in a CSV file.
The CSV reader could not properly parse a field that began with a quoted substring even when LazyQuotes was turned on. The field needed to be completely quoted.
For example, the string |"Quoted Substring" other chars| when parsed would process the second quote as a bare quote and then consume characters (including newlines)until a closing quote before a Comma was encountered. This could sometimes result in memory exhaustion in the case of a large file.
The change here fixes this behaviour if (and only if) LazyQuotes is true. If the character following the second quote character in a field is not a Comma or NewLine,the rest of the field is processed as a normal unquoted field until a Comma, Newline or EOF are reached (quotes are treated as BareQuotes and inserted in the field). If the last character in the field is a quote character, it is dropped (to match the quote at the beginning). If not, the leading quote is inserted as part of the field value.
Change in behaviour:
Input: `abc,"def"ghi,jkl\nMNO,PQR",STU`
Old Output: `abc`, `def"ghi,jkl\nMNO,PQR`, `STU`
New Output: `abc`, `"def"ghi`, `jkl`
`MNO`, `PQR"`, `STU`
Note how the previous behaviour sucks in a whole bunch of characters before terminating at the end of the second field on line 2. The new (fixed) handling terminates processing at the comma after 'i' and returns two records instead of one.
Two new tests are added in the test package to describe this behaviour (all existing tests pass unchanged).
Fixes issue 6352.
Fixes issue 3150.
Patch Set 1 #Patch Set 2 : diff -r 77a4d225cc7e https://code.google.com/p/go #Patch Set 3 : diff -r 77a4d225cc7e https://code.google.com/p/go #
Total comments: 2
Patch Set 4 : diff -r d71a954bca35 https://code.google.com/p/go #Patch Set 5 : diff -r d71a954bca35 https://code.google.com/p/go #MessagesTotal messages: 10
|