Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(3387)

Issue 164760043: code review 164760043: encoding/csv: for Postgres, unquote empty strings, quote \. (Closed)

Can't Edit
Can't Publish+Mail
Start Review
Created:
9 years, 6 months ago by rsc
Modified:
9 years, 6 months ago
Reviewers:
bradfitz
CC:
bradfitz, golang-codereviews
Visibility:
Public.

Description

encoding/csv: for Postgres, unquote empty strings, quote \. In theory both of these lines encode the same three fields: a,,c a,"",c However, Postgres defines that when importing CSV, the unquoted version is treated as NULL (missing), while the quoted version is treated as a string value (empty string). If the middle field is supposed to be an integer value, the first line can be imported (NULL is okay), but the second line cannot (empty string is not). Postgres's import command (COPY FROM) has an option to force the unquoted empty to be interpreted as a string but it does not have an option to force the quoted empty to be interpreted as a NULL. From http://www.postgresql.org/docs/9.0/static/sql-copy.html: The CSV format has no standard way to distinguish a NULL value from an empty string. PostgreSQL's COPY handles this by quoting. A NULL is output as the NULL parameter string and is not quoted, while a non-NULL value matching the NULL parameter string is quoted. For example, with the default settings, a NULL is written as an unquoted empty string, while an empty string data value is written with double quotes (""). Reading values follows similar rules. You can use FORCE_NOT_NULL to prevent NULL input comparisons for specific columns. Therefore printing the unquoted empty is more flexible for imports into Postgres than printing the quoted empty. In addition to making the output more useful with Postgres, not quoting empty strings makes the output smaller and easier to read. It also matches the behavior of Microsoft Excel and Google Drive. Since we are here and making concessions for Postgres, handle this case too (again quoting the Postgres docs): Because backslash is not a special character in the CSV format, \., the end-of-data marker, could also appear as a data value. To avoid any misinterpretation, a \. data value appearing as a lone entry on a line is automatically quoted on output, and on input, if quoted, is not interpreted as the end-of-data marker. If you are loading a file created by another application that has a single unquoted column and might have a value of \., you might need to quote that value in the input file. Fixes issue 7586.

Patch Set 1 #

Patch Set 2 : diff -r 28de6f41b1c77c5cb9046056fb29b96b9583a3a5 https://code.google.com/p/go/ #

Patch Set 3 : diff -r 28de6f41b1c77c5cb9046056fb29b96b9583a3a5 https://code.google.com/p/go/ #

Patch Set 4 : diff -r 5db49b99612c356e35d4977d95ff4f464c2110bb https://code.google.com/p/go/ #

Unified diffs Side-by-side diffs Delta from patch set Stats (+25 lines, -2 lines) Patch
M src/encoding/csv/writer.go View 1 2 1 chunk +14 lines, -2 lines 0 comments Download
M src/encoding/csv/writer_test.go View 1 2 1 chunk +11 lines, -0 lines 0 comments Download

Messages

Total messages: 3
rsc
Hello bradfitz (cc: golang-codereviews@googlegroups.com), I'd like you to review this change to https://code.google.com/p/go/
9 years, 6 months ago (2014-10-24 02:03:12 UTC) #1
bradfitz
LGTM Nice. On Oct 23, 2014 7:03 PM, <rsc@golang.org> wrote: > Reviewers: bradfitz, > > ...
9 years, 6 months ago (2014-10-24 03:14:42 UTC) #2
rsc
9 years, 6 months ago (2014-10-24 03:44:52 UTC) #3
*** Submitted as https://code.google.com/p/go/source/detail?r=80ddbcaec704 ***

encoding/csv: for Postgres, unquote empty strings, quote \.

In theory both of these lines encode the same three fields:

        a,,c
        a,"",c

However, Postgres defines that when importing CSV, the unquoted
version is treated as NULL (missing), while the quoted version is
treated as a string value (empty string). If the middle field is supposed to
be an integer value, the first line can be imported (NULL is okay), but
the second line cannot (empty string is not).

Postgres's import command (COPY FROM) has an option to force
the unquoted empty to be interpreted as a string but it does not
have an option to force the quoted empty to be interpreted as a NULL.

From http://www.postgresql.org/docs/9.0/static/sql-copy.html:

        The CSV format has no standard way to distinguish a NULL
        value from an empty string. PostgreSQL's COPY handles this
        by quoting. A NULL is output as the NULL parameter string
        and is not quoted, while a non-NULL value matching the NULL
        parameter string is quoted. For example, with the default
        settings, a NULL is written as an unquoted empty string,
        while an empty string data value is written with double
        quotes (""). Reading values follows similar rules. You can
        use FORCE_NOT_NULL to prevent NULL input comparisons for
        specific columns.

Therefore printing the unquoted empty is more flexible for
imports into Postgres than printing the quoted empty.

In addition to making the output more useful with Postgres, not
quoting empty strings makes the output smaller and easier to read.
It also matches the behavior of Microsoft Excel and Google Drive.

Since we are here and making concessions for Postgres, handle this
case too (again quoting the Postgres docs):

        Because backslash is not a special character in the CSV
        format, \., the end-of-data marker, could also appear as a
        data value. To avoid any misinterpretation, a \. data value
        appearing as a lone entry on a line is automatically quoted
        on output, and on input, if quoted, is not interpreted as
        the end-of-data marker. If you are loading a file created by
        another application that has a single unquoted column and
        might have a value of \., you might need to quote that value
        in the input file.

Fixes issue 7586.

LGTM=bradfitz
R=bradfitz
CC=golang-codereviews
https://codereview.appspot.com/164760043
Sign in to reply to this message.

Powered by Google App Engine
RSS Feeds Recent Issues | This issue
This is Rietveld f62528b