Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/csv: writer.UseCRLF will change \n to \r\n in data field #36445

Open
bkkgbkjb opened this issue Jan 8, 2020 · 6 comments
Open

encoding/csv: writer.UseCRLF will change \n to \r\n in data field #36445

bkkgbkjb opened this issue Jan 8, 2020 · 6 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bkkgbkjb
Copy link

bkkgbkjb commented Jan 8, 2020

What version of Go are you using (go version)?

$ go version
go version go1.13.5 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/secret/.cache/go-build"
GOENV="/home/secret/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/secret/Dropbox/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.13"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.13/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build843557951=/tmp/go-build -gno-record-gcc-switches"

What did you do?

trying to write

"col1","col2"
"asd\njk", "2g9"

into csv file

but the newline in asd\njk has been change to asd\r\njk

playground

What did you expect to see?

\n in data field would not be changed by writer.UseCRLF

"col1,col2\r\n\"asd\njk\",2g9\r\n"

What did you see instead?

"col1,col2\r\n\"asd\r\njk\",2g9\r\n"

@bkkgbkjb
Copy link
Author

bkkgbkjb commented Jan 8, 2020

after a further comparison to Python 3.x csv library,
I find following table:

Python:
new_line: \r\n
\r -> quote
\n -> quote
\r\n -> quote


new_line: \n
\n -> quote
\r -> no_quote
\r\n -> quote


Go:

new_line: \r\n
\n -> changed to \r\n, then quote                         (1)
\r -> removed \r, then quote remaining                    (2)
\r\n -> quote


new_line: \n
\n -> quote
\r -> quote
\r\n -> quote

though there seem no good standard on csv format, I still think touching actual data is a bad idea

My suggestion will be simply fix (1), (2) to quote
then all the \r?\n? occurrence would be quoted, which never harms

@toothrot toothrot added this to the Backlog milestone Jan 8, 2020
@toothrot toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 8, 2020
@toothrot
Copy link
Contributor

toothrot commented Jan 8, 2020

/cc @dsnet @bradfitz

The issue reported seems like surprising behavior to me. I wouldn't expect data to be changed either.

@dsnet
Copy link
Member

dsnet commented Jan 11, 2020

The godoc currently documents the behavior:

The Reader converts all \r\n sequences in its input to plain \n

Given that this is specified behavior, we can't change it. At best, we can add a Reader option to preserve newlines without mangling.

@bkkgbkjb
Copy link
Author

well but i think we're talking about csv.Writer.UseCRLF here

the only explanation is:

If UseCRLF is true, the Writer ends each output line with \r\n instead of \n.

i suggest we add a StrictMode bool field into

struct Writer {
    ...
}

so that by enabling it, Writer would not change anything in our data

@bkkgbkjb
Copy link
Author

So the problem here is with csv.Writer.UseCRLF enabled

csv.Writer would also change our data in quote:
remove all \r
change \n to \n\r

which is shown as

                        // Encode the special character.
			if len(field) > 0 {
				var err error
				switch field[0] {
				case '"':
					_, err = w.w.WriteString(`""`)
				case '\r':
					if !w.UseCRLF {
						err = w.w.WriteByte('\r')
					}
				case '\n':
					if w.UseCRLF {
						_, err = w.w.WriteString("\r\n")
					} else {
						err = w.w.WriteByte('\n')
					}
				}
				field = field[1:]
				if err != nil {
					return err
				}
			}

src

@lrita
Copy link

lrita commented Apr 2, 2020

Ms-excel will interpretive the \r in fields to . And we must to set UseCRLF=true for ms-excel. What a pity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants