Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: encoding/csv: add NewWriterSize #51746

Closed
tomtwinkle opened this issue Mar 17, 2022 · 4 comments
Closed

proposal: encoding/csv: add NewWriterSize #51746

tomtwinkle opened this issue Mar 17, 2022 · 4 comments

Comments

@tomtwinkle
Copy link

tomtwinkle commented Mar 17, 2022

What version of Go are you using (go version)?

$ go version
go version go1.17.8 darwin/amd64

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN="/Users/tomtwinkle/go/bin"
GOCACHE="/Users/tomtwinkle/Library/Caches/go-build"
GOENV="/Users/tomtwinkle/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/tomtwinkle/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/tomtwinkle/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/tomtwinkle/go/go1.17.8"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/tomtwinkle/go/go1.17.8/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.17.8"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/tomtwinkle/workspaces/go-verification/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/dy/wq8tln5j6_db0645jxfqj15m0000gp/T/go-build3247205606=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I am trying to generate CSV output using encoding/csv.

  • Outputting multibyte characters
  • Output string exceeds 4096 bytes.
  • Comma-separated characters are present in the field
  • Newline code is CRLF
  • Use custom Writer
package main

import (
	"bytes"
	"encoding/csv"
	"fmt"
	"io"
	"log"
	"strings"
)

func main() {
	buf := bytes.Buffer{}
	w := csv.NewWriter(&runeWriter{w: &buf})

	in := strings.Repeat("あ", 300) + ","
	line := make([]string, 10)
	for i := 0; i < 10; i++ {
		line[i] = in
	}
	w.UseCRLF = true
	if err := w.Write(line); err != nil {
		log.Fatal(err)
	}
	w.Flush()
	fmt.Println("===================Result====================")
	fmt.Println(buf.String())
}

type runeWriter struct {
	w io.Writer
}

func (rw *runeWriter) Write(in []byte) (int, error) {
	fmt.Println("===================Write====================")
	fmt.Println(string(in))
	l := 0
	for _, r := range string(in) {
		b := []byte(string(r))
		l += len(b)
		if _, err := rw.w.Write(b); err != nil {
			if _, err := rw.w.Write([]byte{'?'}); err != nil {
				return 0, err
			}
		}
	}
	return l, nil
}

What did you expect to see?

Expect all input characters to be output correctly.

What did you see instead?

In the case of the conditions, when writing with Writer, the byte array is cut off at the halfway point when it exceeds 4096 bytes, and a �(replacement character) is output.

The reason for using a custom writer is to replace strings that cannot be converted during the UTF-8 -> SJIS conversion with another string.

The example describes the following

w := csv.NewWriter(&runeWriter{w: &buf})

In practice, it is used as follows

w := csv.NewWriter(&runeWriter{w: transform.NewWriter(&buf, japanese.ShiftJIS.NewEncoder())})

I understand that the byte array passed to Writer() is simply passed as 4096 bytes separated by 4096 bytes.
For the convenience of conversion from UTF-8 to SJIS, if the byte array delimited by 4096 bytes contains characters that cannot be converted, an error will occur there.

@tomtwinkle
Copy link
Author

tomtwinkle commented Mar 17, 2022

I came up with this idea.
Once the CSV is output in UTF-8, and then NewWriterSize() to specify the bufferSize and convert it to SJIS.

I think it would be best to have csv.NewWriterSize() .

@seankhliao seankhliao changed the title encoding/csv: Output strings are garbled. proposal: encoding/csv: add NewWriterSize Mar 17, 2022
@gopherbot gopherbot added this to the Proposal milestone Mar 17, 2022
@seankhliao
Copy link
Member

I think your custom writer should keep state so that it properly handles characters spread over multiple writes.
Or you could have implemented it as a transform.Transformer.

@ianlancetaylor
Copy link
Contributor

encoding/csv assumes that the io.Writer passed to csv.NewWriter honors the io.Writer interface documented at https://pkg.go.dev/io#Writer. Your runeWriter doesn't do that. We aren't going to encoding/csv to support non-compliant Writer implementations, so closing.

@tomtwinkle
Copy link
Author

Thanks for the advice.
I have implemented the transformer.
https://github.com/tomtwinkle/garbledreplacer

For people with the same problem.

@golang golang locked and limited conversation to collaborators Apr 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants