Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/csv: golang csv also so slow and consumed two much memory #16786

Closed
liketic opened this issue Aug 18, 2016 · 5 comments
Closed

encoding/csv: golang csv also so slow and consumed two much memory #16786

liketic opened this issue Aug 18, 2016 · 5 comments

Comments

@liketic
Copy link

liketic commented Aug 18, 2016

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
    go version go1.7 linux/amd64
  2. What operating system and processor architecture are you using (go env)?
    GOHOSTARCH="amd64"
    GOHOSTOS="linux"
    GOOS="linux"
  3. What did you do?
    If possible, provide a recipe for reproducing the error.
    A complete runnable program is good.
    A link on play.golang.org is best.

Read a several csv file, each one is about 50 m. the speed and memory usage are so huge.

  1. What did you expect to see?

I compared regex with Java yesterday: #16758
I wish Go can get the speed of Java

  1. What did you see instead?

Go is much slower on CSV reading. I think writing also much slower too. So is there any way to optimize this? I read CSV like:

func (cd *CsvDecoder) readCsv(source []byte) (records [][]string, err error) {
    r := csv.NewReader(bytes.NewReader(source))
    return r.ReadAll()
}

Maybe I can set some options before reading? Seems I can't specify something like buffer size, etc. Why?

@ALTree
Copy link
Member

ALTree commented Aug 18, 2016

Can you provide two small, self contained Go and Java programs that can be used to assess the problem?

How much slower than Java is "so slow"? How much memory is "huge memory"? Are you slurping whole files into memory?

Also, if you have questions about how to optimize your go programs, you should ask elsewhere (the project does not use the issue tracker for questions. See: Questions).

@nussjustin
Copy link
Contributor

nussjustin commented Aug 18, 2016

As @ALTree said, we need some more information. Maybe you can also give us some more information about your CSV files, like the number of rows and columns.

There are currently no options that you could specify for better performance.

The internal reader used by encoding/csv uses a bufio.Reader with default buffer size (4096 bytes). You could try to wrap your reader with an bufio.Reader by using bufio.NewReaderSize with a bigger buffer size, but I don't except this to yield any nig performance gains as the reader only reads from an in-memory byte slice.

Maybe you can try to read directly from the files, instead of reading them into memory first? This could reduce the memory usage (quite) a bit.

I have a CL open that optimizes the encoding/csv a bit by avoiding some allocations (basically reducing allocations from 1columnsrows to 1*rows in your case). In a simple synthetic test reading ~15 million rows of CSV I got a 17% win with my CL.

If you have multiple files you could also try to parse them with multiple goroutines in parallel.

@bradfitz
Copy link
Contributor

You can also not use ReadAll if you care about memory. You should use Go's streaming APIs when your data is large.

Let's move this discussion to golang-nuts@ if there's nothing concrete to do here.

Reports on the bug tracker should be more concrete than "Go is slow" and the "the speed and memory usage are so huge". Please provide sample code, numbers, etc.

@ALTree
Copy link
Member

ALTree commented Aug 18, 2016

To be fair the standard library csv reader is notoriously slow out of the box, we could use an Issue tracking the problem. Obviously it'll need some data.

@bradfitz
Copy link
Contributor

@ALTree, if you'd like to open one, please go ahead.

@golang golang locked and limited conversation to collaborators Aug 18, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants