Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/csv: memory consumption is huge(1.2GB) when parsing a big(450MB) csv file #8059

Closed
gopherbot opened this issue May 21, 2014 · 14 comments

Comments

@gopherbot
Copy link

by claudiu.garba:

When trying to parse a huge csv file(450MB) the memory increase at 1.2GB, with a spike
at 1.6GB. The amount of time to finish the program is ~ 1,30 minutes.
OS: mac osx
go version :1.2, 64biti
code here: http://play.golang.org/p/jrVSqCcMpQ

The csv file has 450MB and ~ 1 milion rows. The code just print in terminal the current
row.
The csv has some errors inside, like missing comma, or spaces. 
When I run the program, it stops for 10 seconds and then the memory increase from 600MB
to 1.6GB then, remain at 1.2GB.

Attachments:

  1. small.txt (2993 bytes)
@ianlancetaylor
Copy link
Contributor

Comment 1:

Labels changed: added repo-main, release-go1.4.

@rsc
Copy link
Contributor

rsc commented Sep 15, 2014

Comment 2:

Status changed to Accepted.

@josharian
Copy link
Contributor

Comment 3:

Thanks for the report, claudiu.garba.
I tried to reproduce this on my OS X machine but could not. I duplicated small.txt until
it was 1560200 lines long and ran your sample program over it. The terminal backtrace
consumed a lot of memory, but the program itself did not. Is it possible that your
memory numbers include the terminal/stdout buffer?
I also put in calls to runtime.ReadMemStats during the main loop. On my machine, I see
MemStats.Alloc fairly stable around 60k and MemStats.Sys completely stable at 2885880
bytes.
If it's not the terminal buffer consuming memory, could you report some memstats numbers
here? Also, would you check whether there's a particular section of your large csv file
that's required to reproduce this?

Status changed to WaitingForReply.

@gopherbot
Copy link
Author

Comment 4 by claudiu.garba:

Hi there,
I will try it again and check is is not the terminal buffer and add memstats, also the
cvs file :)
regards,
claudiu

@rsc
Copy link
Contributor

rsc commented Sep 30, 2014

Comment 5:

If you have a syntax error involving a " and LazyQuotes is enabled then the reader might
try to read a very large chunk of the file as a single field. That would explain the
memory usage.

@rsc
Copy link
Contributor

rsc commented Sep 30, 2014

Comment 6:

I don't expect any structural changes to happen for this release.

Labels changed: added release-none, removed release-go1.4.

@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@carlfn
Copy link

carlfn commented Jul 4, 2016

Experiencing something similar. I have a binary file that I read into a byte array. Binary file is 2Gb. When it's loaded into mem, the process mem usage is over 6GB. After about 5 minutes, the mem usage drops down. Wrote a simple app to reproduce and can do so each time.

package main

import (
    "flag"
    "fmt"
    "io/ioutil"
    "os"
    "os/signal"
    "runtime"
)

var fileToRead = flag.String("filetoread", "", "file to load into mem")

func main() {
    flag.Parse()

    arr, err := func() ([]byte, error) {
        file, err := os.Open(*fileToRead)
        if err != nil {
            panic(err)
        }
        defer file.Close()

        return ioutil.ReadAll(file)
    }()
    if err != nil {
        panic(err)
    }

    fmt.Printf("Read %d bytes", len(arr))

    exitChan := make(chan os.Signal)
    signal.Notify(exitChan, os.Interrupt)
    <-exitChan
}

Added runtime.GC() after the printf statement with no affect. If I call debug.FreeOSMemory(), the memory is released after a few seconds. Version: go version go1.6.2 darwin/amd64

@bradfitz
Copy link
Contributor

bradfitz commented Jul 5, 2016

@carlfn, your case is not similar. You're using ioutil.ReadAll. Don't slurp the whole thing into memory.

@carlfn
Copy link

carlfn commented Jul 5, 2016

Okay, I'll create a new issue.

@nussjustin
Copy link
Contributor

I tested the code from the issue with a version of small.txt (see issue) repeated to 1 million lines, but can not reproduce the problem. Memory usage/allocation reported by the Go runtime and max RSS reported from the OS (using time -v) stays below 10MB. Only my terminal has a noticeable memory increase while writing. This is in line with what @josharian reported back in 2014.

My numbers are from both go 1.8.1 and tip on a notebook with linux and amd64 intel processor.

Since the problem can not be reproduced and there is no specific TODO here, I suggest this issue should be closed.

@josharian
Copy link
Contributor

Did you by chance check whether Russ's observation about LazyQuotes holds?

@nussjustin
Copy link
Contributor

Russ's observation seems to still hold. If you wrap a field in quotes and put something between the comma and the closing quote the Reader will read more and more memory until it finds EOF or a quote followed by a comma.

I agree that this is a problem, but the small.txt in the issue doesn't even contain any quotes, so we can't be sure if this is what the author meant.

Regardless of whether the LazyQuotes behaviour is a problem that should be fixed (e.g. by giving an optional limit on field size) or not (I think it is), I think that this should be handled in it's own issue and that this issue should be closed.

@josharian
Copy link
Contributor

Ok, I'll close this now. Will you go ahead and file a new issue, please?

@nussjustin
Copy link
Contributor

Filed #20169 for this

@golang golang locked and limited conversation to collaborators Apr 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants