Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html: html.Parse() leaks memory #5938

Closed
gopherbot opened this issue Jul 23, 2013 · 9 comments
Closed

html: html.Parse() leaks memory #5938

gopherbot opened this issue Jul 23, 2013 · 9 comments

Comments

@gopherbot
Copy link

by jake.austwick:

What steps will reproduce the problem?

Running the following gist produces the problem. I tried to make a smaller test case,
but it doesn't seem to leak when just parsing local files. It seems that html.Parse() is
not getting all the memory used garbage collected.

Memory will only leak using live URL's like the file below.

Sample Program:
https://gist.github.com/JakeAustwick/1cbdb5e9e3e778b4ff42

urls.txt (needed to run):
https://gist.github.com/JakeAustwick/82c9d4ce300639a4d275/raw/368c41ce6ba95f03cbc25a188dd3c07646a068b0/gistfile1.txt

What is the expected output?

Memory not to increase until system memory is exhausted.

What do you see instead?

Memory slowly increases until it is all gone. Increase WORKER_COUNT to increase leakage
speed.

Which compiler are you using (5g, 6g, 8g, gccgo)?

6g

Which operating system are you using?

Ubuntu 12.10

Which version are you using?  (run 'go version'):

go version go1.1 linux/amd64
@davecheney
Copy link
Contributor

Comment 1:

Moving to accepted status, don't know what priority to set on it.

Status changed to Accepted.

@robpike
Copy link
Contributor

robpike commented Aug 21, 2013

Comment 2:

Labels changed: added priority-later, garbage, removed priority-triage.

@gopherbot
Copy link
Author

Comment 3 by jake.austwick:

I know this got marked as Priority-Later, but it really is an issue for me as I write a
lot of crawlers / web bots. If there is any chance that you somebody could at least
confirm they are also having the issue on their machine then maybe somebody might be
willing to look into this deeper?
I suspect it's certain page sources that are causing the issue, which is why I couldn't
replicate it just parsing the same url over and over, and unfortunately couldn't come up
with a smaller test case for everyone.

@adg
Copy link
Contributor

adg commented Aug 27, 2013

Comment 4:

I'm willing to take a look if you can provide a self-contained example. I am not going
to run a program that hits all those web sites. (The irony of me being a Google employee
is not lost on me either. ;-)

@davecheney
Copy link
Contributor

Comment 5:

Status changed to WaitingForReply.

@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 6:

Labels changed: added go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 7:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 8:

Labels changed: added repo-net.

@davecheney
Copy link
Contributor

Comment 9:

Status changed to TimedOut.

@mikioh mikioh added repo-net and removed repo-net labels Dec 23, 2014
@mikioh mikioh changed the title go.net/html: html.Parse() leaks memory html: html.Parse() leaks memory Jan 4, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants