Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: compiling with GOGC=off shows runtime.mallocgc uses 11.8% of run time #14850

Closed
davecheney opened this issue Mar 17, 2016 · 7 comments

Comments

@davecheney
Copy link
Contributor

  1. What version of Go are you using (go version)?

go version devel +56e0ecc Thu Mar 17 04:20:02 2016 +0000 linux/amd64

  1. What operating system and processor architecture are you using (go env)?

linux/amd64

  1. What did you do?
(pprof) lucky(~/go/src/cmd/compile/internal/gc) % env GOGC=off go build -gcflags=-cpuprofile=/tmp/c.p .
lucky(~/go/src/cmd/compile/internal/gc) % go tool pprof $(go tool -n compile) /tmp/c.p
  1. What did you expect to see?

mallocgc should take less of the overall run time

  1. What did you see instead?
Entering interactive mode (type "help" for commands)
(pprof) top
1640ms of 5080ms total (32.28%)
Dropped 289 nodes (cum <= 25.40ms)
Showing top 10 nodes out of 243 (cum >= 350ms)
      flat  flat%   sum%        cum   cum%
     240ms  4.72%  4.72%      240ms  4.72%  cmd/compile/internal/ssa.copyelimValue
     240ms  4.72%  9.45%      240ms  4.72%  runtime.duffzero
     210ms  4.13% 13.58%      220ms  4.33%  runtime.(*mcentral).grow
     190ms  3.74% 17.32%      400ms  7.87%  cmd/compile/internal/gc.escwalkBody
     170ms  3.35% 20.67%      600ms 11.81%  runtime.mallocgc
     140ms  2.76% 23.43%      170ms  3.35%  cmd/compile/internal/ssa.(*regAllocState).computeLive
     130ms  2.56% 25.98%      130ms  2.56%  runtime.heapBitsSetType
     110ms  2.17% 28.15%      640ms 12.60%  cmd/compile/internal/ssa.(*regAllocState).regalloc
     110ms  2.17% 30.31%      220ms  4.33%  cmd/compile/internal/ssa.cmpVal
     100ms  1.97% 32.28%      350ms  6.89%  runtime.mapassign1

compile

@randall77
Copy link
Contributor

Regalloc definitely needs a once-over for allocation reduction. A lot of
the allocation looks like live variable analysis, which maybe we could do
more densely.
copyelimValue surprises me. It implies there are large copy chains and we
would do well to short circuit them as we find them (so short circuiting
them all takes O(n), not potentially O(n^2)).

On Thu, Mar 17, 2016 at 4:22 PM, Dave Cheney notifications@github.com
wrote:

  1. What version of Go are you using (go version)?

go version devel +56e0ecc Thu Mar 17 04:20:02 2016 +0000 linux/amd64

  1. What operating system and processor architecture are you using (go
    env)?

linux/amd64

  1. What did you do?

(pprof) lucky(/go/src/cmd/compile/internal/gc) % env GOGC=off go build -gcflags=-cpuprofile=/tmp/c.p .
lucky(
/go/src/cmd/compile/internal/gc) % go tool pprof $(go tool -n compile) /tmp/c.p

  1. What did you expect to see?

mallocgc should take less of the overall run time

  1. What did you see instead?

Entering interactive mode (type "help" for commands)
(pprof) top
1640ms of 5080ms total (32.28%)
Dropped 289 nodes (cum <= 25.40ms)
Showing top 10 nodes out of 243 (cum >= 350ms)
flat flat% sum% cum cum%
240ms 4.72% 4.72% 240ms 4.72% cmd/compile/internal/ssa.copyelimValue
240ms 4.72% 9.45% 240ms 4.72% runtime.duffzero
210ms 4.13% 13.58% 220ms 4.33% runtime.(

mcentral).grow 190ms 3.74% 17.32% 400ms 7.87%
cmd/compile/internal/gc.escwalkBody 170ms 3.35% 20.67% 600ms 11.81%
runtime.mallocgc 140ms 2.76% 23.43% 170ms 3.35% cmd/compile/internal/ssa.(

regAllocState).computeLive
130ms 2.56% 25.98% 130ms 2.56% runtime.heapBitsSetType
110ms 2.17% 28.15% 640ms 12.60%
cmd/compile/internal/ssa.(*regAllocState).regalloc
110ms 2.17% 30.31% 220ms 4.33% cmd/compile/internal/ssa.cmpVal
100ms 1.97% 32.28% 350ms 6.89% runtime.mapassign1

[image: compile]
https://cloud.githubusercontent.com/assets/7171/13864113/25e0edb2-ecf3-11e5-9884-233014199813.png


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#14850

@minux
Copy link
Member

minux commented Mar 18, 2016 via email

@davecheney
Copy link
Contributor Author

Thank you for your reply. My question is why does mallocgc take 12% of run
time, and can it be made mor efficient.

Reducing the number of allocations is also useful and everyone is focused
on that. This issue is focusing on reducing the unit allocation cost, which
seems high to me.

On Fri, 18 Mar 2016, 11:01 Minux Ma, notifications@github.com wrote:

mallocgc is the allocation function, so if it shows up
in cpu profile, it simply means the programs spent
a lot of time allocating.

I don't think disable GC will reduce time spent on it.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#14850 (comment)

@randall77
Copy link
Contributor

Mcentral.grow is dominated by faulting in pages, and that's 1/3rd of the
mallocgc time.

On Thu, Mar 17, 2016 at 5:12 PM, Dave Cheney notifications@github.com
wrote:

Thank you for your reply. My question is why does mallocgc take 12% of run
time, and can it be made mor efficient.

Reducing the number of allocations is also useful and everyone is focused
on that. This issue is focusing on reducing the unit allocation cost, which
seems high to me.

On Fri, 18 Mar 2016, 11:01 Minux Ma, notifications@github.com wrote:

mallocgc is the allocation function, so if it shows up
in cpu profile, it simply means the programs spent
a lot of time allocating.

I don't think disable GC will reduce time spent on it.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#14850 (comment)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#14850 (comment)

@davecheney
Copy link
Contributor Author

Thanks Keith, that explains what I am seeing. This process grows to around
a Gb so there will be many page faults.

On Fri, Mar 18, 2016 at 11:42 AM, Keith Randall notifications@github.com
wrote:

Mcentral.grow is dominated by faulting in pages, and that's 1/3rd of the
mallocgc time.

On Thu, Mar 17, 2016 at 5:12 PM, Dave Cheney notifications@github.com
wrote:

Thank you for your reply. My question is why does mallocgc take 12% of
run
time, and can it be made mor efficient.

Reducing the number of allocations is also useful and everyone is focused
on that. This issue is focusing on reducing the unit allocation cost,
which
seems high to me.

On Fri, 18 Mar 2016, 11:01 Minux Ma, notifications@github.com wrote:

mallocgc is the allocation function, so if it shows up
in cpu profile, it simply means the programs spent
a lot of time allocating.

I don't think disable GC will reduce time spent on it.


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#14850 (comment)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#14850 (comment)


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#14850 (comment)

@bradfitz
Copy link
Contributor

Anything to do here or can we close this?

@davecheney
Copy link
Contributor Author

I think this can be closed. It's the same old story, use less memory, pay
fewer overheads.

On Mon, Mar 21, 2016 at 7:58 PM, Brad Fitzpatrick notifications@github.com
wrote:

Anything to do here or can we close this?


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#14850 (comment)

@golang golang locked and limited conversation to collaborators Mar 21, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants