Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: heap profile is malformed #6592

Closed
josharian opened this issue Oct 15, 2013 · 15 comments
Closed

runtime/pprof: heap profile is malformed #6592

josharian opened this issue Oct 15, 2013 · 15 comments
Milestone

Comments

@josharian
Copy link
Contributor

A recent memory profiling run created a binary/memprofile pair that generates:

Illegal division by zero at .../pkg/tool/darwin_amd64/pprof line 3765, <PROFILE>
line 1556.

The memprofile file is at https://gist.github.com/josharian/6998976. I'd rather not
share the binary at this point, if possible.


Which version are you using?  (run 'go version')

go version devel +6e3768395dd2 Mon Oct 14 10:53:55 2013 -0700 darwin/amd64
@rsc
Copy link
Contributor

rsc commented Oct 16, 2013

Comment 1:

This is the line that caused the problem:
12: 0 [48: 768] @ 0x3b420 0x3c8dc 0x1eb2bc 0x1ed3a8 0x1c7f60 0x1c6e64 0xcdf48 0xd1668
0x302e4
Here are some other incorrect lines:
113: -864 [393: 4720] @ 0x3b420 0x3c9a8 0x3ca50 0x35478 0x35734 0x35620 0x123220
0x123f94 0x1281b8 0x12bfd4 0x131b10 0x126aac 0x77ae0 0x77b78 0x78f7c 0x67668 0x302e4
12: -960 [49: 1568] @ 0x3b420 0x3c9a8 0x3ca50 0x35478 0x35328 0x222494 0x22b874 0x21f5ac
0x23393c 0x236c20 0x237b7c 0x237b7c 0x23831c 0x238170 0x1ed2ec 0x1c7f60 0x1c6e64 0xcdf48
0xd1668 0x302e4
60: -1048 [192: 1536] @ 0x3b420 0x3c8dc 0x59dc4 0x670a4 0x675c4 0x302e4
6: -1152 [15: 1920] @ 0x3b420 0x3c8dc 0xce1a4 0xd1668 0x302e4
5: -1184 [13: 1872] @ 0x3b420 0x3fd44 0x402fc 0x164ec0 0x163ce8 0x161490 0x163564
0x1614cc 0x160c60 0x1607d0 0x6bc84 0x70c34 0x302e4
9: -1216 [72: 2304] @ 0x3b420 0x3c9a8 0x3ca50 0x35478 0x35734 0x35620 0x21b5a4 0x21a1ec
0x1da2f4 0x1da7f0 0x1db458 0x1c9404 0x1c6e64 0xcdf48 0xd1668 0x302e4
48: -1280 [193: 1544] @ 0x3b420 0x3c8ac 0x20bd0 0x20dbc 0x77c74 0x78f7c 0x59fc4 0x670a4
0x675c4 0x302e4
77: -1424 [245: 1960] @ 0x3b420 0x3c8dc 0x177c48 0x1769fc 0x2379a0 0x235308 0x2372a8
0x23831c 0x238170 0x1ec474 0x1ed3a8 0x1c7f60 0x1c6e64 0xcdf48 0xd1668 0x302e4
1: -1472 [14: 4032] @ 0x3b420 0x3c9a8 0x3ca50 0x35478 0x35328 0x222494 0x229c88 0x21f838
0x231f84 0x230358 0x1d7378 0x1dffc8 0x1c8e68 0x1c6e64 0xcdf48 0xd1668 0x302e4
0: -1536 [4: 2048] @ 0x3b420 0x3c9a8 0x3ca50 0x35478 0x35328 0x82af4 0x82cac 0x6b2a8
0x707fc 0x302e4
108: -1624 [353: 11296] @ 0x3b420 0x3fd44 0x402fc 0x126adc 0x77ae0 0x77b78 0x78f7c
0x65c00 0x302e4
The accounting is clearly bogus.

Labels changed: added priority-later, go1.2, removed priority-triage.

Status changed to Accepted.

@rsc
Copy link
Contributor

rsc commented Oct 16, 2013

Comment 2:

I don't need the full binary. However, it would help me to know the addresses of the
various runtime functions. Can you please post the output of:
go tool nm -n yourbinary | grep ' T runtime\.'
Thanks.
Russ

@josharian
Copy link
Contributor Author

@rsc
Copy link
Contributor

rsc commented Oct 16, 2013

Comment 4:

Thanks. 
The first four fields are "objects in use", "bytes in use", "total allocated objects",
"total allocated bytes". Internally, we track total allocated and total freed. The "in
use" numbers are allocated - freed. The negative in use byte numbers indicate that the
freed byte count is bigger than the allocated byte count. It appears that only the byte
counts are incorrect. The object counts seem fine.
All the lines containing 0x3c9a8 are calls to runtime.new, which necessarily allocate
fixed-size chunks, so all the allocs and frees for each of those should be the same
size, so the ratio of #block / # byte should be constant (1 / size).
That makes it easy to find inconsistent lines:
tr -d '[]:' <heap.txt | grep 0x3c8dc | awk '$1*$4 != $2*$3'
Here are the 10 entries with the largest block sizes:
$ tr -d '[]:' <heap.txt | grep 0x3c8dc | awk '$1*$4 != $2*$3 {print $4/$3, $0}' |
sort -nr | sed 10q
2048 96 252408 370 757760 @ 0x3b420 0x3c8dc 0x149f54 0x67ebc 0x6bdc8 0x70c34 0x302e4
2048 39 109136 192 393216 @ 0x3b420 0x3c8dc 0x149f54 0x59a84 0x670a4 0x675c4 0x302e4
2048 38 94992 192 393216 @ 0x3b420 0x3c8dc 0x149f54 0x59c00 0x670a4 0x675c4 0x302e4
1152 5 6752 18 20736 @ 0x3b420 0x3c8dc 0x21aa48 0x1bf928 0x1c9500 0x1c6e64 0xcdf48
0xd1668 0x302e4
1152 4 -1920 19 21888 @ 0x3b420 0x3c8dc 0x21aa48 0x1bf928 0x1c995c 0x1c6e64 0xcdf48
0xd1668 0x302e4
224 55 12552 193 43232 @ 0x3b420 0x3c8dc 0x153244 0x14e7ac 0x65fe4 0x671e4 0x675c4
0x302e4
224 52 13544 193 43232 @ 0x3b420 0x3c8dc 0x153244 0x14b5dc 0x14b514 0x662fc 0x675c4
0x302e4
144 71 9136 202 29088 @ 0x3b420 0x3c8dc 0x138068 0x1262a8 0x126454 0x126a74 0x57efc
0x54738 0x58adc 0x12b214 0x12b708 0x131b10 0x126aac 0x77bb8 0x78f7c 0x57858 0x302e4
144 67 9808 196 28224 @ 0x3b420 0x3c8dc 0x138068 0x1262a8 0x126454 0x126a74 0x77ae0
0x77b78 0x78f7c 0x67668 0x302e4
144 63 8352 182 26208 @ 0x3b420 0x3c8dc 0x138068 0x1262a8 0x126454 0x126c28 0x15aee0
0x15a75c 0x15d6d8 0x15d868 0x15f780 0x12b214 0x12b708 0x131b10 0x126aac 0x77bb8 0x78f7c
0x66174 0x671e4 0x675c4 0x302e4
Note especially the three entries with 2048-byte blocks. The in-use byte count is 252408
(504 mod 2048). So it's not that some blocks are being counted as freed twice. The size
is wrong for some other reason.
Perhaps the size being passed to MProf_Free is wrong. However, it looks correct, and it
is correct in many cases, and if it were incorrect, that would mean that sweep would be
walking the span using the wrong block size and would cause more serious problems than a
bad profile.
Perhaps there is a bad atomic operation and the int64 storing the free byte count is
getting corrupted. It is strange that the allocated byte count is fine, though.
Perhaps there is memory corruption, but it's strange that it would target this one field
in this one struct, repeatedly, but nothing else.
Perhaps there is a code generation error, but again it's strange that it works most of
the time.
Perhaps if a signal arrives at the wrong time the register being used by this one
computation is smashed, but not the computation of the others. But again, why this one
field?
The size passed to MProf_Free being incorrect really does seem to be the most likely
option, but again if that were true, more would be broken.
I'm stumped.
I'd very much like to understand this before we cut Go 1.2. It won't hold up the rc2
release candidate, though.
I assume this is arm. What kind of hardware is this? Is it a multiprocessor? What is the
GOARM setting you are using? Any other relevant details?

Labels changed: added go1.2maybe, removed go1.2.

@josharian
Copy link
Contributor Author

Comment 5:

> I assume this is arm. What kind of hardware is this? Is it a multiprocessor?
$ cat /proc/cpuinfo
Processor       : ARM926EJ-S rev 5 (v5l)
BogoMIPS        : 226.09
Features        : swp half thumb fastmult edsp java 
CPU implementer : 0x41
CPU architecture: 5TEJ
CPU variant     : 0x0
CPU part        : 0x926
CPU revision    : 5
Hardware        : Freescale MX28EVK board
Revision        : 0000
Serial          : 0000000000000000
> What is the GOARM setting you are using? Any other relevant details?
GOARCH=arm GOARM=5 GOOS=linux
Cross compiled on OS X with go version devel +351b6fe0ae36 Tue Sep 24 15:54:48 2013
-0400 darwin/amd64
(Yes, that go toolchain includes the fix for
https://golang.org/issue/6440.)
$  uname -a
Linux ppbeacon-48:02:2A:39:B6:5E 2.6.35.3-670-g914558e #1 PREEMPT Mon Oct 7 23:19:04 PDT
2013 armv5tejl GNU/Linux
> if that were true, more would be broken.
And this combination of code base, go toolchain, os, and hardware has a fair number of
miles on it.
> I'd very much like to understand this before we cut Go 1.2.
Thanks for the write-up above. I'll spend some time digging at this here, and share
anything potentially interesting that I find.

@josharian
Copy link
Contributor Author

Comment 6:

Ok, assuming that I correctly understand how to detect malformed pprof heap profiles
(following your notes above), I have a smallish test case that fairly reliably
reproduces this: http://play.golang.org/p/kklslqpWKT. Forgive the ugly, haphazard code.
It's the result of a very frustrating, slow search through a poorly bounded space. Most
of this code seems to be necessary; attempts to simplify further and still reproduce
have failed to date.
This needs to run for about a minute to reproduce on my hardware. Happily, I've also
been able to reproduce this on my Raspberry Pi, using the same binary, although it took
a 10m run.
Attached to this commit are a gzipped binary, a malformed sample 1 minute profile run
with that binary, and that sample profile run symbolicated, for your convenience. (Note
that the symbolification might be dodgy; I whipped it up ad hoc just for this, and it is
not well tested.)
Now that I can reproduce reliably, the question is: What next?
In the absence of other guidance, I'll start digging into this myself, starting by
instrumenting the runtime to look at MProf_Free sizes. If you have any specific
experiments you'd like me to run, though, I'm happy to do so. Note that I'll be offline
all of tomorrow (Thu) and will wrap up today in an hour or so.

Attachments:

  1. small6592_p.gz (767518 bytes)
  2. sm_p_1m.prof (3824 bytes)
  3. sm_p_1m.prof.symbolicated (6127 bytes)

@rsc
Copy link
Contributor

rsc commented Oct 17, 2013

Comment 7:

Looks like a dup of 6566.

@josharian
Copy link
Contributor Author

Comment 8:

Agreed. And https://golang.org/cl/14695044/ does appear to fix it.

@rsc
Copy link
Contributor

rsc commented Oct 17, 2013

Comment 9:

It doesn't make sense to me that the numbers we saw would be caused by
6566. issue #6566 would cause entire profiling events to be lost but should
keep the recent_frees and recent_free_bytes in sync. Since buckets are
never reused, a bucket allocated for 2048-byte objects should always have a
count that is a multiple of 2048. Even if 64-bit operations were
interrupted, this property should still hold (and the recent_xxx numbers
are uint32s anyway; the int64 doesn't come into play until they are
retrieved as a heap profile).
Something is still wrong, either in the code or in our understanding of the
bug.

@rsc
Copy link
Contributor

rsc commented Oct 17, 2013

Comment 10:

@josharian, do you have GOMAXPROCS set when the program fails? Does it
still fail with GOMAXPROCS=1?

@rsc
Copy link
Contributor

rsc commented Oct 17, 2013

Comment 11:

Dmitriy, can you see how issue #6566 would have caused the profile seen above too?
See my notes in comment #4 and comment #9.
Thanks.

@josharian
Copy link
Contributor Author

Comment 12:

GOMAXPROCS not set. Stuck in meetings all day; will gather some GOMAXPROCS data when I
can.
In the meantime, a quick note before I run off. (Please bear with me, I'm new to this
code, and am reading it in meetings. :P)
First, it's not clear to me why runtime·new will allocate a fixed size over the
lifetime of a program, given only that the stack is the same; it could be called with
different Types, no?
That said, last night I instrumented runtime·MProf_Free and runtime·MProf_Malloc to
log all calls and all calls that got short-circuited (nomemprof caught my eye as well).
Organizing these log lines by pointer / size / event and then sort | uniq -c gives a
look at what is going on on a per-memory-address basis. Here's activity for one sample
pointer:
count | ptr | event | size
   2 0x10551000 1024 free
   2 0x10551000 1024 malloc
   1 0x10551000 1024 skipmalloc
   3 0x10551000 1152 free
   3 0x10551000 1152 malloc
   3 0x10551000 1152 skipfree
   4 0x10551000 1536 free
   4 0x10551000 1536 malloc
   3 0x10551000 1536 skipfree
   1 0x10551000 1536 skipmalloc
  28 0x10551000 2048 free
  28 0x10551000 2048 malloc
   3 0x10551000 2048 skipfree
   4 0x10551000 2048 skipmalloc
   7 0x10551000 2560 free
   7 0x10551000 2560 malloc
  14 0x10551000 4096 free
  14 0x10551000 4096 malloc
   1 0x10551000 4096 skipfree
   3 0x10551000 512 free
   3 0x10551000 512 malloc
Note the mismatched skipfree/skipmalloc counts. (Skip is a misnomer here; should be
"short-circuit".)
Note also that the pointer is used in setaddrbucket but not in stkbucket, so these will
all impact the same recent_alloc_bytes, IIUC. The mismatched short-circuits would then
explain both the incorrect totals and the non-multiple of 2048, I think.
Back in a few hours.

@rsc
Copy link
Contributor

rsc commented Oct 17, 2013

Comment 13:

MergedInto: 6566
runtime.new is only called by code generated by 5g as the compiled form of
the Go expression new(T). At a given call site T is fixed - there are no
type variables in Go.
I see how the bug can be caused by nomemprof confusion now. All the skips
you saw are buggy skips, since the allocation of the buckets themselves
(which is what we were trying to skip) is no longer done with malloc. If
you have a meant-to-be-profiled N-byte pointer p that misses its
MProf_Malloc, the MProf_Free won't find it, no harm done. But if you have a
profiled N-byte pointer p that hits its MProf_Malloc but misses its
MProf_Free, the hash table entry created by setaddrbucket in MProf_Malloc
will not be removed from the hash table. When the pointer is later
reallocated and also profiled (perhaps as a different size), the next
MProf_Malloc for that address will create a new bucket, earlier in the hash
chain, so a matched MProf_Free will find the new bucket, and the math will
still be okay. But if the p is later used for a profiled M-byte allocation
that misses its MProf_Malloc, the MProf_Free will find the stale bucket
entry left by the earlier missed MProf_Free. At that point (perhaps much
later), it will subtract the new size from the old bucket, causing the
strange behavior we saw. It will also subtract one from the bucket count,
but that's actually a belated but correct update: the old p really did get
freed.
That is:
(1) p allocated and profiled with size N1 and stack S1
(2) MProf_Malloc calls setaddrbucket(p, b1 for stack S1)
(3) MProf_Free spuriously missed
(4) repeat any number of times:
    either:
        p reallocated and not profiled
    or:
        p reallocated and profiled with size Nx and stack Sx
        MProf_Malloc calls setaddrbucket(p, bx for stack Sx)
        MProf_Free calls getaddrbucket, updates counts in bx correctly
(5) p allocated and profiled with size N2 and stack S2
(6) MProf_Malloc spuriously missed, should have set b2 for stack S2
(7) MProf_Free calls getaddrbucket, finds b1 for stack S1
    b1.recent_frees++
    b1.recent_free_bytes += N2;
The b1.recent_frees++ is late but correct. It should have happened in step
(3).
The b1.recent_free_bytes += N2 is late and incorrect. The one that should
have happened in step (3) should have used N1, not N2.
This is why the recent_frees numbers always look fine but the
recent_free_bytes numbers are incorrect.

Status changed to Duplicate.

@rsc
Copy link
Contributor

rsc commented Oct 17, 2013

Comment 14:

Merged into issue #6566.

@josharian
Copy link
Contributor Author

Comment 15:

Makes sense. Thank you for the detailed write-up; it was a pleasure to read.

@rsc rsc added this to the Go1.2 milestone Apr 14, 2015
@rsc rsc removed the go1.2maybe label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants