Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: Unable to calculate memory usage of an object accurately #34561

Closed
manishrjain opened this issue Sep 26, 2019 · 12 comments
Closed

runtime: Unable to calculate memory usage of an object accurately #34561

manishrjain opened this issue Sep 26, 2019 · 12 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@manishrjain
Copy link

manishrjain commented Sep 26, 2019

What version of Go are you using (go version)?

$ go version
go version go1.12.7 linux/amd64

Does this issue reproduce with the latest release?

Haven't checked with go1.13.1 yet.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GOARCH="amd64"
GOBIN=""
GOCACHE="/home/mrjn/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/mrjn/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build258461617=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Tried to calculate the memory usage on heap for a data structure.

https://github.com/dgraph-io/dgraph/blob/4203f983929671f45056beb92c282496ee7f79f6/posting/list.go#L71-L78

using various mechanisms before. But, the closest we have gotten is via this one:

https://github.com/creachadair/misctools/blob/4c938d6077e47ed42f6e7d6c8794b54e236dd23d/sizeof/size.go#L31

See PR: dgraph-io/dgraph#4072 for more details.

What did you expect to see?

An accurate number of how many inuse bytes of space is being consumed by each such posting.List object.

What did you see instead?

Haven't been able to accurately gauge the memory usage of the struct. It is not as bad as we thought initially (https://twitter.com/manishrjain/status/1174743112444346369), sorry about crying wolf. But, still not fully accurate. We're off by around 40% based on a test data set.

// From cache log.
I0926 21:16:40.110860       1 lists.go:177] hit: 180578 miss: 32476354 keys-added: 4680373 keys-updated: 8 keys-evicted: 2227554
cost-added: 1736671951 cost-evicted: 736672031
sets-dropped: 10185369 sets-rejected: 3644105 gets-dropped: 8286464 gets-kept: 24236160 gets-total: 32656932 hit-ratio: 0.01

// From heap profile (in use memory).
// See posting.ReadPostingList using 1.38G.
// Cache shows usage as 1G (cost-added minus cost-evicted).
// Off by almost 40%.

  570.55MB 14.77% 14.77%  1381.37MB 35.76%  github.com/dgraph-io/dgraph/posting.ReadPostingList

What more could we be doing to accurate capture the inuse memory cost of a struct on heap?

@rsc
Copy link
Contributor

rsc commented Sep 26, 2019

One possibility is that there are lingering pointers elsewhere in the system that are keeping the evictions from being GC'ed. Another possibility is that there hasn't been a full GC since the evictions and not all memory has been reclaimed.

Do you know if the numbers are more accurate when data is only added, before any evictions?

I checked out the HEAD of that PR but I was unable to figure out what to build and what to run to get that output. What is the sequence of commands to run to reproduce the output you showed?

@manishrjain
Copy link
Author

manishrjain commented Sep 27, 2019

So, we are force running GC every 5s or so, if a GC hasn't already run. If we force run it, we also spit out the memory stats.

In this case, I stopped the loading of data after cost-added reached 505MB (cost-added = 529481634, cost-evicted = 0), and lot before Ristretto would start evictions.

I0927 00:38:25.604304       1 lists.go:177] hit: 9424 miss: 1733121 keys-added: 1688314 keys-updated: 0 keys-evicted: 0 cost-added: 529481634 cost-evicted: 0 sets-dropped: 0 sets-rejected: 0 gets-dropped: 0 gets-kept: 1726976 gets-total: 1742545 hit-ratio: 0.01
I0927 00:38:25.918887       1 main.go:50] GC: 40. InUse: 3.6 GB. Idle: 1.8 GB

$ go tool pprof dgraph localhost:8180/debug/pprof/heap

(pprof) top
Showing nodes accounting for 2636.05MB, 97.24% of 2710.83MB total
Dropped 61 nodes (cum <= 13.55MB)
Showing top 10 nodes out of 63
      flat  flat%   sum%        cum   cum%
     512MB 18.89% 18.89%      512MB 18.89%  github.com/dgraph-io/ristretto.newCmRow
  457.55MB 16.88% 35.77%   457.55MB 16.88%  bytes.makeSlice
  ---> 401.03MB 14.79% 50.56%   428.54MB 15.81%  github.com/dgraph-io/dgraph/posting.ReadPostingList
  389.04MB 14.35% 64.91%   389.04MB 14.35%  github.com/dgraph-io/dgraph/posting.(*List).updateMutationLayer
  269.04MB  9.92% 74.84%   269.04MB  9.92%  github.com/dgraph-io/dgraph/posting.glob..func1
     256MB  9.44% 84.28%      256MB  9.44%  github.com/dgraph-io/ristretto/z.(*Bloom).Size
  166.41MB  6.14% 90.42%   166.41MB  6.14%  github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/skl.newArena
   89.95MB  3.32% 93.74%    89.95MB  3.32%  github.com/dgraph-io/ristretto.(*lockedMap).Set
      51MB  1.88% 95.62%       51MB  1.88%  github.com/dgraph-io/dgraph/x.generateKey
   44.02MB  1.62% 97.24%    44.02MB  1.62%  github.com/dgraph-io/ristretto.(*sampledLFU).add

The effect is reversed this time. ReadPostingList in this case is only showing 428MB, i.e. under what Ristretto is seeing (505MB), which is off by 15%. Assuming that go profiler has a 3% error rate, this is still off by ~10%.

Daniel is going to post the instructions on how to run Dgraph and do a data load, so the caching would trigger.

@danielmai
Copy link

danielmai commented Sep 27, 2019

(Instructions work on Linux only)

To build PR dgraph-io/dgraph#4072:

  1. go get -u -v -d github.com/dgraph-io/dgraph/dgraph

  2. cd $GOPATH/src/github.com/dgraph-io/dgraph; git checkout mrjn/pl-cache

  3. Use the compose package to build a Docker Compose config to run a Dgraph cluster. The following command installs a Dgraph binary to $GOPATH/bin/dgraph, creates a docker-compose.yml config for a Dgraph cluster, and runs the cluster.

    cd ./compose
    ./run.sh --num_alphas 1 --num_zeros 1 --port_offset=0
  4. Download these data files to load into the Dgraph cluster:

  5. Run live loader to load data in the cluster. This can be run from the host:

    dgraph live --files 21million.rdf.gz --schema 21million.schema

Relevant logs would be from Alpha: docker logs alpha1 -f. And pprof info is available at localhost:8080/debug/pprof.

@agnivade agnivade added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 27, 2019
@agnivade agnivade added this to the Unplanned milestone Sep 27, 2019
@rsc
Copy link
Contributor

rsc commented Sep 27, 2019

For simplicity, I did this without Docker. I ran in three different windows:

./dgraph zero
./dgraph alpha --lru_mb 2048 --zero localhost:5080
./dgraph live --files 21million.rdf.gz --schema 21million.schema

First I ^C'ed the third command once dgraph alpha was printing about 1 GB of data. It had just started evicting, but only about 1 MB by its accounting.

The profile looks like this. Note that ReadPostingList is charged with 1271 MB, compared to the 1e9/1024/1024 = 953 MB that dgraph had calculated.

Screen Shot 2019-09-27 at 12 25 32 PM

There are three functions doing most of the allocations: ReadPostingList, a closure inside ReadPostingList (ReadPostingList.func1), and a proto Unmarshal called from that closure. Here are the profiled source listings.

ReadPostingList:

Screen Shot 2019-09-27 at 12 26 26 PM

There are about 230 MB of List structs, 230 MB of pb.PostingList structs, and 120 MB of allocated empty maps.

Here's pb.PostingList:

type PostingList struct {
	Pack                 *UidPack   `protobuf:"bytes,1,opt,name=pack,proto3" json:"pack,omitempty"`
	Postings             []*Posting `protobuf:"bytes,2,rep,name=postings,proto3" json:"postings,omitempty"`
	CommitTs             uint64     `protobuf:"varint,3,opt,name=commit_ts,json=commitTs,proto3" json:"commit_ts,omitempty"`
	Splits               []uint64   `protobuf:"varint,4,rep,packed,name=splits,proto3" json:"splits,omitempty"`
	XXX_NoUnkeyedLiteral struct{}   `json:"-"`
	XXX_unrecognized     []byte     `json:"-"`
	XXX_sizecache        int32      `json:"-"`
}

I count 1+3+1+3+0+3+1 = 12 words = 96 bytes, which is exactly an allocation size in Go, so no wastage there.

Here's List:

type List struct {
	x.SafeMutex
	key         []byte
	plist       *pb.PostingList
	mutationMap map[uint64]*pb.PostingList
	minTs       uint64 // commit timestamp of immutable layer, reject reads before this ts.
	maxTs       uint64 // max commit timestamp seen for this list.
}

type SafeMutex struct {
	m sync.RWMutex
	writer  int32
	readers int32
}

type RWMutex struct {
	w           Mutex  // held if there are pending writers
	writerSem   uint32 // semaphore for writers to wait for completing readers
	readerSem   uint32 // semaphore for readers to wait for completing writers
	readerCount int32  // number of pending readers
	readerWait  int32  // number of departing readers
}

type Mutex struct {
	state int32
	sema  uint32
}

I count 11 words, which will round up to 12 in the allocator, so it makes sense that this code, which allocates the same number of Lists as pb.PostingLists, would see the same allocation counts for those two. 230*1024*1024/96 = 2.51e6 of each.

Given that the map is 123 MB, it sounds like the empty map takes up 6 words. Sure enough, it does:

type hmap struct {
	count     int // # live cells == size of map.  Must be first (used by len() builtin)
	flags     uint8
	B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
	noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
	hash0     uint32 // hash seed
	buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
	oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
	nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)
	extra *mapextra // optional fields
}

Now there's 665 MB in the call to the closure via item.Value, so let's look at that next:

ReadPostingList's closure:

Screen Shot 2019-09-27 at 12 27 17 PM

Another 126 MB of pb.PostingList structs, compared to the 232 MB allocated in the outer function, suggests that on average there are only about half as many item.Value closure calls as there are ReadPostingList calls. More precisely, 126.5*1024*1024/96 = 1.38e6, compared to 2.51e6 of the outer List allocations. That is, at least about half the Lists returned by ReadPostingList have an empty map. Maybe 1/2 have an empty map and 1/2 have one pb.PostingList in the map. Maybe 3/4 are empty and 1/4 have two pb.PostingLists in the map. Maybe something else. But at least half are entirely empty (item.Value is never called). So by object count (not necessarily object size), the cache is at least half full with empty posting lists.

The other allocation is a map bucket, which for a map[uintptr]*pb.PostingList is 18 words = 144 bytes, which is another size class, so no wasted allocated space. 165.73*1024*1024/144 = 1.21e6 buckets (out of 2.51e6 maps). So at least 1.30e6 (more than half) of the maps are empty. And the 1.21e6 buckets hold only 1.38e6 entries, so at least 1.04e6 of the buckets hold only one entry.

And then there is 373.56MB in 1.38e6 calls to pl.Unmarshal, or about 284 bytes per call. Most are in this line:

Screen Shot 2019-09-27 at 12 26 55 PM

Here all the allocations are on the line m.Postings = append(m.Postings, &Posting{}), so I've expanded the assembly to distinguish the allocation &Posting{} from the append. A Posting is 20 words, 160 bytes (another size class, luckily):

type Posting struct {
	Uid         uint64              `protobuf:"fixed64,1,opt,name=uid,proto3" json:"uid,omitempty"`
	Value       []byte              `protobuf:"bytes,2,opt,name=value,proto3" json:"value,omitempty"`
	ValType     Posting_ValType     `protobuf:"varint,3,opt,name=val_type,json=valType,proto3,enum=pb.Posting_ValType" json:"val_type,omitempty"`
	PostingType Posting_PostingType `protobuf:"varint,4,opt,name=posting_type,json=postingType,proto3,enum=pb.Posting_PostingType" json:"posting_type,omitempty"`
	LangTag     []byte              `protobuf:"bytes,5,opt,name=lang_tag,json=langTag,proto3" json:"lang_tag,omitempty"`
	Label       string              `protobuf:"bytes,6,opt,name=label,proto3" json:"label,omitempty"`
	Facets      []*api.Facet        `protobuf:"bytes,9,rep,name=facets,proto3" json:"facets,omitempty"`
	Op                   uint32   `protobuf:"varint,12,opt,name=op,proto3" json:"op,omitempty"`
	StartTs              uint64   `protobuf:"varint,13,opt,name=start_ts,json=startTs,proto3" json:"start_ts,omitempty"`
	CommitTs             uint64   `protobuf:"varint,14,opt,name=commit_ts,json=commitTs,proto3" json:"commit_ts,omitempty"`
	XXX_NoUnkeyedLiteral struct{} `json:"-"`
	XXX_unrecognized     []byte   `json:"-"`
	XXX_sizecache        int32    `json:"-"`
}

type Posting_ValType int32
type Posting_PostingType int32

The 346.05MB/1.38e6 calls/160 bytes = 1.64 Postings allocated per call, on average, or 2.27e6 total. The 20.51MB of append data for 2.27e6 pointers ends up being 9.24 bytes per pointer, which is quite good. Probably most of the lists have a single entry, and we do handle that case optimally in an append loop.

Overall, this profile shows that there are:

230 MB = 2.51e6 List
230 MB = 2.51e6 pb.PostingList
123 MB = 2.51e6 maps
126 MB = 1.38e6 pb.PostingList in map entries
346 MB = 2.27e6 pb.Posting in the pb.PostingList in the map entries
165 MB = 1.21e6 map buckets
20.5 MB = 2.69e6 slice array slots for pointers holding *pb.Postings

That adds up to 1240.5 MB, which is close enough to the 1270 MB pprof number that we can stop. (Another 7 MB is in string data in the pb.Postings, for what that's worth.)

The old sizing code did this:

func (l *List) cacheCost() int64 {	
	var cost int64	
	// Size of a pointer in bytes. This is using unsafe.Sizeof	
	ptrSize := int64(unsafe.Sizeof(int(0)))	

	//Cost of storing the key.	
	cost += int64(len(l.key))	

	// Cost of storing the plist (pointer + plist size)	
	cost += ptrSize	
	cost += int64(l.plist.Size())	

	// Cost of mutation map.	
	for _, pl := range l.mutationMap {	
		// Each entry costs 8 bytes for the key, ptrSize for the pointer, and	
		// pl.Size() for the cost of the posting list itself.	
		cost += 8 + ptrSize + int64(pl.Size())	
	}	
	return cost	
}	

This is not accounting for the size of the List struct itself, nor the size of the map header, nor the unused map bucket entries. And then it uses l.plist.Size() and pl.Size(), clearly intended as the total, recursive in-memory size of the protobuf structures involved. But those functions calculate the wire representation size, which is far more compact than the in-memory representation. That's why it was so far off.

The new package dgraph uses in the PR has its own problems. It does:

func DeepSize(v interface{}) int64 {
	return int64(valueSize(reflect.ValueOf(v), make(map[uintptr]bool)))
}

func valueSize(v reflect.Value, seen map[uintptr]bool) uintptr {
	base := v.Type().Size()
	switch v.Kind() {
	case reflect.Ptr:
		p := v.Pointer()
		if !seen[p] && !v.IsNil() {
			seen[p] = true
			return base + valueSize(v.Elem(), seen)
		}

	case reflect.Slice:
		n := v.Len()
		for i := 0; i < n; i++ {
			base += valueSize(v.Index(i), seen)
		}

		// Account for the parts of the array not covered by this slice.  Since
		// we can't get the values directly, assume they're zeroes. That may be
		// incorrect, in which case we may underestimate.
		if cap := v.Cap(); cap > n {
			base += v.Type().Size() * uintptr(cap-n)
		}

	case reflect.Map:
		// A map m has len(m) / 6.5 buckets, rounded up to a power of two, and
		// a minimum of one bucket. Each bucket is 16 bytes + 8*(keysize + valsize).
		//
		// We can't tell which keys are in which bucket by reflection, however,
		// so here we count the 16-byte header for each bucket, and then just add
		// in the computed key and value sizes.
		nb := uintptr(math.Pow(2, math.Ceil(math.Log(float64(v.Len())/6.5)/math.Log(2))))
		if nb == 0 {
			nb = 1
		}
		base = 16 * nb
		for _, key := range v.MapKeys() {
			base += valueSize(key, seen)
			base += valueSize(v.MapIndex(key), seen)
		}

		// We have nb buckets of 8 slots each, and v.Len() slots are filled.
		// The remaining slots we will assume contain zero key/value pairs.
		zk := v.Type().Key().Size()  // a zero key
		zv := v.Type().Elem().Size() // a zero value
		base += (8*nb - uintptr(v.Len())) * (zk + zv)

	case reflect.Struct:
		// Chase pointer and slice fields and add the size of their members.
		for i := 0; i < v.NumField(); i++ {
			f := v.Field(i)
			switch f.Kind() {
			case reflect.Ptr:
				p := f.Pointer()
				if !seen[p] && !f.IsNil() {
					seen[p] = true
					base += valueSize(f.Elem(), seen)
				}
			case reflect.Slice:
				base += valueSize(f, seen)
			}
		}

	case reflect.String:
		return base + uintptr(v.Len())

	}
	return base
}

There are a few problems with this code:

  • The map header is never counted. That's 48 bytes * 2.51e6 = 115 MB undercounted.
  • Zero-entry maps have no bucket but are charged for one. That's 144 * 1.30e6 = 179 MB overcounted.
  • A struct with a slice field counts the slice header in the overall v.Type().Size(), and then the recursive valueSize call also counts it. There is 1 slice per List, 3 per pb.PostingList, and 4 per pb.Posting, or 1*2.51e6 + 3*(2.51e6+1.38e6) + 4*2.27e6 = 23.3e6 slice headers = 532 MB overcounted.

Actually, scratch that. The struct field loop ignores maps entirely. So the entire 288 MB of map data never is accounted for, plus anything reachable through the map. Starting over, we have:

  • 288 MB of map data structures missing
  • 126 MB of pb.PostingList in map entries missing
  • 346 MB of pb.Posting in the pb.PostingList in map entries missing
  • 20.5 MB of slice-backing arrays missing
  • The slice header overcount affects the Lists and the top-level pb.PostingList, but not the ones in the map nor the pb.Postings. That's 1*2.51e6+3*2.51e6 = 10.04e6 slice headers = 230 MB overcounted.
  • Struct-valued struct fields and elements of arrays are ignored entirely; luckily there are none in this case.

That's a net 550.5 MB missing from the actual counts, which is a wider gap than observed (1270 - 953 = 317 MB). I'm sure the remaining 232 MB can be found somewhere, but I'll stop here.

Overall I would suggest that using this complex nest of protobufs and maps as the representation of values in a large-memory cache is incredibly wasteful. A representation storing only the values you care about, in a simple format, would let you store easily 2X, possibly even 10X the number of values in the same amount of memory. And the simpler format would be easier to calculate the size of. (And it could be done without reflection and the corresponding overheads.)

The simpler representation should probably also have a special case small way to represent "no list". Fully 300 MB out of the 1270 MB, or 23% of the cache space, is being taken up by Lists with empty maps.

Finally, I would suggest that if being able to calculate the size of cached values is important, then there should be a test of that calculation specifically. A simple test with a few manually constructed values would have turned up at least some of the bugs in both the sizing implementations quite quickly, instead of having to run an entire dgraph cluster and workload.

Overall, "unable to calculate memory usage of an object accurately" is because of major bugs in the two different calculation codes, not Go itself. The reflect-based approach could be made to work if the bugs were all fixed. It would just be slow. And pprof itself appears to be working fine.

@martinmr
Copy link

Thanks for the detailed response.

I agree with the conclusion that it would be way easier to convert the List into a different format. We were trying to avoid that because doing that for values with very large posting lists (i.e very large indices) might be more expensive than using the reflect package but we'll look into it.

One question I have after reading the post, how is the size of a map bucket calculated?

@randall77
Copy link
Contributor

randall77 commented Sep 27, 2019

A map bucket is 16 + 8*sizeof(key) + 8*sizeof(value) bytes.
(There are a few exceptions to that.)
See runtime/map.go, type bmap, as well as runtime/type.go, type maptype.

@manishrjain
Copy link
Author

Thanks a lot for the detailed analysis, Russ! This is extremely helpful. This is not actually how we are planning to use the cache. Instead this is a quick way for us to showcase how inaccurate our memory calculation algorithm is.

We would try to improve the calculation of the memory usage of the list structure given your guidelines. At the same time, we would also try to see if we can simplify our data structure itself, but that would come at the added complexity of the underlying algorithm.

More importantly though, the sort of calculations involved do lead me to think that it would be very useful if Go provided a function which would recursively calculate the memory usage of any given Go struct so the users do not have to have such a deep understanding of how Go allocates memory internally.

As you can see from how wrong both our own attempts and third party attempts at memory calculation were, it is hard to gain this level of internal understanding of Go memory allocation. And our engineers have tried.

@rsc
Copy link
Contributor

rsc commented Sep 30, 2019

it would be very useful if Go provided a function which would recursively calculate the memory usage of any given Go struct

This would be an arbitrarily expensive function - in the limit it is a full garbage collection - and it's not even a particularly well-formed concept, since there might be overlap with other possible starting points, so you can't in general add the results of multiple calls. Such a function might be interesting for debugging but would be a serious mistake for production use. Having it in the standard library would encourage the latter.

I'm sorry, but we can't put every function someone writes incorrectly into the standard library, especially functions that might look cheap but are actually quite expensive.

If this functionality is particularly important to you, I would suggest putting in the time to write a good implementation and then either contributing it back to the current third-party location or publishing your own open source version that others can use. This code does not need to be in the standard library.

so the users do not have to have such a deep understanding of how Go allocates memory internally.

The analysis above requires understanding how to use pprof but not really anything deep about Go memory layout. The bulk of the memory is in plain structs containing pointers to structs, which is not any different from C or Java or nearly any other language. The only really Go-specific thing is the layout of maps, which I don't know off the top of my head any more than other Go programmers - I looked it up while doing the analysis (see runtime/map.go).

As you can see from how wrong both our own attempts and third party attempts at memory calculation were, it is hard to gain this level of internal understanding of Go memory allocation. And our engineers have tried.

I'm very sorry, but I find it exceedingly hard to credit the engineering effort that went into either of the two attempts at memory calculation.

The first attempt evidently did not get as far as reading the documentation for the method being called, which clearly states, "Size returns the encoded size of a protocol buffer message."

The second attempt grabbed a third-party library off the internet, apparently without inspection, and hoped it would do a better job. Your engineers overlooked that the library - on Aug 5, the day it was adopted had no tests whatsoever - and even today has only one trivial test. See my blog post Our Software Dependency Problem for more about why and how to evaluate third-party dependencies before adopting them.

Worst of all, neither attempt included any kind of test for the functionality in question!

Writing a test is, hands down, the number one most effective way to arrive at code that works.

If your engineers did not think it worthwhile to write a simple, standalone test for this functionality, even after realizing that the code being used didn't work, that's a serious problem with your engineering practices; addressing it would pay significant dividends.

There's nothing left to do on the Go side, so I'm going to close this issue.

@rsc rsc closed this as completed Sep 30, 2019
@seebs
Copy link
Contributor

seebs commented Sep 30, 2019

I know this is a closed issue, but I want to point out a fundamental issue: Not only is this hard, it is impossible to do what we really want from a size computation.

  1. We want it to be complete -- it should be recursive and include the sizes of other objects.
  2. We want it to correctly account for cases where multiple objects have a pointer to a common sub-object.
  3. We want the sum of the sizes of two things to be the same as the total size of those two things.

Each of these criteria is obviously desireable.

Now consider a struct with a byte slice in it, and two such structs with the same byte slice. The size of each struct must include the storage for the slice. The size of a thing containing those two structs should only include the slice once.

You cannot win. The traits essential for a size calculation to do what we generally want it to do are mutually exclusive because pointers can point to the same thing. To get a usable size calculation, you have to decide what tradeoffs you're looking for, and that will be too application-specific to make sense to try to build into stdlib.

(Counterargument: Runtime can tell you how much space a slice really takes in the case where it's a reference into storage that extends past its bounds, such as bigarray[1000:], user code can't. But I don't think that's enough of the problem to justify the massive effort of resolving the definitional problems several different and mutually exclusive ways.)

@randall77
Copy link
Contributor

Calculating sizes like this is a use case for viewcore. You could trigger a core dump of your Go process and then inspect it using viewcore. All kinds of post-process analyses are possible, including total retained size from each object (using a dominator calculation).

@manishrjain
Copy link
Author

I'm sorry, but we can't put every function someone writes incorrectly into the standard library, especially functions that might look cheap but are actually quite expensive.

If your engineers did not think it worthwhile to write a simple, standalone test for this functionality, even after realizing that the code being used didn't work, that's a serious problem with your engineering practices; addressing it would pay significant dividends.

With all due respect, I wish you would present your arguments with a bit more empathy for the receiver. We already know we are off in our calculation of the memory usage. A test does not give us any more help there than what pprof did already.

We do not have the engineering resources of Google to look deeply into every problem that we face. Sometimes we look outside towards the experts for guidance. But, for all that Dgraph team has given the Go ecosystem (two databases and a cache, if you missed), I really think we could use a bit more respect from the Go team.

In fact, we just needed a few guiding pointers, which is why I raised this on Twitter. I was talked down into creating this issue, only then to turn around and be degraded. I'll remind myself not to raise issues here in the future, or tag the Go team on Twitter.

Talking about significant dividents, I run multiple open source communities myself. And we do our best to guide our users, even if their approach was wrong, or they didn't read docs properly, or their engineers are sub par (all these 3 things happen all the time).

So, let me present your arguments for you, in how I would treat a user of Dgraph, Badger or Ristretto:


This would be an arbitrarily expensive function - in the limit it is a full garbage collection - and it's not even a particularly well-formed concept, since there might be overlap with other possible starting points, so you can't in general add the results of multiple calls. Such a function might be useful for debugging but would be too slow for production use. Having it in the standard library would encourage the latter.

Calculating memory seems general enough that it might be useful to others as well. I would suggest putting in the time to write a good implementation and then either contributing it back to the current third-party location or publishing your own open source version that others can use. While this might not make it to standard library, I or the Go team can help review the code, so it would be correct as of Go 1.13s memory layout.

While the analysis above seems hard, it only requires understanding how to use pprof, not really anything deep about Go memory layout. The bulk of the memory is in plain structs containing pointers to structs, which is not any different from C or Java or nearly any other language. The only really Go-specific thing is the layout of maps, which I don't know off the top of my head any more than other Go programmers - I looked it up while doing the analysis (see runtime/map.go).

While we don't expect typical Go users to read the source code of runtime/map.go, given your use case, it would be recommended to look at map.go, slice.go and these other files <xyz.go>, while doing your implementation.

I understand you felt that your team spent a bunch of work trying to do memory calculations. The first attempt had some issues like using the encoded size, which wouldn't work because of inmemory representation of structs. And the second attempt was to use a third-party library, which lacked tests and was also wrong. I'd recommend writing unit tests, because they could tell you exactly how much memory is being used (quicker than pprof would), which would shorten your iteration time.

Clearly, we could do a better job at documenting how Go structs consume memory. It might even make a nice blog post, or a talk in the next Go conference. But meanwhile, look at the resources I mentioned above and if you open source your memory calculator, my team would be happy to help. Or, if they are not available, perhaps xy person (from the community) should also be able to help.

Closing this issue for now.

-- A Go team who cares

@rsc
Copy link
Contributor

rsc commented Oct 1, 2019

With all due respect, I wish you would present your arguments with a bit more empathy for the receiver.

I apologize for the way my reply came across. I absolutely want you and other Go users to succeed at using Go, and I want you to have the tools to do that on your own.

We already know we are off in our calculation of the memory usage. A test does not give us any more help there than what pprof did already.

I couldn't disagree more.

A simple, standalone test absolutely gives you more help. It lets you narrow down the problem to a simpler case that is easier to reproduce, easier to share with others, and—most critically of all—easier to debug on your own. If I had noticed a problem in a large production run, writing a simple, standalone test to check the behavior that seems to be broken is literally the first thing I would do.

This is the most important point I was trying to make in my last reply. The response to "this big program is not working right" should almost always be "write (or re-examine) a test of the smaller piece that seems to be wrong".

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

8 participants