Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime/pprof: add new WithLabels* function that requires fewer allocations #33701

Open
martisch opened this issue Aug 17, 2019 · 20 comments

Comments

@martisch
Copy link
Contributor

martisch commented Aug 17, 2019

runtime/pprof.Labels is used in conjunction with runtime/pprof.WithLabels to set pprof labels in a context for performance profiling.

func Labels(args ...string) LabelSet {

Adding information for fine grained on demand profiling of already running binaries should idealy be very efficient so it can always stay enabled with minimal overhead. The current API could be made more efficient by requiring fewer heap allocations. Pprof information sourced from contexts added by census frameworks is used in large go deployments on every RPC request and thereby small performance gains add up to a larger resource saving across many servers.

The current runtime/pprof API requires census frameworks such as OpenCensus to first convert their internal representation of key and value tag pairs (in a slice or map) to a slice of strings for input to runtime/pprof.Labels.

https://github.com/census-instrumentation/opencensus-go/blob/df6e2001952312404b06f5f6f03fcb4aec1648e5/tag/profile_19.go#L24

This requires at least one heap allocation for a variable amount of labels. Then internaly the Labels functions constructs a LabelSet data structure which requires another allocation (the case where this uses more than one allocation will be improved with cl/181517 ). All in all this makes two heap allocations per context creation with pprof labels which can potentially be avoided.

I propose to extend runtime/pprof to have an API that takes e.g. a mapping/iteration interface such that census frameworks can implement that interface on their internal tag representations (e.g. maps and slices with custom types) and runtime/pprof can then source the labels to be set in a new runtime/pprof.WithLabels* function without first requiring conversion between multiple internal and external data structures.

cl/188499 is a quick prototype as an example how this could look like. Different other ways of making an interface that can be used are possible to reduce allocations. Note that the LabelSet struct cant be changed to an interface itself (which seems the cleaner approach) due being not API backwards compatible.

/cc @aclements @randall77 @matloob

@martisch martisch added this to the Go1.14 milestone Aug 17, 2019
@gopherbot
Copy link

Change https://golang.org/cl/188499 mentions this issue: runtime/pprof: add interface to reduce number of allocation when setting labels

@bcmills bcmills changed the title proposal: add new WithLabels* function to runtime/pprof that requires fewer allocations proposal: runtime/pprof: add new WithLabels* function that requires fewer allocations Aug 19, 2019
@andybons andybons modified the milestones: Go1.14, Proposal Sep 9, 2019
@rsc
Copy link
Contributor

rsc commented Sep 12, 2019

Just to summarize here to avoid needing to load external links, right now the API is:

type LabelSet struct {
	// Has unexported fields.
}

func Labels(args ...string) LabelSet

func WithLabels(ctx context.Context, labels LabelSet) context.Context

This requires making a slice to pass to Labels, and that slice escapes (and is variable sized) so it must be heap allocated.

The proposed interface in the CL is to add WithLabelsFromMapper(context.Context, LabelMapper), where LabelMapper is:

// LabelMapper is a set of Len() number of labels that can apply a function to all of the contained labels.
type LabelMapper interface {
	Len() int
	Map(f func(key, value string))
}

So you have to make a LabelMapper and then the context package calls its Len and Map methods to retrieve the labels. But if you already have the key-value pairs in your own data structure, you avoid allocating the converted slice.

They still get copied into the context in some form, though, so you've cut the allocations by at most 50%, not 100%.

Is there a simpler or cleaner API? Is Len really necessary?

/cc @hyangah @pjweinbgo @matloob

@martisch
Copy link
Contributor Author

martisch commented Sep 13, 2019

They still get copied into the context in some form, though, so you've cut the allocations by at most 50%, not 100%.

I do not think the proposal anywhere claims to lower down the usage to no allocations. I understand that saying there is opportunity to potentially save two allocations when this does only one of them in the worst case can lead to infer that this was misleading. That was not my intention. Note that there are usually at least 3 allocations involved:

  1. from internal census representation (e.g. a map or slice of structs) to a []string to pass to pprof.Labels
  2. another in pprof.Labels to construct a new slice for the LabelsSet
  3. in WithLabels to create a map to hold the labels

The usual case I have seen is therefore should saving at least 2 of 3 allocations since most uses do not use []string as their native representation of key value pairs that can be passed as is to pprof.Labels.

Is there a simpler or cleaner API? Is Len really necessary?

The Len allows to pre-size the internally created map to usually hold all items from parent and child context with the initial allocation as another optimisation. Profiling shows that some map growth is made in this function which seems to account for ~30% of the time in WithLabels.

for _, label := range labels.list {
	childLabels[label.key] = label.value
}

@martisch
Copy link
Contributor Author

ping to experts. Would be nice if this could be resolved before go1.14 enters the freeze period.
/cc @hyangah @pjweinbgo @matloob

@hyangah
Copy link
Contributor

hyangah commented Sep 19, 2019

I think @matloob knows the history of this API design and the plan to optimize LabelsSet and WithLabels better.

My impression was - it was originally designed to support census and it was uncommon to see a large number of new tags (labels) added at once. (usually one or two additional tags added per Do call except when a server is creating the new context based on the tags from wire). Maybe the trend is changed now?

The runtime/pprof.Do is what the package recommends for users to use, so this needs a variance of Do that works with LabelMapper.

@rsc
Copy link
Contributor

rsc commented Sep 25, 2019

Ping @pjweinbgo @matloob for any thoughts. Needing to make a Do does make this more complex.

@matloob
Copy link
Contributor

matloob commented Sep 27, 2019

I don't think that we need to have a variant of Do. WithLabels is meant to be a lower level interface than Do, and the WithLabelsFromMapper looks like just a more efficient replacement that an alternative to Do should use. I think it would be valid for OpenCensus to have its own variant of Do that calls WithLabelsFromMapper instead. As long as the OpenCensus variant of Do sets and unsets labels the same way Do does, code that uses pprof's Do and code that uses OpenCensus's Do should be compatible.

The way I thought about Do originally was that we'd make Do available for users who weren't using census or something similar, and other libraries would provide their own Dos that set the labels on the context and called WithLabels at the same time. It looks like this change provides another more efficient alternative to WtihLabels, so it seems like it fits in fine.

@rsc
Copy link
Contributor

rsc commented Oct 2, 2019

This sense of "map" - meaning apply a function to a data structure - is not one we've used much in Go to date (strings.Map is the exception). I'm also bothered by needing both Len and Map. Would it really be so bad if there was only Map? Then the argument could be a plain function instead of an interface. Right now the CL uses it as a hint to allocate a map, in which case it really doesn't matter, but if a different implementation used it to allocate a slice and then blindly filled in increasing indexes, that would be a problem.

I'm also a little confused about the avoidance of LabelSet. Should we instead be looking at an alternate LabelSet constructor, like LabelSetFromFunc (or a better name)? Then the result could be passed to both Do and WithLabels.

@rsc
Copy link
Contributor

rsc commented Oct 9, 2019

Any comments about trying to use an alternate LabelSet constructor instead of all new API?

@martisch
Copy link
Contributor Author

martisch commented Oct 9, 2019

I have not gotten around to profiling or testing that yet. Putting better naming aside a possible approach is:

type LabelSet struct {
  mapper LabelMapper
}

func LabelsFromMapper(lm LabelMapper) LabelSet {
  return LabelSet{mapper: lm}
}

func Labels(args ...string) LabelSet {
  return LabelSet{mapper: listMapper(args)}
}

type listMapper []string

func (l listMapper) Len() int { return len(l) }
func (l listMapper) Map(f func(key, value string)) {
  for i := 0; i+1 < len(l); i += 2 {
   f(l[i], l[i+1])
  }
}

to keep LabelSet struct backwards compatible. The inner workings of WithLabels can then be replaced with that of WithLabelsFromMapper from https://golang.org/cl/188499 using the mapper field of LabelSet.

@rsc
Copy link
Contributor

rsc commented Oct 30, 2019

ping @martisch - any profiling or testing of this alternate approach?

@martisch
Copy link
Contributor Author

martisch commented Nov 6, 2019

The approach of adding a new helper LabelsFromAdder (name to be decided), using WithLabels with a new internal implementation and changing LabelSet struct seems equally performant.

In addition I removed the mapper closure in the new approach as that caused an allocation which would regress the List case in the number of allocations but has the downside of exposing the internal map[string]string structure. As that is not ideal I would need to first investigate if we can stack allocate and keep that implementation detail hidden with a closure again if that is important.

code:

type LabelSet struct {
	adder LabelAdder
}

type LabelAdder interface {
	AddLabels(map[string]string) // should have a better name
}

type listLabels []string

func (l listLabels) AddLabels(m map[string]string) {
	for i := 0; i+1 < len(l); i += 2 {
		m[l[i]] = l[i+1]
	}
}

func WithLabels(ctx context.Context, labelset LabelSet) context.Context {
	parentLabels := labelValue(ctx)
	childLabels := make(labelMap)
	for k, v := range parentLabels {
		childLabels[k] = v
	}
	labelset.adder.AddLabels(childLabels)
	return context.WithValue(ctx, labelContextKey{}, &childLabels)
}

func LabelsFromAdder(labels LabelAdder) LabelSet {
	return LabelSet{adder: labels}
}

func Labels(args ...string) LabelSet {
	return LabelSet{adder: listLabels(args)}
}

name                                           time/op
ContextLabels/WithLabels/Labels/List           297ns ± 2%
ContextLabels/WithLabels/Labels/Map            366ns ± 3%
ContextLabels/WithLabels/LabelsFromAdder/List  208ns ± 2%
ContextLabels/WithLabels/LabelsFromAdder/Map   264ns ± 2%

name                                           alloc/op
ContextLabels/WithLabels/Labels/List            520B ± 0%
ContextLabels/WithLabels/Labels/Map             520B ± 0%
ContextLabels/WithLabels/LabelsFromAdder/List   392B ± 0%
ContextLabels/WithLabels/LabelsFromAdder/Map    392B ± 0%

name                                           allocs/op
ContextLabels/WithLabels/Labels/List            6.00 ± 0%
ContextLabels/WithLabels/Labels/Map             6.00 ± 0%
ContextLabels/WithLabels/LabelsFromAdder/List   4.00 ± 0%
ContextLabels/WithLabels/LabelsFromAdder/Map    4.00 ± 0%

Updated cl/188499 with new code.

@rsc
Copy link
Contributor

rsc commented Nov 6, 2019

@martisch, thanks for confirming that we can use an alternate LabelSet constructor instead of having to change other parts of the API.

It would still be good to find a way to avoid exposing the map[string]string. In the long run I expect that internal detail might change too. Unless it is OK for the LabelSet constructor to be passed a dummy map and copy those values out.

@martisch
Copy link
Contributor Author

martisch commented Nov 7, 2019

It would still be good to find a way to avoid exposing the map[string]string.
I agree.

Since we are in freeze for go1.14 I have 6months to spend some time figuring out an interface (and potential generic compiler optimization if needed) that does not cause an additional allocation and just assumes a string type for key and value but no other implementation details.

@rsc
Copy link
Contributor

rsc commented Nov 20, 2019

Marking this on hold until @martisch's update. @martisch, please remove the hold label once you have an update. No hurry. Thanks.

@rsc rsc added this to Active in Proposals (old) Nov 27, 2019
@rsc rsc moved this from Active to Hold in Proposals (old) Nov 27, 2019
@jmacd
Copy link

jmacd commented Apr 23, 2020

The OpenTelemetry Go SDK has taken a position on this topic:
open-telemetry/opentelemetry-go#651

The Metrics API needs a LabelSet type that is both inexpensive, as stated in this issue, but also supports an Equal method, to support combining metric events by distinct label set. We landed on an interface{} containing a variable-size array of key-value structs.

It would be great if the runtime/pprof labels were somehow able to re-use structures computed in this way for OpenTelemetry. OpenTelemetry's trace.WithSpan() function would then be able to automatically enter profiler labels from its span attributes, and cheaply, because it would simply enter a structure it has computed for other reasons.

@jmacd
Copy link

jmacd commented Apr 23, 2020

In the above library, @martisch, we take the position that a stable sort, followed by de-duplication, is cheaper than using a map in this situation.

@jmacd
Copy link

jmacd commented Apr 24, 2020

Referring to @rsc's API proposal above (#33701 (comment)), I believe that it's slightly better to support an interface like in the OTel label package above, which is:

type LabelSet interface {
    Len() int
    Get(int) (KeyValue, bool)
}

This my preference because often to pass a func() argument implies an allocation, whereas the simpler Get() API does not.

@martisch
Copy link
Contributor Author

martisch commented Apr 24, 2020

Thanks for the suggestions.

I initially chose the Map approach since some census frameworks use maps for census labels. Since maps have no native operations to get the i th element and no stable iteration order the Get approach for maps needs them to extract all keys+values and buffer them. Maybe that easy enough to do with a wrapper (but it will cause allocation(s)). If that is as good as the map approach this might be workable and as good in performance if we teach the compiler to potentially optimize extracting all keys and values in a loop.

Ideally (unless there are very good reasons) we need to keep LabelSet a struct to not break backwards compatibility. We can add a new field that is an interface.

P.S. I dont see a proposal of Len and Get in the comments (but tried it in an initial prototype).

@jmacd
Copy link

jmacd commented Apr 24, 2020

I was referring to Len() and Get() in the OTel API linked in the comment above (#33701 (comment)).

If the type has to be a struct, I'd make it a struct { LabelInterface } so that we can support more than one input type that implements LabelInterface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Hold
Development

No branches or pull requests

7 participants