Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: Go 2: allow manual control over imported package initialization #48174

Closed
Tatskaari opened this issue Sep 3, 2021 · 16 comments
Closed
Labels
Milestone

Comments

@Tatskaari
Copy link

Tatskaari commented Sep 3, 2021

The proposal

As a go program grows, it can hit a critical point where the number of packages causes package initialisation to become increasingly expensive. As most of these packages tend to be third party, authors have little control over this. At some point, package initialisation can become prohibitive for use cases that are sensitive to startup performance. In short, I propose that we introduce a mechanism to defer initialisation of imported packages until they are needed. Borrowing from the plugin design, perhaps something like this could work:

package main

import (
    _ "github.com/some/module/core" //go:deferred
    _ "github.com/some/module/util" //go:deferred
)

func main() {
    if os.Args[1] != "util" {
        // Now that we know we're using this package, initialise it
        core, err := runtime.InitDeferredImport("github.com/some/module/core")
        ...
    }

    // `InitDeferredImport()` returns something akin to a `*plugin.Plugin`, representing the available symbols in that package.
    util, err := runtime.InitDeferredImport("github.com/some/module/util")
    if err != nil {...}
    
    // `Lookup()` returns a `*plugin.Symbol`, or similar
    run, err := util.Lookup("Run")
    if err != nil {...}
    
    // Which we can then cast to a `fun()` and call
    run.(func())() 
    os.Exit(0)
}

In this example, the initialisation of the core and util packages is deferred until we actually intend to use those packages. The packages must be imported with _ to avoid using an uninitialised package.**

Libraries like gocloud.dev that import various SDKs might find this feature especially useful. For example, importing gocloud.dev/blob has some quite expensive initialisation:

...
init go.opencensus.io/trace/tracestate @4.0 ms, 0.31 ms clock, 462688 bytes, 2407 allocs
...

If we could defer this until we call some code that actually needs tracing, we can improve startup performance a lot.

** This might not be ideal. We might want to refer to symbols e.g. types in the package even if it's not initialised yet, otherwise we won't be able to cast the symbols returned from Lookup(). Client side interfaces could certainly help here though.

Feasibility

The //go:deferred comment here tells the runtime not to initialise that package on importing it. Because we still import the package, this shouldn't cause any problems for the build.Import() algorithm, or the compiler/linker.

The runtime.InitDeferredImport(string) function can then be used to initialise that package, and anything it imports. It seems we already have the ability to defer package initialisation for plugins. This function will likely follow the same code path as plugin.Open(), minus the dlopen() stuff, and return something akin to a *plugin.Plugin that can be used to lookup the package symbols.

It feels like there's a lot of prior art indicating this is possible, but perhaps somebody from the core team has a better idea?

Further motivation

At a certain point of scale, initialisation can seriously hinder startup performance. There are some efforts to incrementally improve this by making changes to the SDK and other libraries, however these only mitigate the problem. At some point, without a way to defer imports, library authors and golang developers have limited ability to manage the initialisation cost as the package graph balloons in size.

My particular use-case is motivated by setting up linux namespaces to improve isolation for sub-processes we spawn in our build system. We spawn processes rapidly, and in parallel so startup performance is very important. Ideally, we'd be able to just clone() to a new thread, set up our network and mounts, and then exec the sub-process. I understand uncontrolled forking like this isn't safe within the golang runtime, so we've had to think of a different approach.

We've taken a page out of docker's book, and are trying to re-exec the current binary to do some setup in the new namespace, before finally fork/exec'ing once more to create the sub-process. As it stands, package initialisation is prohibitively expensive for this approach to work. We allocate about 2.1mb of data on startup, and spends 10s of ms doing this. Because this is all happening in parallel, we get a lot of resource contention trying to do all these allocations, which has a big enough impact that this approach isn't feasible.

The actual code that sets up the namespace has no dependencies so needs to initialise only a very small set of packages. Because there's no mechanism to defer this initialisation, we actually end up initialising 100s of packages.

Additionally, the most expensive initialisation comes from the remote execution API , which is only used when building remotely. Ideally we'd only initialise this when configured to do so. Only around 100kb of this allocation comes from our code. The rest is from third party libraries to which we have little control.

@Tatskaari Tatskaari changed the title Defer init() for imports to avoid explosive package initialisation Proposal: Defer init() for imports to avoid explosive package initialisation Sep 3, 2021
@gopherbot gopherbot added this to the Proposal milestone Sep 3, 2021
@seankhliao seankhliao changed the title Proposal: Defer init() for imports to avoid explosive package initialisation proposal: spec: Go 2: allow manual control over imported package initialization Sep 3, 2021
@seankhliao seankhliao added v2 A language change or incompatible library change LanguageChange labels Sep 3, 2021
@jfesler
Copy link

jfesler commented Sep 3, 2021

Would this cascade to all dependencies modules imported as //go:deferred ?
Would this be limited to invocation via package main?

As a library author, I'm not sure I want to field support from people having bugs that turn out related to their own fault, not allowing my init to run.

As an application author, I appreciate this problem completely. To the point of finding the worst offenders (in my use cases) and getting those libraries to switch from init(), to instead having functions that depend on initialization call that initialization. But this is a game of whack-a-mole, and does depend on the participation of those dependencies to change, so I can appreciate the proposal. I just don't know whether to 👍 or 👎 .

@peterebden
Copy link

As a library author, I'm not sure I want to field support from people having bugs that turn out related to their own fault, not allowing my init to run.

Were this proposal to be accepted, a key feature would need to be that the guarantees around init() remain unchanged from a library's perspective, so this deferred import would have to arrange for it to run before any of the functions in the package can be called.

Would this cascade to all dependencies modules imported as //go:deferred ?

I think it would need to in order to be useful.

@Tatskaari
Copy link
Author

Tatskaari commented Sep 3, 2021

Would this cascade to all dependencies modules imported as //go:deferred ?

In short, yes. Currently, when initialising a package, we have some internal init function that does 3 things, in this order:

  1. initialise any imported package
  2. initialise the top level package declarations
  3. call the init() function if present

I'm just suggesting that we omit any //go:deferred packages from step 1. Of course they might still be initialised if they're imported transitively, without //go:deferred.

Would this be limited to invocation via package main?

I see no reason why it should be.

As a library author, I'm not sure I want to field support from people having bugs that turn out related to their own fault, not allowing my init to run.

As I've described it here, the packages must be imported using _ so you cannot access the package until you call runtime.InitDeferredImport(). This might be annoying if you want to cast the plugin.Symbol to it's real type as you can't access the type declaration so perhaps this is something to consider further.

I just don't know whether to 👍 or 👎.

Definitely 👍

@deanveloper
Copy link

deanveloper commented Sep 4, 2021

While I do like the idea, I noticed the spec in the title and LangaugeChange tag. However I don't actually see any language changes, we're just adding a comment with special properties.

Another possibility if we did want this as a language change (ie if this is going to be commonly used, which it doesn't seem like it will be) is that we could use the defer keyword. Although defer doesn't really mean the same thing in this context.

@seankhliao
Copy link
Member

I tagged it as a language change as package initialization order is part of the spec and I didn't see enough room for implementations to defer running init.

I don't think the guarantees are strong enough, init() can have side effects such as writing a file or registering with a common package and this will break those

@fzipp
Copy link
Contributor

fzipp commented Sep 4, 2021

This seems like treating the symptom rather than the cause, i.e. libraries abusing init(), and it will give authors of these low quality libraries an excuse to continue doing so.

@beoran
Copy link

beoran commented Sep 4, 2021

@fzipp Indeed. If a library has to do expensive initialization, then in most cases it is perfectly possible to provide an Init for the package which the user then has to call themselves before using said library.

The use of init is only strictly necessary for the automatically generated init() for initialization of variables, and for ensuring that the initialization runs on the primary thread of the program for GUI or game libraries. All other libs should provide an Init or better, NewXyzzy for the Xyzzy they provide.

@peterebden
Copy link

This seems like treating the symptom rather than the cause, i.e. libraries abusing init(), and it will give authors of these low quality libraries an excuse to continue doing so.

I don't think this is as easy as tagging them as "low quality" libraries. Some of the worse offenders are high-profile things like prometheus, gRPC and generated protobuf code which I generally think are good libraries, they obviously just haven't optimised for this (to be fair, I imagine the majority of their use cases don't treat them as optional and aren't so worried about time-to-main).

I'm also a bit dubious about the ability to get them all to change on this basis (and not regress again later). A nice property of this proposal is that it's up to the consumer of the library whether they want to pay the price upfront or later.

All other libs should provide an Init or better, NewXyzzy for the Xyzzy they provide.

Sure, that's a breaking change for any of these cases that don't have or require those functions at present though.
(Obviously there are also non-breaking ways of deferring the init cost but those might not always be feasible).

@Tatskaari
Copy link
Author

Tatskaari commented Sep 4, 2021

Indeed, the biggest contributor in my case is initialisation of maps in generated protobuf packages, not low quality libraries with expensive init() functions. I don't think it's sensible to suggest that library authors should defer this sort of thing, but cumulatively it can add up.

@ianlancetaylor
Copy link
Contributor

This seems to me like fixing the problem in the wrong place. If the problem is slow initialization of maps in generated protobuf packages, then let's fix that. If the problem is packages that have slow init functions, then let's fix those packages.

Providing a mechanism for deferred initialization of some packages, and requiring the program to explicitly initialize them at runtime, seems to me like a recipe for subtle bugs. If the code logic is slightly wrong, then the package will be invoked with no initialization. This could be bad in any number of ways, but there is no simple way to avoid it.

@Tatskaari
Copy link
Author

Tatskaari commented Sep 8, 2021

This seems to me like fixing the problem in the wrong place. If the problem is slow initialization of maps in generated protobuf packages, then let's fix that. If the problem is packages that have slow init functions, then let's fix those packages.

This is clearly the better option but seems infeasible to me. Perhaps I'm aiming for unrealistic startup performance, but I don't think we can optimise package initialisation to the point where it's no longer a blocker for use cases like the one I've described. At some point of scale, there are simply too many packages.

As it stands, my import graph takes around 15ms +-5ms to initialise and allocates 2.2mb of data. When I spin these up to try and sandbox my build tasks, the terminal gets very choppy presumably as each process contends for resources.

If the code logic is slightly wrong, then the package will be invoked with no initialization. This could be bad in any number of ways, but there is no simple way to avoid it.

I was hoping that problem had already been solved for the plugin API. That seems to load in packages from .so files, and initialise them as needed. We could skip all the dlopen() call that loads packages from disk, as they should already be in program memory. We "just" need to follow the same code path that the plugins code follows to make sure the packages are initialised when we call runtime.InitialiseDeferredImport(). If we agree this feature would be useful if it can be done right, I'm happy to try and get a proof of concept out.

@ianlancetaylor
Copy link
Contributor

See also #38450.

@ianlancetaylor
Copy link
Contributor

I don't quite see the analogy to plugins. When you open a plugin, all the packages in the plugin are initialized at that point. There is no deferred initialization.

If you want to carry that idea over to this one, then I think we would have to some sort of static plugin, in which the plugin opens an import that is already in the program. That would take the place of the normal explicit import. That is, you wouldn't write import "github.com/some/module/core". You would write localplugin.Open("github.com/some/module/core") and then call the functions via that mechanism. There would be some way to tell the go tool that you want that package treated as a local plugin.

I'm not sure that is a great idea but it would at least be safe, albeit hard to use. This proposal is not safe.

You should take a look at #38450 and see whether that will solve your problem.

@Tatskaari
Copy link
Author

Tatskaari commented Sep 9, 2021

Thanks Ian. Appreciate you hearing me out.

I don't quite see the analogy to plugins.

Thanks probably my ignorance. The plugin API would allow me to "import" a package based on some condition. This is what i meant by deferred: packages are initialised after package main has been initialised. I thought my idea for differed imports would be similar in implementation to loading a local plugin, as you've described.

If you want to carry that idea over to this one, then I think we would have to some sort of static plugin, in which the plugin opens an import that is already in the program. That would take the place of the normal explicit import. That is, you wouldn't write import "github.com/some/module/core". You would write localplugin.Open("github.com/some/module/core") and then call the functions via that mechanism. There would be some way to tell the go tool that you want that package treated as a local plugin.

Yeah, okay. I think what you've described there is similar what I had in mind with this proposal. I assume there's some reason we can't just treat a normal package as a "local plugin", and avoid additional flags or build modes to go tool link/compile? I'll have a dig through this code to better understand how packages are loaded, and hopefully understand the implications of this proposal a bit better.

There would be some way to tell the go tool that you want that package treated as a local plugin.

Perhaps we could look at the imports in the program, and see if a package is imported with //go:deferred (or //go:localplugin)? I guess that might be considered an abuse of the import directive but would provide essentially the ergonomics I've described here.

You should take a look at #38450 and see whether that will solve your problem.

If I understand correctly, no. This only works statically. My condition is based on args passed to the program so can't be optimised out by the compiler.

@ianlancetaylor
Copy link
Contributor

Based on the discussion above, this is a likely decline. Leaving open for four weeks for final comments.

@ianlancetaylor
Copy link
Contributor

No further comments.

@golang golang locked and limited conversation to collaborators Nov 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

9 participants