Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: lazy init imports to possibly import without side effects #38450

Closed
bradfitz opened this issue Apr 14, 2020 · 17 comments
Closed

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Apr 14, 2020

I've been doing a bunch of work on binary size reduction and a common problem that comes up is that the linker's dead code elimination can remove all the references to a package, but the one remaining reference is to the package's inittask symbol, which is then heavy & brings it a bunch more.

That is:

package main
import "crypto/x509"
func main() {
    if false {
        x509.NewCertPool()
    }
}

Is equivalent today to:

package main
import _ "crypto/x509"
func main() {}

... which slurps in a ton of packages and their init load and extra 1 MiB binary size increase, from 1.3 MiB to 2.3 MiB:

dev:big $ go build --ldflags=--dumpdep x.go 2>&1 | grep inittask
runtime.main -> runtime..inittask
runtime.main -> main..inittask
main..inittask -> crypto/x509..inittask
crypto/x509..inittask -> encoding/pem..inittask
crypto/x509..inittask -> errors..inittask
crypto/x509..inittask -> crypto/aes..inittask
crypto/x509..inittask -> crypto/cipher..inittask
crypto/x509..inittask -> crypto/des..inittask
crypto/x509..inittask -> crypto/md5..inittask
crypto/x509..inittask -> encoding/hex..inittask
crypto/x509..inittask -> io..inittask
crypto/x509..inittask -> strings..inittask
crypto/x509..inittask -> sync..inittask
crypto/x509..inittask -> crypto/rsa..inittask
crypto/x509..inittask -> encoding/asn1..inittask
crypto/x509..inittask -> math/big..inittask
crypto/x509..inittask -> crypto/ecdsa..inittask
crypto/x509..inittask -> crypto/ed25519..inittask
crypto/x509..inittask -> crypto/x509/pkix..inittask
crypto/x509..inittask -> fmt..inittask
crypto/x509..inittask -> io/ioutil..inittask
crypto/x509..inittask -> os..inittask
crypto/x509..inittask -> crypto/elliptic..inittask
crypto/x509..inittask -> bytes..inittask
crypto/x509..inittask -> net..inittask
crypto/x509..inittask -> net/url..inittask
crypto/x509..inittask -> reflect..inittask
crypto/x509..inittask -> time..inittask
crypto/x509..inittask -> crypto..inittask
crypto/x509..inittask -> crypto/dsa..inittask
crypto/x509..inittask -> crypto/sha1..inittask
crypto/x509..inittask -> crypto/sha256..inittask
crypto/x509..inittask -> crypto/sha512..inittask
crypto/x509..inittask -> strconv..inittask
crypto/x509..inittask -> vendor/golang.org/x/crypto/cryptobyte..inittask
crypto/x509..inittask -> crypto/x509.init
encoding/pem..inittask -> encoding/base64..inittask
encoding/pem..inittask -> sort..inittask
encoding/pem..inittask -> encoding/base64..inittask
encoding/pem..inittask -> sort..inittask
encoding/pem..inittask -> encoding/base64..inittask
encoding/pem..inittask -> sort..inittask
errors..inittask -> internal/reflectlite..inittask
errors..inittask -> errors.init
runtime..inittask -> internal/bytealg..inittask
runtime..inittask -> runtime.init
runtime..inittask -> runtime.init.0
runtime..inittask -> runtime.init.3
runtime..inittask -> runtime.init.4
runtime..inittask -> runtime.init.5
runtime..inittask -> runtime.init.6
crypto/aes..inittask -> encoding/binary..inittask
crypto/aes..inittask -> crypto/aes.init
crypto/cipher..inittask -> crypto/cipher.init
crypto/md5..inittask -> hash..inittask
crypto/md5..inittask -> crypto/md5.init.0
encoding/hex..inittask -> encoding/hex.init
io..inittask -> io.init
strings..inittask -> unicode..inittask
sync..inittask -> sync.init
sync..inittask -> sync.init.0
sync..inittask -> sync.init.1
crypto/rsa..inittask -> crypto/internal/randutil..inittask
crypto/rsa..inittask -> crypto/rand..inittask
crypto/rsa..inittask -> math..inittask
crypto/rsa..inittask -> crypto/rsa.init
encoding/asn1..inittask -> encoding/asn1.init
math/big..inittask -> math/rand..inittask
math/big..inittask -> math/big.init
crypto/ecdsa..inittask -> crypto/ecdsa.init
crypto/ed25519..inittask -> crypto/ed25519/internal/edwards25519..inittask
crypto/x509/pkix..inittask -> crypto/x509/pkix.init
fmt..inittask -> internal/fmtsort..inittask
fmt..inittask -> fmt.init
io/ioutil..inittask -> path/filepath..inittask
io/ioutil..inittask -> io/ioutil.init
os..inittask -> syscall..inittask
os..inittask -> internal/oserror..inittask
os..inittask -> internal/poll..inittask
os..inittask -> internal/syscall/execenv..inittask
os..inittask -> internal/syscall/unix..inittask
os..inittask -> os.init
os..inittask -> os.init.0
bytes..inittask -> bytes.init
net..inittask -> context..inittask
net..inittask -> vendor/golang.org/x/net/dns/dnsmessage..inittask
net..inittask -> internal/singleflight..inittask
net..inittask -> net.init
net..inittask -> net.init.0
reflect..inittask -> reflect.init
time..inittask -> time.init
crypto..inittask -> crypto.init
crypto/dsa..inittask -> crypto/dsa.init
crypto/sha1..inittask -> crypto/sha1.init
crypto/sha1..inittask -> crypto/sha1.init.0
crypto/sha256..inittask -> crypto/sha256.init
crypto/sha256..inittask -> crypto/sha256.init.0
crypto/sha512..inittask -> crypto/sha512.init
crypto/sha512..inittask -> crypto/sha512.init.0
strconv..inittask -> strconv.init
vendor/golang.org/x/crypto/cryptobyte..inittask -> vendor/golang.org/x/crypto/cryptobyte.init
encoding/base64..inittask -> encoding/base64.init
internal/bytealg..inittask -> internal/bytealg.init.0
encoding/binary..inittask -> encoding/binary.init
unicode..inittask -> unicode.init
crypto/rand..inittask -> bufio..inittask
crypto/rand..inittask -> crypto/rand.init
crypto/rand..inittask -> crypto/rand.init.0
crypto/rand..inittask -> crypto/rand.init.1
crypto/rand..inittask -> crypto/rand.init.2
math..inittask -> math.init
math/rand..inittask -> math/rand.init
path/filepath..inittask -> path/filepath.init
syscall..inittask -> syscall.init
internal/oserror..inittask -> internal/oserror.init
internal/poll..inittask -> internal/poll.init
context..inittask -> context.init
context..inittask -> context.init.0
vendor/golang.org/x/net/dns/dnsmessage..inittask -> vendor/golang.org/x/net/dns/dnsmessage.init
bufio..inittask -> bufio.init

The only way to get around that is with build tags and more conditional compilation, which is gross.

What I'd like instead is a way to declare to the toolchain that for a given imported package that I'm fine with that package's init-time side effects (like normal) if I need them, but I'm also cool with omitting them if the linker decides that's fine.

That is, I want something like this this strawman syntax:

package main

import (
      "crypto/x509" // go:lazyinit
      "fmt"
)

func main() {
    if false {
        NewCertPool()
    }
    fmt.Println("hi")
}

... so the x509 init (nor its deps) is never run if the toolchain's DCE doesn't want to.

The go:lazyinit is saying that I'm not depending on any init-time work happening there for the import on that line. (Or perhaps it should be on the line before to be consistent with other //go: comments, or it shouldn't use comments)

(Arguably this should be the default behavior and imports with side effects would need the declaration, but we can't for backwards compatibility anyway, so not worth considering.)

/cc @ianlancetaylor @griesemer

@gopherbot gopherbot added this to the Proposal milestone Apr 14, 2020
@bradfitz bradfitz changed the title proposal: language: lazy init imports to possibly import with side effects proposal: language: lazy init imports to possibly import without side effects Apr 14, 2020
@ianlancetaylor
Copy link
Contributor

Would it be correct to say that you mean something like: if there are no references to any exported names of this package, then there is no need to run any initializers?

I certainly understand that the current situation is frustrating, but it seems possible to implement this entirely in the compiler without changing the language. Basically we need to separate the initialization of each global variable into a separate init function. We put those init functions in the deadcode graph along with everything else, with a reference from the variable to the init function. The package init function then calls only the undiscarded init functions.

@bradfitz
Copy link
Contributor Author

Would it be correct to say that you mean something like: if there are no references to any exported names of this package, then there is no need to run any initializers?

Yes.

but it seems possible to implement this entirely in the compiler without changing the language

In many cases, yes, but not all. For one example that's only partially contrived:

https://play.golang.org/p/2GLltT92h1t

You can't do that automatically without changing language semantics.

Or you can't get rid of imports to net/http/pprof (even if the caller is only using its exported symbols) because of its side effects of registering itself with http.DefaultServeMux, even if you don't use that.

But, yes, if the various bugs like #19533 #14840 were fixed, it's unlikely I'd be filing this.

I just think such toolchain magic continues to be years away and I wonder if a more explicit mechanism has a better chance of being implemented.

@bcmills
Copy link
Contributor

bcmills commented Apr 15, 2020

How do you end up with those kinds of if false statements in production code in general?

(Are these due to debugging constants, or paths rendered unreachable due to build tags?)

@rsc
Copy link
Contributor

rsc commented Apr 15, 2020

It seems like it would make sense to start with first handling the cases that don't require changes to language semantics, and then evaluate what's left.

@bradfitz
Copy link
Contributor Author

How do you end up with those kinds of if false statements in production code in general?

They arise in a dozen different ways. That was just an example for brevity. Sometimes build tags, sometimes GOOS comparisons, sometimes just statically unreachable code.

@rasky
Copy link
Member

rasky commented Apr 15, 2020

I have a very initial implementation for #19533 here:
rasky@50db447

This splits generated init functions into separate symbols (using a relocation to bind them to the symbols they initialize), and the linker deadcode pass is then able to remove them in "some cases". In fact, it's not sufficient for the linker to see that the symbol is not used: it also needs to know that the init function itself is "side-effect free". For instance:

var x int = FunctionCall()

might not be side-effect free, depending on the body of FunctionCall. So, in addition to what it's done above, we also need a pass that does "side effect" detection in the compiler: we need to analyze all functions in call graph order (similar to escape detection) and check whether it's function is side-effect free. This flag must also be stored in export data, so that we're able to cross package boundaries.

By the way, this also require to agree on the exact definition of "side-effect free"; for instance, many things are visible through a debugger. os.Getenv might possibly be considered side-effect free for the purpose of removing a init function, but it's visible with strace/ptrace.

It would also be possible to have a compiler annotation for std functions which are commonly used in init functions, and we can't otherwise prove that are side effects free (even in the best possible world where we have a fully working side effect detection pass, they might still end up calling a syscall which needs to be somehow annotated anyway).

So yes, this is actually a complicated issue that requires many development hours. It's possibly a task bigger than I can afford in my spare contribution time, so I'm not sure I'll get around completing it.

@bradfitz
Copy link
Contributor Author

@rsc, here's a more concrete example distilled from above.

crypto/x509 imports crypto/md5, but only because the never-used-in-std x509.EncryptPEMBlock might use it.

If you don't use EncryptPEMBlock, the linker can GC everything about md5, except the crypto/x509..inittask -> crypto/md5..inittask edge, which then brings in all of md5 forever because:

https://golang.org/src/crypto/md5/md5.go#L21

func init() {
	crypto.RegisterHash(crypto.MD5, New)
}

What I'd like to do from crypto/x509 is do a lazy/weak/conditional import of crypto/md5, as we don't really care about that that crypto.RegisterHash side effect.

In fact, the only use of https://golang.org/pkg/crypto/#Hash.Available in x509 is:

switch hashType {
case crypto.Hash(0):
        if pubKeyAlgo != Ed25519 {
                return ErrUnsupportedAlgorithm
        }
case crypto.MD5:
        return InsecureAlgorithmError(algo)
default:
        if !hashType.Available() {
                return ErrUnsupportedAlgorithm
        }
        h := hashType.New()
        h.Write(signed)
        signed = h.Sum(nil)
}

... where it only uses the crypto.Hash registration mechanism after it's determined that it's not MD5.

@magical
Copy link
Contributor

magical commented Apr 22, 2020

What I'd like to do from crypto/x509 is do a lazy/weak/conditional import of crypto/md5, as we don't really care about that that crypto.RegisterHash side effect.

Are side effects from indirectly-imported packages covered by the Go 1 compatibility promise? The following code technically works today...

https://play.golang.org/p/lw5P2Vb5NXH

package main

import (
	"crypto"
	"fmt"

	_ "crypto/x509"
)

func main() {
	h := crypto.MD5.New()
	fmt.Println(h.Sum(nil))
}

@bradfitz
Copy link
Contributor Author

Are side effects from indirectly-imported packages covered by the Go 1 compatibility promise?

No clear rule either way. I'm sure we've even already broken it in the past without knowing since the fix is so easy for people. It's not something we automatically test for at least. We'd definitely avoid trying to break people if possible.

@bcmills
Copy link
Contributor

bcmills commented Jan 8, 2021

The other way to handle cases like this is to move all of the functionality to a new package that exports explicit, idempotent registration functions instead of implicitly executing them at init-time.

Then, replace the declarations in the original package with forwarding shims to the new package, with the init function in the old package calling the explicit hook in the new package.

Users who want the “lazy init” behavior can import the new (light) package instead of the old one to eliminate the init side-effect, and calls to the registration functions are then subject to DCE.

That approach does not require a language change.

@leighmcculloch
Copy link
Contributor

Instead of using a //go: comment, could we explore a new import keyboard that distinguishes lazy vs non-lazy behavior? This would allow grouping new imports. If this is the new default, new tools could default to using the new keyword for import groups without us having a bunch of comments littered through code.

@bradfitz's original example would become:

package main

importnoinit (
      "crypto/x509"
)

import (
      "fmt"
)

func main() {
    if false {
        NewCertPool()
    }
    fmt.Println("hi")
}

importnoinit seems like not a great name, a better name is probably possible, I just couldn't think of it right now.

@dosgo
Copy link

dosgo commented Mar 20, 2021

The crypto package is bigger than the runtime, but ssl has to use it. I run it on the openwrt mips device, and its ROM is only 16M...
image

@rsc rsc changed the title proposal: language: lazy init imports to possibly import without side effects proposal: spec: lazy init imports to possibly import without side effects Aug 10, 2022
@rsc
Copy link
Contributor

rsc commented Apr 19, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@flimzy
Copy link
Contributor

flimzy commented Apr 20, 2023

if there are no references to any exported names of this package, then there is no need to run any initializers?

Doesn't this description omit the (common?) pattern of, say, the database/sql driver registration?

Let's imagine:

package main

import (
    "database/sql"
    "some/random/db/driver"
}

func main() {
    if false {
        fmt.Println("using ", driver.Version)
    }
    db, err := sql.Open("driver", ....)
}

In this case there may be no references to exported variables in the driver package, other than the one omitted by the if false, but the import side effects (which may legitimately not expose any exported symobls) could be important.

Unless exported methods on unexported types are counted as exported symbols, of course.

@rsc
Copy link
Contributor

rsc commented May 3, 2023

Go 1.21 will be able to remove dead code from map initialization and the like.

From a semantic level, it seems clear we don't want to take on the complexity of a new kind of import right now. Better to fix the offending packages.

@rsc
Copy link
Contributor

rsc commented May 10, 2023

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented May 17, 2023

No change in consensus, so declined.
— rsc for the proposal review group

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Declined
Development

No branches or pull requests

10 participants