spec: clarify section on initialization order #31292

griesemer · 2019-04-05T22:39:06Z

See #22326 for a discussion of a specific example and how the spec might be unclear. See specifically #22326 (comment) .

Issues with the section on initialization order:

The very first sentence may imply an algorithm that is different from the one explained by 2nd, more detailed paragraph.
The section talks about declaration order, but variables named _ are "never declared" (yet they are processed for initialization order like any other variable).
The section could use one or two more examples.

The text was updated successfully, but these errors were encountered:

mdempsky · 2019-04-15T20:28:07Z

While we wait for a spec CL to improve the wording / examples, I think it's worth at least ensuring we're in agreement about the intended understanding.

Are we agreed upon the total ordering* implied by the algorithm specified in the second paragraph? (* I think it's a total ordering at least.)
Are we agreed that var _ = ... and var uniqueUnusedName = ... should behave identically?

If agreed on both, based on fuzzing I've done, I think go/types and gccgo are both spec compliant, and I have a revision for CL 170062 to bring it into compliance as well.

mdempsky · 2019-04-15T20:28:23Z

/cc @ianlancetaylor

ianlancetaylor · 2019-05-08T20:05:33Z

I agree with those two points.

alandonovan · 2019-05-08T20:26:52Z

Me too.

mdempsky · 2019-05-08T20:32:17Z

Thanks.

One other thing I was wanted confirmation on: the spec talks about initializing one variable at a time. How is this supposed to interact with statements that initialize multiple variables with a single multi-value RHS expression?

For example, consider this program:

package main

import "fmt"

var w = getB()
var x = []int{a, getB()}

var a, b = func() (int, int) { return 271, 314 }()

// getB returns the value of b without depending on it according to
// package initialization dependency rules.
func getB() int { return I(T{}).M() }
type T struct{}
func (T) M() int { return b }
type I interface{ M() int }

func main() {
	fmt.Println(w)
	fmt.Println(x)
	fmt.Println(b)
}

I had planned to make this program print:

0
[271, 314]
314

That is, that a and b get initialized at the same time before going back to initialize x.

But the spec says that variables should be initialized one at a time, and x doesn't actually depend on b, so the spec behavior suggests that the initialization order should be w, a, x, b and that the second line of output should be [271, 0] instead of [271, 314].

I think go/types would print [271, 314]. However, gccgo does print [271, 0].

alandonovan · 2019-05-08T20:49:33Z

I agree the wording should be more precise in the case of assignments var x, y = f(), but the intent is pretty clear: you get the same outcome whether you treat it like a single assignment of a compound variable (var xy struct { x X; y Y } = f()) or a sequence of assignments with an explicit intermediary: var tmpX, tmpY = f(); var x = tmpX; var y = tmpY.

mdempsky · 2019-05-08T21:01:10Z

@alandonovan I'm not sure I follow, and I think compound variable vs explicit intermediaries actually require different output.

Do you mind specifying what the right output you think should be for the program I posted? That is [217, 0] or [217, 314]?

griesemer · 2019-05-08T21:27:25Z

go/types does the following analysis, matching what @mdempsky stated (specifically, getB has no dependencies!):

Computing initialization order for package main ("main")

Object dependency graph:
	main depends on
		w
		x
		b
	w depends on
		getB
	x depends on
		a
		getB
	a has no dependencies
	M depends on
		b
	b has no dependencies
	getB has no dependencies

Transposed object dependency graph (functions eliminated):
	w depends on 0 nodes
		main is dependent
	b depends on 0 nodes
		main is dependent
		M is dependent
	a depends on 0 nodes
		x is dependent
	x depends on 1 nodes
		main is dependent

Processing nodes:
	w (src pos 1) depends on 0 nodes now
	a (src pos 3) depends on 0 nodes now
	x (src pos 2) depends on 0 nodes now
	b (src pos 4) depends on 0 nodes now

Initialization order:
	w = getB()
	a, b = (func() (int, int) literal)()
	x = ([]int literal)

(running go test -run Check -files=x.go with x.go the respective package and with the debug flag enabled in go/types/initorder.go).

From the initialization order I conclude that the output would be:

0
[271 314]
314

which happens to match cmd/compile for this case.

But from the processing nodes list one might expect the output to print [271, 0] like gccgo.

For the case where we have an n:n initialization assignment we can reduce it to n 1:1 initialization assignments (and the usual rules apply).

For the n:1 case (as here), it seems sensible to say that initialization any of the variables on the LHS leads to the initialization of all variables on the LHS at the same time. I think this is what a user would expect to happen - we don't think of the RHS values being stored in temporaries and assigned to the LHS "as needed".

But it is definitively unclear, I believe, and I can't provide a spec-based rationale for one or the other at the moment. We probably need to state current behavior.

alandonovan · 2019-05-08T21:54:09Z

@mdempsky Sorry, you're quite right, desugaring into a sequence of assignments with an explicit intermediary does give the [217, 0] behavior observed with gccgo. (gri and I spent days scratching our heads over this problem before he came up with the breadth-first formulation and still I get confused.)

I agree with gri that it would be preferable that the output be [217, 314]: the assignments 'var x, y = f()' should not be observable as separate events.

gopherbot · 2019-05-08T22:06:33Z

Change https://golang.org/cl/175980 mentions this issue: spec: clarify language on package-level variable initialization

griesemer · 2019-05-08T22:16:49Z

FWIW, the go/types code contains the following comment:

// n:1 variable declarations such as: a, b = f()
// introduce a node for each lhs variable (here: a, b);
// but they all have the same initializer - emit only
// one, for the first variable seen

explicitly addressing the n:1 case.

mdempsky · 2019-05-08T22:22:02Z

@griesemer Thanks for go/types analysis.

Also, thanks for bringing up the n:n assignment case. I think this is uncontroversial (both understood and implemented by compilers), but I'll go ahead and state it for posterity: I believe var a, b = f(), g() should be identical to var a = f(); var b = g().

@alandonovan No problem. I've been sufficiently confused about initialization order myself too. :)

It seems like the three of us (@griesemer, @alandonovan, and myself) all tend to think that n:1 assignments should initialize all of the LHS at once, and not as observably separate events.

@ianlancetaylor Can you comment on whether gccgo intentionally prints [217, 0] for the above test case, or whether you think this is an unintended output? Would you agree with changing its output to [217, 314]?

ianlancetaylor · 2019-05-08T22:47:48Z

gccgo intentionally prints [217, 0], since that is my reading of the current spec ("repeatedly initializing the next package-level variable that is earliest in declaration order" doesn't say anything about "but if the variables are initialized from the same expression then do them all at once").

I'm OK with changing it.

mdempsky · 2019-05-09T18:39:13Z

Here's another fun example exploiting package dependency orders, this time demonstrating its interaction with static initialization:

package main

import "fmt"

var x = getAB()
var a = 42
var b = a

func main() {
	fmt.Println(x)
}

// getAB returns {a, b} without depending on them per
// package initialization dependency rules.
func getAB() interface{} { return I(T{}).M() }
type T struct{}
func (T) M() interface{} { return []int{a, b} }
type I interface{ M() interface{} }

The Go spec requires that x be initialized before a or b, so this should print [0 0]. This is also the initialization order that go/types produces.

However, gccgo statically initializes a, causing the program to print [42 0].

cmd/compile additionally statically initializes b, causing the program to print [42 42].

mdempsky · 2019-05-09T19:01:24Z

I think it would be really unfortunate if we have to abandon static initialization, and I think it would be pretty complex and error prone to try to keep track of when static initialization is safe or not.

One possible spec solution would be to tweak the dependency rules to allow compilers to detect dependencies on additional variables that evaluating the initialization expression accesses. E.g., evaluating getAB() accesses both a and b, so we could allow compilers to additionally recognize that x depends on a and/or b.

In practice compilers probably wouldn't actively search for these additional dependencies, but it would make it valid for gccgo and cmd/compile to continue statically initializing a and/or b like they do today. gccgo's output is explained "as if" gccgo recognized a dependency from x on a, and cmd/compile's output is explained "as if" cmd/compile recognized dependencies on both a and b.

griesemer · 2019-05-09T19:51:09Z

Interesting. @ianlancetaylor points out that the problem here is a dependency that the compiler cannot detect (per the stated rules). Of course, for this specific example, one could probably add complicating rules to detect the dependency, but in general we can always create an analogous example using functions of an imported package, at which point all bets are off. Instead, Ian suggests that we say something like this: "If a data dependency exists that is not detected according to the currently stated rules, initialization order is undefined".

Such a catch-all rule would allow different compilers to do whatever they are doing now while produce consistent results in "normal" situations.

mdempsky · 2019-05-09T20:01:17Z

Instead, Ian suggests that we say something like this: "If a data dependency exists that is not detected according to the currently stated rules, initialization order is undefined".

I think this is like what I suggested above, except I'm suggesting a more limited form of "undefined": just that the additional dependencies outside the stated rules may or may not be respected.

Making all of initialization order undefined seems unnecessary to me.

griesemer · 2019-05-09T20:32:18Z

Given your example above, the "additional" dependency outside the stated rules is that I.M depends on T.M (which depends on a and b, with this dependency being detected by the rules). So there are two cases: 1) The additional dependency is respected, or 2) not respected.

With this we get for 1) x = [42, 42], and for 2) x = [0, 0].

The choice gccgo is making is not included in this list. I assume we would consider that a bug?

mdempsky · 2019-05-09T20:37:04Z

@griesemer My suggestion is that "x depends on a" and "x depends on b" are two separate dependencies that compilers may or may not respect, independently of each other. The current gccgo behavior can be explained as gccgo "choosing" to recognize the dependency on a, but not recognizing the dependency on b.

ianlancetaylor · 2019-05-09T20:38:22Z

Personally I would rather keep compiler dependency detection fixed and as simple as we can manage, rather than having different compilers implement different algorithms.

griesemer · 2019-05-09T20:44:48Z

@mdempsky Understood. But we do have (stated by the existing rules) the following dependencies: x -> getAB -> I.M, T.M -> a, b, and b -> a. Your suggested rule permits a compiler to ignore the dynamic dependency I.M -> T.M. It doesn't allow the compiler to ignore the dependency T.M -> a, b.

griesemer · 2019-05-09T21:05:31Z

I should perhaps have said "references" rather than "dependencies" because dependency detection is based on references: a -> b means that there is chain of references leading from a to b.

mdempsky · 2019-05-09T21:13:07Z

@ianlancetaylor I want it to be simple too, but my concern is that the simple implementation choice is going to mean punting a lot of static initializations like var a = 42 into dynamic initialization, which will bloat executables and slow process startup. That's likely to be only a small negative impact, but it's likely to affect a lot of users and programs, so I worry at the large scale it's a measurable negative impact.

Conversely, making package initialization slightly more complex in these tricky cases is unlikely to be noticed by users, but gives compilers enough leeway to still perform aggressive static initialization.

@griesemer Yes, I think the references vs dependencies distinction is what I'm getting at. I'm suggesting that if evaluating x's static initializer dynamically accesses a and b, then the compiler is free to recognize a dependency from variable x to variables a and/or b. I'm not suggesting that these specially recognized dependencies have to be consistently explained by any underlying reference chain.

mdempsky · 2019-05-09T21:20:10Z

Actually, I withdraw my suggestion. I don't think gccgo and cmd/compile's static initialization can be saved that simply. In something like:

var x = /* side-effects F1, and hidden dependency on b */
var y = /* side-effects F2, and no dependencies */
var b = 42

if we allow compilers to recognize the dependency from x onto b to justify statically initializing b, then we'd also have to run the side effects F2 before F1. However, the current compilers schedule F1 first.

ianlancetaylor · 2019-05-09T21:28:24Z

My suggestion above is simply that if there is a hidden dependency of A upon B that is not detected by the algorithm in the spec, then the order of initialization of A and B is unspecified. I believe that will permit compilers to retain static initialization.

mdempsky · 2019-05-09T21:47:32Z

My suggestion above is simply that if there is a hidden dependency of A upon B that is not detected by the algorithm in the spec, then the order of initialization of A and B is unspecified.

Can you elaborate on what you mean by leaving it unspecified? In particular, how much do you argue should be unspecified?

E.g., suppose there's also X, Y, and Z, and the initialization order would be X, A, Y, B, Z under the current spec wording. To allow static initialization, we have to allow X, B, A, Y, Z. This is more than just A and B's initialization order being unspecified; it's also B and Y's being unspecified.

griesemer · 2019-05-09T22:08:07Z

I think we want to maintain the relative initialization order imposed by detected dependencies and simply leave the initialization order for other variables (between them) undefined. This may have consequences for other, unrelated variables and initialization expressions since the compiled initialization order orders these other variables because of source order.

I've sent out another attempt at the spec change to reflect that.

griesemer added the Documentation label Apr 5, 2019

griesemer added this to the Go1.13 milestone Apr 5, 2019

griesemer self-assigned this Apr 5, 2019

griesemer mentioned this issue Apr 5, 2019

cmd/compile: incorrect package initialization order for spec example #22326

Closed

gopherbot closed this as completed in 451cf3e May 13, 2019

golang locked and limited conversation to collaborators May 12, 2020

gopherbot added the FrozenDueToAge label May 12, 2020

rsc unassigned griesemer Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec: clarify section on initialization order #31292

spec: clarify section on initialization order #31292

griesemer commented Apr 5, 2019 •

edited

mdempsky commented Apr 15, 2019

mdempsky commented Apr 15, 2019

ianlancetaylor commented May 8, 2019

alandonovan commented May 8, 2019

mdempsky commented May 8, 2019 •

edited

alandonovan commented May 8, 2019

mdempsky commented May 8, 2019 •

edited

griesemer commented May 8, 2019

alandonovan commented May 8, 2019

gopherbot commented May 8, 2019

griesemer commented May 8, 2019

mdempsky commented May 8, 2019 •

edited

ianlancetaylor commented May 8, 2019

mdempsky commented May 9, 2019

mdempsky commented May 9, 2019 •

edited

griesemer commented May 9, 2019

mdempsky commented May 9, 2019

griesemer commented May 9, 2019

mdempsky commented May 9, 2019

ianlancetaylor commented May 9, 2019

griesemer commented May 9, 2019

griesemer commented May 9, 2019 •

edited

mdempsky commented May 9, 2019

mdempsky commented May 9, 2019 •

edited

ianlancetaylor commented May 9, 2019

mdempsky commented May 9, 2019

griesemer commented May 9, 2019

spec: clarify section on initialization order #31292

spec: clarify section on initialization order #31292

Comments

griesemer commented Apr 5, 2019 • edited

mdempsky commented Apr 15, 2019

mdempsky commented Apr 15, 2019

ianlancetaylor commented May 8, 2019

alandonovan commented May 8, 2019

mdempsky commented May 8, 2019 • edited

alandonovan commented May 8, 2019

mdempsky commented May 8, 2019 • edited

griesemer commented May 8, 2019

alandonovan commented May 8, 2019

gopherbot commented May 8, 2019

griesemer commented May 8, 2019

mdempsky commented May 8, 2019 • edited

ianlancetaylor commented May 8, 2019

mdempsky commented May 9, 2019

mdempsky commented May 9, 2019 • edited

griesemer commented May 9, 2019

mdempsky commented May 9, 2019

griesemer commented May 9, 2019

mdempsky commented May 9, 2019

ianlancetaylor commented May 9, 2019

griesemer commented May 9, 2019

griesemer commented May 9, 2019 • edited

mdempsky commented May 9, 2019

mdempsky commented May 9, 2019 • edited

ianlancetaylor commented May 9, 2019

mdempsky commented May 9, 2019

griesemer commented May 9, 2019

griesemer commented Apr 5, 2019 •

edited

mdempsky commented May 8, 2019 •

edited

mdempsky commented May 8, 2019 •

edited

mdempsky commented May 8, 2019 •

edited

mdempsky commented May 9, 2019 •

edited

griesemer commented May 9, 2019 •

edited

mdempsky commented May 9, 2019 •

edited