Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: clarify section on initialization order #31292

Closed
griesemer opened this issue Apr 5, 2019 · 27 comments
Closed

spec: clarify section on initialization order #31292

griesemer opened this issue Apr 5, 2019 · 27 comments

Comments

@griesemer
Copy link
Contributor

griesemer commented Apr 5, 2019

See #22326 for a discussion of a specific example and how the spec might be unclear. See specifically #22326 (comment) .

Issues with the section on initialization order:

  • The very first sentence may imply an algorithm that is different from the one explained by 2nd, more detailed paragraph.
  • The section talks about declaration order, but variables named _ are "never declared" (yet they are processed for initialization order like any other variable).
  • The section could use one or two more examples.
@mdempsky
Copy link
Member

While we wait for a spec CL to improve the wording / examples, I think it's worth at least ensuring we're in agreement about the intended understanding.

  1. Are we agreed upon the total ordering* implied by the algorithm specified in the second paragraph? (* I think it's a total ordering at least.)

  2. Are we agreed that var _ = ... and var uniqueUnusedName = ... should behave identically?

If agreed on both, based on fuzzing I've done, I think go/types and gccgo are both spec compliant, and I have a revision for CL 170062 to bring it into compliance as well.

@mdempsky
Copy link
Member

/cc @ianlancetaylor

@ianlancetaylor
Copy link
Contributor

I agree with those two points.

@alandonovan
Copy link
Contributor

Me too.

@mdempsky
Copy link
Member

mdempsky commented May 8, 2019

Thanks.

One other thing I was wanted confirmation on: the spec talks about initializing one variable at a time. How is this supposed to interact with statements that initialize multiple variables with a single multi-value RHS expression?

For example, consider this program:

package main

import "fmt"

var w = getB()
var x = []int{a, getB()}

var a, b = func() (int, int) { return 271, 314 }()

// getB returns the value of b without depending on it according to
// package initialization dependency rules.
func getB() int { return I(T{}).M() }
type T struct{}
func (T) M() int { return b }
type I interface{ M() int }

func main() {
	fmt.Println(w)
	fmt.Println(x)
	fmt.Println(b)
}

I had planned to make this program print:

0
[271, 314]
314

That is, that a and b get initialized at the same time before going back to initialize x.

But the spec says that variables should be initialized one at a time, and x doesn't actually depend on b, so the spec behavior suggests that the initialization order should be w, a, x, b and that the second line of output should be [271, 0] instead of [271, 314].

I think go/types would print [271, 314]. However, gccgo does print [271, 0].

@alandonovan
Copy link
Contributor

I agree the wording should be more precise in the case of assignments var x, y = f(), but the intent is pretty clear: you get the same outcome whether you treat it like a single assignment of a compound variable (var xy struct { x X; y Y } = f()) or a sequence of assignments with an explicit intermediary: var tmpX, tmpY = f(); var x = tmpX; var y = tmpY.

@mdempsky
Copy link
Member

mdempsky commented May 8, 2019

@alandonovan I'm not sure I follow, and I think compound variable vs explicit intermediaries actually require different output.

Do you mind specifying what the right output you think should be for the program I posted? That is [217, 0] or [217, 314]?

@griesemer
Copy link
Contributor Author

go/types does the following analysis, matching what @mdempsky stated (specifically, getB has no dependencies!):

Computing initialization order for package main ("main")

Object dependency graph:
	main depends on
		w
		x
		b
	w depends on
		getB
	x depends on
		a
		getB
	a has no dependencies
	M depends on
		b
	b has no dependencies
	getB has no dependencies

Transposed object dependency graph (functions eliminated):
	w depends on 0 nodes
		main is dependent
	b depends on 0 nodes
		main is dependent
		M is dependent
	a depends on 0 nodes
		x is dependent
	x depends on 1 nodes
		main is dependent

Processing nodes:
	w (src pos 1) depends on 0 nodes now
	a (src pos 3) depends on 0 nodes now
	x (src pos 2) depends on 0 nodes now
	b (src pos 4) depends on 0 nodes now

Initialization order:
	w = getB()
	a, b = (func() (int, int) literal)()
	x = ([]int literal)

(running go test -run Check -files=x.go with x.go the respective package and with the debug flag enabled in go/types/initorder.go).

From the initialization order I conclude that the output would be:

0
[271 314]
314

which happens to match cmd/compile for this case.

But from the processing nodes list one might expect the output to print [271, 0] like gccgo.

For the case where we have an n:n initialization assignment we can reduce it to n 1:1 initialization assignments (and the usual rules apply).

For the n:1 case (as here), it seems sensible to say that initialization any of the variables on the LHS leads to the initialization of all variables on the LHS at the same time. I think this is what a user would expect to happen - we don't think of the RHS values being stored in temporaries and assigned to the LHS "as needed".

But it is definitively unclear, I believe, and I can't provide a spec-based rationale for one or the other at the moment. We probably need to state current behavior.

@alandonovan
Copy link
Contributor

@mdempsky Sorry, you're quite right, desugaring into a sequence of assignments with an explicit intermediary does give the [217, 0] behavior observed with gccgo. (gri and I spent days scratching our heads over this problem before he came up with the breadth-first formulation and still I get confused.)

I agree with gri that it would be preferable that the output be [217, 314]: the assignments 'var x, y = f()' should not be observable as separate events.

@gopherbot
Copy link

Change https://golang.org/cl/175980 mentions this issue: spec: clarify language on package-level variable initialization

@griesemer
Copy link
Contributor Author

FWIW, the go/types code contains the following comment:

// n:1 variable declarations such as: a, b = f()
// introduce a node for each lhs variable (here: a, b);
// but they all have the same initializer - emit only
// one, for the first variable seen

explicitly addressing the n:1 case.

@mdempsky
Copy link
Member

mdempsky commented May 8, 2019

@griesemer Thanks for go/types analysis.

Also, thanks for bringing up the n:n assignment case. I think this is uncontroversial (both understood and implemented by compilers), but I'll go ahead and state it for posterity: I believe var a, b = f(), g() should be identical to var a = f(); var b = g().

@alandonovan No problem. I've been sufficiently confused about initialization order myself too. :)

It seems like the three of us (@griesemer, @alandonovan, and myself) all tend to think that n:1 assignments should initialize all of the LHS at once, and not as observably separate events.

@ianlancetaylor Can you comment on whether gccgo intentionally prints [217, 0] for the above test case, or whether you think this is an unintended output? Would you agree with changing its output to [217, 314]?

@ianlancetaylor
Copy link
Contributor

gccgo intentionally prints [217, 0], since that is my reading of the current spec ("repeatedly initializing the next package-level variable that is earliest in declaration order" doesn't say anything about "but if the variables are initialized from the same expression then do them all at once").

I'm OK with changing it.

@mdempsky
Copy link
Member

mdempsky commented May 9, 2019

Here's another fun example exploiting package dependency orders, this time demonstrating its interaction with static initialization:

package main

import "fmt"

var x = getAB()
var a = 42
var b = a

func main() {
	fmt.Println(x)
}

// getAB returns {a, b} without depending on them per
// package initialization dependency rules.
func getAB() interface{} { return I(T{}).M() }
type T struct{}
func (T) M() interface{} { return []int{a, b} }
type I interface{ M() interface{} }

The Go spec requires that x be initialized before a or b, so this should print [0 0]. This is also the initialization order that go/types produces.

However, gccgo statically initializes a, causing the program to print [42 0].

cmd/compile additionally statically initializes b, causing the program to print [42 42].

@mdempsky
Copy link
Member

mdempsky commented May 9, 2019

I think it would be really unfortunate if we have to abandon static initialization, and I think it would be pretty complex and error prone to try to keep track of when static initialization is safe or not.

One possible spec solution would be to tweak the dependency rules to allow compilers to detect dependencies on additional variables that evaluating the initialization expression accesses. E.g., evaluating getAB() accesses both a and b, so we could allow compilers to additionally recognize that x depends on a and/or b.

In practice compilers probably wouldn't actively search for these additional dependencies, but it would make it valid for gccgo and cmd/compile to continue statically initializing a and/or b like they do today. gccgo's output is explained "as if" gccgo recognized a dependency from x on a, and cmd/compile's output is explained "as if" cmd/compile recognized dependencies on both a and b.

@griesemer
Copy link
Contributor Author

Interesting. @ianlancetaylor points out that the problem here is a dependency that the compiler cannot detect (per the stated rules). Of course, for this specific example, one could probably add complicating rules to detect the dependency, but in general we can always create an analogous example using functions of an imported package, at which point all bets are off. Instead, Ian suggests that we say something like this: "If a data dependency exists that is not detected according to the currently stated rules, initialization order is undefined".

Such a catch-all rule would allow different compilers to do whatever they are doing now while produce consistent results in "normal" situations.

@mdempsky
Copy link
Member

mdempsky commented May 9, 2019

Instead, Ian suggests that we say something like this: "If a data dependency exists that is not detected according to the currently stated rules, initialization order is undefined".

I think this is like what I suggested above, except I'm suggesting a more limited form of "undefined": just that the additional dependencies outside the stated rules may or may not be respected.

Making all of initialization order undefined seems unnecessary to me.

@griesemer
Copy link
Contributor Author

Given your example above, the "additional" dependency outside the stated rules is that I.M depends on T.M (which depends on a and b, with this dependency being detected by the rules). So there are two cases: 1) The additional dependency is respected, or 2) not respected.

With this we get for 1) x = [42, 42], and for 2) x = [0, 0].

The choice gccgo is making is not included in this list. I assume we would consider that a bug?

@mdempsky
Copy link
Member

mdempsky commented May 9, 2019

@griesemer My suggestion is that "x depends on a" and "x depends on b" are two separate dependencies that compilers may or may not respect, independently of each other. The current gccgo behavior can be explained as gccgo "choosing" to recognize the dependency on a, but not recognizing the dependency on b.

@ianlancetaylor
Copy link
Contributor

Personally I would rather keep compiler dependency detection fixed and as simple as we can manage, rather than having different compilers implement different algorithms.

@griesemer
Copy link
Contributor Author

@mdempsky Understood. But we do have (stated by the existing rules) the following dependencies: x -> getAB -> I.M, T.M -> a, b, and b -> a. Your suggested rule permits a compiler to ignore the dynamic dependency I.M -> T.M. It doesn't allow the compiler to ignore the dependency T.M -> a, b.

@griesemer
Copy link
Contributor Author

griesemer commented May 9, 2019

I should perhaps have said "references" rather than "dependencies" because dependency detection is based on references: a -> b means that there is chain of references leading from a to b.

@mdempsky
Copy link
Member

mdempsky commented May 9, 2019

@ianlancetaylor I want it to be simple too, but my concern is that the simple implementation choice is going to mean punting a lot of static initializations like var a = 42 into dynamic initialization, which will bloat executables and slow process startup. That's likely to be only a small negative impact, but it's likely to affect a lot of users and programs, so I worry at the large scale it's a measurable negative impact.

Conversely, making package initialization slightly more complex in these tricky cases is unlikely to be noticed by users, but gives compilers enough leeway to still perform aggressive static initialization.

@griesemer Yes, I think the references vs dependencies distinction is what I'm getting at. I'm suggesting that if evaluating x's static initializer dynamically accesses a and b, then the compiler is free to recognize a dependency from variable x to variables a and/or b. I'm not suggesting that these specially recognized dependencies have to be consistently explained by any underlying reference chain.

@mdempsky
Copy link
Member

mdempsky commented May 9, 2019

Actually, I withdraw my suggestion. I don't think gccgo and cmd/compile's static initialization can be saved that simply. In something like:

var x = /* side-effects F1, and hidden dependency on b */
var y = /* side-effects F2, and no dependencies */
var b = 42

if we allow compilers to recognize the dependency from x onto b to justify statically initializing b, then we'd also have to run the side effects F2 before F1. However, the current compilers schedule F1 first.

@ianlancetaylor
Copy link
Contributor

My suggestion above is simply that if there is a hidden dependency of A upon B that is not detected by the algorithm in the spec, then the order of initialization of A and B is unspecified. I believe that will permit compilers to retain static initialization.

@mdempsky
Copy link
Member

mdempsky commented May 9, 2019

My suggestion above is simply that if there is a hidden dependency of A upon B that is not detected by the algorithm in the spec, then the order of initialization of A and B is unspecified.

Can you elaborate on what you mean by leaving it unspecified? In particular, how much do you argue should be unspecified?

E.g., suppose there's also X, Y, and Z, and the initialization order would be X, A, Y, B, Z under the current spec wording. To allow static initialization, we have to allow X, B, A, Y, Z. This is more than just A and B's initialization order being unspecified; it's also B and Y's being unspecified.

@griesemer
Copy link
Contributor Author

I think we want to maintain the relative initialization order imposed by detected dependencies and simply leave the initialization order for other variables (between them) undefined. This may have consequences for other, unrelated variables and initialization expressions since the compiled initialization order orders these other variables because of source order.

I've sent out another attempt at the spec change to reflect that.

@golang golang locked and limited conversation to collaborators May 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants