Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/dist: s390x builder failing #32536

Closed
billotosyr opened this issue Jun 10, 2019 · 10 comments
Closed

cmd/dist: s390x builder failing #32536

billotosyr opened this issue Jun 10, 2019 · 10 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Milestone

Comments

@billotosyr
Copy link

What version of Go are you using (go version)?

master tip

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

Linux s390x

What did you do?

What did you expect to see?

successful build

What did you see instead?

fail

@randall77
Copy link
Contributor

I'm working on it. At least, the stale dependency errors, maybe caused by stack-allocated defers.
It would be helpful for someone to look at the resource temporarily unavailable error.

@randall77
Copy link
Contributor

What I've learned so far.
It isn't gc, problem still occurs with GOGC=off
It isn't stack copying; making the initial stack very large doesn't change anything.
The problem is in the anonymous go'd function in cmd/compile/internal/gc/noder.go:parseFiles. Turning off all stack-allocated defers everywhere except this function still reproduces the problem.
Changing cmd/compile/internal/gc/noder.go:50 from defer f.Close() to defer func() { f.Close() } fixes it. Changing it to defer func() error { return f.Close() } also reproduces the problem.
I don't see any obvious problems with the generated assembly.
The only real difference between those first two defer statements is that the first requires 16 extra bytes for the unused error return. We seem to be reserving space for that correctly.

Puzzling. I'm going to continue working on this, will revert or patch out if I can't figure it out by EOD tomorrow.

@agnivade agnivade added the NeedsFix The path to resolution is known, but the work has not been done. label Jun 11, 2019
@agnivade agnivade added this to the Go1.13 milestone Jun 11, 2019
@mundaym
Copy link
Member

mundaym commented Jun 11, 2019

Interestingly the tests pass on my Ubuntu 18.04 VM. Last time I saw something OS-dependent like this it was because we had got an ABI transition slightly wrong (e.g. not saving a callee-save floating point register that happens to be needed on one OS but not the other). I suggest we hold off reverting, I think this may well be a latent bug. I'll keep investigating.

@mundaym
Copy link
Member

mundaym commented Jun 11, 2019

The problem is in the anonymous go'd function in cmd/compile/internal/gc/noder.go:parseFiles. Turning off all stack-allocated defers everywhere except this function still reproduces the problem.
Changing cmd/compile/internal/gc/noder.go:50 from defer f.Close() to defer func() { f.Close() } fixes it. Changing it to defer func() error { return f.Close() } also reproduces the problem.

I haven't been able to recreate this. I still see the problem even with all of the defers in parseFiles removed.

@randall77
Copy link
Contributor

I haven't been able to recreate this. I still see the problem even with all of the defers in parseFiles removed.

There may very well be other places where this bug occurs.
I turned off stack allocation of defers for all functions except this one.
Use this patch:

+++ b/src/cmd/compile/internal/gc/escape.go
@@ -6,7 +6,10 @@ package gc
 
 import (
        "cmd/compile/internal/types"
+       "crypto/sha1"
        "fmt"
+       "os"
+       "strings"
 )
 
 // Escape analysis.
@@ -882,7 +885,15 @@ func (e *Escape) augmentParamHole(k EscHole, where *Node) EscHole {
        // non-transient location to avoid arguments from being
        // transiently allocated.
        if where.Op == ODEFER && e.loopDepth == 1 {
-               where.Esc = EscNever // force stack allocation of defer record (see ssa.go)
+               // function name
+               hstr := ""
+               for _, b := range sha1.Sum([]byte(e.curfn.funcname())) {
+                       hstr += fmt.Sprintf("%08b", b)
+               }
+               if strings.HasSuffix(hstr, os.Getenv("GODEFERHASH")) {
+                       fmt.Printf("DEFER %s.%s\n", localpkg.Name, e.curfn.funcname())
+                       where.Esc = EscNever // force stack allocation of defer record (see ssa.go)
+               }
                // TODO(mdempsky): Eliminate redundant EscLocation allocs.
                return e.teeHole(k, e.newLoc(nil, false).asHole())
        }

And then run with GODEFERHASH=111000000.

Replacing that defer with defer func(a int) { return }(0) makes make.bash pass. Replacing it with defer func(a, b int) { return }(0,0) makes make.bash fail.
Will look more today.

@randall77
Copy link
Contributor

Some experiments I did suggest that the error occurs when a function has defers with different argument counts. Adding dummy args to make them all the same argsize seems to fix the issue.

Mike Munday suggested that maybe it's the bootstrap Go. We use a devel version on the builders circa somewhere between 1.5 and 1.6. Replacing the bootstrap with 1.12.5 fixes the issue.
Sounds plausible, but I'm not sure how a bug in the bootstrap leads to the issue in question.

Speaking of which, the issue occurs because cmd/dist runs checkNotStale (in cmd/dist/build.go), which does a go list on the toolchain binaries. It is expecting nothing (everything up to date) but instead it reports the binaries need to be recompiled. To me this suggests nondeterministic compiler output.

@randall77
Copy link
Contributor

I'm coming around to the bad bootstrap compiler theory.

If I add another round of bootstrap to cmd/dist/build.go:cmdbootstrap, everything now passes.

Mike, can you update the bootstrap on the (all of the?) s390x builders?

@mundaym
Copy link
Member

mundaym commented Jun 12, 2019

Mike, can you update the bootstrap on the (all of the?) s390x builders?

Done. It is now using Go 1.12.6.

I have a VM which is using Go 1.10.1 as a bootstrap and is also failing the same way suggesting that the underlying issue was 'fixed' relatively recently. I'll see if I can narrow it down to a specific commit.

@mundaym
Copy link
Member

mundaym commented Jun 12, 2019

Looks like the bootstrap starts working after CL 121495 which fixed the s390x assembly for strings.Compare and bytes.Compare. Which makes sense. Not sure what the interaction with your CL was Keith. Thanks for investigating it.

@billotosyr
Copy link
Author

Thank you all! It appears to be working now.

@golang golang locked and limited conversation to collaborators Jun 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Projects
None yet
Development

No branches or pull requests

5 participants