Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: floating point optimization bug #13008

Closed
calmh opened this issue Oct 21, 2015 · 11 comments
Closed

cmd/compile: floating point optimization bug #13008

calmh opened this issue Oct 21, 2015 · 11 comments
Milestone

Comments

@calmh
Copy link
Contributor

calmh commented Oct 21, 2015

I'm hitting something really weird on go version devel +3bc0601 Tue Oct 20 03:16:09 2015 +0000 darwin/amd64. I don't have a small reproducer, so I'll just show you what I get and see what we can figure out from that. I have this method, that essentially just calculates a percentage:

// Completion returns the completion status, in percent, for the given device
// and folder.
func (m *Model) Completion(device protocol.DeviceID, folder string) float64 {
    m.fmut.RLock()
    rf, ok := m.folderFiles[folder]
    m.fmut.RUnlock()
    if !ok {
        return 0 // Folder doesn't exist, so we hardly have any of it
    }

    _, _, tot := rf.GlobalSize()
    if tot == 0 {
        return 100 // Folder is empty, so we have all of it
    }

    var need int64
    rf.WithNeedTruncated(device, func(f db.FileIntf) bool {
        need += f.Size()
        return true
    })

    needRatio := float64(need) / float64(tot)
    l.Debugln("needRatio:", needRatio)
    completionPct := 100 * (1 - needRatio)
    l.Debugln("completionPct:", completionPct)
    l.Debugf("%v Completion(%s, %q): %f (%d / %d = %f)", m, device, folder, completionPct, need, tot, needRatio)

    return completionPct
}

The relevant part for this report is the last part, needRatio := ... and so on. When run as it looks above, I get the following log output (which is correct and as expected):

[JMFJC] 2015/10/21 08:59:24.944322 logger.go:296: DEBUG: needRatio: 0
[JMFJC] 2015/10/21 08:59:24.944353 logger.go:296: DEBUG: completionPct: 100
[JMFJC] 2015/10/21 08:59:24.944470 logger.go:304: DEBUG: model@0xc820748600 Completion(373HSRP-QLPNLIE-JYKZVQF-P4PKZ63-R2ZE6K3-YD442U2-JHBGBQG-WWXAHAU, "s23"): 100.000000 (0 / 12296478 = 0.000000)

If I comment out two of the debug prints:

    needRatio := float64(need) / float64(tot)
    // l.Debugln("needRatio:", needRatio)
    completionPct := 100 * (1 - needRatio)
    // l.Debugln("completionPct:", completionPct)
    l.Debugf("%v Completion(%s, %q): %f (%d / %d = %f)", m, device, folder, completionPct, need, tot, needRatio)

    return completionPct
}

Then I get the following output:

[JMFJC] 2015/10/21 09:00:31.222311 logger.go:304: DEBUG: model@0xc820787400 Completion(373HSRP-QLPNLIE-JYKZVQF-P4PKZ63-R2ZE6K3-YD442U2-JHBGBQG-WWXAHAU, "s23"): 301295835217774195877081356294600623999623875498227347264658436707288007434290836880071246344289875456553039631746301745699977654284466935487418350473778212994407067832680206835633320817493452436173546848256.000000 (0 / 12296478 = 0.000000)

Note that completionPct has taken on a bizarre value - uninitialized? Printing it separately on the line before fixes this...

@minux
Copy link
Member

minux commented Oct 21, 2015

For compiler bugs, we will need a reproducer,
and preferably a small one.

Note that debugging prints could also affect
the scheduler, so please also make sure that
the program is not racy and don't depend on
some specific goroutine scheduling decisions.

One way to see if it's related to the optimizer
is that you can compile the program without
optimization:
go build -gcflags "-N -l" program should do
the trick.

@davecheney
Copy link
Contributor

Can you please build with -gcflags="-N", if that fixes the problem, then it may be an inlining bug.

On 21 Oct 2015, at 18:02, Jakob Borg notifications@github.com wrote:

I'm hitting something really weird on go version devel +3bc0601 Tue Oct 20 03:16:09 2015 +0000 darwin/amd64. I don't have a small reproducer, so I'll just show you what I get and see what we can figure out from that. I have this method, that essentially just calculates a percentage:

// Completion returns the completion status, in percent, for the given device
// and folder.
func (m *Model) Completion(device protocol.DeviceID, folder string) float64 {
m.fmut.RLock()
rf, ok := m.folderFiles[folder]
m.fmut.RUnlock()
if !ok {
return 0 // Folder doesn't exist, so we hardly have any of it
}

_, _, tot := rf.GlobalSize()
if tot == 0 {
    return 100 // Folder is empty, so we have all of it
}

var need int64
rf.WithNeedTruncated(device, func(f db.FileIntf) bool {
    need += f.Size()
    return true
})

needRatio := float64(need) / float64(tot)
l.Debugln("needRatio:", needRatio)
completionPct := 100 * (1 - needRatio)
l.Debugln("completionPct:", completionPct)
l.Debugf("%v Completion(%s, %q): %f (%d / %d = %f)", m, device, folder, completionPct, need, tot, needRatio)

return completionPct

}
The relevant part for this report is the last part, needRatio := ... and so on. When run as it looks above, I get the following log output (which is correct and as expected):

[JMFJC] 2015/10/21 08:59:24.944322 logger.go:296: DEBUG: needRatio: 0
[JMFJC] 2015/10/21 08:59:24.944353 logger.go:296: DEBUG: completionPct: 100
[JMFJC] 2015/10/21 08:59:24.944470 logger.go:304: DEBUG: model@0xc820748600 Completion(373HSRP-QLPNLIE-JYKZVQF-P4PKZ63-R2ZE6K3-YD442U2-JHBGBQG-WWXAHAU, "s23"): 100.000000 (0 / 12296478 = 0.000000)
If I comment out two of the debug prints:

needRatio := float64(need) / float64(tot)
// l.Debugln("needRatio:", needRatio)
completionPct := 100 * (1 - needRatio)
// l.Debugln("completionPct:", completionPct)
l.Debugf("%v Completion(%s, %q): %f (%d / %d = %f)", m, device, folder, completionPct, need, tot, needRatio)

return completionPct

}
Then I get the following output:

[JMFJC] 2015/10/21 09:00:31.222311 logger.go:304: DEBUG: model@0xc820787400 Completion(373HSRP-QLPNLIE-JYKZVQF-P4PKZ63-R2ZE6K3-YD442U2-JHBGBQG-WWXAHAU, "s23"): 301295835217774195877081356294600623999623875498227347264658436707288007434290836880071246344289875456553039631746301745699977654284466935487418350473778212994407067832680206835633320817493452436173546848256.000000 (0 / 12296478 = 0.000000)
Note that completionPct has taken on a bizarre value - uninitialized? Printing it separately on the line before fixes this...


Reply to this email directly or view it on GitHub.

@calmh
Copy link
Contributor Author

calmh commented Oct 21, 2015

Building with -gcflags="-N" makes the problem go away, yes. The program is clean according to -race.

But it's also 30k LOC plus dependencies, so it's a bit crap as a reproducer. Obviously, doing just the code above in a go run doesn't reproduce it.

Go 1.5.1 also doesn't exhibit this issue.

@ianlancetaylor ianlancetaylor changed the title Optimizer bug? cmd/compile: Optimizer bug? Oct 21, 2015
@ianlancetaylor ianlancetaylor added this to the Go1.6 milestone Oct 21, 2015
@rsc
Copy link
Contributor

rsc commented Nov 4, 2015

Please try with -l (ell) but not -N.

It's possible this is a dup of #12411 but I expected that one had something to do with the deep nesting. There's no nesting here.

I tried starting with your example code and defining enough scaffolding to make it compile, and then I commented out the same prints and looked for a mishandling of completionPct, but I don't see it. It would help a bit if you could try building both the buggy and the non-buggy versions of the code with -S (that is, -gcflags -S), redirecting to separate files. The output is a sequence of assembly listings, one per function, each starting with an unindented line giving the function name. If you could send the disassembly of just the Completion method with and without the bug, that would be helpful. For example you might see a file like

...
"".(*Model).Completion t=1 size=1744 value=0 args=0x28 locals=0xf0
    0x0000 00000 (x.go:28)  TEXT    "".(*Model).Completion(SB), $240-40
    0x0000 00000 (x.go:28)  MOVQ    (TLS), CX
    0x0009 00009 (x.go:28)  LEAQ    -112(SP), AX
    0x000e 00014 (x.go:28)  CMPQ    AX, 16(CX)
...

and I'm asking for the part starting at the TEXT line until the section for the next function in the file.

Thanks.

@rsc rsc changed the title cmd/compile: Optimizer bug? cmd/compile: floating point optimization bug Nov 4, 2015
@calmh
Copy link
Contributor Author

calmh commented Nov 4, 2015

Here: https://gist.github.com/calmh/dccbd2e6d4d511067dea

Of those, both the variants with -l and without any option exhibit the bad behavior, only the -N version is good.

@rsc
Copy link
Contributor

rsc commented Nov 10, 2015

Thanks for the assembly. Please try the most recent version of the toolchain. Looking at the bad code, I believe this was fixed by https://go.googlesource.com/go/+/0bef88aa5fa5d8f92e2821d9df7c37caaac432dd on Sunday.

@randall77
Copy link
Contributor

The relevant fix is this one:

ffb2063

On Tue, Nov 10, 2015 at 11:22 AM, Russ Cox notifications@github.com wrote:

Thanks for the assembly. Please try the most recent version of the
toolchain. Looking at the bad code, I believe this was fixed by
https://go.googlesource.com/go/+/0bef88aa5fa5d8f92e2821d9df7c37caaac432dd
on Sunday.


Reply to this email directly or view it on GitHub
#13008 (comment).

@rsc rsc closed this as completed Dec 5, 2015
@calmh
Copy link
Contributor Author

calmh commented Dec 5, 2015

Hmm, so following the tags and commits around it looks like this is fixed in 1.6 and still broken in 1.5.2?

@calmh
Copy link
Contributor Author

calmh commented Dec 5, 2015

On the other hand, I can't seem to reproduce it with 1.5.2 so either something changed on my end to no longer trigger or I'm wrong. Either way, that's good.

@cespare
Copy link
Contributor

cespare commented Dec 5, 2015

@calmh wasn't this bug both introduced and fixed after 1.5? (Note that point releases comprise selected changes cherry-picked onto the original release; they are not cut from master.)

@calmh
Copy link
Contributor Author

calmh commented Dec 6, 2015

Yes, I'm super confused, sorry for the noise.

@golang golang locked and limited conversation to collaborators Dec 14, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants