Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: -race instrumentation leads to different behaviour due to 16 to 32-bit expansion #21963

Closed
martelletto opened this issue Sep 21, 2017 · 9 comments
Labels
CherryPickApproved Used during the release process for point releases FrozenDueToAge release-blocker
Milestone

Comments

@martelletto
Copy link

What version of Go are you using (go version)?

go version devel +eca45997df Thu Sep 21 03:00:51 2017 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes; in fact it does not happen with earlier releases.

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/pedro/go"
GORACE=""
GOROOT="/home/pedro/go-current"
GOTOOLDIR="/home/pedro/go-current/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build492614889=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

Hi,

I have noticed a rather interesting phenomenon. Before expanding on it,
I would like to apologize in advance for my lack of familiarity with Go,
which is most likely at the root of the problem.

When compiled with Go >= 1.9 and -race, the behaviour of the function
rq.Mult changes. The problem cannot be reproduced with Go 1.7.6 or
1.8.3, with or without -race, or with Go 1.9 without -race. All the
information in this report relates to a x86_64 Arch Linux installation
running go version devel +eca45997df linux/amd64.

The issue can be reproduced by cloning the sntrup4591761 repository
and using the provided keygen.go tool to generate a public/private
key pair with a fixed randomness source:

$ git clone https://github.com/companyzero/sntrup4591761/
$ curl https://ambientworks.net/tmp/r > /tmp/r # 64kB worth of random bytes
$ go build examples/keygen/keygen.go
$ ./keygen /tmp/r /tmp/x0 /tmp/y0
$ sha256sum /tmp/[xy]0
a1f9db8f41d4a87b5464414dc6042e55a4803d9345c247317c959278a79c4e50 /tmp/x0
fba5a885bb31fd7845a893e48cce95e267c31d068a14de6886c1843d2480c4b5 /tmp/y0
$ go build -race examples/keygen/keygen.go
$ ./keygen /tmp/r /tmp/x1 /tmp/y1
$ sha256sum /tmp/[xy]1
699680a059bdf65ac98fff6a6dce62583297e2db56360a9d5d5bf8a730db25ea /tmp/x1
049758eb6ff0c63d0e4018787d17540ee56dbebe363d54c5218990ac01dd0e0c /tmp/y1

I tracked down the problem to the first multiplication in
modq.PlusProduct, which happens with A=0, B=-235, and C=-1. Without
-race, this is the prelude leading to the first multiplication in
rq.Mult:

Dump of assembler code for function github.com/companyzero/sntrup4591761/rq.Mult:
=> 0x48fa10 <+0>:       mov    %fs:0xfffffffffffffff8,%rcx
   0x48fa19 <+9>:       lea    -0xb70(%rsp),%rax
   0x48fa21 <+17>:      cmp    0x10(%rcx),%rax
   0x48fa25 <+21>:      jbe    0x48fcf9 <github.com/companyzero/sntrup4591761/rq.Mult+745>
   0x48fa2b <+27>:      sub    $0xbf0,%rsp
   0x48fa32 <+34>:      mov    %rbp,0xbe8(%rsp)
   0x48fa3a <+42>:      lea    0xbe8(%rsp),%rbp
   0x48fa42 <+50>:      movq   $0x0,0x6(%rsp)
   0x48fa4b <+59>:      lea    0x8(%rsp),%rdi
   0x48fa50 <+64>:      mov    $0x17c,%ecx
   0x48fa55 <+69>:      xor    %eax,%eax
   0x48fa57 <+71>:      rep stos %rax,%es:(%rdi)
   0x48fa5a <+74>:      mov    0xc00(%rsp),%rdx
   0x48fa62 <+82>:      mov    0xc08(%rsp),%rbx
   0x48fa6a <+90>:      xor    %eax,%eax
   0x48fa6c <+92>:      jmpq   0x48fb03 <github.com/companyzero/sntrup4591761/rq.Mult+243>
   0x48fa71 <+97>:      lea    0x1(%rcx),%r8
   0x48fa75 <+101>:     movzwl (%rdx,%rcx,2),%r9d
   0x48fa7a <+106>:     movswq %si,%r10
   0x48fa7e <+110>:     movswq %r9w,%r9
   0x48fa82 <+114>:     imul   %r9d,%eax

The first multiplication goes:

   0x48fa75 <+101>:     movzwl (%rdx,%rcx,2),%r9d
   0x48fa7a <+106>:     movswq %si,%r10
   0x48fa7e <+110>:     movswq %r9w,%r9
   0x48fa82 <+114>:     imul   %r9d,%eax

At 0x48fa82, r9d and eax contain:

(gdb) i r r9d
r9d            0xffffff15       -235
(gdb) i r eax
eax            0xffffffff       -1
(gdb) si
(gdb) i r eax
eax            0xeb     235

With -race, the corresponding rq.Mult text reads:

Dump of assembler code for function github.com/companyzero/sntrup4591761/rq.Mult:
=> 0x4dfa60 <+0>:       mov    %fs:0xfffffffffffffff8,%rcx
   0x4dfa69 <+9>:       lea    -0xbd8(%rsp),%rax
   0x4dfa71 <+17>:      cmp    0x10(%rcx),%rax
   0x4dfa75 <+21>:      jbe    0x4dff6a <github.com/companyzero/sntrup4591761/rq.Mult+1290>
   0x4dfa7b <+27>:      sub    $0xc58,%rsp
   0x4dfa82 <+34>:      mov    %rbp,0xc50(%rsp)
   0x4dfa8a <+42>:      lea    0xc50(%rsp),%rbp
   0x4dfa92 <+50>:      mov    0xc58(%rsp),%rax
   0x4dfa9a <+58>:      mov    %rax,(%rsp)
   0x4dfa9e <+62>:      callq  0x47c8d0 <runtime.racefuncenter>
   0x4dfaa3 <+67>:      movq   $0x0,0x4e(%rsp)
   0x4dfaac <+76>:      lea    0x50(%rsp),%rdi
   0x4dfab1 <+81>:      mov    $0x17c,%ecx
   0x4dfab6 <+86>:      xor    %eax,%eax
   0x4dfab8 <+88>:      rep stos %rax,%es:(%rdi)
   0x4dfabb <+91>:      xor    %eax,%eax
   0x4dfabd <+93>:      jmpq   0x4dfbce <github.com/companyzero/sntrup4591761/rq.Mult+366>
   0x4dfac2 <+98>:      mov    %cx,0x12(%rsp)
   0x4dfac7 <+103>:     mov    0xc68(%rsp),%rax
   0x4dfacf <+111>:     lea    (%rax,%rdx,2),%rcx
   0x4dfad3 <+115>:     mov    %rcx,(%rsp)
   0x4dfad7 <+119>:     callq  0x47c770 <runtime.raceread>
   0x4dfadc <+124>:     mov    0xc68(%rsp),%rax
   0x4dfae4 <+132>:     test   %al,(%rax)
   0x4dfae6 <+134>:     mov    0x18(%rsp),%rcx
   0x4dfaeb <+139>:     lea    0x1(%rcx),%rdx
   0x4dfaef <+143>:     movzwl (%rax,%rcx,2),%ecx
   0x4dfaf3 <+147>:     movzwl 0x10(%rsp),%ebx
   0x4dfaf8 <+152>:     movswq %bx,%rbx
   0x4dfafc <+156>:     movswq %cx,%rcx
   0x4dfb00 <+160>:     movzwl 0x12(%rsp),%esi
   0x4dfb05 <+165>:     imul   %ecx,%esi

Following the execution flow from 0x4dfac2:

(gdb) display/i $pc
1: x/i $pc
=> 0x4dfac2 <github.com/companyzero/sntrup4591761/rq.Mult+98>:  mov    %cx,0x12(%rsp)
(gdb) i r cx
cx             0xffff   -1
(gdb) x/x $rsp+0x12
0xc4200abcaa:   0x00000000
(gdb) si
1: x/i $pc
=> 0x4dfac7 <github.com/companyzero/sntrup4591761/rq.Mult+103>: mov    0xc68(%rsp),%rax
(gdb) x/x $rsp+0x12
0xc4200abcaa:   0x0000ffff
(gdb) b *0x4dfb00
(gdb) c
1: x/i $pc
=> 0x4dfb00 <github.com/companyzero/sntrup4591761/rq.Mult+160>: movzwl 0x12(%rsp),%esi
(gdb) i r ecx
ecx            0xffffff15       -235
(gdb)  x/x $rsp+0x12
0xc4200abcaa:   0x0000ffff
(gdb) si
1: x/i $pc
=> 0x4dfb05 <github.com/companyzero/sntrup4591761/rq.Mult+165>: imul   %ecx,%esi
(gdb) i r ecx
ecx            0xffffff15       -235
(gdb) i r esi
esi            0xffff   65535
(gdb) si
1: x/i $pc
=> 0x4dfb08 <github.com/companyzero/sntrup4591761/rq.Mult+168>: lea    (%rsi,%rbx,1),%ecx
(gdb) i r esi
esi            0xff1500eb       -15400725

At a first moment, cx holds -1 and is placed on the stack as a 16-bit
value. After the call to runtime.raceread, cx's previous contents are
retrieved from the stack, expanded from 16 to 32 bits, and placed into
esi. This expansion does not preserve the value's signedness, and -1
becomes 65535, which causes the subsequent multiplication to result in
-15400725 instead of 235.

I gave Go's source code a quick look, and nailed down the generation of
that particular movzwl instruction to loadByType. If I adjust
loadByType with the diff below, then go -race emits movswl instead of
movzwl, and keygen compiled with -race produces the expected output.

diff --git src/cmd/compile/internal/amd64/ssa.go src/cmd/compile/internal/amd64/ssa.go
index 8c92f07320..3399de1c46 100644
--- src/cmd/compile/internal/amd64/ssa.go
+++ src/cmd/compile/internal/amd64/ssa.go
@@ -45,7 +45,11 @@ func loadByType(t *types.Type) obj.As {
                if t.Size() == 1 {
                        return x86.AMOVBLZX
                } else {
-                       return x86.AMOVWLZX
+                       if t.IsSigned() {
+                               return x86.AMOVWLSX
+                       } else {
+                               return x86.AMOVWLZX
+                       }
                }
        }
        // Otherwise, there's no difference between load and store opcodes.

Please note that I am not suggesting the diff above as a fix, but just
as an additional data point. I am not sure whether the problem is in my
code (could it be relying on undefined behaviour?), or elsewhere. Any
input on this would be very much appreciated. Thank you for your time
and attention.

-p.

@mdempsky mdempsky changed the title -race instrumentation leads to different behaviour due to 16 to 32-bit expansion cmd/compile: -race instrumentation leads to different behaviour due to 16 to 32-bit expansion Sep 21, 2017
@randall77
Copy link
Contributor

Thanks for the detailed bug report.

I can reproduce the problem. Mostly to record this for myself, do:

GOSSAFUNC=Mult go build -a -race examples/keygen/keygen.go

Near the top of the listing there is:

v62	00029 (/Users/khr/gopath/src/github.com/companyzero/sntrup4591761/rq/rq.go:80)	MOVWLZX	""..autotmp_84-3134(SP), SI
v72	00030 (/Users/khr/gopath/src/github.com/companyzero/sntrup4591761/rq/rq.go:80)	IMULL	CX, SI

An unsigned 16-bit load followed by a signed multiply, no extension between.
Repros on 1.9 and tip. Only with -race.

I have no idea why -race is relevant.
I will investigate some more.

@randall77 randall77 self-assigned this Sep 21, 2017
@randall77 randall77 added this to the Go1.9.1 milestone Sep 21, 2017
@martelletto
Copy link
Author

Hi Keith,

My (otherwise unsubstantiated) guess is that -race causes a temporary
variable to be allocated on the stack, so that rcx can be reused for
the call to runtime.raceread.

-p.

@randall77
Copy link
Contributor

Yes, I think the bug is not -race specific. It's a bug in spill/restore, and -race just causes more runtime calls -> more spills.

@randall77
Copy link
Contributor

Small repro:

package main

import (
	"fmt"
	"runtime"
)

//go:noinline
func f(x []int32, y *int8) int32 {
	c := int32(int16(*y))
	runtime.GC()
	return x[0] * c
}

func main() {
	var x = [1]int32{5}
	var y int8 = -1
	if got, want := f(x[:], &y), int32(-5); got != want {
		panic(fmt.Sprintf("wanted %d, got %d", want, got))
	}
}

Prints

panic: wanted -5, got 327675

@gopherbot
Copy link

Change https://golang.org/cl/65290 mentions this issue: cmd/compile: fix sign-extension merging rules

@randall77
Copy link
Contributor

Reopening for cherry pick into 1.9.1.

@rsc
Copy link
Contributor

rsc commented Oct 13, 2017

CL 65290 OK for Go 1.9.2.
CL 70986 OK for Go 1.9.2. (fixed merge conflicts)

@rsc rsc added the CherryPickApproved Used during the release process for point releases label Oct 14, 2017
@gopherbot
Copy link

Change https://golang.org/cl/70986 mentions this issue: cmd/compile: fix sign-extension merging rules

gopherbot pushed a commit that referenced this issue Oct 25, 2017
If we have

  y = <int16> (MOVBQSX x)
  z = <int32> (MOVWQSX y)

We used to use this rewrite rule:

(MOVWQSX x:(MOVBQSX _)) -> x

But that resulted in replacing z with a value whose type
is only int16.  Then if z is spilled and restored, it gets
zero extended instead of sign extended.

Instead use the rule

(MOVWQSX (MOVBQSX x)) -> (MOVBQSX x)

The result is has the correct type, so it can be spilled
and restored correctly.  It might mean that a few more extension
ops might not be eliminated, but that's the price for correctness.

Fixes #21963

Change-Id: I6ec82c3d2dbe43cc1fee6fb2bd6b3a72fca3af00
Reviewed-on: https://go-review.googlesource.com/65290
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-on: https://go-review.googlesource.com/70986
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@rsc
Copy link
Contributor

rsc commented Oct 26, 2017

go1.9.2 has been packaged and includes:

The release is posted at golang.org/dl.

— golang.org/x/build/cmd/releasebot, Oct 26 21:09:17 UTC

@rsc rsc closed this as completed Oct 26, 2017
@golang golang locked and limited conversation to collaborators Oct 26, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CherryPickApproved Used during the release process for point releases FrozenDueToAge release-blocker
Projects
None yet
Development

No branches or pull requests

5 participants