|
|
Descriptionliblink: fix bad code generated for MOVFD/MOVDF when reg > 7
The byte that r is or'd into is already 0x7, so the failure to zero r only
impacts the generated machine code if the register is > 7.
Fixes issue 7044.
Patch Set 1 #Patch Set 2 : diff -r fb17c23eee74 https://code.google.com/p/go #Patch Set 3 : diff -r fb17c23eee74 https://code.google.com/p/go #
Total comments: 3
Patch Set 4 : diff -r fb17c23eee74 https://code.google.com/p/go #Patch Set 5 : diff -r ec6ee3cf226b https://code.google.com/p/go #
MessagesTotal messages: 20
Hello dave@cheney.net (cc: golang-codereviews@googlegroups.com), I'd like you to review this change to https://code.google.com/p/go
Sign in to reply to this message.
Thanks for taking a look at this Josh. I'll do some testing on arm5 and arm7 machines later today. This might also need testing on armv6 machines as they do not have a VFPv3 floating point unit, and so I *think* only have 1/2 the number of double precision fp registers. wrt, your comment that the compiler isn't making good use of the available registers, maybe this is just because the compiler hasn't hit a situation like this before. From the test you wrote, it has to be a pretty contrived bit of code before we can have that many live fp registers. On Tue, Mar 11, 2014 at 8:12 AM, <josharian@gmail.com> wrote: > Reviewers: dfc, > > Message: > Hello dave@cheney.net (cc: golang-codereviews@googlegroups.com), > > I'd like you to review this change to > https://code.google.com/p/go > > > Description: > liblink, cmd/5a: fix bad AMOVFD and AMOVDF generated code for registers >> >> 7 > > > The byte that r is or'd into is already 0x7, so the failure to zero r > only > impacts the generated machine code if the register is > 7. This > apparently > happens quite rarely. (That fact itself is concerning, as it suggests > that > we are not making good use of the available registers.) > > Fixes issue 7044. > > Please review this at https://codereview.appspot.com/73730043/ > > Affected files (+44, -1 lines): > M src/liblink/asm5.c > A test/fixedbugs/issue7044.go > > > Index: src/liblink/asm5.c > =================================================================== > --- a/src/liblink/asm5.c > +++ b/src/liblink/asm5.c > @@ -1635,7 +1635,7 @@ > r = p->reg; > if(r == NREG) { > r = rt; > - if(p->as == AMOVF || p->as == AMOVD || p->as == > ASQRTF || p->as == ASQRTD || p->as == AABSF || p->as == AABSD) > + if(p->as == AMOVF || p->as == AMOVD || p->as == > AMOVFD || p->as == AMOVDF || p->as == ASQRTF || p->as == ASQRTD || p->as == > AABSF || p->as == AABSD) > r = 0; > } > o1 |= rf | (r<<16) | (rt<<12); > Index: test/fixedbugs/issue7044.go > =================================================================== > new file mode 100644 > --- /dev/null > +++ b/test/fixedbugs/issue7044.go > @@ -0,0 +1,43 @@ > +// run > + > +// Copyright 2014 The Go Authors. All rights reserved. > +// Use of this source code is governed by a BSD-style > +// license that can be found in the LICENSE file. > + > +// Issue 7044: bad AMOVFD and AMOVDF assembly generation on > +// arm for registers above 7. > + > +package main > + > +import ( > + "fmt" > + "reflect" > +) > + > +func f() [16]float32 { > + f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15 > := > + float32(1), float32(1), float32(1), float32(1), float32(1), > float32(1), float32(1), float32(1), float32(1), float32(1), float32(1), > float32(1), float32(1), float32(1), float32(1), float32(1) > + // Use all 16 registers to do float32 --> float64 conversion. > + d0, d1, d2, d3, d4, d5, d6, d7, d8, d9, d10, d11, d12, d13, d14, d15 > := > + float64(f0), float64(f1), float64(f2), float64(f3), > float64(f4), float64(f5), float64(f6), float64(f7), float64(f8), > float64(f9), float64(f10), float64(f11), float64(f12), float64(f13), > float64(f14), float64(f15) > + // Use all 16 registers to do float64 --> float32 conversion. > + g0, g1, g2, g3, g4, g5, g6, g7, g8, g9, g10, g11, g12, g13, g14, g15 > := > + float32(d0), float32(d1), float32(d2), float32(d3), > float32(d4), float32(d5), float32(d6), float32(d7), float32(d8), > float32(d9), float32(d10), float32(d11), float32(d12), float32(d13), > float32(d14), float32(d15) > + // Force another conversion, so that the previous conversion doesn't > + // get optimized away into constructing the returned array. With > current > + // optimizations, constructing the returned array uses only > + // a single register. > + e0, e1, e2, e3, e4, e5, e6, e7, e8, e9, e10, e11, e12, e13, e14, e15 > := > + float64(g0), float64(g1), float64(g2), float64(g3), > float64(g4), float64(g5), float64(g6), float64(g7), float64(g8), > float64(g9), float64(g10), float64(g11), float64(g12), float64(g13), > float64(g14), float64(g15) > + return [16]float32{ > + float32(e0), float32(e1), float32(e2), float32(e3), > float32(e4), float32(e5), float32(e6), float32(e7), float32(e8), > float32(e9), float32(e10), float32(e11), float32(e12), float32(e13), > float32(e14), float32(e15), > + } > +} > + > +func main() { > + want := [16]float32{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1} > + got := f() > + if !reflect.DeepEqual(got, want) { > + fmt.Printf("f() = %#v; want %#v\n", got, want) > + } > +} > >
Sign in to reply to this message.
> Thanks for taking a look at this Josh. I'll do some testing on arm5 > and arm7 machines later today. This might also need testing on armv6 > machines as they do not have a VFPv3 floating point unit, and so I > *think* only have 1/2 the number of double precision fp registers. Thanks, much appreciated. My testing has been limited to the Raspberry Pi and examining the disassembled binaries. (Hopper.app for OS X is proving to be pretty handy.) > wrt, your comment that the compiler isn't making good use of the > available registers, maybe this is just because the compiler hasn't > hit a situation like this before. From the test you wrote, it has to > be a pretty contrived bit of code before we can have that many live fp > registers. I'm not sure whether VFPv3 does instruction pipelining like NEON does, but if so, it looks like we're currently missing an opportunity to make good use of it in this code. All the writes to the returned array are done via dxx --> s2 --> memory, whereas I think that they could be pipelined using different s registers. This is the first gc-generated ARM assembly I've taken a close look at, and I see what look like a couple of optimization opportunities. I'm not sure how viable / valuable they are, though. I'm new to compilers and to broader ARM assembly; my prior experience is just from hand-tuned NEON for card.io. I'll file some issues and cc you, and perhaps you can provide some guidance. -josh
Sign in to reply to this message.
Ahh interesting. If you've already been testing on an RPi then my concerns are probably not relevant. Have a look at gcc -v, the magic compile flag is something like VFP-d16. If that is set the there probably isn't cause for concern. On 11 Mar 2014, at 8:30, Josh Bleecher Snyder <josharian@gmail.com> wrote: >> Thanks for taking a look at this Josh. I'll do some testing on arm5 >> and arm7 machines later today. This might also need testing on armv6 >> machines as they do not have a VFPv3 floating point unit, and so I >> *think* only have 1/2 the number of double precision fp registers. > > Thanks, much appreciated. My testing has been limited to the Raspberry > Pi and examining the disassembled binaries. (Hopper.app for OS X is > proving to be pretty handy.) > > >> wrt, your comment that the compiler isn't making good use of the >> available registers, maybe this is just because the compiler hasn't >> hit a situation like this before. From the test you wrote, it has to >> be a pretty contrived bit of code before we can have that many live fp >> registers. > > I'm not sure whether VFPv3 does instruction pipelining like NEON does, > but if so, it looks like we're currently missing an opportunity to > make good use of it in this code. All the writes to the returned array > are done via dxx --> s2 --> memory, whereas I think that they could be > pipelined using different s registers. > > This is the first gc-generated ARM assembly I've taken a close look > at, and I see what look like a couple of optimization opportunities. > I'm not sure how viable / valuable they are, though. I'm new to > compilers and to broader ARM assembly; my prior experience is just > from hand-tuned NEON for card.io. I'll file some issues and cc you, > and perhaps you can provide some guidance. > > -josh
Sign in to reply to this message.
This passes on armv5 # ../test real 7m44.494s user 6m45.730s sys 0m40.450s Josh has tested armv6, I'll test on armv7 later today. https://codereview.appspot.com/73730043/diff/40001/src/liblink/asm5.c File src/liblink/asm5.c (right): https://codereview.appspot.com/73730043/diff/40001/src/liblink/asm5.c#newcode... src/liblink/asm5.c:1638: if(p->as == AMOVF || p->as == AMOVD || p->as == AMOVFD || p->as == AMOVDF || p->as == ASQRTF || p->as == ASQRTD || p->as == AABSF || p->as == AABSD) this line is quite long, would you consider converting it to a switch for possibly better readability
Sign in to reply to this message.
I think we can use assembly to test the assemble of MOVDF and MOVFD directly. Although that involves enhancing test/run.go to support assembly. The other idea is that we write a Go program that use 5a and 5l to assemble and link a program and use objdump to examine its disassembly matches the input. There is no guarantee that the Go repro code could always trigger a bad instruction being emitted.
Sign in to reply to this message.
https://codereview.appspot.com/73730043/diff/40001/src/liblink/asm5.c File src/liblink/asm5.c (right): https://codereview.appspot.com/73730043/diff/40001/src/liblink/asm5.c#newcode... src/liblink/asm5.c:1638: if(p->as == AMOVF || p->as == AMOVD || p->as == AMOVFD || p->as == AMOVDF || p->as == ASQRTF || p->as == ASQRTD || p->as == AABSF || p->as == AABSD) On 2014/03/10 23:37:35, dfc wrote: > this line is quite long, would you consider converting it to a switch for > possibly better readability Converting to switch will take a lot of space (because each case must be on its own line), I suggest we just break the long line up.
Sign in to reply to this message.
Ah, I missed in the previous emails. The description should just be: liblink: fix bad code generated for MOVFD/MOVDF when reg > 7. (i.e. remove cmd/5a)
Sign in to reply to this message.
> Converting to switch will take a lot of space (because each case must be on > its own line), I suggest we just break the long line up. 10 lines sounds like a reasonable price to pay for readability. Anyway, it was just a suggestion.
Sign in to reply to this message.
PTAL Dave, did arm7 turn out ok? > The description should just be: > liblink: fix bad code generated for MOVFD/MOVDF when reg > 7. Done. Sorry about that. > There is no guarantee that the Go repro code could > always trigger a bad instruction being emitted. Good point. Given the nature of the fix, I'm not sure it's worth the gymnastics required to test it directly. I'm open to being persuaded, though. https://codereview.appspot.com/73730043/diff/40001/src/liblink/asm5.c File src/liblink/asm5.c (right): https://codereview.appspot.com/73730043/diff/40001/src/liblink/asm5.c#newcode... src/liblink/asm5.c:1638: if(p->as == AMOVF || p->as == AMOVD || p->as == AMOVFD || p->as == AMOVDF || p->as == ASQRTF || p->as == ASQRTD || p->as == AABSF || p->as == AABSD) On 2014/03/11 00:44:06, minux wrote: > On 2014/03/10 23:37:35, dfc wrote: > > this line is quite long, would you consider converting it to a switch for > > possibly better readability > Converting to switch will take a lot of space (because each case must be on > its own line), I suggest we just break the long line up. Tried both. The switch looked prettier but out of place, because it made case 54 much longer than its neighbors. Now broken into two lines.
Sign in to reply to this message.
LGTM on armv5 and armv7, please wait for another reviewer. On Tue, Mar 11, 2014 at 11:47 PM, <josharian@gmail.com> wrote: > PTAL > > > Dave, did arm7 turn out ok? > > > >> The description should just be: >> liblink: fix bad code generated for MOVFD/MOVDF when reg > 7. > > > Done. Sorry about that. > > > >> There is no guarantee that the Go repro code could >> always trigger a bad instruction being emitted. > > > Good point. Given the nature of the fix, I'm not sure it's worth the > gymnastics required to test it directly. I'm open to being persuaded, > though. > > > > > https://codereview.appspot.com/73730043/diff/40001/src/liblink/asm5.c > File src/liblink/asm5.c (right): > > https://codereview.appspot.com/73730043/diff/40001/src/liblink/asm5.c#newcode... > src/liblink/asm5.c:1638: if(p->as == AMOVF || p->as == AMOVD || p->as == > AMOVFD || p->as == AMOVDF || p->as == ASQRTF || p->as == ASQRTD || p->as > == AABSF || p->as == AABSD) > On 2014/03/11 00:44:06, minux wrote: >> >> On 2014/03/10 23:37:35, dfc wrote: >> > this line is quite long, would you consider converting it to a > > switch for >> >> > possibly better readability >> >> Converting to switch will take a lot of space (because each case must > > be on >> >> its own line), I suggest we just break the long line up. > > > Tried both. The switch looked prettier but out of place, because it made > case 54 much longer than its neighbors. Now broken into two lines. > > https://codereview.appspot.com/73730043/
Sign in to reply to this message.
R=rsc On Tue, Mar 11, 2014 at 6:24 AM, Dave Cheney <dave@cheney.net> wrote: > LGTM on armv5 and armv7, please wait for another reviewer. > > On Tue, Mar 11, 2014 at 11:47 PM, <josharian@gmail.com> wrote: > > PTAL > > > > > > Dave, did arm7 turn out ok? > > > > > > > >> The description should just be: > >> liblink: fix bad code generated for MOVFD/MOVDF when reg > 7. > > > > > > Done. Sorry about that. > > > > > > > >> There is no guarantee that the Go repro code could > >> always trigger a bad instruction being emitted. > > > > > > Good point. Given the nature of the fix, I'm not sure it's worth the > > gymnastics required to test it directly. I'm open to being persuaded, > > though. > > > > > > > > > > https://codereview.appspot.com/73730043/diff/40001/src/liblink/asm5.c > > File src/liblink/asm5.c (right): > > > > > https://codereview.appspot.com/73730043/diff/40001/src/liblink/asm5.c#newcode... > > src/liblink/asm5.c:1638: if(p->as == AMOVF || p->as == AMOVD || p->as == > > AMOVFD || p->as == AMOVDF || p->as == ASQRTF || p->as == ASQRTD || p->as > > == AABSF || p->as == AABSD) > > On 2014/03/11 00:44:06, minux wrote: > >> > >> On 2014/03/10 23:37:35, dfc wrote: > >> > this line is quite long, would you consider converting it to a > > > > switch for > >> > >> > possibly better readability > >> > >> Converting to switch will take a lot of space (because each case must > > > > be on > >> > >> its own line), I suggest we just break the long line up. > > > > > > Tried both. The switch looked prettier but out of place, because it made > > case 54 much longer than its neighbors. Now broken into two lines. > > > > https://codereview.appspot.com/73730043/ > > -- > You received this message because you are subscribed to the Google Groups > "golang-codereviews" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to golang-codereviews+unsubscribe@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. >
Sign in to reply to this message.
LGTM. On Tue, Mar 11, 2014 at 8:47 AM, <josharian@gmail.com> wrote: > There is no guarantee that the Go repro code could >> always trigger a bad instruction being emitted. > > Good point. Given the nature of the fix, I'm not sure it's worth the > gymnastics required to test it directly. I'm open to being persuaded, > though. But the test is too complex when we can just write an assembly language function to test the code generated for MOVF/MOVD with a couple lines. I will need to have a test comparing our output with a disassembler output sooner or later. Russ, have you found the solution for an objdump in Go? This case makes a perfect test case for it.
Sign in to reply to this message.
LGTM file an issue marked Go1.3Maybe for a test once the new objdump is in
Sign in to reply to this message.
*** Submitted as https://code.google.com/p/go/source/detail?r=a910bd7561e4 *** liblink: fix bad code generated for MOVFD/MOVDF when reg > 7 The byte that r is or'd into is already 0x7, so the failure to zero r only impacts the generated machine code if the register is > 7. Fixes issue 7044. LGTM=dave, minux.ma, rsc R=dave, minux.ma, bradfitz, rsc CC=golang-codereviews https://codereview.appspot.com/73730043
Sign in to reply to this message.
Message was sent while issue was closed.
This CL appears to have broken the darwin-amd64-cheney builder.
Sign in to reply to this message.
This is new, but unrelated. Looks like maybe just a network hiccup? http://build.golang.org/log/1d1d4b832cc20abe4d650eb2a88e8a644fb52441 --- FAIL: TestVariousDeadlines4Proc (2.01 seconds) timeout_test.go:480: 1ns run 1/1 timeout_test.go:503: for 1ns run 1/1, good client timeout after 11.705us, reading 0 bytes timeout_test.go:513: for 1ns run 1/1: server in 628.079us wrote 187948, write tcp 127.0.0.1:54863: broken pipe timeout_test.go:480: 2ns run 1/1 timeout_test.go:503: for 2ns run 1/1, good client timeout after 2.983615ms, reading 0 bytes timeout_test.go:513: for 2ns run 1/1: server in 3.79554ms wrote 196608, write tcp 127.0.0.1:54864: broken pipe timeout_test.go:480: 5ns run 1/1 timeout_test.go:503: for 5ns run 1/1, good client timeout after 6.74us, reading 0 bytes timeout_test.go:513: for 5ns run 1/1: server in 616.614us wrote 187948, write tcp 127.0.0.1:54866: broken pipe timeout_test.go:480: 50ns run 1/1 timeout_test.go:503: for 50ns run 1/1, good client timeout after 8.837us, reading 0 bytes timeout_test.go:513: for 50ns run 1/1: server in 542.237us wrote 187948, write tcp 127.0.0.1:54867: broken pipe timeout_test.go:480: 100ns run 1/1 timeout_test.go:503: for 100ns run 1/1, good client timeout after 5.456us, reading 0 bytes timeout_test.go:513: for 100ns run 1/1: server in 533.785us wrote 187948, write tcp 127.0.0.1:54868: broken pipe timeout_test.go:480: 200ns run 1/1 timeout_test.go:503: for 200ns run 1/1, good client timeout after 8.98us, reading 0 bytes timeout_test.go:513: for 200ns run 1/1: server in 534.485us wrote 187948, write tcp 127.0.0.1:54869: broken pipe timeout_test.go:480: 500ns run 1/1 timeout_test.go:503: for 500ns run 1/1, good client timeout after 4.354us, reading 0 bytes timeout_test.go:513: for 500ns run 1/1: server in 112.186us wrote 32768, write tcp 127.0.0.1:54872: broken pipe timeout_test.go:480: 750ns run 1/1 timeout_test.go:503: for 750ns run 1/1, good client timeout after 9.211us, reading 0 bytes timeout_test.go:517: for 750ns run 1/1, timeout waiting for server to finish writing FAIL FAIL net 4.344s On Tue, Mar 11, 2014 at 2:07 PM, <gobot@golang.org> wrote: > This CL appears to have broken the darwin-amd64-cheney builder. > > https://codereview.appspot.com/73730043/
Sign in to reply to this message.
On Mar 11, 2014 2:10 PM, "Josh Bleecher Snyder" <josharian@gmail.com> wrote: > This is new, but unrelated. Looks like maybe just a network hiccup? > > http://build.golang.org/log/1d1d4b832cc20abe4d650eb2a88e8a644fb52441 > > --- FAIL: TestVariousDeadlines4Proc (2.01 seconds) nothing serious, this test is inherently flaky. > timeout_test.go:480: 1ns run 1/1 > timeout_test.go:503: for 1ns run 1/1, good client timeout after > 11.705us, reading 0 bytes > timeout_test.go:513: for 1ns run 1/1: server in 628.079us wrote > 187948, write tcp 127.0.0.1:54863: broken pipe > timeout_test.go:480: 2ns run 1/1 > timeout_test.go:503: for 2ns run 1/1, good client timeout after > 2.983615ms, reading 0 bytes > timeout_test.go:513: for 2ns run 1/1: server in 3.79554ms wrote > 196608, write tcp 127.0.0.1:54864: broken pipe > timeout_test.go:480: 5ns run 1/1 > timeout_test.go:503: for 5ns run 1/1, good client timeout after > 6.74us, reading 0 bytes > timeout_test.go:513: for 5ns run 1/1: server in 616.614us wrote > 187948, write tcp 127.0.0.1:54866: broken pipe > timeout_test.go:480: 50ns run 1/1 > timeout_test.go:503: for 50ns run 1/1, good client timeout after > 8.837us, reading 0 bytes > timeout_test.go:513: for 50ns run 1/1: server in 542.237us wrote > 187948, write tcp 127.0.0.1:54867: broken pipe > timeout_test.go:480: 100ns run 1/1 > timeout_test.go:503: for 100ns run 1/1, good client timeout after > 5.456us, reading 0 bytes > timeout_test.go:513: for 100ns run 1/1: server in 533.785us wrote > 187948, write tcp 127.0.0.1:54868: broken pipe > timeout_test.go:480: 200ns run 1/1 > timeout_test.go:503: for 200ns run 1/1, good client timeout after > 8.98us, reading 0 bytes > timeout_test.go:513: for 200ns run 1/1: server in 534.485us wrote > 187948, write tcp 127.0.0.1:54869: broken pipe > timeout_test.go:480: 500ns run 1/1 > timeout_test.go:503: for 500ns run 1/1, good client timeout after > 4.354us, reading 0 bytes > timeout_test.go:513: for 500ns run 1/1: server in 112.186us wrote > 32768, write tcp 127.0.0.1:54872: broken pipe > timeout_test.go:480: 750ns run 1/1 > timeout_test.go:503: for 750ns run 1/1, good client timeout after > 9.211us, reading 0 bytes > timeout_test.go:517: for 750ns run 1/1, timeout waiting for server to > finish writing > FAIL > FAIL net 4.344s > > > On Tue, Mar 11, 2014 at 2:07 PM, <gobot@golang.org> wrote: > > This CL appears to have broken the darwin-amd64-cheney builder. > > > > https://codereview.appspot.com/73730043/
Sign in to reply to this message.
On Tue, Mar 11, 2014 at 2:12 PM, minux <minux.ma@gmail.com> wrote: > > On Mar 11, 2014 2:10 PM, "Josh Bleecher Snyder" <josharian@gmail.com> > wrote: > > This is new, but unrelated. Looks like maybe just a network hiccup? > > > > http://build.golang.org/log/1d1d4b832cc20abe4d650eb2a88e8a644fb52441 > > > > --- FAIL: TestVariousDeadlines4Proc (2.01 seconds) > nothing serious, this test is inherently flaky. if it happens more than once per year, please make it not as flaky or turn it off. thanks. russ
Sign in to reply to this message.
> file an issue marked Go1.3Maybe for a test once the new objdump is in https://code.google.com/p/go/issues/detail?id=7515
Sign in to reply to this message.
|