Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: memset/memmove clobbers r9/r10 on Linux/ARM #3718

Closed
davecheney opened this issue Jun 10, 2012 · 13 comments
Closed

runtime: memset/memmove clobbers r9/r10 on Linux/ARM #3718

davecheney opened this issue Jun 10, 2012 · 13 comments

Comments

@davecheney
Copy link
Contributor

What steps will reproduce the problem?
1. run the attached code on linux/arm 

% go build profilesegv.go 
% ./profilesegv 
Segmentation fault

What is the expected output? What do you see instead?

This example is extracted from any benchmark code with -test.cpuprofile= supplied.
Generally rerunning the benchmark will eventually avoid the segfault, but the longer the
test, the less likely the results are. 

No crashes are observed if cpu profiling is not requested.

Please use labels and text to provide additional information.

This occurs on go1.0.1, 1.0.2-{2,3} and +tip. 

Running under gdb gives this result.

pando(~/src) % gdb ./profilesegv
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>;
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>;...
Reading symbols from /home/dfc/src/profilesegv...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/dfc/src/profilesegv
[New LWP 1157]

Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 1157]
0x000370a4 in runtime.sigtramp ()
(gdb) disassemble
Dump of assembler code for function runtime.sigtramp:
   0x00037098 <+0>:     str     lr, [sp, #-28]!
   0x0003709c <+4>:     mov     r3, r10
   0x000370a0 <+8>:     str     r10, [sp, #20]
=> 0x000370a4 <+12>:    ldr     r10, [r9, #44]  ; 0x2c
   0x000370a8 <+16>:    str     r0, [sp, #4]
   0x000370ac <+20>:    str     r1, [sp, #8]
   0x000370b0 <+24>:    str     r2, [sp, #12]
   0x000370b4 <+28>:    str     r3, [sp, #16]
   0x000370b8 <+32>:    bl      0x2ad9c <runtime.sighandler>
   0x000370bc <+36>:    ldr     r10, [sp, #20]
   0x000370c0 <+40>:    ldr     pc, [sp], #28
End of assembler dump.
(gdb) info registers
r0             0x1b     27
r1             0x10651c90       275061904
r2             0x10651d10       275062032
r3             0x0      0
r4             0x0      0
r5             0x0      0
r6             0x0      0
r7             0x0      0
r8             0x0      0
r9             0x0      0
r10            0x0      0
r11            0x0      0
r12            0xcafebabe       3405691582
sp             0x10651c74       0x10651c74
lr             0x370fc  225532
pc             0x370a4  0x370a4 <runtime.sigtramp+12>
cpsr           0x110    272

Which suggests there is a race between setting up the signal handler and ensuring m and
g (r9 and r10) are properly populated. Or maybe somethings is overwriting them. In other
crashes I've seen r9/10 set to invalid values (will start collecting core files)

Attachments:

  1. profilesegv.go (1691 bytes)
@minux
Copy link
Member

minux commented Jun 14, 2012

Comment 1:

i'm afraid the reason is not that simple.
i straced profilesegv, and found out the thread handled several SIGPROF signals before
the one that triggered SIGSEGV.
4435  --- SIGPROF (Profiling timer expired) @ 0 (0) --- 
4435  rt_sigreturn(0x1)                 = 0 
4435  --- SIGPROF (Profiling timer expired) @ 0 (0) --- 
4435  rt_sigreturn(0x1)                 = 2 
4435  --- SIGPROF (Profiling timer expired) @ 0 (0) --- 
4435  rt_sigreturn(0x1)                 = -94133986
4435  --- SIGPROF (Profiling timer expired) @ 0 (0) --- 
4435  rt_sigreturn(0x1)                 = 2 
4435  --- SIGPROF (Profiling timer expired) @ 0 (0) --- 
4435  --- SIGSEGV (Segmentation fault) @ 0 (0) --- 
4434  +++ killed by SIGSEGV +++ 
4433  +++ killed by SIGSEGV +++ 
because the strace log is very small, I attached it.

Attachments:

  1. strace.log (41566 bytes)

@minux
Copy link
Member

minux commented Jun 14, 2012

Comment 2:

finally, I've found the reason, it has nothing to do with signal masking (although we do
have a bug there, http://golang.org/cl/6297087/)
the signal occurred in runtime.memset, which clobbers r9 and r10;
I think we never anticipated that possibility (except kaib's TODO note in
memset_arm.s, but that only mentioned SIGSEGV triggered by memset itself).
any suggestions? should we fix memset/memmove to not clobber r9/r10 or
we block all signals in them?
@dfc, you have some improvements on these function, what do you think? can you
make memset/memmove not touching r9/r10?

Labels changed: added priority-soon, removed priority-triage.

Status changed to Accepted.

@rsc
Copy link
Contributor

rsc commented Jun 14, 2012

Comment 3:

The block copy uses
    MOVM.IA.W [R4-R11], (R(TO))
which is a 32-byte copy using 8 registers.
Is there a 16-byte form using 4 registers?
The simplest fix would be to stop using R8, R9, R10, and R11, and then make this
    MOVM.IA.W [R4-R7], (R(TO))
if that's an option.
Russ

@minux
Copy link
Member

minux commented Jun 14, 2012

Comment 4:

oh, movm.ia.w uses a bitmap for registers, so we can use any 8 registers.
i think we can use r12&r14(LR) instead of r9/r10, and still maintain the
32-byte copy efficiency.
i think we can use LR in leaf functions (provided we back it up on stack),
what do you think about this idea?

@rsc
Copy link
Contributor

rsc commented Jun 14, 2012

Comment 5:

Will using LR break tracebacks? It would be nice not to do that.
You could use R2 (TOE) if you saved it before the f32loop and
restored it afterward.
Russ

@davecheney
Copy link
Contributor Author

Comment 6:

http://golang.org/cl/6300043
Fixes r9/r10 during memset. I hadn't mailed it because it produced no measurable
improvement in performance.

@davecheney
Copy link
Contributor Author

Comment 7:

Brilliant diagnosis btw, thanks.

@davecheney
Copy link
Contributor Author

Comment 8:

I've confirmed avoiding r9/10 during memmove solves the problem
(http://golang.org/cl/6300096 _really_ not ready for submission). I'll continue
to try to find a nicer way to do memmove.

@davecheney
Copy link
Contributor Author

Comment 9:

Responding to comment #3, doing two back to back 16 byte moves, vs a single 32byte move
looks like it costs about 15% when the data fits in the cache.
pando(~/go/src/pkg/runtime) % ~/go/misc/benchcmp {old,new}.txt 
benchmark              old ns/op    new ns/op    delta
BenchmarkMemmove32           113          112   -0.88%
BenchmarkMemmove4K          1242         1437  +15.70%
BenchmarkMemmove64K        60473        68747  +13.68%
BenchmarkMemmove4M      28684690     28676750   -0.03%
BenchmarkMemmove64M    460833600    460650800   -0.04%
benchmark               old MB/s     new MB/s  speedup
BenchmarkMemmove32        282.05       284.38    1.01x
BenchmarkMemmove4K       3295.79      2849.69    0.86x
BenchmarkMemmove64K      1083.72       953.28    0.88x
BenchmarkMemmove4M        146.22       146.26    1.00x
BenchmarkMemmove64M       145.62       145.68    1.00x

@davecheney
Copy link
Contributor Author

Comment 10:

Here is an alternative proposal for memmove that avoids spilling r9/10 in favour of
spilling and reusing the start/end register during the block move section.
http://golang.org/cl/6305100
@minux: I could not make MOVM.IA.W [R4-R7,R11] style syntax work, possibly the 5a
assembler doesn't understand it.

@minux
Copy link
Member

minux commented Jun 16, 2012

Comment 11:

@dfc, you could use MOVM.IA.W [R11,R4-R7], (R0); note the register range is at the end
and only one register range is allowed. (we certainly could lift this restriction, but
do you
have any requirements for this flexibility?)

@davecheney
Copy link
Contributor Author

Comment 12:

I'm happy to keep specifying the register bitmap explicitly as I don't
think I'm going to be able to change the code to always generate 2
register sets let alone one.

@davecheney
Copy link
Contributor Author

Comment 13:

This issue was closed by revision e34079b.

Status changed to Fixed.

davecheney added a commit that referenced this issue May 11, 2015
««« backport 5ca8acc84025
runtime: avoid r9/r10 during memmove

Fixes #3718.

Requires CL 6300043.

R=rsc, minux.ma, extraterrestrial.neighbour
CC=golang-dev
https://golang.org/cl/6305100

»»»
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants