-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: illegal instruction on ARMv5 #18694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
No further development is happening on the Go 1.7 branch, except for security fixes. Can you try Go 1.8? |
What's your kernel version? It's probably too old and doesn't provide
needed atomic operation helpers.
|
Updating to master doesn't seem to have made any difference.
Kernel version is 2.6.24.4. What's the minimum required for ARM architecture, as this is newer than the version given on the Minimum Requirements page? |
At least 3.1 as we need __kuser_cmpxchg64.
The documented requirement of 2.6.23 is for x86 only, for ARMv5, we need
much newer kernels.
|
I couldn't find any use of the fault address (PC) in our source code
though. Could you please run the problem under gdb and see where does the
jmp to 0xffff0514 occurr? I'd like to understand more about the problem.
|
That is, I doubt the direct caller of that address is as showed by gdb.
Perhaps you can set a hardware breakpoint on *0xffff0514 and see if it can
catch the real caller?
The address is very special, and it's not within the range for kernel
helpers. Maybe the problem lies elsewhere.
|
I tried to reproduce with qemu (-M versatilepb) with 2.6.32 kernel but the
program runs fine and stops fine when interrupted with Ctrl-C.
|
CL https://golang.org/cl/35352 mentions this issue. |
For future reference:
the kernel help page is in arch/arm/kernel/entry-armv.S (v2.6.24):
https://git.kernel.org/cgit/linux/kernel/git/torvalds/
linux.git/tree/arch/arm/kernel/entry-armv.S?h=v2.6.24&id=
49914084e797530d9baaf51df9eda77babc98fa8
And it gets copied into the page mapped at 0xffff0000 by traps.c:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/
linux.git/tree/arch/arm/kernel/traps.c?h=v2.6.24&id=
49914084e797530d9baaf51df9eda77babc98fa8#n706
OK, I know where does 0xffff0514 comes from. It's the code kernel used to
implement signal return.
KERN_SIGRETURN_CODE is defined as (CONFIG_VECTORS_BASE + 0x00000500) by
https://git.kernel.org/cgit/linux/kernel/git/torvalds/
linux.git/tree/arch/arm/kernel/signal.h?h=v2.6.24&id=
49914084e797530d9baaf51df9eda77babc98fa8#n10
(And CONFIG_VECTOR_BASE should be 0xffff0000)
The code there is in:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/
linux.git/tree/arch/arm/kernel/signal.c?h=v2.6.24&id=
49914084e797530d9baaf51df9eda77babc98fa8#n45
At offset 0x514, the instruction is a thumb instruction:
SWI_THUMB_RT_SIGRETURN,
which no doubt is an invalid instruction for ARM mode.
I still don't understand why the kernel uses thumb version of sigreturn for
us, because our code is definitely ARM, and if the kernel thinks our code
is thumb, the process should have crashed before we even reach the RET
instruction in runtime.sigtramp.
Perhaps a safe workaround is to do our own rt_sigreturn syscall in
sigtramp, I will send a CL for this.
@RobHumphrey, please give the CL a try. Thanks.
|
The patch doesn't work yet, please hold on trying it.
It needs to restore the stack pointer before the sigreturn syscall, so
essentially we need to go back to SA_RESTORER.
|
On the command line I don't have a problem stopping the executable with Ctrl+C either. I have no experience using gdb, but hope some of the following is useful. Adding a hardware breakpoint with
Adding a breakpoint at main.main works. Neither resizing the window nor Using the code built from master gives
Adding a breakpoint at 0x43780 gives
I've been trying to find a way of seeing what's happening when the signal occurs. I can stop the code with Is there a way to add something simple into the loop (eg an increment or nop) that won't get optimised away? Hopefully that would then stop gdb detecting the infinite loop, and so allow me to step and see what's happening. |
The patch is ready. @RobHumphrey please give it a try. Thanks.
If you haven't used Gerrit before, download this zip which contains a patch
for the latest master branch. I think it should apply cleanly to Go 1.7.4
too.
https://go-review.googlesource.com/changes/35352/revisions/3/patch?zip
|
If you have time, I'd like to understand better why the kernel is giving us
a bad returning address.
$ gdb ./loopTest
(gdb) b *'runtime.sigtramp'
(gdb) r
// Make the process crash with SIGILL as usual, and
// gdb should stop the process right when it enters
// the function 'runtime.sigtramp'
(gdb) info registers
// Please paste the output generated by this command.
That is, I want the registers when the process hits 'runtime.sigtramp'.
Please do this without using the patch.
Specifically, I'm interested in the LR and CPSR register values.
If my theory was right, that this is a kernel bug, then LR should be
0xffff0514. If it's not, then something else is going on and my fix just
papers over the real cause.
|
Dumping the registers as requested, still without the patch applied:
The LR contains the same value I saw on the back trace last night. Stepping through until it crashed, the final few instructions were:
Putting a breakpoint at 0x437c4 (two instructions before the crash), looking also at the registers and the data at the SP address:
If you need more information than this then do let me know. |
Realised I should have added that the patch does stop it crashing, even if it's not fixing quite the bug you thought was happening. |
Ping @minux. I'm surprised by this:
It looks to me like we very slightly stomped the saved LR somehow. @RobHumphrey, thanks for all the debugging. If you can still reproduce this (sorry, I know it's been a while), could you redo your last experiment where you stopped right before the crash, but when you're on the |
@aclements, this isn't by chance related to your "runtime: save r11 in ARM addmoduledata" 87a51a0 fix, is it? |
It's possible it's related, but it's certainly not the same bug since the code modified in 87a51a0 only runs in dynamically linked Go libraries. But maybe we failed to follow the C ABI in some signal-related code. |
Timed out in state WaitingForInfo. Closing. (I am just a bot, though. Please speak up if this is a mistake or you have the requested information.) |
I too have a problem running go compiled program running on a WD My Book World Edition (White Light) NAS box, I think my issue is identical. I realise this thread was closed - but was it ever resolved? I can provide further details if requested. |
@DrPeterVC, thanks for checking. Go ahead and open a new issue (with a link to this issue). |
I've been trying to get Syncthing working on a WD My Book World Edition (White Light) NAS box, but encountered illegal instructions. Whilst investigating I found issue #15869, which seemed very similar, so hopefully I can continue to help resolve this issue.
What version of Go are you using (
go version
)?What operating system and processor architecture are you using (
go env
)?What did you do?
Set up a simple infinite loop like:
Compile with:
Transfer the output across to the NAS box and run it.
If I resize the terminal window then it crashes with
Illegal instruction
. Additionally, as reported in #15869, if I kill it with-SIGPIPE
then it also crashes withIllegal instruction
.cpuinfo of target machine:
Under gdb (I resized the window after starting it running):
The text was updated successfully, but these errors were encountered: