Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/sys/cpu: SIGILL on arm64 in readARM64Registers #57336

Closed
bradfitz opened this issue Dec 15, 2022 · 24 comments
Closed

x/sys/cpu: SIGILL on arm64 in readARM64Registers #57336

bradfitz opened this issue Dec 15, 2022 · 24 comments
Labels
arch-arm64 FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@bradfitz
Copy link
Contributor

On a Synology DS220J (Realtek RTD1296), an arm64 CPU, with x/sys@v0.1.0:

SIGILL: illegal instruction
PC=0x4e0d70 m=0 sigcode=1
instruction bytes: 0x0 0x6 0x38 0xd5 0xe0 0x7 0x0 0xf9 0xc0 0x3 0x5f 0xd6 0x0 0x0 0x0 0x0

goroutine 1 [running, locked to thread]:
golang.org/x/sys/cpu.getisar0()
	golang.org/x/sys@v0.1.0/cpu/cpu_arm64.s:14 fp=0x4000093510 sp=0x4000093510 pc=0x4e0d70
golang.org/x/sys/cpu.readARM64Registers()
	golang.org/x/sys@v0.1.0/cpu/cpu_arm64.go:65 +0x2c fp=0x4000093550 sp=0x4000093510 pc=0x4e05fc
golang.org/x/sys/cpu.doinit()
	golang.org/x/sys@v0.1.0/cpu/cpu_linux_arm64.go:38 +0x24 fp=0x4000093560 sp=0x4000093550 pc=0x4e08e4
golang.org/x/sys/cpu.archInit(...)
	golang.org/x/sys@v0.1.0/cpu/cpu_arm64.go:48
golang.org/x/sys/cpu.init.0()
	golang.org/x/sys@v0.1.0/cpu/cpu.go:199 +0x20 fp=0x4000093570 sp=0x4000093560 pc=0x4dfc80
runtime.doInit(0xfe7740)
	runtime/proc.go:6321 +0x128 fp=0x40000936b0 sp=0x4000093570 pc=0x56c38
runtime.doInit(0xfe9980)
	runtime/proc.go:6298 +0x68 fp=0x40000937f0 sp=0x40000936b0 pc=0x56b78
runtime.doInit(0xfee380)
	runtime/proc.go:6298 +0x68 fp=0x4000093930 sp=0x40000937f0 pc=0x56b78
runtime.doInit(0xff4da0)
	runtime/proc.go:6298 +0x68 fp=0x4000093a70 sp=0x4000093930 pc=0x56b78
runtime.doInit(0xff50a0)
	runtime/proc.go:6298 +0x68 fp=0x4000093bb0 sp=0x4000093a70 pc=0x56b78
runtime.doInit(0xffa960)
	runtime/proc.go:6298 +0x68 fp=0x4000093cf0 sp=0x4000093bb0 pc=0x56b78
runtime.doInit(0xff58a0)
	runtime/proc.go:6298 +0x68 fp=0x4000093e30 sp=0x4000093cf0 pc=0x56b78
runtime.doInit(0xff7080)
	runtime/proc.go:6298 +0x68 fp=0x4000093f70 sp=0x4000093e30 pc=0x56b78
runtime.main()
	runtime/proc.go:233 +0x1f8 fp=0x4000093fd0 sp=0x4000093f70 pc=0x48bb8
runtime.goexit()
	runtime/asm_arm64.s:1172 +0x4 fp=0x4000093fd0 sp=0x4000093fd0 pc=0x7a824

(originally reported in tailscale/tailscale#5793)

Although that's for x/sys v0.1, there have been no cpu changes since:

$ git diff -r v0.1.0..v0.3.0 cpu
$

@tklauser, looks like you touched that code last in 85b82a3

/cc @golang/runtime

@gopherbot gopherbot added this to the Unreleased milestone Dec 15, 2022
@bradfitz
Copy link
Contributor Author

bradfitz commented Dec 15, 2022

Apparently that Realtek RTD1296 CPU is:

  • ARMv8-A (64-bit)
  • 4x ARM Cortex-A53 @ 1.4 GHz

@bcmills
Copy link
Contributor

bcmills commented Dec 15, 2022

(attn @golang/arm)

@cherrymui
Copy link
Member

The instruction is mrs x0, ID_AA64ISAR0_EL1. Maybe that system register isn't accessible in user space on your machine? I assume the OS is Linux?

@prattmic
Copy link
Member

I think EL1 here is meant to refer to kernel mode. My copy of the ARM manual, section "D13.2.61 ID_AA64ISAR0_EL1, AArch64 Instruction Set Attribute Register 0" even says that access from EL0 (user mode) either traps or is undefined behavior. But clearly it must work on most systems.

@bradfitz
Copy link
Contributor Author

Looks like it's:

Linux fanli-218play 4.4.180+ #42661 SMP Mon Jun 27 15:09:04 CST 2022 aarch64 GNU/Linux synology_rtd1296_ds218play

On a https://www.synology.com/en-us/products/DS218play

@bradfitz
Copy link
Contributor Author

FWIW, on Synology, this program runs as non-root but with some additional capabilities (as in capabilities(7)).

@bradfitz
Copy link
Contributor Author

Looks like this is the same CPU across:

https://www.synology.com/en-us/products/DS220j#specs
https://www.synology.com/en-us/products/DS218#specs
https://www.synology.com/en-us/products/DS418#specs
https://www.synology.com/en-us/products/DS218play#specs
https://www.synology.com/en-us/products/DS118#specs

... which are the 5 of the top 6 arm64 Synology models per our stats. (the DS120j has a Marvell Armada 3700 88F3720)

@prattmic
Copy link
Member

It looks like these instructions do trap. Linux emulates the instructions and returns to userspace: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=77c97b4ee21290f5f083173d957843b615abbff2

It looks like emulation support was added in Linux 4.11, so it makes sense that 4.4 is crashing. From the patch, it seems that we should check for HWCAP_CPUID before executing these instructions.

@prattmic prattmic added the NeedsFix The path to resolution is known, but the work has not been done. label Dec 15, 2022
@bradfitz
Copy link
Contributor Author

Out of curiosity, I checked: no Synology model (for any architecture) runs a kernel newer than 4.4.x. Top kernel versions on Synology we see used:

kernel percent
4.4.180+ 75
3.10.108 13
4.4.59+ 10
3.10.105 8
3.2.101 3
3.2.40 3
3.10.102 2
2.6.32.12 2

But who knows how much was cherry-picked into those +. Apparently not that commit you referenced :)

@bradfitz
Copy link
Contributor Author

So in this case I guess the /proc/self/auxv read is failing, causing the readARM64Registers call, which then gets the SIGILL:

func doinit() {
        if err := readHWCAP(); err != nil {
                // failed to read /proc/self/auxv, try reading registers directly                                                                               
                readARM64Registers()
                return
        }
...

But these same hwcap flags are available in /proc/cpuinfo too, no?

pi@pi400arm64:~ $ uname -r
5.10.103-v8+
pi@pi400arm64:~ $ cat /proc/cpuinfo  | grep ^Features
Features	: fp asimd evtstrm crc32 cpuid
Features	: fp asimd evtstrm crc32 cpuid
Features	: fp asimd evtstrm crc32 cpuid
Features	: fp asimd evtstrm crc32 cpuid

Should we try parsing that first as another fallback before the mrs x0, ID_AA64ISAR0_EL1?

Or would y'all prefer looking at the kernel version and hope that's indicative of reality?

@bradfitz
Copy link
Contributor Author

Related issue with x/sys/cpu on arm64 but on OpenBSD is #42517

@bradfitz
Copy link
Contributor Author

@lovitus notes in tailscale/tailscale#5793 (comment) that they're able to cat /proc/cpuinfo:

fanli@fanli-218play:~$ cat /proc/cpuinfo
processor       : 0
model name      : ARMv8 Processor rev 4 (v8l)
BogoMIPS        : 54.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4
...

@prattmic
Copy link
Member

prattmic commented Dec 16, 2022

I am fairly surprised that /proc/self/auxv is not readable, especially if /proc/cpuinfo is. From tailscale/tailscale#5793 (comment), it sounds like /proc/self/auxv is generally readable.

Is tailscale running inside of some kind of sandbox? Running under `strace -f`` could at least give a sense of what kind of error we get trying to open/read auxv.

@prattmic
Copy link
Member

(As an aside, I'm sad we can't just get auxv from runtime main. It shouldn't be necessary to open a file to read auxv...)

@prattmic
Copy link
Member

As far as changing x/sys/cpu to avoid SIGILL when /proc/self/auxv can't be read, I see a few options, none stellar:

  1. Attempt to read CPU capabilities from /proc/cpuinfo. I don't have high confidence in cpuinfo being available if auxv is not.
  2. Use uname to do a best-effort version check before readARM64Registers.
  3. Install a SIGILL handler and manually recover if the instructions fault. This is going to get really messy.
  4. Don't use readARM64Registers at all and assume minimal CPU features.
  5. Accept the status quo that this might crash on old kernels.

@prattmic
Copy link
Member

@tklauser For more context, I'm curious what your motivation for https://go.dev/cl/209478 was. What systems did not have auxv available that you wanted to support?

@bradfitz
Copy link
Contributor Author

Is tailscale running inside of some kind of sandbox? Running under `strace -f`` could at least give a sense of what kind of error we get trying to open/read auxv.

Synology has two major OS versions in the wild. In the newer one (DSM7), packages run non-root but can request certain capabilities(7) in their manifest. I'd never noticed a filesystem chroot or anything but our existing four Synology test devices don't yet include arm64. One is arriving Monday, though, and then I'll be able to see.

Checking uname kernel version seems easiest. We do that already elsewhere so should work.

@ianlancetaylor
Copy link
Contributor

x/sys/cpu already reaches into the standard library for a couple of cases. We could save the auxv information in sysauxv and provide a hidden runtime package function for x/sys/cpu to call to retrieve it.

@gopherbot
Copy link

Change https://go.dev/cl/458256 mentions this issue: runtime: expose auxv for use by x/sys/cpu

@bradfitz
Copy link
Contributor Author

x/sys/cpu already reaches into the standard library for a couple of cases. We could save the auxv information in sysauxv and provide a hidden runtime package function for x/sys/cpu to call to retrieve it.

Something like this? https://go-review.googlesource.com/c/go/+/458256

I then made a change to x/sys/cpu using that. Seems to work.

@bradfitz
Copy link
Contributor Author

Amazon arrived and I have yet another test device so I can reproduce now.

/proc/self/auxv isn't readable because the binary runs as a non-root user but its binary is additionally given /bin/setcap cap_net_admin,cap_net_raw+eip which changes the permissions in procfs it seems (per https://lore.kernel.org/lkml/6ddd00bd-87d9-484e-8f2a-06f15a75a4df@app.fastmail.com/T/#md2736e8bf1b5a87e3913f2f558887bc012ae75c8).

If I remove those capabilities, then x/sys/cpu reads /proc/self/auxv and is fine. (but runs in a degraded mode where certain operations aren't permitted due to the Synology sandbox)

@bradfitz
Copy link
Contributor Author

Minimal repro against Linux 4.4.180+ (arm64):

bradfitz@tsdev:~/src/golang.org/x/sys$ git di
diff --git a/cpu/cpu_linux_arm64.go b/cpu/cpu_linux_arm64.go
index 79a38a0..fee6f12 100644
--- a/cpu/cpu_linux_arm64.go
+++ b/cpu/cpu_linux_arm64.go
@@ -4,6 +4,11 @@
 
 package cpu
 
+import (
+       "io/ioutil"
+       "log"
+)
+
 // HWCAP/HWCAP2 bits. These are exposed by Linux.
 const (
        hwcap_FP       = 1 << 0
@@ -34,6 +39,9 @@ const (
 
 func doinit() {
        if err := readHWCAP(); err != nil {
+               log.Printf("XXX readHWCap error: %v", err)
+               cpuinfo, err := ioutil.ReadFile("/proc/cpuinfo")
+               log.Printf("XXX /proc/cpuinfo = %q, %v", cpuinfo, err)
                // failed to read /proc/self/auxv, try reading registers directly
                readARM64Registers()
                return

Then, after a go.mod replace:

$ cat cputest.go
package main

import (
        "fmt"

        "golang.org/x/sys/cpu"
)

func main() {
        fmt.Printf("%+v\n", cpu.ARM64)
}

$ GOARCH=arm64 go build  && scp cputest ts@10.2.201.2: && ssh root@10.2.201.2  /bin/setcap cap_net_admin,cap_net_raw+eip /var/services/homes/ts/cputest && ssh ts@10.2.201.2 ./cputest 2>&1 | head -20
cputest                                                                                                                                 100% 2029KB  59.2MB/s   00:00    
2022/12/18 18:50:46 XXX readHWCap error: open /proc/self/auxv: permission denied
2022/12/18 18:50:46 XXX /proc/cpuinfo = "processor\t: 0\nmodel name\t: ARMv8 Processor rev 4 (v8l)\nBogoMIPS\t: 54.00\nFeatures\t: fp asimd evtstrm aes pmull sha1 sha2 crc32\nCPU implementer\t: 0x41\nCPU architecture: 8\nCPU variant\t: 0x0\nCPU part\t: 0xd03\nCPU revision\t: 4\n\nprocessor\t: 1\nmodel name\t: ARMv8 Processor rev 4 (v8l)\nBogoMIPS\t: 54.00\nFeatures\t: fp asimd evtstrm aes pmull sha1 sha2 crc32\nCPU implementer\t: 0x41\nCPU architecture: 8\nCPU variant\t: 0x0\nCPU part\t: 0xd03\nCPU revision\t: 4\n\nprocessor\t: 2\nmodel name\t: ARMv8 Processor rev 4 (v8l)\nBogoMIPS\t: 54.00\nFeatures\t: fp asimd evtstrm aes pmull sha1 sha2 crc32\nCPU implementer\t: 0x41\nCPU architecture: 8\nCPU variant\t: 0x0\nCPU part\t: 0xd03\nCPU revision\t: 4\n\nprocessor\t: 3\nmodel name\t: ARMv8 Processor rev 4 (v8l)\nBogoMIPS\t: 54.00\nFeatures\t: fp asimd evtstrm aes pmull sha1 sha2 crc32\nCPU implementer\t: 0x41\nCPU architecture: 8\nCPU variant\t: 0x0\nCPU part\t: 0xd03\nCPU revision\t: 4\n\n", <nil>
SIGILL: illegal instruction
PC=0x9feb0 m=0 sigcode=1
instruction bytes: 0x0 0x6 0x38 0xd5 0xe0 0x7 0x0 0xf9 0xc0 0x3 0x5f 0xd6 0x0 0x0 0x0 0x0

goroutine 1 [running, locked to thread]:
golang.org/x/sys/cpu.getisar0()
        /home/bradfitz/src/golang.org/x/sys/cpu/cpu_arm64.s:14 fp=0x4000041c20 sp=0x4000041c20 pc=0x9feb0
golang.org/x/sys/cpu.readARM64Registers()
        /home/bradfitz/src/golang.org/x/sys/cpu/cpu_arm64.go:65 +0x2c fp=0x4000041c60 sp=0x4000041c20 pc=0x9f69c
golang.org/x/sys/cpu.doinit()
        /home/bradfitz/src/golang.org/x/sys/cpu/cpu_linux_arm64.go:46 +0xc8 fp=0x4000041ce0 sp=0x4000041c60 pc=0x9fa28
golang.org/x/sys/cpu.archInit(...)
        /home/bradfitz/src/golang.org/x/sys/cpu/cpu_arm64.go:48
golang.org/x/sys/cpu.init.0()
        /home/bradfitz/src/golang.org/x/sys/cpu/cpu.go:199 +0x20 fp=0x4000041cf0 sp=0x4000041ce0 pc=0x9ed20
runtime.doInit(0x1522e0)
        /home/bradfitz/sdk/go1.19/src/runtime/proc.go:6321 +0x128 fp=0x4000041e30 sp=0x4000041cf0 pc=0x51828
runtime.doInit(0x1516e0)

@prattmic wrote:

I don't have high confidence in cpuinfo being available if auxv is not.

Neither did I, but it seems to work.

I'll send a change to fall back to parsing that for now for Linux kernels prior to 4.11.

@gopherbot
Copy link

Change https://go.dev/cl/458315 mentions this issue: cpu: parse /proc/cpuinfo on linux/arm64 on old kernels when needed

gopherbot pushed a commit that referenced this issue Feb 16, 2023
Updates #57336

Change-Id: I181885f59bac59360b855d3990326ea2b268bd28
Reviewed-on: https://go-review.googlesource.com/c/go/+/458256
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot
Copy link

Change https://go.dev/cl/465295 mentions this issue: cpu: get hwcap/auxv from the Go 1.21+ runtime

gopherbot pushed a commit to golang/sys that referenced this issue Feb 17, 2023
Depends on https://go.dev/cl/458256

This change only does Linux for now.

Updates golang/go#57336

Change-Id: I0659697c1bdc6e2577c6251b964a0df32047ee12
Reviewed-on: https://go-review.googlesource.com/c/sys/+/465295
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
johanbrandhorst pushed a commit to Pryz/go that referenced this issue Feb 22, 2023
Updates golang#57336

Change-Id: I181885f59bac59360b855d3990326ea2b268bd28
Reviewed-on: https://go-review.googlesource.com/c/go/+/458256
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@golang golang locked and limited conversation to collaborators Feb 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants