Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: NumCPU always 1 when using CGO with libgomp [linux/arm64 - openstack virtualization] #47365

Closed
willliu opened this issue Jul 24, 2021 · 9 comments

Comments

@willliu
Copy link

willliu commented Jul 24, 2021

What version of Go are you using (go version)?

$ go version
go version go1.16.6 linux/arm64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

[machine 1 (in cloud) - error]

go env Output
$ go env
GO111MODULE="on"
GOARCH="arm64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://goproxy.cn,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_arm64"
GOVCS=""
GOVERSION="go1.16.6"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1705612511=/tmp/go-build -gno-record-gcc-switches"

gcc --version & numactl -H & uname -a Output
$gcc --version
gcc (GCC) 7.3.0

$numactl -H
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 16021 MB
node 0 free: 10794 MB
node 1 cpus: 8 9 10 11 12 13 14 15
node 1 size: 15341 MB
node 1 free: 12248 MB
node distances:
node 0 1
0: 10 20
1: 20 10

$uname -a
Linux ecs-2207 4.19.90-2003.4.0.0036.oe1.aarch64 #1 SMP Mon Mar 23 19:06:43 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

sudo dmidecode Output
Handle 0x0100, DMI type 1, 27 bytes
System Information
	Manufacturer: OpenStack Foundation
	Product Name: OpenStack Nova
	Version: 13.2.1-20210526121246_99bdf3c
	Serial Number: 1e730be6-d3fc-46ee-8442-49bad98bee50
	UUID: 1e730be6-d3fc-46ee-8442-49bad98bee50
	Wake-up Type: Power Switch
	SKU Number: Not Specified
	Family: Virtual Machine

[machine 2 (bare metal) - correct]

go env Output

$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/home/xxx/.cache/go-build"
GOENV="/home/xxx/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/xxx/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/xxx/go"
GOPRIVATE=""
GOPROXY="https://goproxy.cn,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_arm64"
GOVCS=""
GOVERSION="go1.16.6"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build539472266=/tmp/go-build -gno-record-gcc-switches"

gcc --version & numactl -H & uname -a Output
$gcc --version
gcc (GCC) 7.3.0

$numactl -H
available: 1 nodes (0)
node 0 cpus: 0 1 2 3
node 0 size: 31299 MB
node 0 free: 29630 MB
node distances:
node 0
0: 10

$uname -a
Linux localhost 4.19.90-2012.4.0.0053.oe1.aarch64 #1 SMP Mon Dec 21 14:33:58 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

What did you do?

main.go:

package main

//#cgo CFLAGS:
//#cgo CXXFLAGS:
//#cgo LDFLAGS: -lgomp
//#
import "C"

import (
"fmt"
"runtime"
)

func main(){
cpuNum := runtime.NumCPU()
fmt.Println("number of current logic cpus: ",cpuNum)
}


$go run main.go

On machine 1, there are 16 logic cpus, but when linked with cgo -lgomp, runtime.NumCPU gives 1. If one comment out -lgomp, the output is correct. Machine 1 is running virtually in cloud.

On machine 2, there are 4 logic cpus, when linked with cgo -lgomp, runtime.NumCPU gives 4 as expected. Machine 2 is running in bare metal.

What did you expect to see?

[machine 1]
number of current logic cpus: 16

[machine 2]
number of current logic cpus: 4

What did you see instead?

[machine 1 - error]
number of current logic cpus: 1

[machine 2 - correct]
number of current logic cpus: 4

@willliu willliu changed the title NumCPU always 1 when using CGO with libgomp runtime: NumCPU always 1 when using CGO with libgomp [linux/arm64] Jul 24, 2021
@n4j
Copy link

n4j commented Jul 24, 2021

@willliu Tried this on my machine with and without -lgompand I do get expected value for logical cpus

➜  /tmp go run main.go
number of current logic cpus:  24
➜  /tmp go version
go version go1.16.6 linux/amd64

My environment info

➜  /tmp uname -a
Linux turing 5.12.8-051208-generic #202105281232 SMP Fri May 28 12:35:52 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
➜  /tmp gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@willliu
Copy link
Author

willliu commented Jul 24, 2021

@willliu Tried this on my machine with and without -lgompand I do get expected value for logical cpus

➜  /tmp go run main.go
number of current logic cpus:  24
➜  /tmp go version
go version go1.16.6 linux/amd64

My environment info

➜  /tmp uname -a
Linux turing 5.12.8-051208-generic #202105281232 SMP Fri May 28 12:35:52 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
➜  /tmp gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Thank you for your reply, I later checked my bare metal, it is correct (same go, gcc, os). The original case is in cloud, so I suspect something with hypervisor is involved.

P.s. my machines are ARM64 instead of AMD64.

@willliu willliu changed the title runtime: NumCPU always 1 when using CGO with libgomp [linux/arm64] runtime: NumCPU always 1 when using CGO with libgomp [linux/arm64 - openstack virtualization] Jul 24, 2021
@AlexRouSg
Copy link
Contributor

Go should be reading and parsing the value from https://linux.die.net/man/2/sched_getaffinity
Try checking the value from that, if that is wrong then I don't think there is something Go can do about it.

@willliu
Copy link
Author

willliu commented Jul 24, 2021

Go should be reading and parsing the value from https://linux.die.net/man/2/sched_getaffinity
Try checking the value from that, if that is wrong then I don't think there is something Go can do about it.

Thank you Alex, after check the source code, runtime.NumCPU traces back to func getproccount() int32 in file src/runtime/os_linux.go, which calls sched_getaffinity as you pointed out.

Now, I suspect that in some case cgo's linking with -lgomp makes sched_getaffinity less reliable to obtain number of CPUs.

btw, I tested on the error prone machine 1, with the following C++ code linked with openmp:

main.cpp

#include <iostream>
#include <string>
#include <omp.h>

using namespace std;

int main(){
    int numProcs = omp_get_num_procs();
    cout << "omp_get_num_procs() = " << numProcs << endl;
    return 0;
}

$ g++ main -fopenmp -lgomp -O3 -o main
$ ./main
$ omp_get_num_procs() = 16

It gives the correct number of logic cpus, so openmp seems work.

@AlexRouSg
Copy link
Contributor

AlexRouSg commented Jul 24, 2021

omp_get_num_procs is a completely different function, we need to know that sched_getaffinity in C/C++ works.
If it doesn't work in C/C++ then it is not specific to cgo.

@willliu
Copy link
Author

willliu commented Jul 24, 2021

omp_get_num_procs is a completely different function, we need to know that sched_getaffinity in C/C++ works.
If it doesn't work in C/C++ then it is not specific to cgo.

Alex, you pointed out the right direction, and I indeed found the source of error (and miracle).

main.c (adapted from stackoverflow.com/questions/10490756/how-to-use-sched-getaffinity-and-sched-setaffinity-in-linux-from-c)

#define _GNU_SOURCE
#include <assert.h>
#include <sched.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void print_affinity() {
    cpu_set_t mask;
    long nproc, i;

    if (sched_getaffinity(0, sizeof(cpu_set_t), &mask) == -1) {
        perror("sched_getaffinity");
        assert(false);
    }
    nproc = sysconf(_SC_NPROCESSORS_ONLN);
    printf("sched_getaffinity = ");
    for (i = 0; i < nproc; i++) {
        printf("%d ", CPU_ISSET(i, &mask));
    }
    printf("\n");
}

int main(void) {
    cpu_set_t mask;

    print_affinity();
    return 0;
}

$gcc -o main main.c
$./main
$sched_getaffinity = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Now linked with -lgomp
$gcc -lgomp -o main main.c
$./main
$sched_getaffinity = 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

This agrees with golang's behavior previously, that runtime.NumCPU=1. Now the miracle is why linking -lgomp could change this on my virtual machine, does it hint that relaying on sched_getaffinity to get number of CPUs is less reliable? Could it happen for other virtual machine as well?

Thanks again Alex,

@AlexRouSg
Copy link
Contributor

I would guess gomp is calling sched_setaffinity setting it to 1 or doing something weird.
You can try asking in gomp channels and see if they have any idea.

@willliu
Copy link
Author

willliu commented Jul 24, 2021

@n4j @AlexRouSg Thank you all, setting environment export OMP_PROC_BIND=false has solved this problem for now.

@ianlancetaylor
Copy link
Contributor

It looks like this has been resolved, and it looks like there is no change required to the Go source code. Closing.

@golang golang locked and limited conversation to collaborators Jul 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants