Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: TCPConn.Read Higher CPU usage than comparable operation in python3/c#/nodejs #53939

Closed
andrewaggb opened this issue Jul 18, 2022 · 3 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance

Comments

@andrewaggb
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.18.4 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/root/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18.4"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/root/GoServer/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3785892550=/tmp/go-build -gno-record-gcc-switches"
cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.4 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

What did you do?

I was debugging higher than expected CPU usage for async socket operations in c#/net6 on linux and ran a series of socket tests in nodejs, python3, c, and go as a references. I noticed that go uses about twice as much cpu as python3/c/.net6 and somewhat more than nodejs in my synchronous read test case. It's not a bug but I thought I'd share the findings in case somebody is interested at looking at it. I suspect there is some optimization that can be done to bring the go results in line with the others. I'm not a go developer but the code and are test are very simple so I don't believe it's naivety on my part - though I apologize if it is. These tests are done with the default settings, I'm not adjusting any socket settings, OS settings, etc. Just plain socket reads into an oversized buffer.

From another host on the network I run netcat to saturate the network (in my case the receiving server is 192.168.1.93 and my tests were with a gigabit network)
cat /dev/zero | nc 192.168.1.93 9989

Then I run a receiving application such as the go application below to read the data and print out some details when I kill the netcat. I monitored the cpu usage with top and confirmed the network usage with iftop in addition to the application output below.

using netcat as a listener
nc -l 9989 >/dev/null
uses about 8% cpu

using golang with a goroutine per connection receiving into a buffer
./GoServer (attached zip, but it's super simple and has the code)
uses about 14% cpu

using nodejs and the socket.on data callback
nodejs jsserver.js
uses about 10% cpu (not re-using a receive buffer so it's less efficient)

using python3 with a synchronous tcp socket receiving into a buffer
python3 pyserver.py
uses about 8% cpu

using .net6 with a synchronous tcp socket receiving into a buffer
dotnet run --configuration Release PerfTestTCPListener
uses about 6-8% cpu

Code for python3, nodejs, c#/net6
OtherSamples.zip

Code for Go
GoServer.zip

package main

import (
        "fmt"
        "io"
        "net"
        "os"
        "time"
)

func main() {
        port := "0.0.0.0:9989"
        listener, err := net.Listen("tcp", port)
        if err != nil {
                fmt.Println("failed to create listener, err:", err)
                os.Exit(1)
        }
        fmt.Printf("listening on %s\n", listener.Addr())

        for {
                conn, err := listener.Accept()
                if err != nil {
                        fmt.Println("failed to accept connection, err:", err)
                        continue
                }
                // run connection in a goroutine
                go handleConnection(conn)
        }
}

func handleConnection(conn net.Conn) {
        defer conn.Close()
        buf := make([]byte, 1048576)
        nBytes := 0
        start := time.Now()
        for {
                n, err := conn.Read(buf)
                if err != nil {
                        if err != io.EOF {
                                fmt.Println("failed to read data, err:", err)
                        }
                        t := time.Now()
                        elapsed := t.Sub(start)
                        fmt.Printf("Read: %d MB in %s, %f MB/s\n", nBytes/1048576, elapsed, float64(nBytes)/1048576.0/el                                                                       apsed.Seconds())
                        return
                }
                nBytes += n
        }
}

What did you expect to see?

Similar cpu usage to python/.net6/nodejs

What did you see instead?

~14% cpu usage vs ~8% cpu usage for python3/net6 and ~10% cpu usage for nodejs

@seankhliao seankhliao changed the title affected/package: net.Read Higher CPU usage than comparable operation in python3/c#/nodejs net: TCPConn.Read Higher CPU usage than comparable operation in python3/c#/nodejs Jul 18, 2022
@seankhliao seankhliao added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jul 18, 2022
@dspezia-1a
Copy link

It sounds like an apples to oranges comparison. Although the programs do the same thing, they do not do it in the same way.

For Python and synchronous C# , I think there is no multiplexing in your test (i.e. no event loop), just a thread waiting on a unique system call to read the socket. Go and Node.js probably generate more system calls, because they automatically multiplex the I/Os (with epoll). Since you measure with a single connection, it is a worst case for I/O multiplexing.

Also the default configuration of the socket may be different from one language implementation to the other (Nagle algorithm, socket buffers, etc ...).

You could count the number of system calls using strace -c and compare the results.

@andrewaggb
Copy link
Author

Ok. I don't disagree with any of your remarks. I did run some trace results but it was casual and not for the exact same amount of time so the number of calls isn't directly comparable though the proportions (% time) should be fine. You're correct that nodejs and go appear to be using epoll and python3 and .net6 don't appear to be (from the strace results) for this scenario.

I'm fine if you want to close the issue, I just thought the results might be worth further investigation as the cpu discrepancy was repeatable and somewhat large but if that's expected then that's that. Nodejs did use less CPU than go in my testing but that may very well be down to deliberate design choices.

Go

strace -c ./GoServer
listening on [::]:9989
^Cstrace: Process 2728279 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 88.61    3.058293         923      3312         1 futex
  6.79    0.234224          30      7627      3172 read
  4.61    0.159014          30      5268           epoll_pwait
  0.00    0.000000           0         1           write
  0.00    0.000000           0         8           close
  0.00    0.000000           0         3           fstat
  0.00    0.000000           0        35           mmap
  0.00    0.000000           0         7           mprotect
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0         3           brk
  0.00    0.000000           0       119           rt_sigaction
  0.00    0.000000           0        11           rt_sigprocmask
  0.00    0.000000           0        17           rt_sigreturn
  0.00    0.000000           0         8           pread64
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         4           socket
  0.00    0.000000           0         3           bind
  0.00    0.000000           0         1           listen
  0.00    0.000000           0         2           getsockname
  0.00    0.000000           0         8           setsockopt
  0.00    0.000000           0         4           clone
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         5           fcntl
  0.00    0.000000           0         2           sigaltstack
  0.00    0.000000           0         2         1 arch_prctl
  0.00    0.000000           0         1           gettid
  0.00    0.000000           0         1           sched_getaffinity
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         5           epoll_ctl
  0.00    0.000000           0         5           openat
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         3         2 accept4
  0.00    0.000000           0         1           epoll_create1
  0.00    0.000000           0         1           pipe2
  0.00    0.000000           0         1           prlimit64
------ ----------- ----------- --------- --------- ----------------
100.00    3.451531                 16473      3177 total

python3

strace -c python3 pyserver.py
starting up on 192.168.1.93 port 9989
waiting for a connection
connection from ('192.168.1.128', 48620)
^Cstrace: Process 2722064 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
Traceback (most recent call last):
 99.12    1.237329          38     32450           recvfrom
  0.21    0.002662          14       186        40 stat
  0.09    0.001131          16        67           rt_sigaction
  0.09    0.001109          10       103           fstat
  0.09    0.001106          11        93           read
  0.07    0.000873          13        64         2 openat
  File "pyserver.py", line 26, in <module>
  0.07    0.000853           9        86         3 lseek
  0.07    0.000825          13        62           mmap
  0.06    0.000727          11        65           close
  0.04    0.000531          10        52        46 ioctl
  0.02    0.000257          12        20           getdents64
  0.02    0.000200          18        11           mprotect
  0.01    0.000159          19         8           pread64
  0.01    0.000158          12        13           brk
  0.01    0.000089          44         2         1 arch_prctl
  0.01    0.000063          21         3         2 readlink
  0.00    0.000052          17         3           dup
  0.00    0.000032          10         3           fcntl
  0.00    0.000024           1        16           munmap
  0.00    0.000023          23         1         1 access
  0.00    0.000019          19         1           futex
  0.00    0.000019          19         1           prlimit64
  0.00    0.000019          19         1           getrandom
  0.00    0.000018          18         1           lstat
  0.00    0.000018          18         1           rt_sigprocmask
  0.00    0.000018          18         1           set_tid_address
  0.00    0.000018          18         1           set_robust_list
  0.00    0.000017          17         1           getcwd
  0.00    0.000000           0         3           write
  0.00    0.000000           0         1           socket
  0.00    0.000000           0         1           bind
  0.00    0.000000           0         1           listen
  0.00    0.000000           0         1           getsockname
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           sysinfo
  0.00    0.000000           0         1           getuid
  0.00    0.000000           0         1           getgid
  0.00    0.000000           0         1           geteuid
  0.00    0.000000           0         1           getegid
  0.00    0.000000           0         1           sigaltstack
  0.00    0.000000           0         1           accept4
------ ----------- ----------- --------- --------- ----------------
100.00    1.248349                 33331        95 total
    nBytes = connection.recv_into(view, 1048576)

nodejs

strace -c nodejs jsserver.js
Server listening for connection requests on socket localhost:9989
A new connection has been established.
^Cstrace: Process 2722385 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 49.63    1.112156          29     38050           epoll_wait
 44.64    1.000243          25     39535           read
  3.03    0.067818          37      1791        17 futex
  1.92    0.043027          11      3686           mprotect
  0.41    0.009177          12       715           brk
  0.10    0.002218          18       122           mmap
  0.08    0.001768          16       109           write
  0.03    0.000620          17        36           getpid
  0.03    0.000562          18        30         8 openat
  0.02    0.000416          21        19           close
  0.02    0.000363          16        22           fstat
  0.02    0.000353          11        31        10 ioctl
  0.01    0.000275           7        35           munmap
  0.01    0.000270          45         6           clone
  0.01    0.000173          15        11           getgid
  0.01    0.000168          21         8           pread64
  0.01    0.000166          15        11           getegid
  0.01    0.000160          20         8           prlimit64
  0.01    0.000154          14        11           getuid
  0.01    0.000135          12        11           geteuid
  0.00    0.000074          24         3           pipe2
  0.00    0.000073          10         7           rt_sigaction
  0.00    0.000059          29         2         1 accept4
  0.00    0.000055           9         6           madvise
  0.00    0.000046           6         7           rt_sigprocmask
  0.00    0.000045          22         2           epoll_create1
  0.00    0.000043          21         2           eventfd2
  0.00    0.000040          10         4           epoll_ctl
  0.00    0.000032          32         1           getrandom
  0.00    0.000023           4         5           statx
  0.00    0.000016           8         2         1 arch_prctl
  0.00    0.000012          12         1           set_tid_address
  0.00    0.000012          12         1           set_robust_list
  0.00    0.000008           2         4           readlink
  0.00    0.000006           0        11         8 stat
  0.00    0.000006           3         2           fcntl
  0.00    0.000005           2         2           dup3
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           socket
  0.00    0.000000           0         1           bind
  0.00    0.000000           0         1           listen
  0.00    0.000000           0         2           setsockopt
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2           getcwd
------ ----------- ----------- --------- --------- ----------------
100.00    2.240777                 84318        46 total

net6

strace -c dotnet run --configuration Release PerfTestTCPListener
PerfTestTCPClient NoDelay False, ReceiveBufferSize 131072, SendBufferSize 16384
Listener started on port 9989
Received connection request from 192.168.1.128:48862
^Cstrace: Process 2723416 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 19.45    0.033194          18      1751           read
 17.94    0.030625          74       409        39 futex
 16.66    0.028439          17      1581           lseek
  8.83    0.015076          12      1202           mprotect
  5.15    0.008783          12       717       186 lstat
  4.39    0.007486           6      1070       187 stat
  4.20    0.007173           6      1035           madvise
  4.10    0.007001           9       767           pread64
  2.74    0.004677          14       322           mmap
  2.69    0.004599          10       421        93 openat
  2.46    0.004200          10       382           fstat
  2.30    0.003923          11       329           close
  2.27    0.003878          18       207           rt_sigprocmask
  1.95    0.003331          10       332           flock
  0.86    0.001475           5       247           getcwd
  0.58    0.000995          41        24        12 access
  0.37    0.000628           7        83           fcntl
  0.36    0.000623          10        60           munmap
  0.32    0.000542          33        16           sched_yield
  0.31    0.000532           7        71           brk
  0.27    0.000468          42        11           clone
  0.27    0.000456          20        22           readlink
  0.27    0.000456          15        30           getdents64
  0.16    0.000272          10        27           rt_sigaction
  0.12    0.000207          41         5           getgid
  0.09    0.000159          26         6           getegid
  0.09    0.000158          22         7           geteuid
  0.07    0.000127          18         7           getuid
  0.07    0.000114          14         8           prlimit64
  0.06    0.000107          11         9           write
  0.06    0.000107          15         7           sysinfo
  0.06    0.000103          25         4           pipe2
  0.05    0.000092          18         5           membarrier
  0.05    0.000087          21         4         2 unlink
  0.04    0.000076          15         5           sched_getaffinity
  0.04    0.000067          67         1           vfork
  0.03    0.000058           9         6           getpid
  0.03    0.000057          28         2           mknod
  0.03    0.000051          17         3           statfs
  0.03    0.000045          45         1           bind
  0.02    0.000034          17         2           sigaltstack
  0.02    0.000031          31         1           fadvise64
  0.02    0.000026          26         1           socket
  0.01    0.000022          22         1           getgroups
  0.01    0.000020          20         1           fchmod
  0.01    0.000019          19         1           listen
  0.01    0.000019           9         2         1 arch_prctl
  0.01    0.000019          19         1           set_tid_address
  0.01    0.000019          19         1           set_robust_list
  0.01    0.000017           5         3           ioctl
  0.01    0.000017           8         2           getsid
  0.01    0.000017           8         2           gettid
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         2           sched_getparam
  0.00    0.000000           0         2           sched_setscheduler
  0.00    0.000000           0         2           sched_getscheduler
  0.00    0.000000           0         2           sched_get_priority_max
  0.00    0.000000           0         2           sched_get_priority_min
------ ----------- ----------- --------- --------- ----------------
100.00    0.170707                 11225       520 total

@seankhliao
Copy link
Member

Duplicate of #52341

@seankhliao seankhliao marked this as a duplicate of #52341 Jul 29, 2022
@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2022
@golang golang locked and limited conversation to collaborators Jul 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

4 participants