Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: network read using a lot of CPU #52341

Open
vinoblade opened this issue Apr 14, 2022 · 9 comments
Open

runtime: network read using a lot of CPU #52341

vinoblade opened this issue Apr 14, 2022 · 9 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@vinoblade
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.18 darwin/amd64

Does this issue reproduce with the latest release?

Yes. This issue is more of a feature request than a bug fix.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18.1"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/go/snap-statsd/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3187482189=/tmp/go-build -gno-record-gcc-switches"

What did you do?

My code is running a tcp server and read from connection in a for loop

        // c is a connection
        r := bufio.NewReader(c)
	for {
		buf, err := r.ReadBytes('\n')
		if err != nil {
			_ = c.Close()
			return
		}
		if len(buf) > 0 {
			t.payloads <- buf
		}

	}

The problem is the syscall.read is 50% of the total CPU, see the profiling below
reduce-connection-1

Looking at https://github.com/golang/go/blob/master/src/internal/poll/fd_unix.go#L163-L167

n, err := ignoringEINTRIO(syscall.Read, fd.Sysfd, p)
		if err != nil {
			n = 0
			if err == syscall.EAGAIN && fd.pd.pollable() {
				if err = fd.pd.waitRead(fd.isFile); err == nil {
					continue
				}
			}
		}

My theory is that on line 163 it calls syscall.read to read the socket, and syscall.read returns error when there is nothing to read. Then on line 167 it calls fd.pd.waitRead() to wait for data. And when data arrives at the connection, the fd is notified and continue to read. I'm seeing two problems here,

  1. If on line 163 the syscall.Read returns nothing, it unnecessarily does one system call. Why don't we call fd.pd.waitRead() first and wait for data to arrive and then call syscall.Read? That way we will save one time system call.
  2. In current mechanism, whenever data arrives, we do one time syscall.Read, no matter how large the data is. Can we implement fd.pd.waitRead() to have it wait until some amount of time, or some amount of data accumulates? That way we can use much fewer syscall.Read to read the same amount of data, which should save CPU usage a lot.
@ianlancetaylor ianlancetaylor changed the title poll: fd_unix Read() seems using a lot of CPU runtime: network read using a lot of CPU Apr 14, 2022
@ianlancetaylor
Copy link
Contributor

If on line 163 the syscall.Read returns nothing, it unnecessarily does one system call. Why don't we call fd.pd.waitRead() first and wait for data to arrive and then call syscall.Read? That way we will save one time system call.

That would force us to use kqueue in level triggered mode which in common cases would lead to a lot of busywork by the poller receiving and ignoring notifications that we don't care about. While this does lead to an extra system call for the case you describe it still on average saves system calls.

In current mechanism, whenever data arrives, we do one time syscall.Read, no matter how large the data is. Can we implement fd.pd.waitRead() to have it wait until some amount of time, or some amount of data accumulates? That way we can use much fewer syscall.Read to read the same amount of data, which should save CPU usage a lot.

Unix kernels do not support this kind of operation, and of course we couldn't use it in general as the Read method has no way to say "don't return until you've read at least N bytes."

I'm not sure how to interpret your profile. You didn't show us the complete program. It's normal for a program that doesn't do anything other than read from the network to spend most of its time reading from the network; what else might it be doing? You also didn't tell us what kind of network traffic it is handling.

It's not clear to me that there is anything we can do here, though I would be happy to hear specific tested suggestions. The Go networking code is optimized for handling many different network connections simultaneously. That does mean that it is not optimized for handling a single network connection; that case works fine but it does use more CPU time than is strictly required.

@ianlancetaylor ianlancetaylor added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 14, 2022
@vinoblade
Copy link
Author

Hey thanks for replying! You are right my application is not handling high number of connections. In most cases there would be at most 5 - 6 connections. And the connections are usually very busy, serving more than 5K QPS traffic per connection. And the network is in TCP. I understand Read doesn't have the capability to "preread", but is it possible to have waitRead() do such things?

@ianlancetaylor
Copy link
Contributor

is it possible to have waitRead() do such things?

I don't know of a way to do that.

@vinoblade
Copy link
Author

vinoblade commented Apr 14, 2022

Got it. Do you think it makes sense to do some event based implementations, using epoll explicitly etc, to improve CPU usage? My guts say it will not help because Accept and Read already implemented with epoll under the hood on Linux. Just wanna double check with you.

@ianlancetaylor
Copy link
Contributor

Yes, since the Go runtime already uses epoll under the hood (or kqueue on Darwin), using it in the main program won't make any difference.

@prattmic
Copy link
Member

prattmic commented Aug 5, 2022

go version go1.18 darwin/amd64

The profile shows lots of time in runtime.futex, which is only on Linux. So I assume this is actually from Linux.

This profile does spend a rather large amount of time waking sysmon in entersyscall_sysmon, which would be nice to improve. I am a bit surprised we don't see sysmon itself in the profile.

Based on this, I assume that this program is completely idle between receiving data? That would let sysmon go to sleep, which then requires we wake it up again on the next read.

@prattmic prattmic added this to the Backlog milestone Aug 5, 2022
@prattmic
Copy link
Member

prattmic commented Aug 5, 2022

That would let sysmon go to sleep, which then requires we wake it up again on the next read.

To be clear, this means that a program that is busier for any reason (other background work, handling other connections, etc) would see less overhead in this case.

@Natureshadow
Copy link

While this is hopefully being fixed soon, what is the best workaround at the moment?

I have implemented a server listening on Unix sockets, and the client (Postfix) keeps connections open indefenitely to re-use them. So I get 10 idle connections that do nothing most of the time, and my server burns CPU time as if it were mining some cryptocurrency.

My approach would have been to use a Listener that produces Conns that forcefully close the connection after some idle time (Postfix will re-open them once it needs them) to mitigate this issue for the time being. Does something like this exist? I had no success finding a module that does this, but cannot imagine I am the first one who wants to do that ;).

@Sina-Ghaderi
Copy link

same issue here, tcp proxy causing too much cpu time, pprof diagram is like the one above, is there any workaround for this?

@joedian joedian added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Development

No branches or pull requests

7 participants