Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/poll: use noescape or reuse iovec obj in Writev to avoid alloc mem in heap every time #26663

Closed
lintanghui opened this issue Jul 28, 2018 · 5 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@lintanghui
Copy link

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.10.3 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOOS="linux"

What did you do?

I use writev to write buffer into tcpconn. in order to reduce gc, I try to reuse object as far as possible.
and i did it so. but when i run go tool pprof to analysis of the memory。I found Writev alloc memory every time is called.

0: 0 [2990: 95680] @ 0x49225f 0x58df7b 0x58dd76 0x57ebe7 0x77232f 0x773aad 0x784329 0x79d09b 0x7a0958 0x45a851
#	0x49225e	internal/poll.(*FD).Writev+0xde				/usr/local/Cellar/go/1.10/libexec/src/internal/poll/writev.go:25
#	0x58df7a	net.(*netFD).writeBuffers+0x3a				/usr/local/Cellar/go/1.10/libexec/src/net/writev_unix.go:26
#	0x58dd75	net.(*conn).writeBuffers+0x55				/usr/local/Cellar/go/1.10/libexec/src/net/writev_unix.go:18
#	0x57ebe6	net.(*Buffers).WriteTo+0x1a6				/usr/local/Cellar/go/1.10/libexec/src/net/net.go:638
#	0x77232e	overlord/lib/net.(*Conn).Writev+0x6e			/work/go/src/overlord/lib/net/conn.go:115
#	0x773aac	overlord/lib/bufio.(*Writer).Flush+0x8c			/work/go/src/overlord/lib/bufio/io.go:174

1: 256 [1: 256] @ 0x492616 0x58df7b 0x58dd76 0x57ebe7 0x77232f 0x773aad 0x784329 0x79d09b 0x7a0958 0x45a851
#	0x492615	internal/poll.(*FD).Writev+0x495			/usr/local/Cellar/go/1.10/libexec/src/internal/poll/writev.go:42
#	0x58df7a	net.(*netFD).writeBuffers+0x3a				/usr/local/Cellar/go/1.10/libexec/src/net/writev_unix.go:26
#	0x58dd75	net.(*conn).writeBuffers+0x55				/usr/local/Cellar/go/1.10/libexec/src/net/writev_unix.go:18
#	0x57ebe6	net.(*Buffers).WriteTo+0x1a6				/usr/local/Cellar/go/1.10/libexec/src/net/net.go:638
#	0x77232e	overlord/lib/net.(*Conn).Writev+0x6e			/work/go/src/overlord/lib/net/conn.go:115
#	0x773aac	overlord/lib/bufio.(*Writer).Flush+0x8c			/work/go/src/overlord/lib/bufio/io.go:174

and the go tool pprof show:

 $go tool pprof -alloc_space  http://127.0.0.1:2110/debug/pprof/heap
(pprof) cum
(pprof) top
Showing nodes accounting for 303.01MB, 91.27% of 331.99MB total
Dropped 51 nodes (cum <= 1.66MB)
Showing top 10 nodes out of 24
      flat  flat%   sum%        cum   cum%
  302.51MB 91.12% 91.12%   302.51MB 91.12%  internal/poll.(*FD).Writev
         0     0% 91.12%   302.51MB 91.12%  net.(*Buffers).WriteTo
         0     0% 91.12%   302.51MB 91.12%  net.(*conn).writeBuffers
         0     0% 91.12%   302.51MB 91.12%  net.(*netFD).writeBuffers
         0     0% 91.12%   302.51MB 91.12%  overlord/lib/bufio.(*Writer).Flush
         0     0% 91.12%   302.51MB 91.12%  overlord/lib/net.(*Conn).Writev
         0     0% 91.12%   170.02MB 51.21%  overlord/proxy.(*Handler).handle

as pprof showed. almost all memory was alloced by Writev

analysis of the object escape

./writev.go:16:43: leaking param: fd
./writev.go:16:43: 	from fd (passed to call[argument escapes]) at ./writev.go:17:24
./writev.go:42:48: &chunk[0] escapes to heap
./writev.go:42:48: 	from syscall.Iovec literal (struct literal element) at ./writev.go:42:41
./writev.go:42:48: 	from append(iovecs, syscall.Iovec literal) (appended to slice) at ./writev.go:42:19
./writev.go:16:43: leaking param content: v
./writev.go:16:43: 	from *v (indirection) at ./writev.go:38:25
./writev.go:16:43: 	from * (*v) (indirection) at ./writev.go:38:25
./writev.go:16:43: 	from chunk (range-deref) at ./writev.go:38:19
./writev.go:16:43: 	from chunk[0] (dot of pointer) at ./writev.go:42:54
./writev.go:16:43: 	from &chunk[0] (address-of) at ./writev.go:42:48
./writev.go:16:43: 	from syscall.Iovec literal (struct literal element) at ./writev.go:42:41
./writev.go:16:43: 	from append(iovecs, syscall.Iovec literal) (appended to slice) at ./writev.go:42:19
./writev.go:55:15: &iovecs escapes to heap
./writev.go:55:15: 	from fd.iovecs (star-dot-equals) at ./writev.go:55:13
./writev.go:25:6: moved to heap: iovecs
./writev.go:72:8: syscall.Errno(e0) escapes to heap
./writev.go:72:8: 	from err (assigned) at ./writev.go:72:8
./writev.go:72:8: 	from ~r2 (return) at ./writev.go:82:2
./writev.go:21:17: (*FD).Writev fd.pd does not escape
./writev.go:44:26: (*FD).Writev iovecs[len(iovecs) - 1] does not escape
./writev.go:47:25: (*FD).Writev iovecs[len(iovecs) - 1] does not escape
./writev.go:59:27: (*FD).Writev &iovecs[0] does not escape
./writev.go:68:18: (*FD).Writev fd.pd does not escape

At line42 it show &chunk[0] is escapes to heap.and in pprof it alloc mem every time.
it is that possilbe to make noescape to avoid obj alloc in heap or reuse these obj to avoid gc?

What did you expect to see?

the iovec object should be alloced in stack in oder to avoid gc, but it escaped to heap.

What did you see instead?

iovec was escape to heap and alloc mem every time and made gc frequently.

@gopherbot gopherbot added this to the Proposal milestone Jul 28, 2018
@ALTree ALTree changed the title proposal:use noescape or reuse iovec obj in Writev to avoid alloc mem in heap every time internal/poll: use noescape or reuse iovec obj in Writev to avoid alloc mem in heap every time Jul 28, 2018
@ALTree ALTree added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed Proposal labels Jul 28, 2018
@ALTree ALTree removed this from the Proposal milestone Jul 28, 2018
@randall77
Copy link
Contributor

Unfortunately the iovecs cache prevents this. The address of the chunk survives in the iovecs cache, even though it is never used again. (A probably separate issue - the chunks are kept live longer than necessary because of this.)
I think if you turned the cache off (remove the line fd.iovecs = &iovecs) then the chunks would not escape. But then you'd be allocating Iovec structures every time.

Maybe we could put a few Iovec structures on the stack, or in the FD? We probably couldn't put as many as 1024, unfortunately.

@ianlancetaylor

@lintanghui
Copy link
Author

@randall77 Since iovecs is alloced in heap and be cached , and chunk escapes to heap. It is that possible to reuse the iovec struct by index like iovec[idx].I try to reuse it like below

// Writev wraps the writev system call.
func (fd *FD) Writev(v *[][]byte) (int64, error) {
	if err := fd.writeLock(); err != nil {
		return 0, err
	}
	defer fd.writeUnlock()
	if err := fd.pd.prepareWrite(fd.isFile); err != nil {
		return 0, err
	}

	// TODO: read from sysconf(_SC_IOV_MAX)? The Linux default is
	// 1024 and this seems conservative enough for now. Darwin's
	// UIO_MAXIOV also seems to be 1024.
	maxVec := 1024
	length := len(*v)
	if length > maxVec {
		length = maxVec
	}
	if fd.iovecs == nil {
		var iovecs = make([]syscall.Iovec, len(*v))
		fd.iovecs = &iovecs
	}
	var n int64
	var err error
	if len(*fd.iovecs) < len(*v) {
		*fd.iovecs = make([]syscall.Iovec, length)
	}
	for len(*v) > 0 {
		// iovecs = iovecs[:0]
		i := 0
		for _, chunk := range *v {
			if len(chunk) == 0 {
				continue
			}
			(*fd.iovecs)[i].Base = &chunk[0]
			//iovecs = append(iovecs, syscall.Iovec{Base: &chunk[0]})
			if fd.IsStream && len(chunk) > 1<<30 {
				(*fd.iovecs)[i].SetLen(1 << 30)
				break // continue chunk on next writev
			}
			(*fd.iovecs)[i].SetLen(len(chunk))
			if i == maxVec {
				break
			}
			i++
		}
		if i == 0 {
			break
		}
		//	fd.iovecs = &iovecs // cache

		wrote, _, e0 := syscall.Syscall(syscall.SYS_WRITEV,
			uintptr(fd.Sysfd),
			uintptr(unsafe.Pointer(&(*fd.iovecs)[0])),
			uintptr(i))
		if wrote == ^uintptr(0) {
			wrote = 0
		}
		TestHookDidWritev(int(wrote))
		n += int64(wrote)
		consume(v, int64(wrote))
		if e0 == syscall.EAGAIN {
			if err = fd.pd.waitWrite(fd.isFile); err == nil {
				continue
			}
		} else if e0 != 0 {
			err = syscall.Errno(e0)
		}
		if err != nil {
			break
		}
		if n == 0 {
			err = io.ErrUnexpectedEOF
			break
		}
	}
	return n, err
}

it work well and without gc any more. and latency is deduce and qps is increase in my project.

@randall77
Copy link
Contributor

I don't see how that change prevents the escape of the chunks.
You're still writing &chunk[0] to a heap location.

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Aug 1, 2018
@mdempsky
Copy link
Member

Note that the &chunk[0] escapes to heap message is a little misleading, because it's not actually itself causing anything to be heap allocated. (Consequently that diagnostic message has been removed: golang.org/cl/170319)

The real problem here is that var iovecs []syscall.Iovec is always heap allocated, and we're constantly allocating and churning through slice headers every call to Writev. @lintanghui 's proposed change avoids this by changing it from an unconditional implicit allocation (through an escape var decl) to a conditional explicit allocation (through calling make guarded by an if statement).

(There's also the issue that Writev() should only cause the []byte slices to be heap allocated, it actually causes the [][]byte slices to be heap allocated too. This is tracked as #31504.)

@gopherbot
Copy link

Change https://golang.org/cl/172419 mentions this issue: internal/poll: avoid unnecessary memory allocation in Writev

@golang golang locked and limited conversation to collaborators Apr 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

6 participants