Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: different memory consumption on different VMs with net.Listen #32709

Closed
mokitoo opened this issue Jun 20, 2019 · 18 comments
Closed

net: different memory consumption on different VMs with net.Listen #32709

mokitoo opened this issue Jun 20, 2019 · 18 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@mokitoo
Copy link

mokitoo commented Jun 20, 2019

What version of Go are you using (go version)?

1.11.1

Does this issue reproduce with the latest release?

yes,still reproduce in 1.12.6

What operating system and processor architecture are you using (go env)?

centos 7

What did you do?

i use for loop to listen about 5000 ports , and i have tried this in different virtual machines,
they are both belong to centos7
net.Listen("tcp", ":"+port)

here is uname -a result below
one is in DigitalOcean (https://cloud.digitalocean.com),
Linux 3.10.0-957.12.2.el7.x86_64 #1 SMP x86_64 GNU/Linux

and another created by vmware in my own computer
Linux . 3.10.0-229.1.2.el7.x86_64 #1 SMP x86_64 GNU/Linux

the vm in DigitalOcean consumes about 30M mermory together,
and vm in vmware consumes more than 2GB mermory together,
why could this happen ?

@mokitoo mokitoo changed the title memory consume differtent when use net.Listen() memory consumes differtent when use net.Listen() Jun 20, 2019
@mokitoo mokitoo changed the title memory consumes differtent when use net.Listen() Memory consumes differtent when use net.Listen() Jun 20, 2019
@tv42
Copy link

tv42 commented Jun 21, 2019

Does the memory consumption of the Go process differ, or are your two different kernels or virtualization mechanisms just behaving differently?

@mokitoo
Copy link
Author

mokitoo commented Jun 21, 2019

Does the memory consumption of the Go process differ, or are your two different kernels or virtualization mechanisms just behaving differently?

emmmm , maybe , because i have tried to use Java , and it still get same result :
memory cosumes in vmware is much more than DigitalOcean(DigitalOcean use KVM virtualization mechanisms ).
memory cosumes about 1GB in DigitalOcean
memory cosumes more than 2GB in vmware(system has killed process because out of memory)

but when i use python to do this , it doesn't appear the same result.
There is little difference in memory usage between the two virtual machines when i use python.
it‘s very strange here.

and one thing i should declare that :
when i test in Vmware , memory cosumes in go process is not that huge,
i use top command and system has killed go process because system has run out of memory.
such a huge memory cosumed because of so many ports are bind (about 5000 tcp ports)

@katiehockman katiehockman changed the title Memory consumes differtent when use net.Listen() src/net: different memory consumption on different VMs with net.Listen Jun 21, 2019
@katiehockman
Copy link
Contributor

This might just be differences in the kernel as @tv42 said, but looping in others to see what they think.
/cc @mikioh

@katiehockman katiehockman added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 21, 2019
@agnivade
Copy link
Contributor

emmmm , maybe ,

Please investigate further and tell us exactly what is happening, along with the code that you are using to test.

Please report the exact RSS and VIRT numbers for the Go process in both systems. Also check your syslogs and any other places for logs.

In short, we want to know whether this is a Go issue or something to do with the underlying OS/virtualization layer.

@agnivade agnivade added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Jun 22, 2019
@mokitoo
Copy link
Author

mokitoo commented Jun 23, 2019

emmmm , maybe ,

Please investigate further and tell us exactly what is happening, along with the code that you are using to test.

Please report the exact RSS and VIRT numbers for the Go process in both systems. Also check your syslogs and any other places for logs.

In short, we want to know whether this is a Go issue or something to do with the underlying OS/virtualization layer.

@agnivade Thans for your reply, here are some test from my own :
fisrt of all , i make a mistake about virtualization layer , is Hyper-v , not vmware from my computer
run command virt-what , then i can get hyperv from my cenos7

the code about listen tcp ports is just like this below
ln, err := net.Listen("tcp", ":"+port)

I tried to use golang listen 5000 tcp ports on the hyper-v at the first time , cpu and memory grows very quickly ,
after 2 seconds . my centos7 has run out of 2GB memory, and cpu also has reached to 100%
but as you can see below, RSS of my go process is not that high

7788

then , the second time , i tried to use golang listen 3000 tcp ports on hyper-v ,here is result fom htop command
htop will show actual memory use
8899

here is the result from /proc/meminfo
cat /proc/meminfo
MemTotal: 1877752 kB
MemFree: 86148 kB
MemAvailable: 203124 kB
Buffers: 0 kB
Cached: 139260 kB
SwapCached: 15400 kB
Active: 116660 kB
Inactive: 84496 kB
Active(anon): 42444 kB
Inactive(anon): 21676 kB
Active(file): 74216 kB
Inactive(file): 62820 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 2097148 kB
SwapFree: 1966372 kB
Dirty: 12 kB
Writeback: 0 kB
AnonPages: 52244 kB
Mapped: 18208 kB
Shmem: 2224 kB
Slab: 191024 kB
SReclaimable: 148876 kB
SUnreclaim: 42148 kB
KernelStack: 5680 kB
PageTables: 6200 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 3036024 kB
Committed_AS: 554720 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 1341616 kB
VmallocChunk: 34358388164 kB
HardwareCorrupted: 0 kB
AnonHugePages: 6144 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 77760 kB
DirectMap2M: 2019328 kB
DirectMap1G: 0 kB

and the third time , i tried to use python to listen 5000 tcp ports on hyper-v
here is the result below
9900

python open tcp ports very slowly ,it cost about one mintue to finish
but memory is normal , just like i use golang to open 5000 tcp port on kvm

but i should say golang is much quicker than python when listen tcp port

@agnivade
Copy link
Contributor

Couple of follow-up questions-

  1. Please give us the exact code that you are using to run the tests. A one line snippet may hide other sources of discrepancies.
  2. Just to clarify, when you say golang is listening on 5000 ports, you are actually spawning 5000 processes and each of them listen to a separate port right ?
  3. When you listened on 5000 ports, the CPU spiked to 100% and the RSS was low. But when you listened on 3000 ports, CPU was normal, but RSS was higher than before ? It is very strange for the same process listening on a port to have different RSS and CPU characteristics, on different runs.
  4. From what I am seeing, your Python code is taking more RSS than the Go processes right ? I wonder if there is something else that is contributing to the overall memory.

@mokitoo
Copy link
Author

mokitoo commented Jun 23, 2019

Because system has run out of memory and then will kill that go process ,
i need to take a snapshot quickly in 2 seconds , therefore the final RSS in the screenshot maybe not that accurate when i test golang .
But i think RSS is not the main cause of the overall memory.

ok, i will give a simple code test this on hyper-v

`
func main() {
for i := 50000; i < 55000; i++ {
port := strconv.Itoa(i)
_, err := net.Listen("tcp", ":"+port)
if err != nil {
log.Printf("error listening port %v: %v\n", port, err)
os.Exit(1)
}
}
waitSignal()
}

func waitSignal() {
var sigChan = make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGHUP)
for sig := range sigChan {
if sig == syscall.SIGHUP {

	} else {
		// is this going to happen?
		log.Printf("caught signal %v, exit", sig)
		os.Exit(0)
	}
}

}
`

here is the result from the code above
0011

as you mentioned before

Python code is taking more RSS than the Go processes right ? I wonder if there is something else that is contributing to the overall memory

yes . so maybe /proc/meminfo will help us to find something , but this time i don't find anyting strange in /proc/meminfo

➜ ~ cat /proc/meminfo
MemTotal: 1877752 kB
MemFree: 68260 kB
MemAvailable: 53168 kB
Buffers: 0 kB
Cached: 47176 kB
SwapCached: 7544 kB
Active: 39212 kB
Inactive: 31648 kB
Active(anon): 7540 kB
Inactive(anon): 18724 kB
Active(file): 31672 kB
Inactive(file): 12924 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 2097148 kB
SwapFree: 1998588 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 18824 kB
Mapped: 11388 kB
Shmem: 2552 kB
Slab: 83424 kB
SReclaimable: 37840 kB
SUnreclaim: 45584 kB
KernelStack: 5392 kB
PageTables: 5204 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 3036024 kB
Committed_AS: 391184 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 1605676 kB
VmallocChunk: 34358129392 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 77760 kB
DirectMap2M: 2019328 kB
DirectMap1G: 0 kB

@mokitoo
Copy link
Author

mokitoo commented Jun 23, 2019

as i mentioned before

But i think RSS is not the main cause of the overall memory

i think memory has been consumed by Kernel, not the process that belong to Go
here is a command to calculate for VMALLOC :
grep vmalloc /proc/vmallocinfo | awk '{total+=$2}; END {print total}'

when use golang , it cost about 1.6GB :
➜ ~ grep vmalloc /proc/vmallocinfo | awk '{total+=$2}; END {print total}'
1639149568

when use python , it cost about 400M
grep vmalloc /proc/vmallocinfo | awk '{total+=$2}; END {print total}'
430026752

they are both test in hyper-v.

but when i use kvm , it does not make so much difference in memory between python and golang

@mokitoo
Copy link
Author

mokitoo commented Jun 24, 2019

@agnivade @rsc @bradfitz @katiehockman @@mikioh
i have find someting new to append
first of all , /proc/vmallocinfo can help us to find VMALLOC caller ,
as i mentioned above : memory consumed because of kernel
after some tests , i finally knows the reason : reqsk_queue_alloc

here are some results from cat /proc/vmallocinfo | grep reqsk_queue_alloc

0xffffc9000dda1000-0xffffc9000dda7000   24576 reqsk_queue_alloc+0x65/0x110 pages=5 vmalloc N0=5
0xffffc9000dda7000-0xffffc9000ddad000   24576 reqsk_queue_alloc+0x65/0x110 pages=5 vmalloc N0=5

Each line represents that a port has been bind by system
the 2th column represents that how many memories has been used by kernel

here are some tests , and they are both test in hyper-v
when i test python ,will show me 5000 lines ,because i have listen 5000 ports :

0xffffc90003bcb000-0xffffc90003bd1000   24576 reqsk_queue_alloc+0x65/0x110 pages=5 vmalloc N0=5
0xffffc90003bd1000-0xffffc90003bd7000   24576 reqsk_queue_alloc+0x65/0x110 pages=5 vmalloc N0=5
0xffffc90003bd7000-0xffffc90003bdd000   24576 reqsk_queue_alloc+0x65/0x110 pages=5 vmalloc N0=5

when i test golang , will also show me 5000 lines :

0xffffc9006fcc9000-0xffffc9006fd0b000  270336 reqsk_queue_alloc+0x65/0x110 pages=65 vmalloc N0=65
0xffffc9006fd0b000-0xffffc9006fd4d000  270336 reqsk_queue_alloc+0x65/0x110 pages=65 vmalloc N0=65
0xffffc9006fd4d000-0xffffc9006fd8f000  270336 reqsk_queue_alloc+0x65/0x110 pages=65 vmalloc N0=65

but as you can see above , golang has used 270336 for each port while python only use 24576 !

i have google a little for reqsk_queue_alloc
it has been used for socket to listen port in C
but i don't know why would this make such a big difference between golang and python when use reqsk_queue_alloc

7766

@bradfitz
Copy link
Contributor

In the kernel, it allocates proportional to nr_table_entries:

int reqsk_queue_alloc(struct request_sock_queue *queue,
		      unsigned int nr_table_entries)
{
	size_t lopt_size = sizeof(struct listen_sock);
	struct listen_sock *lopt = NULL;

	nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
	nr_table_entries = max_t(u32, nr_table_entries, 8);
	nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
	lopt_size += nr_table_entries * sizeof(struct request_sock *);

	if (lopt_size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
		lopt = kzalloc(lopt_size, GFP_KERNEL |
					  __GFP_NOWARN |
					  __GFP_NORETRY);
...

So I suspect Python and Go are using different backlog values (see http://man7.org/linux/man-pages/man2/listen.2.html)

Go uses /proc/sys/net/core/somaxconn ... cranking it all the way up (func maxListenerBacklog)...

                case syscall.SOCK_STREAM, syscall.SOCK_SEQPACKET:
                        if err := fd.listenStream(laddr, listenerBacklog(), ctrlFn); err != nil {
                                fd.Close()
                                return nil, err
                        }
                        return fd, nil

As for why the allocation is different under VMware vs DigitalOcean... perhaps it's using a different memory allocator?

Or perhaps /proc/sys/net/core/somaxconn is just different on those two VMs.

@bradfitz bradfitz changed the title src/net: different memory consumption on different VMs with net.Listen net: different memory consumption on different VMs with net.Listen Jun 24, 2019
@mokitoo
Copy link
Author

mokitoo commented Jun 24, 2019

this time i have got two different vm ,
one from DigitalOcean , kernel version 3.10.0-957.12.2.el7.x86_64
and another from Japan IDC, kernel version 3.10.0-327.el7.x86_64
they are both belong to kvm , and i test by Go

the vm belong to japan has same situation as i test in hyper-v even though it was kvm,memory increase very fast , finally system has kill that go process

here is the result from cat /proc/vmallocinfo | grep reqsk_queue_alloc
0xffffc90010c04000-0xffffc90010c46000 270336 reqsk_queue_alloc+0x65/0x110 pages=65 vmalloc N0=65 0xffffc90010c46000-0xffffc90010c88000 270336 reqsk_queue_alloc+0x65/0x110 pages=65 vmalloc N0=65 0xffffc90010c88000-0xffffc90010cca000 270336 reqsk_queue_alloc+0x65/0x110 pages=65 vmalloc N0=65

as you can see ,the same result to hyper-v

another vm belong to DigitalOcean , the memory is normal ,
and i have not found any reqsk_queue_alloc in /proc/vmallocinfo

According to this, i downgrade DigitalOcean‘s kernel version to 3.10.0-327.el7.x86_64 , as the same
version to Japan IDC.
the DigitalOcean vm's memory is still normal and there is no reqsk_queue_alloc in the /proc/vmallocinfo

so ,It has nothing to do with virtualization layer(kvm or hyber-v) and kernel version**
maybe the main cause is belong to reqsk_queue_alloc

@mokitoo
Copy link
Author

mokitoo commented Jun 24, 2019

In the kernel, it allocates proportional to nr_table_entries:

int reqsk_queue_alloc(struct request_sock_queue *queue,
		      unsigned int nr_table_entries)
{
	size_t lopt_size = sizeof(struct listen_sock);
	struct listen_sock *lopt = NULL;

	nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
	nr_table_entries = max_t(u32, nr_table_entries, 8);
	nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
	lopt_size += nr_table_entries * sizeof(struct request_sock *);

	if (lopt_size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
		lopt = kzalloc(lopt_size, GFP_KERNEL |
					  __GFP_NOWARN |
					  __GFP_NORETRY);
...

So I suspect Python and Go are using different backlog values (see http://man7.org/linux/man-pages/man2/listen.2.html)

Go uses /proc/sys/net/core/somaxconn ... cranking it all the way up (func maxListenerBacklog)...

                case syscall.SOCK_STREAM, syscall.SOCK_SEQPACKET:
                        if err := fd.listenStream(laddr, listenerBacklog(), ctrlFn); err != nil {
                                fd.Close()
                                return nil, err
                        }
                        return fd, nil

As for why the allocation is different under VMware vs DigitalOcean... perhaps it's using a different memory allocator?

Or perhaps /proc/sys/net/core/somaxconn is just different on those two VMs.

@bradfitz
yes ,you are right !!
thanks for your reply ,
i have tried jemalloc(memory allocator) in vm , it's no use at all
so , /proc/sys/net/core/somaxconn is the key option !

/proc/sys/net/core/somaxconn in DigitalOcean is 128 while Japan IDC is 32768
after i change to 128 for Japan IDC , memory gets normal

vim /etc/sysctl.conf
net.core.somaxconn=128 

but i think set a bigger number maybe get better performance for queue to control tcp connection

ps: python project has set backlog in the code , value is 1024 , not from system default

@mokitoo
Copy link
Author

mokitoo commented Jun 24, 2019

@bradfitz
btw , is there any way to control backlog value in Go when listen port ? not from system default

i have not found yet

@agnivade
Copy link
Contributor

@mokitoo
Copy link
Author

mokitoo commented Jun 25, 2019

https://golang.org/pkg/syscall/#Listen

Thanks for your reply , i have checked source code

0099

under the windows
9966

under the linux
5544

According to the source code , Go under linux environment will use /proc/sys/net/core/somaxconn as default unless some exceptions occurs , then go will choose to use syscall.SOMAXCONN,
so ,maybe there is not direct way to set backlog value in Go , am i right ?

Or maybe you mean use syscall.listen() directly rather than use net.Listen("tcp", ":"+port) ?

@agnivade
Copy link
Contributor

agnivade commented Jun 25, 2019

Or maybe you mean use syscall.listen() directly rather than use net.Listen("tcp", ":"+port) ?

Correct.

Please let us know if there is anything else to be done from our side on this.

@agnivade agnivade added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Jun 25, 2019
@mokitoo
Copy link
Author

mokitoo commented Jun 25, 2019

Or maybe you mean use syscall.listen() directly rather than use net.Listen("tcp", ":"+port) ?

Correct.

Please let us know if there is anything else to be done from our side on this.

Thanks for your reply , i think someone has already discussed this at issue 9661
[net: make some way to set socket options other than using File{Listener,Conn,PacketConn} #9661]

Although ListenConfig has been added in Go since 1.11.1 to customize socket option ,but when init socket , it still call maxListenerBacklog to get system default backlog value.

@bradfitz i would appreciate if golang can make a easier way to achieve this in the future,because lower the kernel backlog value is not a good way as default for go project and it may affect all processes on the system.

@mvdan
Copy link
Member

mvdan commented Jun 15, 2021

Closing old issues that still have the WaitingForInfo label where enough details to investigate weren't provided. Feel free to leave a comment with more details and we can reopen.

@mvdan mvdan closed this as completed Jun 15, 2021
@golang golang locked and limited conversation to collaborators Jun 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

7 participants