Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: ReadFromUDP hangs (randomly) even when ReadDeadline is crossed #35876

Closed
CPerezz opened this issue Nov 27, 2019 · 23 comments
Closed

net: ReadFromUDP hangs (randomly) even when ReadDeadline is crossed #35876

CPerezz opened this issue Nov 27, 2019 · 23 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@CPerezz
Copy link

CPerezz commented Nov 27, 2019

What version of Go are you using (go version)?

$ go version
> go1.13.4 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

Ubuntu 18.04.2 LTS | Kernel version: 4.15.0-66-generic

What did you do?

This function is called as a gorutine:

// Listens infinitely for UDP packet arrivals and
// executes it's processing inside a gorutine.
func startUDPListener(netw string, router *Router) {
	lAddr := getLocalIPAddress()
	// Set listening port.
	lAddr.Port = int(router.myPeerInfo.port)
	PacketConnCreation:
	// listen to incoming udp packets
	pc, err := net.ListenUDP(netw, &lAddr)
	if err != nil {
		log.Panic(err)
	}
	// Set initial deadline.
	pc.SetReadDeadline(time.Now().Add(time.Minute))

	for {
		//simple read
		buffer := make([]byte, 1024)

		byteNum, uAddr, err := pc.ReadFromUDP(buffer)

		if err != nil {
                        // If we get an error, we just close the pc and create a new one.
			log.Printf("%v", err)
			pc.Close()
			goto PacketConnCreation
		} else {
			// Set a new deadline for the connection.
			pc.SetReadDeadline(time.Now().Add(5 * time.Minute))
			go processPacket(*uAddr, byteNum, buffer, router)
			
		}
	}
}

What did you expect to see?

I was expecting ReadFromUDP to ALWAYS return an error if the DeadLine is crossed and I don't recieve any UDP packet on my IP:Port interface before this happens.

What did you see instead?

Instead of that, it sometimes hangs completely without giving any kind of errors.
It's completely random, it sometimes hangs and sometimes doesn't (providing the corresponding i/o timeout)

It just stays running forever without accepting new packets (or even reading them). The gorutine gets completely freezed (or at least it seems so).

@ianlancetaylor
Copy link
Contributor

Can you show us a complete program that we can use to recreate the problem? Thanks.

@ianlancetaylor ianlancetaylor changed the title ReadFromUDP hangs (randomly) even when ReadDeadline is crossed net: ReadFromUDP hangs (randomly) even when ReadDeadline is crossed Nov 28, 2019
@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 28, 2019
@ianlancetaylor ianlancetaylor added this to the Go1.15 milestone Nov 28, 2019
@CPerezz
Copy link
Author

CPerezz commented Nov 28, 2019

Sure.

It is an implementation of https://eprint.iacr.org/2019/876.pdf

The whole code is here: https://github.com/dusk-network/dusk-blockchain/tree/kadcast/pkg/p2p/kadcast

I'll also attach here the functions that take part on this:
network.go

package kadcast

import (
	"log"
	"net"
	"time"
)

// Listens infinitely for UDP packet arrivals and
// executes it's processing inside a gorutine.
func startUDPListener(netw string, router *Router) {
	lAddr := getLocalIPAddress()
	// Set listening port.
	lAddr.Port = int(router.myPeerInfo.port)
	PacketConnCreation:
	// listen to incoming udp packets
	pc, err := net.ListenUDP(netw, &lAddr)
	if err != nil {
		log.Panic(err)
	}
	// Set initial deadline.
	pc.SetReadDeadline(time.Now().Add(time.Minute))

	for {
		//simple read
		buffer := make([]byte, 1024)

		byteNum, uAddr, err := pc.ReadFromUDP(buffer)

		if err != nil {
			log.Printf("%v", err)
			pc.Close()
			goto PacketConnCreation
		} else {
			// Set a new deadline for the connection.
			pc.SetReadDeadline(time.Now().Add(5 * time.Minute))
			go processPacket(*uAddr, byteNum, buffer, router)
			
		}
	}
}

// Gets the local address of the sender `Peer` and the UDPAddress of the
// reciever `Peer` and sends to it a UDP Packet with the payload inside.
func sendUDPPacket(netw string, addr net.UDPAddr, payload []byte) {
	localAddr := getLocalIPAddress()
	conn, err := net.DialUDP(netw, &localAddr, &addr)
	if err != nil {
		log.Println(err)
		return
	}
	
	// Simple write
	written, err := conn.Write(payload)
	if err != nil {
		log.Println(err)
	} else if written == len(payload) {
		log.Printf("Sent %v bytes to %v", written, addr.IP)
	}
	conn.Close()
}

The router struct is just used to retrieve the node info, you should be able to send an IP and a port instead of it. Also, the network parameter on the receiver is always "udp".

The gorutine used on the receiver can be avoided, the error is not comming from here.
Also, you'll need to configigure your firewall to enable the fordward and input chains to allow UDP and also your router NAT.

Apart from that, this code can reproduce it. Wait the ammount of time that the deadline needs to reach it's end and you'll see (at least this is what happens to me) that 1/3 or 1/4 times it completely hangs without giving any error.

// Listens infinitely for UDP packet arrivals and
// executes it's processing inside a gorutine.
func startUDPListener() {
	lAddr := // Get your local IP address of your interface as `UDPAddr`.
	// Set listening port.
	lAddr.Port = // Set the port where other nodes will try to write to.
	PacketConnCreation:
	// listen to incoming udp packets
	pc, err := net.ListenUDP("udp", &lAddr)
	if err != nil {
		log.Panic(err)
	}
	// Set initial deadline.
	pc.SetReadDeadline(time.Now().Add(time.Minute))

	for {
		//simple read
		buffer := make([]byte, 1024)

		byteNum, uAddr, err := pc.ReadFromUDP(buffer)

		if err != nil {
			log.Printf("%v", err)
			pc.Close()
			goto PacketConnCreation
		} else {
			// Set a new deadline for the connection.
			pc.SetReadDeadline(time.Now().Add(5 * time.Minute))
			// Execute a `gorutine` here that prints the bytes received or whatever,
			// it's not what is causing the hang (or it seems so at least..)
		}
	}
}

Once you listener is up, try making a writer to send some bytes to it and stop the writing. Then, wait for more than 5 min (so the deadline is crossed). And trigger the writer again. You'll see that the listener hangs and no error unblocks it from reading. And since the deadline is crossed, it can't read anything from the file descriptor, so it gets hanged there forever.

Code for the writter:

// Gets the local address of the sender `Peer` and the UDPAddress of the
// reciever `Peer` and sends to it a UDP Packet with the payload inside.
func sendUDPPacket(addr net.UDPAddr, payload []byte) {
	localAddr := //Get your local IP address of your interface as `UDPAddr`.
	conn, err := net.DialUDP("udp", &localAddr, //&ListenerUDPAddr)
	if err != nil {
		log.Println(err)
		return
	}
	
	// Simple write
	written, err := conn.Write(payload)
	if err != nil {
		log.Println(err)
	} else if written == len(payload) {
		log.Printf("Sent %v bytes to %v", written, addr.IP)
	}
	conn.Close()
}

I think this should be enough, if anything else is needed just ask for it and I'll try to provide it.

@networkimprov
Copy link

A "complete program" usually means a single file with package main, and if a second process is required, then a second file also with package main.

@CPerezz
Copy link
Author

CPerezz commented Nov 29, 2019

A "complete program" usually means a single file with package main, and if a second process is required, then a second file also with package main.

You just need to put package main instead of kadcast and change the listener function name to main that's all. I cannot put the IP's for you nor the ports.

Writter can be basically replaced for a Python script or whatever.

@networkimprov
Copy link

It would be a great help if you write that Python script, make the changes you described, and test it all to make sure it fails as described. You may discover that it doesn't fail.

@CPerezz
Copy link
Author

CPerezz commented Dec 3, 2019

I did this in order to kinda reproduce the behavior of the app:

package main

import (
	"log"
	"net"
	"time"
)

func main() {
        // Start several cliends that send UDPPackets and the server listener for ours.
	go startUDPListener()
	go sendUDPPacket()
	go sendUDPPacket()
	go sendUDPPacket()
	go sendUDPPacket()
	for {
		
	}
}

// Listens infinitely for UDP packet arrivals and
// executes it's processing inside a gorutine by sending
// the packets to the circularQueue.
func startUDPListener() {
	lAddr := net.UDPAddr {
		IP: []byte{127,0,0,1},
		Port: 25519,
		Zone: "N/A",
	}
	// Set listening port.
	lAddr.Port = 25519
PacketConnCreation:
	// listen to incoming udp packets
	pc, err := net.ListenUDP("udp", &lAddr)
	if err != nil {
		log.Panic(err)
	}
	// Set initial deadline.
	pc.SetReadDeadline(time.Now().Add(time.Minute))

	// Instanciate the buffer
	buffer := make([]byte, 1024)
	for {

		// Read UDP packet.
		_, _, err := pc.ReadFromUDP(buffer)

		if err != nil {
			log.Printf("%v", err)
			pc.Close()
			goto PacketConnCreation
		} else {
			// Set a new deadline for the connection.
			pc.SetReadDeadline(time.Now().Add(5 * time.Minute))
			// Do stuff with the packet
			log.Printf("%v", buffer)
		}
	}
}

func sendUDPPacket() {
	time.Sleep(15*time.Second)
	for {
		localAddr := getLocalUDPAddress()

		addr := net.UDPAddr {
			IP: []byte{127,0,0,1},
			Port: 25519,
			Zone: "N/A",
		}

		payload := []byte{56,56,56,56,56,56,56,56,5,65,6,56,56,56,56,56,56}

		conn, err := net.DialUDP("udp", &localAddr, &addr)
		if err != nil {
			log.Println(err)
			return
		}

		// Simple write
		written, err := conn.Write(payload)
		if err != nil {
			log.Println(err)
		} else if written == len(payload) {
			log.Printf("Sent %v bytes to %v", written, addr.IP)
			time.Sleep(7*time.Minute)
		}
		conn.Close()
	}
}

// Gets the local IP address of the machine where
// the node is running in `net.UDPAddr` format.
//
// Panics if it there's not connection.
func getLocalUDPAddress() net.UDPAddr {
	conn, err := net.Dial("udp", "8.8.8.8:80")
	if err != nil {
		log.Fatal(err)
	}
	defer conn.Close()

	localAddr := conn.LocalAddr().(*net.UDPAddr)
	return *localAddr
}

This never hangs, it always throws an error or receives packets.

The only difference now between this example and mine is that mine runs inside of a test while this runs in main.go and that this was executed against localhost since I cannot put an IP for you.

So I tried to move this to a test and stills working as expected.

At this point, it's weird.
I just know that it hangs when working with my cloud machines in WAN.

  1. Start the listener
  2. Send one packet
  3. Wait until the deadline is crossed
  4. Send packets again

And no error is thrown on the server nor packets are received.

@networkimprov
Copy link

networkimprov commented Dec 3, 2019

WAN firewall issue? Could you describe your cloud configuration?

@CPerezz
Copy link
Author

CPerezz commented Dec 3, 2019

Same as this (i removed the docker part of it since I'm not using it at all)

2319K 1896M ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED
  185  9859 ACCEPT     all  --  lo     *       0.0.0.0/0            0.0.0.0/0           
  495 78143 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate INVALID
    0     0 ACCEPT     icmp --  *      *       0.0.0.0/0            0.0.0.0/0            icmptype 8 ctstate NEW
  309 65818 UDP        udp  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW
 6638  269K TCP        tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp flags:0x17/0x02 ctstate NEW
  288 64616 REJECT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            reject-with icmp-port-unreachable
 6622  284K REJECT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            reject-with tcp-reset
  792 26940 REJECT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            reject-with icmp-proto-unreachable
    0     0 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:25519

Chain FORWARD (policy DROP 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate RELATED,ESTABLISHED

Chain OUTPUT (policy ACCEPT 653K packets, 61M bytes)
 pkts bytes target     prot opt in     out     source               destination         
 833K  293M ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0           

Chain TCP (1 references)
 pkts bytes target     prot opt in     out     source               destination         
   33  1588 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:22
   51  2160 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:80
   13   528 ACCEPT     tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp dpt:443

Chain UDP (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    2   148 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:5353
    4   288 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:53
   15   766 ACCEPT     udp  --  *      *       0.0.0.0/0            0.0.0.0/0            udp dpt:25519

@networkimprov
Copy link

Also did you check for func init() in all files of kadcast?

@davecheney
Copy link
Contributor

davecheney commented Dec 3, 2019 via email

@CPerezz
Copy link
Author

CPerezz commented Dec 3, 2019

I see the idea of the select. And I thought the same, that with the for loop there's no time to breath for the gorutines and they basically can never get to execute code.

So I modified the code adding a Sleep statement in the main file that spawns the gorutines. And turns out that this makes the code work as expected.
Reporting the i/o timeout error when the deadline is closed and getting the packets otherwise.

@networkimprov
Copy link

To clarify, neither the for{} nor Sleep() variants reproduce the error you see on cloud machines?

@CPerezz
Copy link
Author

CPerezz commented Dec 3, 2019

Using the for{} without Sleep() on it has the error hang completely.

Using the for{} with a Sleep() inside the Listener fixes the issue on cloud and local machines.

Seeing that, looks like this is not related to my firewall/NAT config at all.

@davecheney
Copy link
Contributor

Please try precisely what I suggested. Don’t invent a new concept. If you want the main goroutine to go to sleep forever then select {} is the construct you should use.

@networkimprov
Copy link

Previously you wrote "This never hangs, it always throws an error or receives packets." Now you've contradicted that. And presumably there is no for{} in the kadcast code.

@CPerezz
Copy link
Author

CPerezz commented Dec 3, 2019

Previously you wrote "This never hangs, it always throws an error or receives packets." Now you've contradicted that. And presumably there is no for{} in the kadcast code.

I'm just explaining the behaviours I'm finding. Nothing else.
I told you I'm not able to reproduce this in a complete program working with localhost interface. So I cannot do anything else than explain what happens on the machines that are using other IP addresses.

@CPerezz
Copy link
Author

CPerezz commented Dec 3, 2019

Please try precisely what I suggested. Don’t invent a new concept. If you want the main goroutine to go to sleep forever then select {} is the construct you should use.

I don't want the main gorutine to sleep forever, that's the point, the for{} is there since in the future, the main gorutine will execute much more things than just spawn the listener.

So I used for{} to simulate that this main gorutine will be awake and doing stuff.

@CPerezz
Copy link
Author

CPerezz commented Dec 3, 2019

Please try precisely what I suggested. Don’t invent a new concept. If you want the main goroutine to go to sleep forever then select {} is the construct you should use.

Also I tested it and works correctly.
So again, looks like the for{} is not letting the gorutine that tracks the filedescriptor timeout to query and realize that it has been crossed. But the filedescriptor is no longer working.

Just my thoughts.

And guys, I'm trying to help, it's just that if I cannot reproduce it on my machine even I'm adapting the code, I cannot do more than just explaining what is going on..

@networkimprov
Copy link

I think you'll get more help with this by posting to the kadcast project, or golang-nuts. Maybe you've found a bug in kadcast.

@CPerezz
Copy link
Author

CPerezz commented Dec 3, 2019

I think you'll get more help with this by posting to the kadcast project, or golang-nuts. Maybe you've found a bug in kadcast.

As said on my first comment, I'm implementing kadcast. And I'm importing in in a main.go file to test how it works. This is why I'm here, I made the code for the listeners packet senders that kadcast uses.

@networkimprov
Copy link

Sorry, meant the dusk project. Anyway, golang-nuts would be a better resource.

@CPerezz
Copy link
Author

CPerezz commented Dec 3, 2019

Sorry, meant the dusk project. Anyway, golang-nuts would be a better resource.

Again, I'm part of the dusk project. I'm the implementor of the kadcast package. I just tried to adapt the package to a simple main.go file to allow you to analyze it since I see that you'll not look at the original code.

So, thanks for your time.

I will just leave this here as a workarround to solve this.

Just make the main gorutine sleep (or whatever other thing that makes it stop constantly looping) and then the listener gorutine will be able to see the crossed deadlines and throw the corresponding errors.

@CPerezz CPerezz closed this as completed Dec 3, 2019
@ianlancetaylor
Copy link
Contributor

For the record, never write for {} in a Go program. That will never work well. See #10958 for part of the problem.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants