Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: better DNS transport for builtin DNS stub resolver #23866

Open
mikioh opened this issue Feb 16, 2018 · 4 comments
Open

net: better DNS transport for builtin DNS stub resolver #23866

mikioh opened this issue Feb 16, 2018 · 4 comments
Milestone

Comments

@mikioh
Copy link
Contributor

mikioh commented Feb 16, 2018

Split from #18588. The existing DNS stub resolver uses per-query DNS transport that would cause socket descriptor exhaustion. It might be better to have some mechanism for DNS transport to reduce the use of socket descriptors. For UDP-based DNS transport, it might be the use of shared UDP endpoints. For TCP-based DNS transport, it might be the use of connection pooling.

@meirf
Copy link
Contributor

meirf commented Apr 29, 2018

Some notes:

  • Despite large recent DNS changes, this issue is still relevant - the existing DNS stub resolver still uses per-query DNS transport.
  • Considering the dns header id is stored with two bytes/uint16, there are only 65535 possible values. (In addition to the header id, the code also checks type, class and name.) So it's possible that a shared endpoint could have two outstanding identical requests. If the responses came back in a different order than how they were sent, there would be no way of realizing that the later request got the newer data and the earlier request got the old data. We could either not consider that a problem or we could ensure that requests could not be assigned to a connection/endpoint with an identical outstanding request.
  • Maybe the knob we want is Resolver.maxConnsPerNameServer. Even if this isn't configurable (yet?) as Resolver.MaxConnsPerNameServer. If the knob is private we would probably want some rather large number so as not break existing code. If the knob is public, we could say that if it's not set, we assume large default.

wdyt?

@mwwaters
Copy link
Contributor

mwwaters commented Sep 1, 2018

I have code written to implement sharing sockets for DNS. The change ended up being more significant than I first realized and I wanted to discuss the new code before asking for codereview.

This is my first codereview request and so I'm sorry if this description is too detailed and long. Can I ask for a codereview based on this description?

The following two public options are added to Resolver:

	// For each unique (network, DNS resolver IP address) tuple, the
	// resolver will attempt to keep the same socket. If the same
	// socket is shared for UDP, the UDP port will remain the smae. For
	// TCP, the connection and port remains the same. MaxConnsPerSocket
	// sets the maximum unique DNS requests at one time until a new socket
	// is created. For MaxConnsPerSocket=0, a default value of 1000 is used.
	MaxConnsPerSocket int

	// A DNS resolver connection can be kept for a limited time after all
	// outstanding requests	are served. This value will default to zero.
	KeepSocketTimeout time.Duration

For implementation, the Resolver has a private map dnsConns and a mutex dnsConnsMu. The map key is the (network, server) tuple.

	dnsConnsMu sync.Mutex
	dnsConns map[dnsConnKey][]*dnsConn

The type dnsConn has the following mutable fields:

type dnsConn struct {
	..internal fields
	// Resolver.dnsConnsMu must be locked.
	openRefs int
	hasErr bool
	lastNewConn time.Time
}

r.exchange now calls a general dnsRoundTrip() function, in a new dns_conn.go file. It handles the r.dial(), Conn.Read/Readfull, conn.Write, and conn.Close functions instead of exchange, dnsPacketRoundTrip and dnsStreamRoundTrip functions.

dnsRoundTrip locks the dnsConns map. If the []*dnsConn does not exist for (network, server), then the slice is created and a connection is opened with openRefs of 1. If []*dnsConn exists for (network, server), dnsRoundTrip checks each existing dnsConn for one with openRefs < MaxConnsPerSocket. Also, hasErr cannot be true to use the dnsConn. If dnsRoundTrip gets to end of slice, a new dnsConn is created.

The newDnsConn function starts three different go routines: open, read and loop. There is also one go routine created for each individual request, to handle specific context and timeout. There are many channels for the routines to communicate to each other:

	addIdCh: make(chan *dnsConnAddIdParam),
	hasOpenedCh: make(chan *dnsConnOpenReturn, 1),
	stopReadCh: make(chan chan struct{}),
	readErrorCh: make(chan error, 1),
	readCh: make(chan []byte),
	reqTimedOutCh: make(chan uint16),
	reqCancelledCh: make(chan uint16),
	reqTimerExitCh: make(chan uint16),
	closeCh: make(chan struct{}, 1),

For each request, dnsRoundTrip puts the request information on the AddIdCh, which is received by Loop. There is a callback in dnsConnAddIdParam and dnsRoundTrip blocks each request dnsRoundTrip receives on this callback channel. The callback channel will receiver (dnsmessage.Parser, dnsmessage.Header, error) struct, including errors from opening connection.

hasOpenedCh is first sent on by the Open go routine, after r.dial either returns a Conn or an error. Read and Loop go routines each take the open results and put it back. Read takes the Open result once to get the Conn and exits if there was an open error. Loop, when it handles the AddIdCh, will block until it receives on hasOpenedCh. Loop puts the result back into the channel. If there is an Open error, Loop sends the error on the callback channel to dnsRoundTrip. If there is not an open error, Loop gets a unique ID, adds the ID to the request map and writes the DNS message on the wire.

stopReadCh is a callback channel sent from Loop to Read back to Loop. When Loop is closing the connection, Loop uses the stopReadCh channel to ensure the Read go routine closes.

readErrorCh goes from Read to Loop in case of an error from c.Read. The channel is buffered and so does not block. Read then blocks until it receives on stopReadCh. Loop handles readErrorCh by sending the read error back to all dnsRoundTrip callbacks still listening.

readCh goes from Read to Loop is only for the byte slice from successful Conn.Read or Conn.ReadFull. Loop handles readCh by first parsing only the Header. For both udp and tcp connections, the loop ignores parsing errors. Before, only bad udp responses ignored parsing errors. I needed to add tcp connections as well since a tcp connection could send a correct response for a request timed out. If the request ID is found and it matches the question for that ID, the results are sent on the addIdCh callback and the go routine for the request's timer is cancelled.

reqTimedOutCh, reqCancelledCh and reqTimerExitCh are sent from go routines for each request to Loop. reqTimedOutCh is sent on when timer.C returns, after the DNS request's timeout. reqCancelledCh is to handle ctx.Done for the specific request. There is also a buffered cancelTimer for every specific request, which Loop sends on for a successful request. reqTimerExitCh is sent from request go routine to Loop for every request go routine exit. Only after receiving reqTimerExitCh does Loop delete the ID.

closeCh goes from dnsRoundTrip to the loop. When dnsRoundTrip receives the result back from callback sent on addIdCh, the dnsConnsMu mutex is first locked. Then openRefs is decremented and if dnsRoundTrip receives an error, sets hasErr to true.

If openRefs is 0 and hasErr is true, dnsRoundTrip sends on closeCh and dnsConn is deleted from dnsConns. If openRefs is 0 but Conn is not in error, timer.AfterFunc is called. The passed function checks locks dnsConnsMu. The passed function sends on closeCh and deletes dnsConn if all three are true: dnsConn still exists, openRefs is 0 and lastNewConn does not indicate a new connection after when timer.AfterFunc was called. Loop's closeCh handler makes sure the Read and request timer go routines all close and then Loop is exited.

For testing, along with some new tests, fakeDNSConn had a channel added from the Write function to Read function. The previous tests assumed Write would be called before Read. dnsConn likely starts Read before the first Write. So Read for fakeDNSConn blocks until it receives a parsed question from Write. Write starts a go routine to write on that channel but not block.

Two existing tests needed to be substantially changed. TestIgnoreDNSForgeries called dnsPacketRoundTrip directly. LookupHost was called instead, with the fake connection still sending two bad responses and one good one for every request. TestStrictErrorsLookupIP had an issue with flaky failures on its last test case. The test sent a correct response for the A request and a timeout error for the AAAA request. If the dnsConn Loop received the error for the AAAA request first, it would return the timeout error for both requests. The rh function now checks that the correct A request was returned before the timeout for the AAAA request, which is how a DNS resolver which ignores AAAA records would act.

@tsellers-r7
Copy link

++ on addressing the root cause here. I believe that I've run into this as well with a highly concurrent system making many HTTP requests. Some of the requests are to the same API endpoint and we see results where one connection works and another fails with dial tcp: lookup $hostname: no such host.

This is on macOS where the default per-process file descriptor limit is 256. Increasing the limit resolved the problem.

@vkuznet

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants