net/http: possible race on persistConn.roundtrip vs. persistConn.readloop in transport when using httptrace #59310
Labels
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Given a simple tcp server, that immediately closes an incoming connection:
Given a client that uses httptrace:
What did you expect to see?
What did you see instead?
Background:
The use-case for this scenario is that we want to know if we can safely retry POST requests. The hypothesis is that if
GotConn
orWroteHeaders
on httptrace were not called during the request, we know that no data of the POST request has actually reached the backend. In our case, the backend is behind a L4 reverse proxy which always accepts connections first, but immediately closes them if it cannot forward the connection to its own backend. This has lead to issues where the actual backend would not even accept connections but we don't see that because from our end, the connection got accepted by the reverse proxy. So we want to have some way of knowing whether we could not send HTTP headers. Unfortunately, the errorEOF
is very generic and we can't really make any assumptions just based on this error, as it can happen anywhere during the lifecycle of a connection.Technical details:
When the
GotConn
callback adds a minimal delay of 0.1ms, the behavior is different andhttp: server closed idle connection
is returned. The race I observed is this:persistConn.roundtrip
is called first, increasingnumExpectedResponses
to 1, which preventspersistConn.readLoopPeekFailLocked
from being called later.readLoopPeekFailLocked
is used to peek into the connection to see if it is dead or alive. IfEOF
is encountered during the peek,errServerClosedIdle
is returned. In this case this does not work.GotConn
callback function ofhttptrace
,persistConn.readLoop
is called first, whilenumExpectedResponses
is still 0 (from zero value initialization, NOT from getting reduced after actually reading a response, which leads toreadLoopPeekFailLocked
getting called and the finalerrServerClosedIdle
returned to the caller.GotConn
is below 100 microseconds, the returned error starts flapping betweenEOF
andhttp: server closed idle connection
. Above 500 microseconds we consistently seehttp: server closed idle connection
, below 10 microseconds we consistently seeEOF
.EOF
in the no delay case is actually anothingWrittenError
which gets unwrapped to atransportReadFromServerError
which then unwraps theEOF
that is returned toTransport.RoundTrip
, stripping all helpful information unfortunately.The text was updated successfully, but these errors were encountered: