|
|
Descriptionhttp: fix Transport deadlock
This patch intend to fix following issues.
http://code.google.com/p/go/issues/detail?id=2616
Fixes issue 2616.
Patch Set 1 #Patch Set 2 : diff -r 30af37601706 https://go.googlecode.com/hg/ #
Total comments: 2
Patch Set 3 : diff -r 518f09c59498 https://go.googlecode.com/hg/ #Patch Set 4 : diff -r 06129c716fbb https://go.googlecode.com/hg/ #
Total comments: 7
Patch Set 5 : diff -r 06129c716fbb https://go.googlecode.com/hg/ #MessagesTotal messages: 14
Okay, I've now processed your CLA, so this can be submitted once it's cleaned up. In addition to the things below, can you add a test? How do you reproduce this? I'm guessing that it involves a lot of traffic with a server closing at some point? http://codereview.appspot.com/5532057/diff/2001/src/pkg/net/http/transport.go File src/pkg/net/http/transport.go (right): http://codereview.appspot.com/5532057/diff/2001/src/pkg/net/http/transport.go... src/pkg/net/http/transport.go:498: func (pc *persistConn) expectingResponseOrClose() (ret bool) { don't name this result parameter. (remove "ret") but I don't like this part of the change, anyway: you're combining an accessor with an action. I would just rewrite the caller to do the "if" condition. that might involve just deleting this whole function. http://codereview.appspot.com/5532057/diff/2001/src/pkg/net/http/transport.go... src/pkg/net/http/transport.go:652: func (pc *persistConn) close(hasLock bool) { I don't like reading callsites with foo(false) and foo(true)... too hard to read. I would break this into two functions: close and closeLocked. You can have close just be: func (pc *persistConn) close() { pc.lk.Lock() defer pc.lk.Unlock() pc.closeLocked() }
Sign in to reply to this message.
Also, let's change the first line of the CL description to: http: fix Transport deadlock On Tue, Jan 10, 2012 at 3:57 PM, <bradfitz@golang.org> wrote: > Okay, I've now processed your CLA, so this can be submitted once it's > cleaned up. > > In addition to the things below, can you add a test? How do you > reproduce this? I'm guessing that it involves a lot of traffic with a > server closing at some point? > > > > http://codereview.appspot.com/**5532057/diff/2001/src/pkg/net/** > http/transport.go<http://codereview.appspot.com/5532057/diff/2001/src/pkg/net/http/transport.go> > File src/pkg/net/http/transport.go (right): > > http://codereview.appspot.com/**5532057/diff/2001/src/pkg/net/** > http/transport.go#newcode498<http://codereview.appspot.com/5532057/diff/2001/src/pkg/net/http/transport.go#newcode498> > src/pkg/net/http/transport.go:**498: func (pc *persistConn) > expectingResponseOrClose() (ret bool) { > don't name this result parameter. (remove "ret") > > but I don't like this part of the change, anyway: you're combining an > accessor with an action. > > I would just rewrite the caller to do the "if" condition. > > that might involve just deleting this whole function. > > http://codereview.appspot.com/**5532057/diff/2001/src/pkg/net/** > http/transport.go#newcode652<http://codereview.appspot.com/5532057/diff/2001/src/pkg/net/http/transport.go#newcode652> > src/pkg/net/http/transport.go:**652: func (pc *persistConn) close(hasLock > bool) { > I don't like reading callsites with foo(false) and foo(true)... too hard > to read. > > I would break this into two functions: close and closeLocked. You can > have close just be: > > func (pc *persistConn) close() { > pc.lk.Lock() > defer pc.lk.Unlock() > pc.closeLocked() > } > > http://codereview.appspot.com/**5532057/<http://codereview.appspot.com/5532057/> >
Sign in to reply to this message.
> > In addition to the things below, can you add a test? How do you > > reproduce this? I'm guessing that it involves a lot of traffic with a > > server closing at some point? Yes. So need to execute a stub http server for testing(be setting KeepAliveTimeout very low) For now a reproducable program is here. https://github.com/mocchira/digestw/ I can add a test and Other review points to me is LGTM. So I'll do it (maybe this week end) Thanks
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, bradfitz@golang.org (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
summary 1. request.go's bug was fixed for the last few days so it's removed 2. previous review points was applied to transport.go 3. modified server.go. To reproduce this bugs, server closing at some point will be needed, so I implemented 'MaxKeepAliveRequests' as other http daemons do. This probably will be discussed at an another Thread? 4. added tests. Theare are three factors to reproduce this dead lock. Following values are specified. - MaxKeepAliveRequests (5) - Number of requests (1e5) - Timeout for detecting dead lock (30sec) These suitable values might depend on environments...
Sign in to reply to this message.
The fix and tests look good. The only part I don't like is adding MaxKeepAliveRequests, which I believe is just unnecessary complexity and implies that it needs to be set. The number of requests per connection doesn't mean something in a Go world like it does in a child-process-per-connection Apache-style world. Can you make the tests still work without MaxKeepAliveRequests? Also, the http.Server struct was modified in another CL, so that part of the CL wouldn't apply cleanly anyway. But let's try to do this without it. On Sun, Jan 15, 2012 at 12:37 AM, <nekotaroh@gmail.com> wrote: > summary > 1. request.go's bug was fixed for the last few days so it's removed > 2. previous review points was applied to transport.go > 3. modified server.go. To reproduce this bugs, server closing at some > point will be needed, so I implemented 'MaxKeepAliveRequests' as other > http daemons do. This probably will be discussed at an another Thread? > 4. added tests. Theare are three factors to reproduce this dead lock. > Following values are specified. > - MaxKeepAliveRequests (5) > - Number of requests (1e5) > - Timeout for detecting dead lock (30sec) > These suitable values might depend on environments... > > http://codereview.appspot.com/**5532057/<http://codereview.appspot.com/5532057/> >
Sign in to reply to this message.
> The only part I don't like is adding MaxKeepAliveRequests, which I believe > is just unnecessary complexity and implies that it needs to be set. The > number of requests per connection doesn't mean something in a Go world like > it does in a child-process-per-connection Apache-style world. absolutely. One of this purpose is to protect a process against some kind of memory leaks, so there is no need to do this. > Can you make the tests still work without MaxKeepAliveRequests? > > Also, the http.Server struct was modified in another CL, so that part of > the CL wouldn't apply cleanly anyway. But let's try to do this without it. I just realized it's time to use the Hijack... right? I'll do it tomorrow. Thank you for your time!
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, bradfitz@golang.org, nekotaroh@gmail.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Looking much better and simpler, thanks! Almost there. Few more comments in the test.... http://codereview.appspot.com/5532057/diff/12001/src/pkg/net/http/client_test.go File src/pkg/net/http/client_test.go (right): http://codereview.appspot.com/5532057/diff/12001/src/pkg/net/http/client_test... src/pkg/net/http/client_test.go:435: transport_dl_close_probability = 25 // probability(0-100) of server side close no underscores in variable names. you can also move this whole const ( .... ) block inside the TestClientTransportDeadline function, and then you can remove the "transport_dl_" prefix. const ( closeProbability = 25 // ... numRequests = 1e4 // ... parallel = 8 // ... ) var ( timeout = 30 * time.Second readTImeout = 100 * time.Millisecond ) http://codereview.appspot.com/5532057/diff/12001/src/pkg/net/http/client_test... src/pkg/net/http/client_test.go:442: func TestClientTransportDeadlock(t *testing.T) { can this test run for a long time? It looks like it? You'll probably want to skip this test in short mode. http://weekly.golang.org/pkg/testing/#Short if testing.Short() { t.Logf("skipping in short mode") return } http://codereview.appspot.com/5532057/diff/12001/src/pkg/net/http/client_test... src/pkg/net/http/client_test.go:443: client := &Client{Transport: &Transport{MaxIdleConnsPerHost: transport_dl_parallel}} break this into multiple lines. http://codereview.appspot.com/5532057/diff/12001/src/pkg/net/http/client_test... src/pkg/net/http/client_test.go:455: conn, buf, _ := hijacker.Hijack() nice. :-) http://codereview.appspot.com/5532057/diff/12001/src/pkg/net/http/client_test... src/pkg/net/http/client_test.go:465: timeout := func() <-chan time.Time { delete this function. http://codereview.appspot.com/5532057/diff/12001/src/pkg/net/http/client_test... src/pkg/net/http/client_test.go:481: case <-timeout(): case <-time.After(timeout): http://codereview.appspot.com/5532057/diff/12001/src/pkg/net/http/client_test... src/pkg/net/http/client_test.go:482: panic(fmt.Sprintf("probably dead locked at %v panic for stack trace", idx)) t.Fatalf("....") (people debugging this later can change this to a panic if they want backtraces.
Sign in to reply to this message.
Hello golang-dev@googlegroups.com, bradfitz@golang.org, nekotaroh@gmail.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
Unfortunately the test doesn't pass on my machine. I'll take a look at writing a different test, but I'm pretty sure I'll take your fix to the non-test part. On Sat, Jan 21, 2012 at 4:21 PM, <yoshiyuki.kanno@stoic.co.jp> wrote: > Hello golang-dev@googlegroups.com, bradfitz@golang.org, > nekotaroh@gmail.com (cc: golang-dev@googlegroups.com), > > Please take another look. > > > http://codereview.appspot.com/**5532057/<http://codereview.appspot.com/5532057/> >
Sign in to reply to this message.
Oops. Since This new test is executed in parallel(default is 8), this behavior might affect the failed result depends on num of core, GO_MAX_PLOCS, and so on. Previous one is equal to setting to 'parallel is 1' and 'numrequests is 1e5' . 2012/01/24 14:46 "Brad Fitzpatrick" <bradfitz@golang.org>: > > Unfortunately the test doesn't pass on my machine. > > I'll take a look at writing a different test, but I'm pretty sure I'll take your fix to the non-test part. > > > On Sat, Jan 21, 2012 at 4:21 PM, <yoshiyuki.kanno@stoic.co.jp> wrote: >> >> Hello golang-dev@googlegroups.com, bradfitz@golang.org, >> nekotaroh@gmail.com (cc: golang-dev@googlegroups.com), >> >> Please take another look. >> >> >> http://codereview.appspot.com/5532057/ > >
Sign in to reply to this message.
LGTM I've submitted a more reliable test. Submitting just the transport.go part (the actual fix) part of this CL. Thanks again! On Mon, Jan 23, 2012 at 11:13 PM, yoshiyuki kanno <nekotaroh@gmail.com>wrote: > Oops. > Since This new test is executed in parallel(default is 8), this behavior > might affect the failed result depends on num of core, GO_MAX_PLOCS, and so > on. Previous one is equal to setting to 'parallel is 1' and 'numrequests > is 1e5' . > 2012/01/24 14:46 "Brad Fitzpatrick" <bradfitz@golang.org>: > > > > > Unfortunately the test doesn't pass on my machine. > > > > I'll take a look at writing a different test, but I'm pretty sure I'll > take your fix to the non-test part. > > > > > > On Sat, Jan 21, 2012 at 4:21 PM, <yoshiyuki.kanno@stoic.co.jp> wrote: > >> > >> Hello golang-dev@googlegroups.com, bradfitz@golang.org, > >> nekotaroh@gmail.com (cc: golang-dev@googlegroups.com), > >> > >> Please take another look. > >> > >> > >> http://codereview.appspot.com/5532057/ > > > > > >
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=f57165f82a11 *** net/http: fix Transport deadlock This patch intend to fix following issues. http://code.google.com/p/go/issues/detail?id=2616 Fixes issue 2616. R=golang-dev, bradfitz, nekotaroh CC=golang-dev http://codereview.appspot.com/5532057 Committer: Brad Fitzpatrick <bradfitz@golang.org>
Sign in to reply to this message.
|