Issue 7612045: code review 7612045: net: band-aid for windows network poller

dvyukov

Hello golang-dev@googlegroups.com, I'd like you to review this change to https://dvyukov%40google.com@code.google.com/p/go/

12 years ago (2013-03-18 06:56:34 UTC) #1

brainman

Do you have a plan for *general* solution. I suspect many other syscalls will be ...

12 years ago (2013-03-18 07:59:27 UTC) #2

dvyukov

On 2013/03/18 07:59:27, brainman wrote: > Do you have a plan for *general* solution. I ...

12 years ago (2013-03-18 08:07:50 UTC) #3

dvyukov

On 2013/03/18 08:27:31, brainman wrote: > I don't know of any. But what do I ...

12 years ago (2013-03-18 08:37:29 UTC) #5

brainman

I re-run net benchmarks to see regression on windows. I compared two "clean" versions before ...

12 years ago (2013-03-19 00:39:32 UTC) #6

I re-run net benchmarks to see regression on windows. I compared two "clean"
versions before and after new scheduler:

# ~/go/root/misc/benchcmp id16000.txt id16303.txt
benchmark                          old ns/op    new ns/op    delta
BenchmarkTCPPersistent                 87501       265628  +203.57%
BenchmarkTCPPersistent-2               42188        52813  +25.18%
BenchmarkTCPPersistent-4               39375        65625  +66.67%
BenchmarkTCPPersistentTimeout          94532       253128  +167.77%
BenchmarkTCPPersistentTimeout-2        43438        59375  +36.69%
BenchmarkTCPPersistentTimeout-4        42813        64688  +51.09%

I also compared you proposed change:

# ~/go/root/misc/benchcmp id16000.txt id16303cl7612045.txt
benchmark                          old ns/op    new ns/op    delta
BenchmarkTCPPersistent                 87501        57500  -34.29%
BenchmarkTCPPersistent-2               42188        58125  +37.78%
BenchmarkTCPPersistent-4               39375        70938  +80.16%
BenchmarkTCPPersistentTimeout          94532        62813  -33.55%
BenchmarkTCPPersistentTimeout-2        43438        63750  +46.76%
BenchmarkTCPPersistentTimeout-4        42813        71094  +66.06%

which is better, but, still regression on some tests.

Then I got rid of net.runtime_blockingSyscallHint altogether, as per your
suggestion:

# hg diff
diff -r 1130cc72c645 src/pkg/runtime/cgocall.c
--- a/src/pkg/runtime/cgocall.c Mon Mar 18 15:33:04 2013 -0700
+++ b/src/pkg/runtime/cgocall.c Tue Mar 19 11:34:51 2013 +1100
@@ -145,7 +145,10 @@
         * so it is safe to call while "in a system call", outside
         * the $GOMAXPROCS accounting.
         */
-       runtimeÂ·entersyscall();
+       if(Windows) {
+               runtimeÂ·entersyscallblock();
+       } else
+               runtimeÂ·entersyscall();
        runtimeÂ·asmcgocall(fn, arg);
        runtimeÂ·exitsyscall();

# ~/go/root/misc/benchcmp id16000.txt id16303alex.txt
benchmark                          old ns/op    new ns/op    delta
BenchmarkTCPPersistent                 87501        92969   +6.25%
BenchmarkTCPPersistent-2               42188        62813  +48.89%
BenchmarkTCPPersistent-4               39375        72657  +84.53%
BenchmarkTCPPersistentTimeout          94532       103126   +9.09%
BenchmarkTCPPersistentTimeout-2        43438        78907  +81.65%
BenchmarkTCPPersistentTimeout-4        42813        77344  +80.66%
#

and it is still bad.

I am not sure what to do.

Alex

Sign in to reply to this message.

Ewan Chou

On 2013/03/19 00:39:32, brainman wrote: > I re-run net benchmarks to see regression on windows. ...

12 years ago (2013-03-19 01:55:56 UTC) #7

On 2013/03/19 00:39:32, brainman wrote:
> I re-run net benchmarks to see regression on windows. I compared two "clean"
> versions before and after new scheduler:
> 
> # ~/go/root/misc/benchcmp id16000.txt id16303.txt
> benchmark                          old ns/op    new ns/op    delta
> BenchmarkTCPPersistent                 87501       265628  +203.57%
> BenchmarkTCPPersistent-2               42188        52813  +25.18%
> BenchmarkTCPPersistent-4               39375        65625  +66.67%
> BenchmarkTCPPersistentTimeout          94532       253128  +167.77%
> BenchmarkTCPPersistentTimeout-2        43438        59375  +36.69%
> BenchmarkTCPPersistentTimeout-4        42813        64688  +51.09%
> 
> I also compared you proposed change:
> 
> # ~/go/root/misc/benchcmp id16000.txt id16303cl7612045.txt
> benchmark                          old ns/op    new ns/op    delta
> BenchmarkTCPPersistent                 87501        57500  -34.29%
> BenchmarkTCPPersistent-2               42188        58125  +37.78%
> BenchmarkTCPPersistent-4               39375        70938  +80.16%
> BenchmarkTCPPersistentTimeout          94532        62813  -33.55%
> BenchmarkTCPPersistentTimeout-2        43438        63750  +46.76%
> BenchmarkTCPPersistentTimeout-4        42813        71094  +66.06%
> 
> which is better, but, still regression on some tests.
> 
> Then I got rid of net.runtime_blockingSyscallHint altogether, as per your
> suggestion:
> 
> # hg diff
> diff -r 1130cc72c645 src/pkg/runtime/cgocall.c
> --- a/src/pkg/runtime/cgocall.c Mon Mar 18 15:33:04 2013 -0700
> +++ b/src/pkg/runtime/cgocall.c Tue Mar 19 11:34:51 2013 +1100
> @@ -145,7 +145,10 @@
>          * so it is safe to call while "in a system call", outside
>          * the $GOMAXPROCS accounting.
>          */
> -       runtimeÂ·entersyscall();
> +       if(Windows) {
> +               runtimeÂ·entersyscallblock();
> +       } else
> +               runtimeÂ·entersyscall();
>         runtimeÂ·asmcgocall(fn, arg);
>         runtimeÂ·exitsyscall();
> 
> # ~/go/root/misc/benchcmp id16000.txt id16303alex.txt
> benchmark                          old ns/op    new ns/op    delta
> BenchmarkTCPPersistent                 87501        92969   +6.25%
> BenchmarkTCPPersistent-2               42188        62813  +48.89%
> BenchmarkTCPPersistent-4               39375        72657  +84.53%
> BenchmarkTCPPersistentTimeout          94532       103126   +9.09%
> BenchmarkTCPPersistentTimeout-2        43438        78907  +81.65%
> BenchmarkTCPPersistentTimeout-4        42813        77344  +80.66%
> #
> 
> and it is still bad.
> 
> I am not sure what to do.
> 
> Alex

How about "if(windows && runtime.NumCPU()==1)"

Sign in to reply to this message.

Ewan Chou

I'm sorry, I misunderstood the comparison. So does it mean the new scheduler is a ...

12 years ago (2013-03-19 02:26:25 UTC) #8

brainman

On 2013/03/19 02:26:25, Ewan Chou wrote: > I'm sorry, I misunderstood the comparison. Yeh, your ...

12 years ago (2013-03-19 02:28:58 UTC) #9

dvyukov

On 2013/03/19 00:39:32, brainman wrote: > I re-run net benchmarks to see regression on windows. ...

12 years ago (2013-03-21 09:04:38 UTC) #10

On 2013/03/19 00:39:32, brainman wrote:
> I re-run net benchmarks to see regression on windows. I compared two "clean"
> versions before and after new scheduler:
> 
> # ~/go/root/misc/benchcmp id16000.txt id16303.txt
> benchmark                          old ns/op    new ns/op    delta
> BenchmarkTCPPersistent                 87501       265628  +203.57%
> BenchmarkTCPPersistent-2               42188        52813  +25.18%
> BenchmarkTCPPersistent-4               39375        65625  +66.67%
> BenchmarkTCPPersistentTimeout          94532       253128  +167.77%
> BenchmarkTCPPersistentTimeout-2        43438        59375  +36.69%
> BenchmarkTCPPersistentTimeout-4        42813        64688  +51.09%
> 
> I also compared you proposed change:
> 
> # ~/go/root/misc/benchcmp id16000.txt id16303cl7612045.txt
> benchmark                          old ns/op    new ns/op    delta
> BenchmarkTCPPersistent                 87501        57500  -34.29%
> BenchmarkTCPPersistent-2               42188        58125  +37.78%
> BenchmarkTCPPersistent-4               39375        70938  +80.16%
> BenchmarkTCPPersistentTimeout          94532        62813  -33.55%
> BenchmarkTCPPersistentTimeout-2        43438        63750  +46.76%
> BenchmarkTCPPersistentTimeout-4        42813        71094  +66.06%
> 
> which is better, but, still regression on some tests.
> 
> Then I got rid of net.runtime_blockingSyscallHint altogether, as per your
> suggestion:
> 
> # hg diff
> diff -r 1130cc72c645 src/pkg/runtime/cgocall.c
> --- a/src/pkg/runtime/cgocall.c Mon Mar 18 15:33:04 2013 -0700
> +++ b/src/pkg/runtime/cgocall.c Tue Mar 19 11:34:51 2013 +1100
> @@ -145,7 +145,10 @@
>          * so it is safe to call while "in a system call", outside
>          * the $GOMAXPROCS accounting.
>          */
> -       runtimeÂ·entersyscall();
> +       if(Windows) {
> +               runtimeÂ·entersyscallblock();
> +       } else
> +               runtimeÂ·entersyscall();
>         runtimeÂ·asmcgocall(fn, arg);
>         runtimeÂ·exitsyscall();
> 
> # ~/go/root/misc/benchcmp id16000.txt id16303alex.txt
> benchmark                          old ns/op    new ns/op    delta
> BenchmarkTCPPersistent                 87501        92969   +6.25%
> BenchmarkTCPPersistent-2               42188        62813  +48.89%
> BenchmarkTCPPersistent-4               39375        72657  +84.53%
> BenchmarkTCPPersistentTimeout          94532       103126   +9.09%
> BenchmarkTCPPersistentTimeout-2        43438        78907  +81.65%
> BenchmarkTCPPersistentTimeout-4        42813        77344  +80.66%
> #
> 
> and it is still bad.
> 
> I am not sure what to do.


Thanks for the numbers.

I don't have explanation why new sched + this patch is worse than old sched with
GOMAXPROCS>1... Does profiling works on Windows? Can you try to collect profiles
and CPU utilization?

I am also not sure why this patch is faster than tip on my machine and slower
than tip on your machine with GOMAXPROCS>1. Are you sure this patch slowdowns
net benchamrks as compared to tip?

Proper fix (rewrite) for windows network poller is a significant change. Most
likely I won't have time for it until summer.

Sign in to reply to this message.

brainman

Profiling does work on windows, but I don't know what you want. If you could ...

12 years ago (2013-03-21 11:38:20 UTC) #11

dvyukov

On 2013/03/21 11:38:20, brainman wrote: > Profiling does work on windows, but I don't know ...

12 years ago (2013-03-21 18:17:53 UTC) #12

On 2013/03/21 11:38:20, brainman wrote:
> Profiling does work on windows, but I don't know what you want. If you could
> provide instructions I will collect the info. I will also rerun all my
> benchmarks again to verify. I tested everything on 386 if it matters.
> 
> I would also try to rewrite netpoller myself, but I don't know where to start.
> If you would describe your plan, perhaps, I will have a go.

Here are some rough ideas.

First, Read/Write generate garbage which is unfortunate. Since there is at most
1 pending Read and 1 pending Write operations, it's possible to embed the
required structures directly into netFD:

type netFD struct {
  ...
  ro syscall.Overlapped
  rb syscall.WSABuf
  rt Timer

  wo syscall.Overlapped
  wb syscall.WSABuf
  wt Timer
}

Deadline timer should be set only once (when the deadline is set), instead of
resetting it on each operation. Expired timer calls CancelIOEx().

Read/Write issue the operation (WSARecv/Send) and block on a lightweight
semaphore (see runtime/netpoll.goc).
runtime.netpoll() calls GetQueuedCompletionStatusEx() and unblocks a batch
of ready goroutines.
netFD.Close() calls CancelIOEx for outstanding operations.

So, why I expect to see performance improvement.
1. Do not allocate readOp/writeOp, timer on each operation.
2. Do not arm timer on each operation (there is global mutex).
3. Do not do 3-chan select (this is also allocates memory and locks mutexes, in
particular cancelc chan mutex is shared between read and write operations).
4. Dequeue batches of ready operations with GetQueuedCompletionStatusEx().
5. Better integration with scheduler with runtime.netpoll().

It's also possible to look into RIO windows API, however it's available only
in new versions. I remember something about pinning WSA buffers into physical
memory, so that kernel does not need to pin/unpin them on every operations.
But I can't find how to do it. Perhaps just VirtualLock() will do, but requires
investigation.

1 and 4 look quite simple and most likely  can be done separately.
2 and 3 are harder to implement.

Sign in to reply to this message.

brainman

My reply is in http://play.golang.org/p/9bJ323Me7T. Codereview rejects my message because it is too long. Alex

12 years ago (2013-03-22 02:06:06 UTC) #13

Ewan Chou

On 2013/03/22 02:06:06, brainman wrote: > My reply is in http://play.golang.org/p/9bJ323Me7T. Codereview rejects my > ...

12 years ago (2013-03-22 02:25:38 UTC) #14

brainman

On 2013/03/22 02:25:38, Ewan Chou wrote: > On 2013/03/22 02:06:06, brainman wrote: > > My ...

12 years ago (2013-03-22 02:29:00 UTC) #15

Ewan Chou

On 2013/03/22 02:29:00, brainman wrote: > What is your OS? How many cpus you have? ...

12 years ago (2013-03-22 02:47:11 UTC) #16

On 2013/03/22 02:29:00, brainman wrote:
> What is your OS? How many cpus you have? Can you do investigation similar to
> mine on your computer and post results here? Thank you.
> 
> Alex
I'm sorry I don't know a graceful way to apply the patch and revert yet. so here
is only the tip benchmark, it's windows 7 6g and my CPU has 4-core

PASS
BenchmarkTCP4OneShot	     100	   1890108 ns/op	    6218 B/op	      67 allocs/op
BenchmarkTCP4OneShot-2	    1000	    193011 ns/op	    6059 B/op	      67
allocs/op
BenchmarkTCP4OneShot-3	    5000	    181610 ns/op	    6056 B/op	      67
allocs/op
BenchmarkTCP4OneShotTimeout	      50	   2140122 ns/op	    6730 B/op	      75
allocs/op
BenchmarkTCP4OneShotTimeout-2	     500	    210012 ns/op	    6729 B/op	      75
allocs/op
BenchmarkTCP4OneShotTimeout-3	    5000	    206611 ns/op	    6727 B/op	      75
allocs/op
BenchmarkTCP4Persistent	     500	    372021 ns/op	    1659 B/op	      15
allocs/op
BenchmarkTCP4Persistent-2	   10000	     19701 ns/op	    1541 B/op	      14
allocs/op
BenchmarkTCP4Persistent-3	   20000	     46102 ns/op	    1539 B/op	      14
allocs/op
BenchmarkTCP4PersistentTimeout	     500	    378021 ns/op	    2364 B/op	      23
allocs/op
BenchmarkTCP4PersistentTimeout-2	    2000	     58503 ns/op	    2308 B/op	     
22 allocs/op
BenchmarkTCP4PersistentTimeout-3	    2000	     51002 ns/op	    2325 B/op	     
23 allocs/op
BenchmarkTCP6OneShot	      50	   2100120 ns/op	    6169 B/op	      67 allocs/op
BenchmarkTCP6OneShot-2	     500	    232013 ns/op	    6237 B/op	      67
allocs/op
BenchmarkTCP6OneShot-3	    1000	    214012 ns/op	    6214 B/op	      67
allocs/op
BenchmarkTCP6OneShotTimeout	     100	   1890108 ns/op	    6862 B/op	      75
allocs/op
BenchmarkTCP6OneShotTimeout-2	    1000	    219012 ns/op	    6915 B/op	      75
allocs/op
BenchmarkTCP6OneShotTimeout-3	    1000	    137007 ns/op	    6925 B/op	      75
allocs/op
BenchmarkTCP6Persistent	     500	    318018 ns/op	    1664 B/op	      15
allocs/op
BenchmarkTCP6Persistent-2	   10000	     28501 ns/op	    1542 B/op	      14
allocs/op
BenchmarkTCP6Persistent-3	    5000	     47802 ns/op	    1571 B/op	      14
allocs/op
BenchmarkTCP6PersistentTimeout	     500	    402023 ns/op	    2370 B/op	      23
allocs/op
BenchmarkTCP6PersistentTimeout-2	    5000	     37002 ns/op	    2273 B/op	     
22 allocs/op
BenchmarkTCP6PersistentTimeout-3	    2000	     52503 ns/op	    2326 B/op	     
23 allocs/op
ok  	net	7.974s

Sign in to reply to this message.

brainman

What version of go is this benchmark for? Use "hg id" to discover. Please supply ...

12 years ago (2013-03-22 03:25:25 UTC) #17

Ewan Chou

On 2013/03/22 03:25:25, brainman wrote: > What version of go is this benchmark for? Use ...

12 years ago (2013-03-22 03:37:10 UTC) #18

brainman

On 2013/03/22 03:37:10, Ewan Chou wrote: > > go version is f12b24ea373f+ tip, windows version ...

12 years ago (2013-03-22 03:48:14 UTC) #19

Ewan Chou

On 2013/03/22 03:48:14, brainman wrote: > I can only supply valid TCPOneShot benchmark because sometimes ...

12 years ago (2013-03-22 04:52:50 UTC) #20

brainman

On 2013/03/22 04:52:50, Ewan Chou wrote: > > I can only supply valid TCPOneShot benchmark ...

12 years ago (2013-03-22 06:04:12 UTC) #21

Ewan Chou

On 2013/03/22 06:04:12, brainman wrote: > On 2013/03/22 04:52:50, Ewan Chou wrote: > > > ...

12 years ago (2013-03-22 06:09:22 UTC) #22

Ewan Chou

On 2013/03/22 06:04:12, brainman wrote: > On 2013/03/22 04:52:50, Ewan Chou wrote: > > > ...

12 years ago (2013-03-22 06:09:54 UTC) #23

dvyukov

If you see errors in tcp benchmarks, try to apply the following change: --- a/src/pkg/net/tcp_test.go ...

12 years ago (2013-03-22 06:10:13 UTC) #24

If you see errors in tcp benchmarks, try to apply the following change:

--- a/src/pkg/net/tcp_test.go Thu Mar 21 12:54:19 2013 +0400
+++ b/src/pkg/net/tcp_test.go Fri Mar 22 10:08:15 2013 +0400
@@ -58,7 +58,7 @@
 func benchmarkTCP(b *testing.B, persistent, timeout bool, laddr string) {
  const msgLen = 512
  conns := b.N
- numConcurrent := runtime.GOMAXPROCS(-1) * 16
+ numConcurrent := runtime.GOMAXPROCS(-1) * 4
  msgs := 1
  if persistent {
  conns = numConcurrent


I had to apply it locally, because Linux also has some internal limits on
tcp connections per second. When you hit some limit, kernel throttles tcp
activity unbelievably. Potentially  this can solve the windows buffers bug.





On Fri, Mar 22, 2013 at 10:04 AM, <alex.brainman@gmail.com> wrote:

> On 2013/03/22 04:52:50, Ewan Chou wrote:
>
>  I can only supply valid TCPOneShot benchmark because sometimes I got
>>
> "An
>
>> operation on a socket could not be performed because the system lacked
>> sufficient buffer space or because a queue was full." error.
>>
>
> I had similar problems. I think the benchmark creates too many
> connections in 1s (default benchmark time). So I used shorter benchmark:
>
> go test net -run=NONE -bench=TCP -benchmem -benchtime=100ms -cpu=1,2,3
>
> Try shorter benchtime until you see no errors.
>
>
>
>  And are there any BAT or EXE version of benchcmp?
>>
>
>
> No, but you should be able to rewrite the script into Go or something.
>
>
>  16007:
>> BenchmarkTCPOneShot         2000            119006 ns/op
>> BenchmarkTCPOneShot-2       5000            108606 ns/op
>> BenchmarkTCPOneShot-3       5000             57803 ns/op
>>
>
>  16292:
>> BenchmarkTCP4OneShot         100           1830105 ns/op
>> BenchmarkTCP4OneShot-2      1000            166009 ns/op
>> BenchmarkTCP4OneShot-3      5000             86404 ns/op
>>
>
> This is your result:
>
>
> benchmark                old ns/op    new ns/op    delta
> BenchmarkTCPOneShot         119006      1830105  +1437.83%
> BenchmarkTCPOneShot-2       108606       166009  +52.85%
> BenchmarkTCPOneShot-3        57803        86404  +49.48%
>
> which shows big slowdown. Are you sure, you are using versions as
> reported? Also, please, make sure your benchmark runs end with no
> errors.
>
>
> Thank you.
>
> Alex
>
>
https://codereview.appspot.**com/7612045/<https://codereview.appspot.com/7612...
>

Sign in to reply to this message.

dvyukov

>> 3. Do not do 3-chan select (this is also allocates memory and locks mutexes, ...

12 years ago (2013-03-22 06:15:48 UTC) #25

dvyukov

>> 5. Better integration with scheduler with runtime.netpoll(). > What are the details, and why ...

12 years ago (2013-03-22 06:20:11 UTC) #26

dvyukov

> On 386 (unlike amd64) all IO submission and cancellation happens on single dedicated thread. ...

12 years ago (2013-03-22 06:21:27 UTC) #27

dvyukov

On Fri, Mar 22, 2013 at 10:21 AM, Dmitry Vyukov <dvyukov@google.com> wrote: > > On ...

12 years ago (2013-03-22 06:22:38 UTC) #28

brainman

On Friday, 22 March 2013 17:21:26 UTC+11, Dmitry Vyukov wrote: > > Please patch net ...

12 years ago (2013-03-22 06:36:39 UTC) #29

On Friday, 22 March 2013 17:21:26 UTC+11, Dmitry Vyukov wrote:
>
> Please patch net package to do "single dedicated thread" on amd64 and 
> benchmark it.
>
>
It gets slower, but marginally:

# ~/go/root/misc/benchcmp netstats/amd64/id16362.txt a
benchmark                           old ns/op    new ns/op    delta
BenchmarkTCP6OneShot                   285138       345681  +21.23%
BenchmarkTCP6OneShot-2                 318339       314433   -1.23%
BenchmarkTCP6OneShot-3                 293926       369117  +25.58%
BenchmarkTCP6OneShotTimeout            288067       378882  +31.53%
BenchmarkTCP6OneShotTimeout-2          291973       349587  +19.73%
BenchmarkTCP6OneShotTimeout-3          300762       380835  +26.62%
BenchmarkTCP6Persistent                 47262        66890  +41.53%
BenchmarkTCP6Persistent-2               50289        59371  +18.06%
BenchmarkTCP6Persistent-3               49020        74214  +51.40%
BenchmarkTCP6PersistentTimeout          53707        69819  +30.00%
BenchmarkTCP6PersistentTimeout-2        55660        69722  +25.26%
BenchmarkTCP6PersistentTimeout-3        55269        82026  +48.41%

benchmark                          old allocs   new allocs    delta
BenchmarkTCP6OneShot                       67           73    8.96%
BenchmarkTCP6OneShot-2                     67           73    8.96%
BenchmarkTCP6OneShot-3                     67           73    8.96%
BenchmarkTCP6OneShotTimeout                75           81    8.00%
BenchmarkTCP6OneShotTimeout-2              75           81    8.00%
BenchmarkTCP6OneShotTimeout-3              75           81    8.00%
BenchmarkTCP6Persistent                    14           18   28.57%
BenchmarkTCP6Persistent-2                  14           18   28.57%
BenchmarkTCP6Persistent-3                  14           19   35.71%
BenchmarkTCP6PersistentTimeout             22           26   18.18%
BenchmarkTCP6PersistentTimeout-2           22           26   18.18%
BenchmarkTCP6PersistentTimeout-3           22           27   22.73%

benchmark                           old bytes    new bytes    delta
BenchmarkTCP6OneShot                     6222         7387   18.72%
BenchmarkTCP6OneShot-2                   6219         7395   18.91%
BenchmarkTCP6OneShot-3                   6217         7370   18.55%
BenchmarkTCP6OneShotTimeout              6927         8071   16.52%
BenchmarkTCP6OneShotTimeout-2            6921         8077   16.70%
BenchmarkTCP6OneShotTimeout-3            6972         8087   15.99%
BenchmarkTCP6Persistent                  1543         2337   51.46%
BenchmarkTCP6Persistent-2                1591         2326   46.20%
BenchmarkTCP6Persistent-3                1571         2381   51.56%
BenchmarkTCP6PersistentTimeout           2286         3058   33.77%
BenchmarkTCP6PersistentTimeout-2         2306         3043   31.96%
BenchmarkTCP6PersistentTimeout-3         2288         3089   35.01%

Sign in to reply to this message.

brainman

On Friday, 22 March 2013 17:22:38 UTC+11, Dmitry Vyukov wrote: > > ... > Btw, ...

12 years ago (2013-03-22 06:38:48 UTC) #30

brainman

On Friday, 22 March 2013 17:10:13 UTC+11, Dmitry Vyukov wrote: > > If you see ...

12 years ago (2013-03-22 06:47:55 UTC) #31

dvyukov

It is what against what? For amd64/16362 you've posted very different numbers before: 16007 - ...

12 years ago (2013-03-22 06:54:43 UTC) #32

It is what against what?

For amd64/16362 you've posted very different numbers before:

16007 - 16362 - still very slow, especially for -cpu=1

# ~/go/root/misc/benchcmp id16007.txt id16362.txt
benchmark                          old ns/op    new ns/op    delta
BenchmarkTCPOneShot                   281255      2031289  +622.22%
BenchmarkTCPOneShot-2                 218754       281255  +28.57%
BenchmarkTCPOneShot-3                 218754       187503  -14.29%
BenchmarkTCPOneShotTimeout            281255      2187542  +677.78%
BenchmarkTCPOneShotTimeout-2          218754       281255  +28.57%
BenchmarkTCPOneShotTimeout-3          218754       218754   +0.00%
BenchmarkTCPPersistent                 87501       343756  +292.86%
BenchmarkTCPPersistent-2               46875        62501  +33.34%
BenchmarkTCPPersistent-3               43750        70313  +60.72%
BenchmarkTCPPersistentTimeout         101564       312506  +207.69%
BenchmarkTCPPersistentTimeout-2        50000        78126  +56.25%
BenchmarkTCPPersistentTimeout-3        50000        62501  +25.00%


On Fri, Mar 22, 2013 at 10:36 AM, brainman <alex.brainman@gmail.com> wrote:

> On Friday, 22 March 2013 17:21:26 UTC+11, Dmitry Vyukov wrote:
>>
>> Please patch net package to do "single dedicated thread" on amd64 and
>> benchmark it.
>>
>>
> It gets slower, but marginally:
>
> # ~/go/root/misc/benchcmp netstats/amd64/id16362.txt a
>
> benchmark                           old ns/op    new ns/op    delta
> BenchmarkTCP6OneShot                   285138       345681  +21.23%
> BenchmarkTCP6OneShot-2                 318339       314433   -1.23%
> BenchmarkTCP6OneShot-3                 293926       369117  +25.58%
> BenchmarkTCP6OneShotTimeout            288067       378882  +31.53%
> BenchmarkTCP6OneShotTimeout-2          291973       349587  +19.73%
> BenchmarkTCP6OneShotTimeout-3          300762       380835  +26.62%
> BenchmarkTCP6Persistent                 47262        66890  +41.53%
> BenchmarkTCP6Persistent-2               50289        59371  +18.06%
> BenchmarkTCP6Persistent-3               49020        74214  +51.40%
> BenchmarkTCP6PersistentTimeout          53707        69819  +30.00%
> BenchmarkTCP6PersistentTimeout-2        55660        69722  +25.26%
> BenchmarkTCP6PersistentTimeout-3        55269        82026  +48.41%
>
> benchmark                          old allocs   new allocs    delta
> BenchmarkTCP6OneShot                       67           73    8.96%
> BenchmarkTCP6OneShot-2                     67           73    8.96%
> BenchmarkTCP6OneShot-3                     67           73    8.96%
> BenchmarkTCP6OneShotTimeout                75           81    8.00%
> BenchmarkTCP6OneShotTimeout-2              75           81    8.00%
> BenchmarkTCP6OneShotTimeout-3              75           81    8.00%
> BenchmarkTCP6Persistent                    14           18   28.57%
> BenchmarkTCP6Persistent-2                  14           18   28.57%
> BenchmarkTCP6Persistent-3                  14           19   35.71%
> BenchmarkTCP6PersistentTimeout             22           26   18.18%
> BenchmarkTCP6PersistentTimeout-2           22           26   18.18%
> BenchmarkTCP6PersistentTimeout-3           22           27   22.73%
>
> benchmark                           old bytes    new bytes    delta
> BenchmarkTCP6OneShot                     6222         7387   18.72%
> BenchmarkTCP6OneShot-2                   6219         7395   18.91%
> BenchmarkTCP6OneShot-3                   6217         7370   18.55%
> BenchmarkTCP6OneShotTimeout              6927         8071   16.52%
> BenchmarkTCP6OneShotTimeout-2            6921         8077   16.70%
> BenchmarkTCP6OneShotTimeout-3            6972         8087   15.99%
> BenchmarkTCP6Persistent                  1543         2337   51.46%
> BenchmarkTCP6Persistent-2                1591         2326   46.20%
> BenchmarkTCP6Persistent-3                1571         2381   51.56%
> BenchmarkTCP6PersistentTimeout           2286         3058   33.77%
> BenchmarkTCP6PersistentTimeout-2         2306         3043   31.96%
> BenchmarkTCP6PersistentTimeout-3         2288         3089   35.01%
>
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "golang-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-dev+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

Sign in to reply to this message.

dvyukov

Now I must admit that I am lost in all these numbers for 386/amd64, old ...

12 years ago (2013-03-22 07:00:43 UTC) #33

Now I must admit that I am lost in all these numbers for 386/amd64, old
tip/tip/patched tip, dedicated thread/no dedicated thread,
GOMAXPROCS=1/>1...
What is still slow, if we apply this patch?


On Fri, Mar 22, 2013 at 10:54 AM, Dmitry Vyukov <dvyukov@google.com> wrote:

> It is what against what?
>
> For amd64/16362 you've posted very different numbers before:
>
> 16007 - 16362 - still very slow, especially for -cpu=1
>
> # ~/go/root/misc/benchcmp id16007.txt id16362.txt
> benchmark                          old ns/op    new ns/op    delta
> BenchmarkTCPOneShot                   281255      2031289  +622.22%
> BenchmarkTCPOneShot-2                 218754       281255  +28.57%
> BenchmarkTCPOneShot-3                 218754       187503  -14.29%
> BenchmarkTCPOneShotTimeout            281255      2187542  +677.78%
> BenchmarkTCPOneShotTimeout-2          218754       281255  +28.57%
> BenchmarkTCPOneShotTimeout-3          218754       218754   +0.00%
> BenchmarkTCPPersistent                 87501       343756  +292.86%
> BenchmarkTCPPersistent-2               46875        62501  +33.34%
> BenchmarkTCPPersistent-3               43750        70313  +60.72%
> BenchmarkTCPPersistentTimeout         101564       312506  +207.69%
> BenchmarkTCPPersistentTimeout-2        50000        78126  +56.25%
> BenchmarkTCPPersistentTimeout-3        50000        62501  +25.00%
>
>
> On Fri, Mar 22, 2013 at 10:36 AM, brainman <alex.brainman@gmail.com>wrote:
>
>> On Friday, 22 March 2013 17:21:26 UTC+11, Dmitry Vyukov wrote:
>>>
>>> Please patch net package to do "single dedicated thread" on amd64 and
>>> benchmark it.
>>>
>>>
>> It gets slower, but marginally:
>>
>> # ~/go/root/misc/benchcmp netstats/amd64/id16362.txt a
>>
>> benchmark                           old ns/op    new ns/op    delta
>> BenchmarkTCP6OneShot                   285138       345681  +21.23%
>> BenchmarkTCP6OneShot-2                 318339       314433   -1.23%
>> BenchmarkTCP6OneShot-3                 293926       369117  +25.58%
>> BenchmarkTCP6OneShotTimeout            288067       378882  +31.53%
>> BenchmarkTCP6OneShotTimeout-2          291973       349587  +19.73%
>> BenchmarkTCP6OneShotTimeout-3          300762       380835  +26.62%
>> BenchmarkTCP6Persistent                 47262        66890  +41.53%
>> BenchmarkTCP6Persistent-2               50289        59371  +18.06%
>> BenchmarkTCP6Persistent-3               49020        74214  +51.40%
>> BenchmarkTCP6PersistentTimeout          53707        69819  +30.00%
>> BenchmarkTCP6PersistentTimeout-2        55660        69722  +25.26%
>> BenchmarkTCP6PersistentTimeout-3        55269        82026  +48.41%
>>
>> benchmark                          old allocs   new allocs    delta
>> BenchmarkTCP6OneShot                       67           73    8.96%
>> BenchmarkTCP6OneShot-2                     67           73    8.96%
>> BenchmarkTCP6OneShot-3                     67           73    8.96%
>> BenchmarkTCP6OneShotTimeout                75           81    8.00%
>> BenchmarkTCP6OneShotTimeout-2              75           81    8.00%
>> BenchmarkTCP6OneShotTimeout-3              75           81    8.00%
>> BenchmarkTCP6Persistent                    14           18   28.57%
>> BenchmarkTCP6Persistent-2                  14           18   28.57%
>> BenchmarkTCP6Persistent-3                  14           19   35.71%
>> BenchmarkTCP6PersistentTimeout             22           26   18.18%
>> BenchmarkTCP6PersistentTimeout-2           22           26   18.18%
>> BenchmarkTCP6PersistentTimeout-3           22           27   22.73%
>>
>> benchmark                           old bytes    new bytes    delta
>> BenchmarkTCP6OneShot                     6222         7387   18.72%
>> BenchmarkTCP6OneShot-2                   6219         7395   18.91%
>> BenchmarkTCP6OneShot-3                   6217         7370   18.55%
>> BenchmarkTCP6OneShotTimeout              6927         8071   16.52%
>> BenchmarkTCP6OneShotTimeout-2            6921         8077   16.70%
>> BenchmarkTCP6OneShotTimeout-3            6972         8087   15.99%
>> BenchmarkTCP6Persistent                  1543         2337   51.46%
>> BenchmarkTCP6Persistent-2                1591         2326   46.20%
>> BenchmarkTCP6Persistent-3                1571         2381   51.56%
>> BenchmarkTCP6PersistentTimeout           2286         3058   33.77%
>> BenchmarkTCP6PersistentTimeout-2         2306         3043   31.96%
>> BenchmarkTCP6PersistentTimeout-3         2288         3089   35.01%
>>
>>
>> --
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "golang-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to golang-dev+unsubscribe@googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>
>

Sign in to reply to this message.

brainman

On Friday, 22 March 2013 17:54:43 UTC+11, Dmitry Vyukov wrote: > It is what against ...

12 years ago (2013-03-22 09:11:51 UTC) #34

brainman

On Friday, 22 March 2013 18:00:43 UTC+11, Dmitry Vyukov wrote: > Now I must admit ...

12 years ago (2013-03-22 09:24:38 UTC) #35

dvyukov

On Fri, Mar 22, 2013 at 10:36 AM, brainman <alex.brainman@gmail.com> wrote: > On Friday, 22 ...

12 years ago (2013-03-22 09:24:40 UTC) #36

dvyukov

On Fri, Mar 22, 2013 at 1:23 PM, brainman <alex.brainman@gmail.com> wrote: > > On Friday, ...

12 years ago (2013-03-22 09:29:09 UTC) #37

On Fri, Mar 22, 2013 at 1:23 PM, brainman <alex.brainman@gmail.com> wrote:
>
> On Friday, 22 March 2013 18:00:43 UTC+11, Dmitry Vyukov wrote:
>>
>> Now I must admit that I am lost in all these numbers for 386/amd64, old
tip/tip/patched tip, dedicated thread/no dedicated thread, GOMAXPROCS=1/>1...
>> What is still slow, if we apply this patch?
>>
>
> Do you mean your current CL? http://play.golang.org/p/9bJ323Me7T has some
results:
>
> ...
> # ~/go/root/misc/benchcmp id16007.txt id16362patched.txt
> benchmark                          old ns/op    new ns/op    delta
> BenchmarkTCPOneShot                   281255       281255   +0.00%
> BenchmarkTCPOneShot-2                 218754       218754   +0.00%
> BenchmarkTCPOneShot-3                 218754       203128   -7.14%
> BenchmarkTCPOneShotTimeout            281255       281255   +0.00%
> BenchmarkTCPOneShotTimeout-2          218754       218754   +0.00%
> BenchmarkTCPOneShotTimeout-3          218754       218754   +0.00%
> BenchmarkTCPPersistent                 87501        62501  -28.57%
> BenchmarkTCPPersistent-2               46875        56251  +20.00%
> BenchmarkTCPPersistent-3               43750        62501  +42.86%
> BenchmarkTCPPersistentTimeout         101564        70313  -30.77%
> BenchmarkTCPPersistentTimeout-2        50000        62501  +25.00%
> BenchmarkTCPPersistentTimeout-3        50000        78126  +56.25%
>
> ...
>
> I do not understand why your fix helps. Perhaps there is better way to achieve
that. That is my concern.


It helps because new scheduler favors short/non-blocking syscalls
(e.g. read/write), but penalizes blocking syscalls (e.g.
GetQueuedCompletionStatus )). And it seems that penalty on Windows is
larger than on Linux.
This change uses the old behaviour for GetQueuedCompletionStatus(),
i.e. "do not penalize" blocking syscalls too much. Actually it's even
better, because it first tries to do non-blocking
GetQueuedCompletionStatus and only then blocking.

Sign in to reply to this message.

brainman

On Friday, 22 March 2013 20:24:39 UTC+11, Dmitry Vyukov wrote: > On Fri, Mar 22, ...

12 years ago (2013-03-22 09:39:10 UTC) #38

brainman

On 2013/03/22 09:29:09, dvyukov wrote: > It helps because new scheduler favors short/non-blocking syscalls > ...

12 years ago (2013-03-22 10:00:23 UTC) #39

dvyukov

On Fri, Mar 22, 2013 at 2:00 PM, <alex.brainman@gmail.com> wrote: > On 2013/03/22 09:29:09, dvyukov ...

12 years ago (2013-03-22 10:12:53 UTC) #40

On Fri, Mar 22, 2013 at 2:00 PM,  <alex.brainman@gmail.com> wrote:
> On 2013/03/22 09:29:09, dvyukov wrote:
>>
>> It helps because new scheduler favors short/non-blocking syscalls
>> (e.g. read/write), but penalizes blocking syscalls (e.g.
>> GetQueuedCompletionStatus )). And it seems that penalty on Windows is
>> larger than on Linux.
>
>
> How do you decide if a particular syscall short/non-blocking? None of
> windows syscalls is marked as such in any way.

The scheduler initially assumes that every syscall is non-blocking, so
it does nothing when a thread enters into the syscall.
Then, a background thread that sleeps for a while and then checks
statuses of all threads. If it finds a thread that sits in a syscall
for more than X us, then it assumes that the syscall is blocking and
wakes up another worker thread to replace the blocked one.

>> This change uses the old behaviour for GetQueuedCompletionStatus(),
>> i.e. "do not penalize" blocking syscalls too much. Actually it's even
>> better, because it first tries to do non-blocking
>> GetQueuedCompletionStatus and only then blocking.
>
>
> Fair enough - it helps here. What happens in other places where some of
> our syscalls "block"? Do we get similar regression there?

It is possible.

> Also, now we replaced every GetQueuedCompletionStatus call with two. It
> is quite a busy place here - every io completes here. Is there price to
> pay?

I think it depends on workload. If the non-blocking
GetQueuedCompletionStatus usually discovers new completion
notification then it's a win. If under particular workload we execute
non-blocking GetQueuedCompletionStatus just to discover that there is
no notifications, then it's a loss.
You can benchmark a version with only blocking
GetQueuedCompletionStatus (but the net benchmarks are pretty
synthetic, so it will be difficult to decide).

Sign in to reply to this message.

brainman

On 2013/03/22 10:12:53, dvyukov wrote: > ... > ..., so it will be difficult to ...

12 years ago (2013-03-22 10:44:20 UTC) #41

dvyukov

On Fri, Mar 22, 2013 at 2:44 PM, <alex.brainman@gmail.com> wrote: > On 2013/03/22 10:12:53, dvyukov ...

12 years ago (2013-03-22 10:53:36 UTC) #42

brainman

On 2013/03/22 10:53:36, dvyukov wrote: > > > > What about a different solution - ...

12 years ago (2013-03-22 11:10:51 UTC) #43

dvyukov

Russ, Can you take a look at this CL? A brief summary of the discussion: ...

12 years ago (2013-03-22 12:00:42 UTC) #44

rsc

LGTM https://codereview.appspot.com/7612045/diff/4004/src/pkg/runtime/cgocall.c File src/pkg/runtime/cgocall.c (right): https://codereview.appspot.com/7612045/diff/4004/src/pkg/runtime/cgocall.c#newcode157 src/pkg/runtime/cgocall.c:157: if(Windows && g->blockingsyscall) { Is it terribly expensive ...

12 years ago (2013-03-22 14:26:29 UTC) #45

Ewan Chou

On 2013/03/22 19:42:26, brainman wrote: > Ewan, does this CL helps in your case? > ...

12 years ago (2013-03-23 01:31:10 UTC) #47

dvyukov

On 2013/03/22 14:26:29, rsc wrote: > LGTM > > https://codereview.appspot.com/7612045/diff/4004/src/pkg/runtime/cgocall.c > File src/pkg/runtime/cgocall.c (right): > ...

12 years ago (2013-03-25 16:56:12 UTC) #48

dvyukov

*** Submitted as https://code.google.com/p/go/source/detail?r=2be8c885acc8 *** net: band-aid for windows network poller Fixes performance of the ...

12 years ago (2013-03-25 16:57:57 UTC) #49

brainman

On 2013/03/21 18:17:53, dvyukov wrote: > ... > 5. Better integration with scheduler with runtime.netpoll(). ...

11 years, 11 months ago (2013-04-12 02:04:42 UTC) #50

dvyukov

On Thu, Apr 11, 2013 at 7:04 PM, <alex.brainman@gmail.com> wrote: > On 2013/03/21 18:17:53, dvyukov ...

11 years, 11 months ago (2013-04-12 02:28:50 UTC) #51

brainman

On 2013/04/12 02:28:50, dvyukov wrote: > ... > If WaitRead/Write returns errTimeout, it just means ...

11 years, 11 months ago (2013-04-12 02:51:49 UTC) #52

dvyukov

On 2013/04/12 02:51:49, brainman wrote: > On 2013/04/12 02:28:50, dvyukov wrote: > > ... > ...

11 years, 11 months ago (2013-04-15 05:18:39 UTC) #53

Message was sent while issue was closed.

On 2013/04/12 02:51:49, brainman wrote:
> On 2013/04/12 02:28:50, dvyukov wrote:
> > ...
> > If WaitRead/Write returns errTimeout, it just means that a timer fired
> > inside of runtime. Call CancelIO() and wait for operation completion. If it
> > will return success, reurn success. If it is cancelled return errTimeout.
> > And similar for errClosing.
> 
> That sounds good, but I am worried about netpoll internal state.

I think for Windows you need a different impl of netpoll.goc.

> For example,
> when I call Evict from netFD.Close, netpoll marks this connection as
"closing".
> But I need to call ReadWait again to wait for io completion after CanelIO. How
> do I handle that?

I would call CancelIOEx() during Close() or in deadline timer func. So that
Read()/Write() just wait once for completion, if Read()/Write() gets
ERROR_OPERATION_ABORTED, then it needs to look at some variables to understand
whether it's errClosing or errTimeout.

> Also, should netFD.Close() return after calling Evict
> immediately? The "wakeup / cancelio / wait-for-io-to-complete" should be
quick,
> but it is up to OS to decide.

I think that's fine if Close() returns immediately. The last outstanding
Read/Write will close the descriptor when completed.

Here is a very dirty sketch of what I would try.
This includes both net and runtime parts.
Part of the complexity is added by the fact that if an operation completes
synchronously, I wait for the notification during the *next* operation (when the
notification is most likely already happened).
Probably CancelIoEx() can be moved out the mutex.
The state is quite complex, but it's always mutated under the mutex, so it's
easy to ensure that it's transfers from one legal state to another.
This code contains bugs, but I hope you can get the idea. Then you just need to
carefully handle all state transitions.

type netFD struct {
  ...
  ro, wo OVERLAPPED  // to not allocate each time
  rpending, wpending bool  // ro/wo is pending (another thread can cancel ro/wo)
  rprevpending, wprevpending bool  // the previous op completed synchronously
  rcompleted, wcompleted bool // overlapped operation has completed
  rg, wg *G  // waiting goroutine
  rdeadline, wdeadline int64
  rerr, werr error  // read/write result (valid when rcompleted/wcompleted ==
true)
  rn, wn int  // read/write result (valid when rcompleted/wcompleted == true)
  closing bool
  Lock
}

func (fd *netFD) Read(buf []byte) (int, error) {
// the same as now
	if err := fd.incref(false); err != nil {
		return 0, err
	}
	defer fd.decref()
	fd.rio.Lock()
	defer fd.rio.Unlock()
// runtime part starts here
// check that the socket has not yet timed out or been closed
        runtime.lock(fd)
        if(fd.closing) {
          runtime.unlock(fd)
          return errClosing;
        }
        if(fd.rdeadline < 0) {
          runtime.unlock(fd)
          return errTimeout
        }
        if(fd.rprevpending && !fd.rcompleted) {
            fd.rg = g;
            // wait for the *previous* read to complete
            runtime.park(runtime.unlock, fd, "IO wait")
            runtime.lock(fd)
        }
        fd.rcompleted = false
        fd.rprevpending = false
        runtime.unlock(fd)

        n, err := WSARecv(fd.sysfd, buf, ..., &fd.ro)

        if(err != ERROR_IO_PENDING)
          return 0, err
        if(err == nil) {
          // operation completed synchronously
          runtime.lock(fd)
          if(fd.closing && !fd.rcompleted) {
            fd.rg = g;
            // have to waiit for the completion notification
            runtime.park(runtime.unlock, fd, "IO wait")
          }
          // let the next reader wait for the notification
          fd.rprevpending = true;
          runtime.unlock(fd)
          return n, err
        }
        // err == ERROR_IO_PENDING
        runtime.lock(fd)
        if(fd.closing || fd.rdeadline < 0) {
          fd.rerr = fd.closing ? errClosing : errTimeout
          CancelIOEx(fd.sysfd, &fd.ro)
        }
        if(!fd.rcompleted) {
          fd.rpending = true
          fd.rg = g
          runtime.park(runtime.unlock, fd, "IO wait")
        } else
          runtime.unlock(fd)
        return fd.rn, fd.rerr
}

G *runtime.netpoll(...) {
  G *glist = nil;
  syscall.GetQueuedCompletionStatus(...)
  foreach returned fd, n, err, isread  {
    runtime.lock(fd)
    if(isread) {
      fd.rn = n
      if(err != ERROR_OPERATION_ABORTED)
        fd.rerr = err  // whoever cancels the io, sets the fd.rerr
      fd.rcompleted = true
      fd.rpending = false
      if(fd.rg != nil) {
        fd.rg->link = glist
        glist = fd.rg
      }
    } else {
      // the same for write
    }
    runtime.unlock(fd)
  }
  return glist
}

// the timer is set in setDeadline()
void rdeadline(...) {  // deadline timer handler
    runtime.lock(fd)
    fd.rdeadline = -1
    if(fd.rpending) {
      fd.rerr = fd.closing ? errClosing : errTimeout
      CancelIoEx(fd.sysfd, &fd.ro)
    }
    runtime.unlock(fd)
}

void evict(...) {
    runtime.lock(fd)
    fd.closing = true
    fd.rerr = errClosing
    // cancel timers
    if(fd.rpending)
      CancelIoEx(fd.sysfd, &fd.ro)
    if(fd.wpending)
      CancelIoEx(fd.sysfd, &fd.wo)
    if(fd.rprevpending) {
      // have to wait here
      if(!fd.rcompleted) {
          fd.rg = g;
          runtime.park(runtime.unlock, fd, "IO wait")
      }
      fd.rcompleted = false
      fd.rprevpending = false
    }
    if(fd.wprevpending) {
      // the same for writes
    }
    runtime.unlock(fd)
}

Sign in to reply to this message.

brainman

On 2013/04/15 05:18:39, dvyukov wrote: > ... > I think for Windows you need a ...

11 years, 11 months ago (2013-04-15 07:36:56 UTC) #54

brainman

Here are my benchmark results on windows/amd64: # ~/go/root/misc/benchcmp old.txt new.txt benchmark old ns/op new ...

11 years, 11 months ago (2013-04-16 03:09:26 UTC) #55

Message was sent while issue was closed.

Here are my benchmark results on windows/amd64:

# ~/go/root/misc/benchcmp old.txt new.txt
benchmark                           old ns/op    new ns/op    delta
BenchmarkTCP4OneShot                   314433       292950   -6.83%
BenchmarkTCP4OneShot-2                 343728       347634   +1.14%
BenchmarkTCP4OneShot-3                 339822       308574   -9.20%
BenchmarkTCP4OneShotTimeout            343728       338845   -1.42%
BenchmarkTCP4OneShotTimeout-2          345681       333963   -3.39%
BenchmarkTCP4OneShotTimeout-3          353493       341775   -3.31%
BenchmarkTCP4Persistent                 41013        33591  -18.10%
BenchmarkTCP4Persistent-2               38669        35544   -8.08%
BenchmarkTCP4Persistent-3               40231        37692   -6.31%
BenchmarkTCP4PersistentTimeout          44528        33396  -25.00%
BenchmarkTCP4PersistentTimeout-2        43942        35349  -19.56%
BenchmarkTCP4PersistentTimeout-3        48239        36325  -24.70%
# cat old.txt
PASS
BenchmarkTCP4OneShot	     500	    314433 ns/op
BenchmarkTCP4OneShot-2	     500	    343728 ns/op
BenchmarkTCP4OneShot-3	     500	    339822 ns/op
BenchmarkTCP4OneShotTimeout	     500	    343728 ns/op
BenchmarkTCP4OneShotTimeout-2	     500	    345681 ns/op
BenchmarkTCP4OneShotTimeout-3	     500	    353493 ns/op
BenchmarkTCP4Persistent	    5000	     41013 ns/op
BenchmarkTCP4Persistent-2	    5000	     38669 ns/op
BenchmarkTCP4Persistent-3	    5000	     40231 ns/op
BenchmarkTCP4PersistentTimeout	    5000	     44528 ns/op
BenchmarkTCP4PersistentTimeout-2	    5000	     43942 ns/op
BenchmarkTCP4PersistentTimeout-3	    5000	     48239 ns/op
# cat new.txt
PASS
BenchmarkTCP4OneShot	     500	    292950 ns/op
BenchmarkTCP4OneShot-2	    1000	    347634 ns/op
BenchmarkTCP4OneShot-3	     500	    308574 ns/op
BenchmarkTCP4OneShotTimeout	    1000	    338845 ns/op
BenchmarkTCP4OneShotTimeout-2	     500	    333963 ns/op
BenchmarkTCP4OneShotTimeout-3	     500	    341775 ns/op
BenchmarkTCP4Persistent	    5000	     33591 ns/op
BenchmarkTCP4Persistent-2	    5000	     35544 ns/op
BenchmarkTCP4Persistent-3	    5000	     37692 ns/op
BenchmarkTCP4PersistentTimeout	    5000	     33396 ns/op
BenchmarkTCP4PersistentTimeout-2	    5000	     35349 ns/op
BenchmarkTCP4PersistentTimeout-3	    5000	     36325 ns/op
# 

These are just with windows runtime netpoll implemented. Speed ups are visible
here. Especially where timers involved. I still didn't remove any memory
allocations during net IO. I reckon, I should proceed with this. The code
changes, so far, aren't substantial. I will clean-up my code and post it for
review.

Alex

Sign in to reply to this message.

dvyukov

On Mon, Apr 15, 2013 at 12:36 AM, <alex.brainman@gmail.com> wrote: > On 2013/04/15 05:18:39, dvyukov ...

11 years, 11 months ago (2013-04-17 03:59:49 UTC) #56

On Mon, Apr 15, 2013 at 12:36 AM, <alex.brainman@gmail.com> wrote:

> On 2013/04/15 05:18:39, dvyukov wrote:
>
>> ...
>>
>> I think for Windows you need a different impl of netpoll.goc.
>> ...
>>
>
> This looks very complicated to me :-( Also, CancelIoEx is present on
> some systems, but not on others. If not found, we use CancelIo instead.
> But CancelIo cancels "all io on the caller thread". So we use special
> goroutine (locked to os thread) to do all IO submissions and
> cancellations. I am not sure how easy it is to implement all this in
> runtime. Also, all runtime syscall addresses are resolved by windows
> executable loader, so we cannot use api functions that we know might not
> be present on some systems.
>


Do we care much about performance of systems that does not have CancelIoEx?
It's present for about 6 years.
I would leave the non-CancelIoEx code as is, and implement the integrated
poller only for systems supporting CancelIoEx/GetQueuedCompletionStatusEx.
It's possible to call CancelIoEx only from Go code in net package, and call
from runtime back into net to call it. But it's not that easy to do the
same for GetQueuedCompletionStatusEx, becase netpoll() can not execute Go
code (can not allocate memory, switch stacks and so on).




I just tried to do all cancellation logic inside of fd_windows.go
> instead. I have added new function similar to runtime_pollWait, but one
> that does not call checkerr on entry or exit. Then I call CancelIoEx
> (when runtime_pollWait returns with errClosing or errTimeout) and go
> wait on that function. I discard any results and return previously
> reported errClosing or errTimeout.
>
> All tests pass now. But I don't see any benchmark improvements :-(. I
> didn't test my changes on amd64 yet, so, perhaps, I will see it there.
> Still, I would expect some improvements, because I am not piddling with
> timers. It is hard to benchmark these, because, you cannot run them for
> very long (os runs out of sockets).
>
> I will keep at it, and report back. Thank you for your help.
>
> Alex
>
>
https://codereview.appspot.**com/7612045/<https://codereview.appspot.com/7612...
>

Sign in to reply to this message.

brainman

On 2013/04/17 03:59:49, dvyukov wrote: > ... How about https://codereview.appspot.com/8670044/ ? I will split it ...

11 years, 11 months ago (2013-04-18 07:40:33 UTC) #57

dvyukov

On Thu, Apr 18, 2013 at 12:40 AM, <alex.brainman@gmail.com> wrote: > On 2013/04/17 03:59:49, dvyukov ...

11 years, 11 months ago (2013-04-20 03:36:22 UTC) #58

brainman

On 2013/04/20 03:36:22, dvyukov wrote: > > I will look at it later. Probably in ...

11 years, 11 months ago (2013-04-20 07:15:52 UTC) #59

brainman

11 years, 10 months ago (2013-05-19 03:27:52 UTC) #60

Message was sent while issue was closed.

On 2013/04/20 03:36:22, dvyukov wrote:
> ...
> I will look at it later. Probably in a week or two. This is not for Go1.1,
> right? So we have a plenty of time.

Just a reminder. Mikio wants to move it along http://goo.gl/OuoNT.

I am not asking for a thorough review - I just want to make sure, there are no
show stoppers. As I said earlier, I am planning to split this change into
smaller CLs, so it is easier to review.

Thank you.

Alex

Sign in to reply to this message.

Issue 7612045: code review 7612045: net: band-aid for windows network poller (Closed)

Description

Patch Set 1 #

Patch Set 2 : diff -r 43eb97ed849a https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 3 : diff -r 43eb97ed849a https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 4 : diff -r 43eb97ed849a https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 5 : diff -r f78c5036b7f6 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 6 : diff -r f78c5036b7f6 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 7 : diff -r f78c5036b7f6 https://dvyukov%40google.com@code.google.com/p/go/ #

Messages