|
|
Descriptionruntime: faster entersyscall, exitsyscall
Uses atomic memory accesses to avoid the need to acquire
and release schedlock on fast paths.
benchmark old ns/op new ns/op delta
runtime_test.BenchmarkSyscall 73 31 -56.63%
runtime_test.BenchmarkSyscall-2 538 74 -86.23%
runtime_test.BenchmarkSyscall-3 508 103 -79.72%
runtime_test.BenchmarkSyscall-4 721 97 -86.52%
runtime_test.BenchmarkSyscallWork 920 873 -5.11%
runtime_test.BenchmarkSyscallWork-2 516 481 -6.78%
runtime_test.BenchmarkSyscallWork-3 550 343 -37.64%
runtime_test.BenchmarkSyscallWork-4 632 263 -58.39%
(Intel Core i7 L640 2.13 GHz-based Lenovo X201s)
Reduced a less artificial server benchmark
from 11.5r 12.0u 8.0s to 8.3r 9.1u 1.0s.
Patch Set 1 #Patch Set 2 : diff -r 2131931a8b68 https://go.googlecode.com/hg/ #Patch Set 3 : diff -r 2131931a8b68 https://go.googlecode.com/hg/ #Patch Set 4 : diff -r 353a5cdb24e0 https://go.googlecode.com/hg/ #
Total comments: 16
Patch Set 5 : diff -r 9b990418626f https://go.googlecode.com/hg #Patch Set 6 : diff -r 9b990418626f https://go.googlecode.com/hg #Patch Set 7 : diff -r 9b990418626f https://go.googlecode.com/hg #
Total comments: 2
Patch Set 8 : diff -r 00f0d6062b5d https://go.googlecode.com/hg #Patch Set 9 : diff -r d787393025dc https://go.googlecode.com/hg #
Total comments: 9
Patch Set 10 : diff -r dc95894c0148 https://go.googlecode.com/hg #
MessagesTotal messages: 23
Hello dvyukov, r (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
nice speedups. not sure i get every nuance but mostly it seems a straight translation of the ops into atomic ops. if any algorithmic changes happened, please say so in the CL. http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c File src/pkg/runtime/proc.c (right): http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcode71 src/pkg/runtime/proc.c:71: // int32 mcpu; // number of ms executing on cpu ms reads like "milliseconds". maybe M's. (the apostrophe is ok when pluralizing initialisms, for clarity) http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcode73 src/pkg/runtime/proc.c:73: // int32 msyscall; // number of ms in system calls i found this mysterious even given the "(see below)". who about moving the definitions and sizes together? the info in lines 82 through 89 should be here, or else 71 through 73 should go down there. http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:161: for(;;) { this is a do-while, but i'm perhaps the only person in the universe who likes do-while loops http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:342: // gwait++ hard to understand. use words. // Atomically implement gwait++: the problem is that it's not clear, at least not with reading the whole file, that this is explanatory as opposed to disabled code http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:366: // gwait-- ditto http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:538: // waitstop-- (known to be 1) again http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:552: // for the mark and sweep collector. d? http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:561: // mcpumax = 1; d http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:819: // msyscall++ again http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:1443: // mcpumax = n again, or even delete. this isn't informative.
Sign in to reply to this message.
http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c File src/pkg/runtime/proc.c (right): http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:106: // fields can be done (holding schedlock) without fear of confilcts. conflicts http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:695: // mcpu-- probably another ditto
Sign in to reply to this message.
On 15/07/2011, at 5:29 PM, bradfitz@golang.org wrote: > > http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c > File src/pkg/runtime/proc.c (right): > > http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... > src/pkg/runtime/proc.c:106: // fields can be done (holding schedlock) > without fear of confilcts. > conflicts that's what happens when your fingers aren't mutexed. > > http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... > src/pkg/runtime/proc.c:695: // mcpu-- > probably another ditto > > http://codereview.appspot.com/4723042/
Sign in to reply to this message.
On Fri, Jul 15, 2011 at 02:29, <r@golang.org> wrote: > nice speedups. not sure i get every nuance but mostly it seems a > straight translation of the ops into atomic ops. if any algorithmic > changes happened, please say so in the CL. no algorithmic changes happened. the only hard part is making sure that entersyscall and exitsyscall being able to effectively preempt anything holding schedlock by jumping in and fiddling with .atomic doesn't break the rest of the code. i went through it all by hand last night and i think it works (and added comments why), and it's been pounded on hard in a real program. i may still write a spin model in a different CL.
Sign in to reply to this message.
http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c File src/pkg/runtime/proc.c (right): http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcode70 src/pkg/runtime/proc.c:70: volatile uint32 atomic; // atomic scheduling word (see below) I would suggest renaming it to something like atomicstate to emphasize it's meaning rather than the way it's manipulated. http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:132: #define atomic_gwait(v) (((v)>>gwaitShift)&1) Similarly, atomicstate_mcpu. atomic_xxx() looks like an atomic operation for me. http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:807: w += (-1<<mcpuShift) + (1<<msyscallShift); msyscall can overflow here. The following test reports deadlock: func TestSyscallSwarm(t *testing.T) { const N = 1023 var n runtime.Note runtime.Noteclear(&n) for i := 0; i < N; i++ { go func() { runtime.Entersyscall() runtime.Notesleep(&n) runtime.Exitsyscall() }() } go func() {}() time.Sleep(1e9) runtime.Notewakeup(&n) } type Note struct { opaque [16]uint64 } func noteclear(n *Note) func notesleep(n *Note) func notewakeup(n *Note) var Noteclear = noteclear var Notesleep = notesleep var Notewakeup = notewakeup if you change 1023 to 1024 then the state will be silently messed. The problem one does not want artificially limit number of threads, because then it leads to so called system induced deadlocks (think 1024 threads are blocked on lockf() and a thread that must unblock all them is not scheduled because of the limit). A possible solution is to switch to uint64 for atomicstate (we need to check as to whether all relevant processors support CMPXCHG8B). http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:863: if(atomic_mcpu(v) < atomic_mcpumax(v) && runtime·cas(&runtime·sched.atomic, v, w)) { I guess you meant: if(atomic_mcpu(v) >= atomic_mcpumax(v)) break; if(runtime·cas(&runtime·sched.atomic, v, w)) Otherwise it leads to bad active spinning and potentially deadlocks.
Sign in to reply to this message.
> http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcode70 > src/pkg/runtime/proc.c:70: volatile uint32 atomic; // atomic scheduling > word (see below) > I would suggest renaming it to something like atomicstate to emphasize > it's meaning rather than the way it's manipulated. I'm already unhappy with how wordy the code is. I'd rather just name the field 'a'. > http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... > src/pkg/runtime/proc.c:132: #define > atomic_gwait(v) (((v)>>gwaitShift)&1) > Similarly, atomicstate_mcpu. atomic_xxx() looks like an atomic operation > for me. I agree that atomic_gwait is a bad name. I've already renamed it to atomic_gwaiting to make it clear that it's not the gwait variable from the sched struct. I don't think atomic_mcpumax looks like an atomic operation. What does it mean to mcpumax something? > http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... > src/pkg/runtime/proc.c:807: w += (-1<<mcpuShift) + (1<<msyscallShift); > msyscall can overflow here. The following test reports deadlock: Ouch. I will find another way to do that test, without the msyscall variable. I don't think restricting users to 1024 active threads (not blocked in syscalls) is such a problem. Once msyscall is gone the limit can move up to 16k or so anyway. > http://codereview.appspot.com/4723042/diff/6001/src/pkg/runtime/proc.c#newcod... > src/pkg/runtime/proc.c:863: if(atomic_mcpu(v) < atomic_mcpumax(v) && > runtime·cas(&runtime·sched.atomic, v, w)) { > I guess you meant: > > if(atomic_mcpu(v) >= atomic_mcpumax(v)) > break; > if(runtime·cas(&runtime·sched.atomic, v, w)) Yes. I found this over the weekend too. I think it snuck in during some cleanup for uploading the CL. Russ
Sign in to reply to this message.
Hello dvyukov@google.com, r@golang.org, bradfitz@golang.org, r@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
This version addresses the comments made except for the msyscall range problem. I sent a separate CL to remove msyscall. Once that is submitted I will update this CL. This version also adds a Promela model for verifying that the algorithm is free of spurious sleeps and missed wakeups. The model did find one missed wakeup, which I fixed (with the cas at line 572). Russ
Sign in to reply to this message.
http://codereview.appspot.com/4723042/diff/11006/src/pkg/runtime/proc.c File src/pkg/runtime/proc.c (left): http://codereview.appspot.com/4723042/diff/11006/src/pkg/runtime/proc.c#oldco... src/pkg/runtime/proc.c:624: // entersyscall is going to return immediately after. Is it intended remove of "entersyscall" from the comment?
Sign in to reply to this message.
http://codereview.appspot.com/4723042/diff/11006/src/pkg/runtime/proc.c File src/pkg/runtime/proc.c (left): http://codereview.appspot.com/4723042/diff/11006/src/pkg/runtime/proc.c#oldco... src/pkg/runtime/proc.c:624: // entersyscall is going to return immediately after. On 2011/07/18 19:03:16, dvyukov wrote: > Is it intended remove of "entersyscall" from the comment? Fixed, thanks.
Sign in to reply to this message.
On 2011/07/18 16:09:42, rsc wrote: > This version addresses the comments made > except for the msyscall range problem. > I sent a separate CL to remove msyscall. > Once that is submitted I will update this CL. Humm... are you going to push that other CL into the same release? Otherwise existing program can break badly after messing up atomic scheduling state... > This version also adds a Promela model for > verifying that the algorithm is free of spurious > sleeps and missed wakeups. The model did > find one missed wakeup, which I fixed > (with the cas at line 572). As a self-advertisement, there is also my Relacy Race Detector: http://www.1024cores.net/home/relacy-race-detector/rrd-introduction It's not as well documented as Promela, however it can handle much large state spaces due to smart scheduling, it finds bugs that show up only on relaxed memory models and many more (not saying that it works with almost plain C code).
Sign in to reply to this message.
Other than that, looks good to me.
Sign in to reply to this message.
>> This version addresses the comments made >> except for the msyscall range problem. >> I sent a separate CL to remove msyscall. >> Once that is submitted I will update this CL. > > Humm... are you going to push that other CL into the same release? > Otherwise existing program can break badly after messing up atomic > scheduling state... I am going to submit that other CL (the one that removes msyscall) once it is reviewed, and only after that I will update this CL and send it out for another look. >> This version also adds a Promela model for >> verifying that the algorithm is free of spurious >> sleeps and missed wakeups. The model did >> find one missed wakeup, which I fixed >> (with the cas at line 572). > > As a self-advertisement, there is also my Relacy Race Detector: > http://www.1024cores.net/home/relacy-race-detector/rrd-introduction > It's not as well documented as Promela, however it can handle much large > state spaces due to smart scheduling, it finds bugs that show up only on > relaxed memory models and many more (not saying that it works with > almost plain C code). If you are so inclined, I'd be very interested to see a translation of proc.p into your system. A good test case would be to remove the d_step annotation in stoptheworld and verify that it finds the bug. Russ
Sign in to reply to this message.
Hello dvyukov@google.com, r@golang.org, bradfitz@golang.org, r@google.com (cc: golang-dev@googlegroups.com), Please take another look.
Sign in to reply to this message.
This version is ready for review + submission. It removes the limit on the number of goroutines that can be blocked in a system call. The limit on the number of active goroutines (GOMAXPROCS) is also raised from 1014 to 32758.
Sign in to reply to this message.
LGTM http://codereview.appspot.com/4723042/diff/7011/src/pkg/runtime/proc.c File src/pkg/runtime/proc.c (right): http://codereview.appspot.com/4723042/diff/7011/src/pkg/runtime/proc.c#newcode73 src/pkg/runtime/proc.c:73: int32 predawn; // running initialization, don't run new gs. s/gs/g's/ http://codereview.appspot.com/4723042/diff/7011/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:101: // waitstop, and gwait. They never write it. Thus, writes to those Not sure what "They never write it" means--do you need s/it/them/? http://codereview.appspot.com/4723042/diff/7011/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:101: // waitstop, and gwait. They never write it. Thus, writes to those s/gwait/gwaiting/ http://codereview.appspot.com/4723042/diff/7011/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:291: canaddmcpu(void) I find the name canaddmcpu somewhat misleading since it implies that it checks but does not actually increment mcpu. What do you think of maybeaddmcpu?
Sign in to reply to this message.
LGTM are the benchmark numbers still right after all those changes? http://codereview.appspot.com/4723042/diff/7011/src/pkg/runtime/proc.c File src/pkg/runtime/proc.c (right): http://codereview.appspot.com/4723042/diff/7011/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:291: canaddmcpu(void) i prefer 'can' over 'maybe'. 'can' to me means 'do it if you can'; 'maybe' means 'for some unspecified reason, do this or not'. thus 'can' seems determined; 'maybe' seems arbitrary. "can you do this?" "yes!" "maybe you can do this?" "yeah, maybe. let me finish my drink first." http://codereview.appspot.com/4723042/diff/7011/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:1442: // than procs running? Stop. s/Stop/If so, stop./
Sign in to reply to this message.
r@golang.org writes: > http://codereview.appspot.com/4723042/diff/7011/src/pkg/runtime/proc.c > File src/pkg/runtime/proc.c (right): > > http://codereview.appspot.com/4723042/diff/7011/src/pkg/runtime/proc.c#newcod... > src/pkg/runtime/proc.c:291: canaddmcpu(void) > i prefer 'can' over 'maybe'. 'can' to me means 'do it if you can'; > 'maybe' means 'for some unspecified reason, do this or not'. thus 'can' > seems determined; 'maybe' seems arbitrary. > > "can you do this?" "yes!" > "maybe you can do this?" "yeah, maybe. let me finish my drink first." Yeah, ok. I suppose there is addmcpuifok? Ian
Sign in to reply to this message.
canaddmcpu is consistent with cansemacquire, which already existed (and we had the same discussion then). cangget was the real inconsistency, because it didn't do anything. I renamed it to haveg. Russ
Sign in to reply to this message.
> are the benchmark numbers still right after all those changes? i don't have the machine handy but i expect so since the fast paths didn't change.
Sign in to reply to this message.
*** Submitted as http://code.google.com/p/go/source/detail?r=bc165ba3d931 *** runtime: faster entersyscall, exitsyscall Uses atomic memory accesses to avoid the need to acquire and release schedlock on fast paths. benchmark old ns/op new ns/op delta runtime_test.BenchmarkSyscall 73 31 -56.63% runtime_test.BenchmarkSyscall-2 538 74 -86.23% runtime_test.BenchmarkSyscall-3 508 103 -79.72% runtime_test.BenchmarkSyscall-4 721 97 -86.52% runtime_test.BenchmarkSyscallWork 920 873 -5.11% runtime_test.BenchmarkSyscallWork-2 516 481 -6.78% runtime_test.BenchmarkSyscallWork-3 550 343 -37.64% runtime_test.BenchmarkSyscallWork-4 632 263 -58.39% (Intel Core i7 L640 2.13 GHz-based Lenovo X201s) Reduced a less artificial server benchmark from 11.5r 12.0u 8.0s to 8.3r 9.1u 1.0s. R=dvyukov, r, bradfitz, r, iant, iant CC=golang-dev http://codereview.appspot.com/4723042
Sign in to reply to this message.
http://codereview.appspot.com/4723042/diff/7011/src/pkg/runtime/proc.c File src/pkg/runtime/proc.c (right): http://codereview.appspot.com/4723042/diff/7011/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:291: canaddmcpu(void) On 2011/07/18 22:18:17, r wrote: > i prefer 'can' over 'maybe'. 'can' to me means 'do it if you can'; 'maybe' means > 'for some unspecified reason, do this or not'. thus 'can' seems determined; > 'maybe' seems arbitrary. > > "can you do this?" "yes!" > "maybe you can do this?" "yeah, maybe. let me finish my drink first." I am not a native speaker, by 'can' for me also implies predicate rather than action. (cangget() && canaddmcpu()) look inconsistent because these 2 can's do very different things. Maybe tryaddmcpu() (like trylock())? Hummm... Now I see that comment above the function says "Try to increment mcpu. Report whether succeeded". I think if it's renamed to tryadd, the comment becomes unnecessary. http://codereview.appspot.com/4723042/diff/7011/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:809: } I think now it's logical to replace the CAS loop with: v = xadd(&runtime·sched.atomic, -1<<mcpuShift); if(!(atomic_gwaiting(v) || (atomic_waitstop(v) && atomic_mcpu(v)-1 <= atomic_mcpumax(v)))) return; and remove the decrement from the critical section. http://codereview.appspot.com/4723042/diff/7011/src/pkg/runtime/proc.c#newcod... src/pkg/runtime/proc.c:868: } Ditto. It slightly/subtly changes the algorithm, but as far as I see it's still correct. Verification with Promela won't do any harm of course.
Sign in to reply to this message.
|