Issue 7029044: code review 7029044: runtime: less aggressive per-thread stack segment caching

Issue 7029044: code review 7029044: runtime: less aggressive per-thread stack segment caching (Closed)

Can't Edit
Can't Publish+Mail
Start Review

Created:
12 years, 2 months ago by dvyukov

Modified:
12 years, 2 months ago

Reviewers:
albert.strasheim

CC:
sougou, nsf, rsc, golang-dev, msolomon

Visibility:
Public.

Description

runtime: less aggressive per-thread stack segment caching Introduce global stack segment cache and limit per-thread cache size. This greatly reduces StackSys memory on workloads that create lots of threads. benchmark old ns/op new ns/op delta BenchmarkStackGrowth 665 656 -1.35% BenchmarkStackGrowth-2 333 328 -1.50% BenchmarkStackGrowth-4 224 172 -23.21% BenchmarkStackGrowth-8 124 91 -26.13% BenchmarkStackGrowth-16 82 47 -41.94% BenchmarkStackGrowth-32 73 40 -44.79% BenchmarkStackGrowthDeep 97231 94391 -2.92% BenchmarkStackGrowthDeep-2 47230 58562 +23.99% BenchmarkStackGrowthDeep-4 24993 49356 +97.48% BenchmarkStackGrowthDeep-8 15105 30072 +99.09% BenchmarkStackGrowthDeep-16 10005 15623 +56.15% BenchmarkStackGrowthDeep-32 12517 13069 +4.41% TestStackMem#1,MB 310 12 -96.13% TestStackMem#2,MB 296 14 -95.27% TestStackMem#3,MB 479 14 -97.08% TestStackMem#1,sec 3.22 2.26 -29.81% TestStackMem#2,sec 2.43 2.15 -11.52% TestStackMem#3,sec 2.50 2.38 -4.80%

Patch Set 1 #

Patch Set 2 : diff -r 492fdf07797e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 3 : diff -r 492fdf07797e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 4 : diff -r 492fdf07797e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 5 : diff -r 492fdf07797e https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 6 : diff -r 6867d442bb3d https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 7 : diff -r 6867d442bb3d https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 8 : diff -r 6867d442bb3d https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 9 : diff -r d0d76b7fb219 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 10 : diff -r d0d76b7fb219 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 11 : diff -r 28ee66c9003f https://dvyukov%40google.com@code.google.com/p/go/ #

Total comments: 10

Patch Set 12 : diff -r 08a1396e9aa7 https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 13 : diff -r 2a4cb557ddbd https://dvyukov%40google.com@code.google.com/p/go/ #

Patch Set 14 : diff -r 2a4cb557ddbd https://dvyukov%40google.com@code.google.com/p/go/ #

Created: 12 years, 2 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+154 lines, -12 lines)			Patch
M	src/pkg/runtime/malloc.goc	View	1 2 3 4 5 6 12	3 chunks	+83 lines, -2 lines	0 comments	Download
M	src/pkg/runtime/mgc0.c	View	1 2 3 4 5 6 7 8 9 10	2 chunks	+1 line, -5 lines	0 comments	Download
M	src/pkg/runtime/proc.c	View	1 2 3 4	1 chunk	+0 lines, -2 lines	0 comments	Download
M	src/pkg/runtime/proc_test.go	View	1	3 chunks	+10 lines, -2 lines	0 comments	Download
M	src/pkg/runtime/runtime.h	View	1 2 3 4 5 6 7 8 10	2 chunks	+11 lines, -1 line	0 comments	Download
M	src/pkg/runtime/stack_test.go	View	1 2 3 4 5 6 7 8	2 chunks	+49 lines, -0 lines	0 comments	Download

Messages

Total messages: 21

Expand All Messages | Collapse All Messages

dvyukov

Hello golang-dev@googlegroups.com, I'd like you to review this change to https://dvyukov%40google.com@code.google.com/p/go/

12 years, 2 months ago (2013-01-03 15:29:31 UTC) #1

dvyukov

On 2013/01/03 15:29:31, dvyukov wrote: > Hello mailto:golang-dev@googlegroups.com, > > I'd like you to review ...

12 years, 2 months ago (2013-01-03 17:45:38 UTC) #2

sougou

Reran vtocc benchmars, around 10M queries using 100 clients. Run 1: go version currently used ...

12 years, 2 months ago (2013-01-03 23:55:50 UTC) #3

dvyukov

On 2013/01/03 23:55:50, sougou wrote: > Reran vtocc benchmars, around 10M queries using 100 clients. ...

12 years, 2 months ago (2013-01-04 06:04:59 UTC) #4

nsf

On 2013/01/03 15:29:31, dvyukov wrote: > Hello mailto:golang-dev@googlegroups.com, > > I'd like you to review ...

12 years, 2 months ago (2013-01-04 07:19:15 UTC) #5

dvyukov

On 2013/01/04 07:19:15, nsf wrote: > On 2013/01/03 15:29:31, dvyukov wrote: > > Hello mailto:golang-dev@googlegroups.com, ...

12 years, 2 months ago (2013-01-04 07:36:59 UTC) #6

dvyukov

On 2013/01/04 07:36:59, dvyukov wrote: > On 2013/01/04 07:19:15, nsf wrote: > > On 2013/01/03 ...

12 years, 2 months ago (2013-01-04 07:39:15 UTC) #7

dvyukov

On 2013/01/04 06:04:59, dvyukov wrote: > On 2013/01/03 23:55:50, sougou wrote: > > Reran vtocc ...

12 years, 2 months ago (2013-01-04 07:48:43 UTC) #8

On 2013/01/04 06:04:59, dvyukov wrote:
> On 2013/01/03 23:55:50, sougou wrote:
> > Reran vtocc benchmars, around 10M queries using 100 clients.
> > 
> > Run 1: go version currently used on production 0a3866d6cc6b (Sep 24):
> > qps: 5832 StackSys: 86MB
> > 
> > Run 2: go @tip d0d76b7fb219 (Jan 3):
> > qps: 5543 StackSys: 77MB
> > 
> > Run 3: Using CL 6997052:
> > qps: 5673 StackSys: 3MB
> > 
> > Run 4: Using CL 7029044:
> > qps: 5699 StackSys: 15MB
> > 
> > Conclusion: Marginal difference in performance between the two CLs. The
older
> CL
> > uses less memory. Maybe it will be more pronounced if you passed large
objects
> > by value to functions?
> > The runtime @tip is slower than the one from September, an unrelated
> > observation.
> > 
> > This is just a summary. I can send you more detailed stats and pprof
captures
> if
> > needed.
> 
> Can you please test with varying values for StackCacheSize/StackCacheBatch in
> src/pkg/runtime/runtime.h?
> Currently they are set to 128/32. The CL/6997052 is using 16/8. I am inclined
> towards 32/16 for now (my synthetic tests show still minimal memory
consumption
> and good performance). Another possible point is 64/32.

Or perhaps it's already fine?
15 vs 79-80MB is a good win already. More importantly StackSys must not grow
over time now, it's bounded by 512kb per thread (while currently it slowly grows
to infinity).

Well, actually not that slowly. I've run the following funny test -- each line
is StackSys *increase*.

Current behavior:
$ go test -run=StackMem -v
-cpu=1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
2>&1 | grep "for stack mem"
stack_test.go:1569: 	Consumed 106MB for stack mem
stack_test.go:1569: 	Consumed 48MB for stack mem
stack_test.go:1569: 	Consumed 52MB for stack mem
stack_test.go:1569: 	Consumed 71MB for stack mem
stack_test.go:1569: 	Consumed 71MB for stack mem
stack_test.go:1569: 	Consumed 53MB for stack mem
stack_test.go:1569: 	Consumed 35MB for stack mem
stack_test.go:1569: 	Consumed 27MB for stack mem
stack_test.go:1569: 	Consumed 39MB for stack mem
stack_test.go:1569: 	Consumed 43MB for stack mem
stack_test.go:1569: 	Consumed 49MB for stack mem
stack_test.go:1569: 	Consumed 54MB for stack mem
stack_test.go:1569: 	Consumed 44MB for stack mem
stack_test.go:1569: 	Consumed 35MB for stack mem
stack_test.go:1569: 	Consumed 41MB for stack mem
stack_test.go:1569: 	Consumed 32MB for stack mem
stack_test.go:1569: 	Consumed 27MB for stack mem
stack_test.go:1569: 	Consumed 20MB for stack mem
stack_test.go:1569: 	Consumed 36MB for stack mem
stack_test.go:1569: 	Consumed 33MB for stack mem
stack_test.go:1569: 	Consumed 31MB for stack mem
stack_test.go:1569: 	Consumed 45MB for stack mem
stack_test.go:1569: 	Consumed 40MB for stack mem
stack_test.go:1569: 	Consumed 30MB for stack mem
stack_test.go:1569: 	Consumed 39MB for stack mem
stack_test.go:1569: 	Consumed 27MB for stack mem
stack_test.go:1569: 	Consumed 27MB for stack mem
stack_test.go:1569: 	Consumed 37MB for stack mem
stack_test.go:1569: 	Consumed 33MB for stack mem
stack_test.go:1569: 	Consumed 36MB for stack mem
stack_test.go:1569: 	Consumed 34MB for stack mem
stack_test.go:1569: 	Consumed 42MB for stack mem
stack_test.go:1569: 	Consumed 29MB for stack mem
stack_test.go:1569: 	Consumed 29MB for stack mem
stack_test.go:1569: 	Consumed 44MB for stack mem
stack_test.go:1569: 	Consumed 20MB for stack mem
stack_test.go:1569: 	Consumed 31MB for stack mem
stack_test.go:1569: 	Consumed 31MB for stack mem
stack_test.go:1569: 	Consumed 19MB for stack mem
stack_test.go:1569: 	Consumed 25MB for stack mem


New behavior:
$ go test -run=StackMem -v
-cpu=1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
2>&1 | grep "for stack mem"
stack_test.go:1569: 	Consumed 13MB for stack mem
stack_test.go:1569: 	Consumed 0MB for stack mem
stack_test.go:1569: 	Consumed 0MB for stack mem
stack_test.go:1569: 	Consumed 0MB for stack mem
stack_test.go:1569: 	Consumed 0MB for stack mem
stack_test.go:1569: 	Consumed 0MB for stack mem
stack_test.go:1569: 	Consumed 0MB for stack mem
stack_test.go:1569: 	Consumed 0MB for stack mem
stack_test.go:1569: 	Consumed 0MB for stack mem
stack_test.go:1569: 	Consumed 0MB for stack mem
...



Either somebody must come up with a good tuning methodology, or let's commit it
as-is and tune later.

msolomon

From our perspective, capping the growth is a big win and the performance tradeoff is ...

12 years, 2 months ago (2013-01-04 07:55:36 UTC) #9

From our perspective, capping the growth is a big win and the
performance tradeoff is worth it.  I'll let Sugu confirm that with a
production test. Once we can reason about good sizes, we can run a few
more tests, but I suspect it may be workload dependent. In the future
it might be worth considering an environment variable, but I tend to
dislike tunables.

Either way, thanks for putting so much time into this.

On Thu, Jan 3, 2013 at 11:48 PM,  <dvyukov@google.com> wrote:
> On 2013/01/04 06:04:59, dvyukov wrote:
>>
>> On 2013/01/03 23:55:50, sougou wrote:
>> > Reran vtocc benchmars, around 10M queries using 100 clients.
>> >
>> > Run 1: go version currently used on production 0a3866d6cc6b (Sep
>
> 24):
>>
>> > qps: 5832 StackSys: 86MB
>> >
>> > Run 2: go @tip d0d76b7fb219 (Jan 3):
>> > qps: 5543 StackSys: 77MB
>> >
>> > Run 3: Using CL 6997052:
>> > qps: 5673 StackSys: 3MB
>> >
>> > Run 4: Using CL 7029044:
>> > qps: 5699 StackSys: 15MB
>> >
>> > Conclusion: Marginal difference in performance between the two CLs.
>
> The older
>>
>> CL
>> > uses less memory. Maybe it will be more pronounced if you passed
>
> large objects
>>
>> > by value to functions?
>> > The runtime @tip is slower than the one from September, an unrelated
>> > observation.
>> >
>> > This is just a summary. I can send you more detailed stats and pprof
>
> captures
>>
>> if
>> > needed.
>
>
>> Can you please test with varying values for
>
> StackCacheSize/StackCacheBatch in
>>
>> src/pkg/runtime/runtime.h?
>> Currently they are set to 128/32. The CL/6997052 is using 16/8. I am
>
> inclined
>>
>> towards 32/16 for now (my synthetic tests show still minimal memory
>
> consumption
>>
>> and good performance). Another possible point is 64/32.
>
>
> Or perhaps it's already fine?
> 15 vs 79-80MB is a good win already. More importantly StackSys must not
> grow over time now, it's bounded by 512kb per thread (while currently it
> slowly grows to infinity).
>
> Well, actually not that slowly. I've run the following funny test --
> each line is StackSys *increase*.
>
> Current behavior:
> $ go test -run=StackMem -v
>
-cpu=1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
> 2>&1 | grep "for stack mem"
> stack_test.go:1569:     Consumed 106MB for stack mem
> stack_test.go:1569:     Consumed 48MB for stack mem
> stack_test.go:1569:     Consumed 52MB for stack mem
> stack_test.go:1569:     Consumed 71MB for stack mem
> stack_test.go:1569:     Consumed 71MB for stack mem
> stack_test.go:1569:     Consumed 53MB for stack mem
> stack_test.go:1569:     Consumed 35MB for stack mem
> stack_test.go:1569:     Consumed 27MB for stack mem
> stack_test.go:1569:     Consumed 39MB for stack mem
> stack_test.go:1569:     Consumed 43MB for stack mem
> stack_test.go:1569:     Consumed 49MB for stack mem
> stack_test.go:1569:     Consumed 54MB for stack mem
> stack_test.go:1569:     Consumed 44MB for stack mem
> stack_test.go:1569:     Consumed 35MB for stack mem
> stack_test.go:1569:     Consumed 41MB for stack mem
> stack_test.go:1569:     Consumed 32MB for stack mem
> stack_test.go:1569:     Consumed 27MB for stack mem
> stack_test.go:1569:     Consumed 20MB for stack mem
> stack_test.go:1569:     Consumed 36MB for stack mem
> stack_test.go:1569:     Consumed 33MB for stack mem
> stack_test.go:1569:     Consumed 31MB for stack mem
> stack_test.go:1569:     Consumed 45MB for stack mem
> stack_test.go:1569:     Consumed 40MB for stack mem
> stack_test.go:1569:     Consumed 30MB for stack mem
> stack_test.go:1569:     Consumed 39MB for stack mem
> stack_test.go:1569:     Consumed 27MB for stack mem
> stack_test.go:1569:     Consumed 27MB for stack mem
> stack_test.go:1569:     Consumed 37MB for stack mem
> stack_test.go:1569:     Consumed 33MB for stack mem
> stack_test.go:1569:     Consumed 36MB for stack mem
> stack_test.go:1569:     Consumed 34MB for stack mem
> stack_test.go:1569:     Consumed 42MB for stack mem
> stack_test.go:1569:     Consumed 29MB for stack mem
> stack_test.go:1569:     Consumed 29MB for stack mem
> stack_test.go:1569:     Consumed 44MB for stack mem
> stack_test.go:1569:     Consumed 20MB for stack mem
> stack_test.go:1569:     Consumed 31MB for stack mem
> stack_test.go:1569:     Consumed 31MB for stack mem
> stack_test.go:1569:     Consumed 19MB for stack mem
> stack_test.go:1569:     Consumed 25MB for stack mem
>
>
> New behavior:
> $ go test -run=StackMem -v
>
-cpu=1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
> 2>&1 | grep "for stack mem"
> stack_test.go:1569:     Consumed 13MB for stack mem
> stack_test.go:1569:     Consumed 0MB for stack mem
> stack_test.go:1569:     Consumed 0MB for stack mem
> stack_test.go:1569:     Consumed 0MB for stack mem
> stack_test.go:1569:     Consumed 0MB for stack mem
> stack_test.go:1569:     Consumed 0MB for stack mem
> stack_test.go:1569:     Consumed 0MB for stack mem
> stack_test.go:1569:     Consumed 0MB for stack mem
> stack_test.go:1569:     Consumed 0MB for stack mem
> stack_test.go:1569:     Consumed 0MB for stack mem
> ...
>
>
>
> Either somebody must come up with a good tuning methodology, or let's
> commit it as-is and tune later.
>
> https://codereview.appspot.com/7029044/

dvyukov

On Fri, Jan 4, 2013 at 11:55 AM, Mike Solomon <msolomon@google.com> wrote: > From our ...

12 years, 2 months ago (2013-01-04 08:05:25 UTC) #10

On Fri, Jan 4, 2013 at 11:55 AM, Mike Solomon <msolomon@google.com> wrote:
> From our perspective, capping the growth is a big win and the
> performance tradeoff is worth it.  I'll let Sugu confirm that with a
> production test. Once we can reason about good sizes, we can run a few
> more tests, but I suspect it may be workload dependent. In the future
> it might be worth considering an environment variable, but I tend to
> dislike tunables.


When/if I finally submit the improved scheduler, it will allow for
per-processor (per-GOMAXPROC) state. And stack caches along with other
stuff will move there. That will decrease stack mem further, and I
believe eliminate any need in tunables (e.g. now you have say 200
threads each with own cache, and then you will have only 8 procs with
caches).


> Either way, thanks for putting so much time into this.

You are welcome.


> On Thu, Jan 3, 2013 at 11:48 PM,  <dvyukov@google.com> wrote:
>> On 2013/01/04 06:04:59, dvyukov wrote:
>>>
>>> On 2013/01/03 23:55:50, sougou wrote:
>>> > Reran vtocc benchmars, around 10M queries using 100 clients.
>>> >
>>> > Run 1: go version currently used on production 0a3866d6cc6b (Sep
>>
>> 24):
>>>
>>> > qps: 5832 StackSys: 86MB
>>> >
>>> > Run 2: go @tip d0d76b7fb219 (Jan 3):
>>> > qps: 5543 StackSys: 77MB
>>> >
>>> > Run 3: Using CL 6997052:
>>> > qps: 5673 StackSys: 3MB
>>> >
>>> > Run 4: Using CL 7029044:
>>> > qps: 5699 StackSys: 15MB
>>> >
>>> > Conclusion: Marginal difference in performance between the two CLs.
>>
>> The older
>>>
>>> CL
>>> > uses less memory. Maybe it will be more pronounced if you passed
>>
>> large objects
>>>
>>> > by value to functions?
>>> > The runtime @tip is slower than the one from September, an unrelated
>>> > observation.
>>> >
>>> > This is just a summary. I can send you more detailed stats and pprof
>>
>> captures
>>>
>>> if
>>> > needed.
>>
>>
>>> Can you please test with varying values for
>>
>> StackCacheSize/StackCacheBatch in
>>>
>>> src/pkg/runtime/runtime.h?
>>> Currently they are set to 128/32. The CL/6997052 is using 16/8. I am
>>
>> inclined
>>>
>>> towards 32/16 for now (my synthetic tests show still minimal memory
>>
>> consumption
>>>
>>> and good performance). Another possible point is 64/32.
>>
>>
>> Or perhaps it's already fine?
>> 15 vs 79-80MB is a good win already. More importantly StackSys must not
>> grow over time now, it's bounded by 512kb per thread (while currently it
>> slowly grows to infinity).
>>
>> Well, actually not that slowly. I've run the following funny test --
>> each line is StackSys *increase*.
>>
>> Current behavior:
>> $ go test -run=StackMem -v
>>
-cpu=1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
>> 2>&1 | grep "for stack mem"
>> stack_test.go:1569:     Consumed 106MB for stack mem
>> stack_test.go:1569:     Consumed 48MB for stack mem
>> stack_test.go:1569:     Consumed 52MB for stack mem
>> stack_test.go:1569:     Consumed 71MB for stack mem
>> stack_test.go:1569:     Consumed 71MB for stack mem
>> stack_test.go:1569:     Consumed 53MB for stack mem
>> stack_test.go:1569:     Consumed 35MB for stack mem
>> stack_test.go:1569:     Consumed 27MB for stack mem
>> stack_test.go:1569:     Consumed 39MB for stack mem
>> stack_test.go:1569:     Consumed 43MB for stack mem
>> stack_test.go:1569:     Consumed 49MB for stack mem
>> stack_test.go:1569:     Consumed 54MB for stack mem
>> stack_test.go:1569:     Consumed 44MB for stack mem
>> stack_test.go:1569:     Consumed 35MB for stack mem
>> stack_test.go:1569:     Consumed 41MB for stack mem
>> stack_test.go:1569:     Consumed 32MB for stack mem
>> stack_test.go:1569:     Consumed 27MB for stack mem
>> stack_test.go:1569:     Consumed 20MB for stack mem
>> stack_test.go:1569:     Consumed 36MB for stack mem
>> stack_test.go:1569:     Consumed 33MB for stack mem
>> stack_test.go:1569:     Consumed 31MB for stack mem
>> stack_test.go:1569:     Consumed 45MB for stack mem
>> stack_test.go:1569:     Consumed 40MB for stack mem
>> stack_test.go:1569:     Consumed 30MB for stack mem
>> stack_test.go:1569:     Consumed 39MB for stack mem
>> stack_test.go:1569:     Consumed 27MB for stack mem
>> stack_test.go:1569:     Consumed 27MB for stack mem
>> stack_test.go:1569:     Consumed 37MB for stack mem
>> stack_test.go:1569:     Consumed 33MB for stack mem
>> stack_test.go:1569:     Consumed 36MB for stack mem
>> stack_test.go:1569:     Consumed 34MB for stack mem
>> stack_test.go:1569:     Consumed 42MB for stack mem
>> stack_test.go:1569:     Consumed 29MB for stack mem
>> stack_test.go:1569:     Consumed 29MB for stack mem
>> stack_test.go:1569:     Consumed 44MB for stack mem
>> stack_test.go:1569:     Consumed 20MB for stack mem
>> stack_test.go:1569:     Consumed 31MB for stack mem
>> stack_test.go:1569:     Consumed 31MB for stack mem
>> stack_test.go:1569:     Consumed 19MB for stack mem
>> stack_test.go:1569:     Consumed 25MB for stack mem
>>
>>
>> New behavior:
>> $ go test -run=StackMem -v
>>
-cpu=1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
>> 2>&1 | grep "for stack mem"
>> stack_test.go:1569:     Consumed 13MB for stack mem
>> stack_test.go:1569:     Consumed 0MB for stack mem
>> stack_test.go:1569:     Consumed 0MB for stack mem
>> stack_test.go:1569:     Consumed 0MB for stack mem
>> stack_test.go:1569:     Consumed 0MB for stack mem
>> stack_test.go:1569:     Consumed 0MB for stack mem
>> stack_test.go:1569:     Consumed 0MB for stack mem
>> stack_test.go:1569:     Consumed 0MB for stack mem
>> stack_test.go:1569:     Consumed 0MB for stack mem
>> stack_test.go:1569:     Consumed 0MB for stack mem
>> ...
>>
>>
>>
>> Either somebody must come up with a good tuning methodology, or let's
>> commit it as-is and tune later.
>>
>> https://codereview.appspot.com/7029044/

sougou

Sorry, I forgot to mention that StackSys was indefinitely growing in the first two runs ...

12 years, 2 months ago (2013-01-04 08:05:45 UTC) #11

dvyukov

On Fri, Jan 4, 2013 at 12:05 PM, Sugu Sougoumarane <sougou@google.com> wrote: > Sorry, I ...

12 years, 2 months ago (2013-01-04 08:11:35 UTC) #12

nsf

On 2013/01/04 07:39:15, dvyukov wrote: > Do you miss a part of the sentence? Not ...

12 years, 2 months ago (2013-01-04 12:59:28 UTC) #13

sougou

5M rows using 100 connections: 128/32: qps: 5885 StackSys: 14.7MB 64/32: qps: 5876 StackSys: 8.4MB ...

12 years, 2 months ago (2013-01-04 20:15:38 UTC) #14

dvyukov

OK, I will replace the constants with 32/16. And let's wait for Russ' blessing. On ...

12 years, 2 months ago (2013-01-04 20:36:32 UTC) #15

rsc

https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc File src/pkg/runtime/malloc.goc (right): https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc#newcode754 src/pkg/runtime/malloc.goc:754: StackCacheNode *next; The stack itself is way bigger than ...

12 years, 2 months ago (2013-01-07 04:18:57 UTC) #17

https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc
File src/pkg/runtime/malloc.goc (right):

https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc#...
src/pkg/runtime/malloc.goc:754: StackCacheNode *next;
The stack itself is way bigger than this struct. Listing all StackCacheBatch
elements in batch will make the code significantly simpler, and a comment would
help too:

// A StackCacheNode is a group of StackCacheBatch fixed-size stacks that
// can be transferred between a goroutine and a central cache (stackcache).
// The StackCacheNode contents are stored in one of the stacks, so they can
// only be used when the stacks are free.
typedef struct StackCacheNode StackCacheNode;
struct StackCacheNode
{
    StackCacheNode *next;
    void *batch[StackCacheBatch];
}

refill() {
    ...
    if(n == nil) {
        ...
        for(i = 0; i < StackCacheBatch; i++)
            n->batch[i] = (byte*)n + i*FixedStack;
    }
    pos = m->stackcachepos;
    for(i = 0; i < StackCacheBatch; i++) {
        m->stackcache[pos++] = n->batch[i];
        pos %= StackCacheSize;
    }
    ...
}

release() {
    ...
    n = (StackCacheNode*)m->stackcache[pos];
    for(i = 0; i < StackCacheBatch; i++) {
        n->batch[i] = m->stackcache[pos++];
        pos %= StackCacheSize;
    }

https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc#...
src/pkg/runtime/malloc.goc:758: static StackCacheNode *stackcache;
static struct {
    Lock;
    StackCacheNode *top;
} stackcache;

And then

runtime.lock(&stackcache);
n = stackcache.top;
etc.

https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc#...
src/pkg/runtime/malloc.goc:793: static void
// Release oldest StackCacheBatch stack segments to central free list.

https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc#...
src/pkg/runtime/malloc.goc:799: pos = (m->stackcachepos - m->stackcachecnt) %
StackCacheSize;
This only works because (a) the left hand side of the % is an unsigned type, and
(b) StackCacheSize is a power of two. Please use

pos = (m->stackcachepos - m->stackcachecnt + StackCacheSize) % StackCacheSize

which does not rely on either assumption.

https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc#...
src/pkg/runtime/malloc.goc:840: pos = (pos - 1) % StackCacheSize;
pos = (pos - 1 + StackCacheSize) % StackCacheSize;

dvyukov

PTAL https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc File src/pkg/runtime/malloc.goc (right): https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc#newcode754 src/pkg/runtime/malloc.goc:754: StackCacheNode *next; On 2013/01/07 04:18:57, rsc wrote: > ...

12 years, 2 months ago (2013-01-08 12:33:47 UTC) #18

PTAL

https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc
File src/pkg/runtime/malloc.goc (right):

https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc#...
src/pkg/runtime/malloc.goc:754: StackCacheNode *next;
On 2013/01/07 04:18:57, rsc wrote:
> The stack itself is way bigger than this struct. Listing all StackCacheBatch
> elements in batch will make the code significantly simpler, and a comment
would
> help too:
> 
> // A StackCacheNode is a group of StackCacheBatch fixed-size stacks that
> // can be transferred between a goroutine and a central cache (stackcache).
> // The StackCacheNode contents are stored in one of the stacks, so they can
> // only be used when the stacks are free.
> typedef struct StackCacheNode StackCacheNode;
> struct StackCacheNode
> {
>     StackCacheNode *next;
>     void *batch[StackCacheBatch];
> }
> 
> refill() {
>     ...
>     if(n == nil) {
>         ...
>         for(i = 0; i < StackCacheBatch; i++)
>             n->batch[i] = (byte*)n + i*FixedStack;
>     }
>     pos = m->stackcachepos;
>     for(i = 0; i < StackCacheBatch; i++) {
>         m->stackcache[pos++] = n->batch[i];
>         pos %= StackCacheSize;
>     }
>     ...
> }
> 
> release() {
>     ...
>     n = (StackCacheNode*)m->stackcache[pos];
>     for(i = 0; i < StackCacheBatch; i++) {
>         n->batch[i] = m->stackcache[pos++];
>         pos %= StackCacheSize;
>     }

Done.

https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc#...
src/pkg/runtime/malloc.goc:758: static StackCacheNode *stackcache;
On 2013/01/07 04:18:57, rsc wrote:
> static struct {
>     Lock;
>     StackCacheNode *top;
> } stackcache;
> 
> And then
> 
> runtime.lock(&stackcache);
> n = stackcache.top;
> etc.

Done.

https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc#...
src/pkg/runtime/malloc.goc:793: static void
On 2013/01/07 04:18:57, rsc wrote:
> // Release oldest StackCacheBatch stack segments to central free list.

Done.

https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc#...
src/pkg/runtime/malloc.goc:799: pos = (m->stackcachepos - m->stackcachecnt) %
StackCacheSize;
On 2013/01/07 04:18:57, rsc wrote:
> This only works because (a) the left hand side of the % is an unsigned type,
and
> (b) StackCacheSize is a power of two. Please use
> 
> pos = (m->stackcachepos - m->stackcachecnt + StackCacheSize) % StackCacheSize
> 
> which does not rely on either assumption.

Done.

https://codereview.appspot.com/7029044/diff/22001/src/pkg/runtime/malloc.goc#...
src/pkg/runtime/malloc.goc:840: pos = (pos - 1) % StackCacheSize;
On 2013/01/07 04:18:57, rsc wrote:
> pos = (pos - 1 + StackCacheSize) % StackCacheSize;

Done.

dvyukov

*** Submitted as https://code.google.com/p/go/source/detail?r=88d31369e105 *** runtime: less aggressive per-thread stack segment caching Introduce global stack ...

12 years, 2 months ago (2013-01-10 05:59:56 UTC) #20

albert.strasheim

12 years, 2 months ago (2013-01-24 11:02:33 UTC) #21

Message was sent while issue was closed.

http://code.google.com/p/go/issues/detail?id=4698

Expand All Messages | Collapse All Messages