Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: new goroutines can spend excessive time in morestack #18138

Open
petermattis opened this issue Dec 1, 2016 · 61 comments
Open

runtime: new goroutines can spend excessive time in morestack #18138

petermattis opened this issue Dec 1, 2016 · 61 comments
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Milestone

Comments

@petermattis
Copy link

What version of Go are you using (go version)?

go version devel +41908a5 Thu Dec 1 02:54:21 2016 +0000 darwin/amd64 a.k.a go1.8beta1

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/pmattis/Development/go"
GORACE=""
GOROOT="/Users/pmattis/Development/go-1.8"
GOTOOLDIR="/Users/pmattis/Development/go-1.8/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/qc/fpqpgdqd167c70dtc6840xxh0000gn/T/go-build385423377=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

A recent change to github.com/cockroachdb/cockroach replaced a synchronous call with one wrapped in a goroutine. This small change resulted in a significant slowdown in some benchmarks. The slowdown was traced to additional time being spent in runtime.morestack. The problematic goroutines are all hitting a single gRPC entrypoint Server.Batch and the code paths that fan out from this entrypoint tend to use an excessive amount of stack due to an over reliance on passing and returning by value instead of using pointers. Typical calls use 16-32 KB of stack.

The expensive part of runtime.morestack is the adjustment of existing pointers on the stack. And due to the incremental nature of the stack growth, I can see the stack growing in 4 steps from 2 KB to 32 KB. So we experimented with a hack to pre-grow the stack. Voila, the performance penalty of the change disappeared:

name               old time/op  new time/op  delta
KVInsert1_SQL-8     339µs ± 2%   312µs ± 1%   -7.89%  (p=0.000 n=10+10)
KVInsert10_SQL-8    485µs ± 2%   471µs ± 1%   -2.81%  (p=0.000 n=10+10)
KVInsert100_SQL-8  1.36ms ± 0%  1.35ms ± 0%   -0.95%  (p=0.000 n=10+10)
KVUpdate1_SQL-8     535µs ± 1%   487µs ± 1%   -9.02%   (p=0.000 n=10+9)
KVUpdate10_SQL-8    777µs ± 1%   730µs ± 1%   -6.03%   (p=0.000 n=10+9)
KVUpdate100_SQL-8  2.69ms ± 1%  2.66ms ± 1%   -1.16%  (p=0.000 n=10+10)
KVDelete1_SQL-8     479µs ± 1%   429µs ± 2%  -10.43%   (p=0.000 n=9+10)
KVDelete10_SQL-8    676µs ± 1%   637µs ± 1%   -5.80%    (p=0.000 n=9+9)
KVDelete100_SQL-8  2.23ms ± 5%  2.18ms ± 4%     ~     (p=0.105 n=10+10)
KVScan1_SQL-8       216µs ± 5%   179µs ± 1%  -17.12%  (p=0.000 n=10+10)
KVScan10_SQL-8      233µs ± 1%   201µs ± 1%  -13.76%  (p=0.000 n=10+10)
KVScan100_SQL-8     463µs ± 1%   437µs ± 0%   -5.64%   (p=0.000 n=10+8)

old are benchmarks gathered using go1.8beta1 and new are on go1.8beta1 with the hack to pre-grow the stack. The hack is a call at the beginning of server.Batch to a growStack method:

var growStackGlobal = false

//go:noinline
func growStack() {
	// Goroutine stacks currently start at 2 KB in size. The code paths through
	// the storage package often need a stack that is 32 KB in size. The stack
	// growth is mildly expensive making it useful to trick the runtime into
	// growing the stack early. Since goroutine stacks grow in multiples of 2 and
	// start at 2 KB in size, by placing a 16 KB object on the stack early in the
	// lifetime of a goroutine we force the runtime to use a 32 KB stack for the
	// goroutine.
	var buf [16 << 10] /* 16 KB */ byte
	if growStackGlobal {
		// Make sure the compiler doesn't optimize away buf.
		for i := range buf {
			buf[i] = byte(i)
		}
	}
}

The question here is whether this is copacetic and also to alert the runtime folks that there is a performance opportunity here. Note that the growStackGlobal is not currently necessary, but I wanted to future proof against the compiler deciding that buf is not necessary.

Longer term, the stack usage under server.Batch should be reduced on our side. I'm guessing that we could get the stack usage down to 8-16 KB without too many contortions. But even with such reductions, being able to pre-grow the stack for a goroutine looks beneficial.

@bradfitz bradfitz added this to the Go1.9 milestone Dec 1, 2016
@bradfitz
Copy link
Contributor

bradfitz commented Dec 1, 2016

/cc @aclements @randall77

@aclements
Copy link
Member

We've seen this a few times now. I'm not sure what the right answer is. My best thought so far is that the runtime could keep track of when particular go statements always lead to stack growth right away (for some value of "right away" and "always") and learn to start goroutines from that site with a larger stack. Of course, it would be hard to make this behavior predictable, but perhaps it would still be less surprising than the current behavior. If the runtime did learn to start a goroutine with a larger stack, it would still need a signal to learn if the stack should get smaller again, but we could do that efficiently by allocating the larger stack but setting the stack bounds to something smaller. Then the runtime could still observe whether or not the stack needs to grow, but the actual growth would be basically free until it reached the size of the allocation.

@randall77, thoughts, ideas?

/cc @RLH

@mrjrieke
Copy link

mrjrieke commented Dec 1, 2016

I like @petermattis idea of being able to hint stack size on a per goroutine basis, although this implies developers have the knowhow to identify and provide size estimates accurately. Could this be done with a compiler directive?

@bradfitz
Copy link
Contributor

bradfitz commented Dec 1, 2016

We don't want compiler directives in code. We have some used by the runtime out of necessity, but they're gross. Go prefers simplicity over tons of knobs.

@petermattis
Copy link
Author

Yes, please just make my code magically faster as you've been doing for the last several Go releases.

@mrjrieke
Copy link

mrjrieke commented Dec 1, 2016

I generally agree with not having compiler directives ... magic is nice, although they (compiler directives) do exist even in go. It's an interesting opportunity either way you decide.

@mrjrieke
Copy link

mrjrieke commented Dec 2, 2016

@bradfitz, your comment prompted me to look for the go guiding principles ( https://golang.org/doc/faq#principles). Thanks @adg as well for nicely worded principles.

@gopherbot
Copy link

CL https://golang.org/cl/45142 mentions this issue.

@aclements
Copy link
Member

@petermattis (or anyone who has a good reproducer for this), would you be able to try https://go-review.googlesource.com/45142? It's a trivial hack, but it might actually do the trick. I haven't benchmarked it on anything, so it may also slow things down.

@aclements aclements modified the milestones: Go1.10Early, Go1.9 Jun 8, 2017
@aclements aclements self-assigned this Jun 8, 2017
@petermattis
Copy link
Author

@aclements I'll try and test either tomorrow or next week.

@petermattis
Copy link
Author

@aclements Applying that patch to go1.8.3 resulted in no benefit (this is with the growStack hack disabled):

~/Development/go/src/github.com/cockroachdb/cockroach/pkg/sql master benchstat out.old out.new
name                old time/op  new time/op  delta
KV/Insert1_SQL-8     363µs ± 3%   369µs ± 2%  +1.43%  (p=0.043 n=10+10)
KV/Insert10_SQL-8    583µs ± 0%   581µs ± 1%    ~     (p=0.113 n=10+9)
KV/Insert100_SQL-8  2.05ms ± 0%  2.05ms ± 1%    ~     (p=0.912 n=10+10)
KV/Update1_SQL-8     578µs ± 1%   577µs ± 1%    ~     (p=0.968 n=9+10)
KV/Update10_SQL-8    913µs ± 1%   914µs ± 1%    ~     (p=0.931 n=9+9)
KV/Update100_SQL-8  3.80ms ± 1%  3.87ms ± 5%  +1.90%  (p=0.019 n=10+10)
KV/Delete1_SQL-8     517µs ± 2%   518µs ± 2%    ~     (p=0.912 n=10+10)
KV/Delete10_SQL-8    813µs ± 2%   809µs ± 1%    ~     (p=0.280 n=10+10)
KV/Delete100_SQL-8  3.22ms ± 2%  3.26ms ± 3%    ~     (p=0.052 n=10+10)
KV/Scan1_SQL-8       217µs ± 1%   216µs ± 0%    ~     (p=0.090 n=9+10)
KV/Scan10_SQL-8      238µs ± 0%   238µs ± 1%    ~     (p=0.122 n=10+8)
KV/Scan100_SQL-8     454µs ± 0%   455µs ± 1%    ~     (p=0.089 n=10+10)

Surprising to me this didn't have any effect. Compare this to the growStack hack mentioned earlier:

~/Development/go/src/github.com/cockroachdb/cockroach/pkg/sql master benchstat out.old out.grow-stack
name                old time/op  new time/op  delta
KV/Insert1_SQL-8     363µs ± 3%   331µs ± 2%   -8.82%  (p=0.000 n=10+10)
KV/Insert10_SQL-8    583µs ± 0%   561µs ± 1%   -3.80%  (p=0.000 n=10+10)
KV/Insert100_SQL-8  2.05ms ± 0%  2.03ms ± 0%   -0.88%  (p=0.000 n=10+8)
KV/Update1_SQL-8     578µs ± 1%   532µs ± 1%   -7.94%  (p=0.000 n=9+10)
KV/Update10_SQL-8    913µs ± 1%   872µs ± 1%   -4.47%  (p=0.000 n=9+9)
KV/Update100_SQL-8  3.80ms ± 1%  3.75ms ± 1%   -1.36%  (p=0.000 n=10+10)
KV/Delete1_SQL-8     517µs ± 2%   458µs ± 2%  -11.54%  (p=0.000 n=10+10)
KV/Delete10_SQL-8    813µs ± 2%   765µs ± 1%   -5.91%  (p=0.000 n=10+10)
KV/Delete100_SQL-8  3.22ms ± 2%  3.16ms ± 1%   -2.01%  (p=0.000 n=10+10)
KV/Scan1_SQL-8       217µs ± 1%   194µs ± 1%  -10.44%  (p=0.000 n=9+10)
KV/Scan10_SQL-8      238µs ± 0%   216µs ± 1%   -9.36%  (p=0.000 n=10+10)
KV/Scan100_SQL-8     454µs ± 0%   431µs ± 1%   -4.92%  (p=0.000 n=10+9)

@josharian
Copy link
Contributor

CL 43150 might help a little here.

@aclements
Copy link
Member

Sorry, I made a silly mistake in CL 45142. Would you mind trying the new version of that CL?

@petermattis
Copy link
Author

With your updated patch against go-tip (f363817) there is an improvement:

~/Development/go/src/github.com/cockroachdb/cockroach/pkg/sql master benchstat out.old out.new
name              old time/op  new time/op  delta
KV/Scan1_SQL-8     243µs ± 1%   224µs ± 0%  -7.57%  (p=0.000 n=9+9)
KV/Scan10_SQL-8    263µs ± 0%   247µs ± 0%  -6.20%  (p=0.000 n=9+10)
KV/Scan100_SQL-8   463µs ± 0%   444µs ± 0%  -4.05%  (p=0.000 n=10+10)

But the improvement is still not as good as the growStack hack:

~/Development/go/src/github.com/cockroachdb/cockroach/pkg/sql master benchstat out.new out.grow-stack
name              old time/op  new time/op  delta
KV/Scan1_SQL-8     224µs ± 0%   219µs ± 0%  -2.24%  (p=0.000 n=9+9)
KV/Scan10_SQL-8    247µs ± 0%   240µs ± 1%  -2.59%  (p=0.000 n=10+10)
KV/Scan100_SQL-8   444µs ± 0%   439µs ± 0%  -1.06%  (p=0.000 n=10+9)

There is a little more performance if we increase the initial stack size to 32 KB:

~/Development/go/src/github.com/cockroachdb/cockroach/pkg/sql master benchstat out.old out.new2
name              old time/op  new time/op  delta
KV/Scan1_SQL-8     243µs ± 1%   209µs ± 1%  -13.76%  (p=0.000 n=9+9)
KV/Scan10_SQL-8    263µs ± 0%   232µs ± 2%  -11.61%  (p=0.000 n=9+10)
KV/Scan100_SQL-8   463µs ± 0%   445µs ± 4%   -3.86%  (p=0.000 n=10+9)

Interestingly, all of these timings are lower than with go1.8.3.

@petermattis
Copy link
Author

Interestingly, all of these timings are lower than with go1.8.3.

Nothing to see here. This appears to be due to a change on our code between what I tested earlier today and now.

@bradfitz bradfitz added early-in-cycle A change that should be done early in the 3 month dev cycle. and removed early-in-cycle A change that should be done early in the 3 month dev cycle. labels Jun 14, 2017
@bradfitz bradfitz modified the milestones: Go1.10Early, Go1.10 Jun 14, 2017
@petermattis
Copy link
Author

I did some more testing of this patch and the performance improvements carry over to production settings. morestack disappears from profiles. Note this is using a version of the patch which uses a 32KB initial stack size.

@petermattis
Copy link
Author

It is early in the 1.10 cycle and wanted to bring this issue forward again. See cockroachdb/cockroach#17242 for a graph showing the benefit of a larger initial stack size.

@petermattis
Copy link
Author

Is there any update on this issue? A larger initial goroutine stack size provides a nice performance boost for our system.

@randall77 randall77 removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 11, 2023
@richardartoul
Copy link

Just chiming in that I'm still working around this issue in my performance-sensitive applications manually for the same reasons mentioned above, an average over the entire application is not reflective of the needs of my critical path.

rabbbit added a commit to uber-go/zap that referenced this issue Jul 25, 2023
We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @CDVR and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

Before:
```
❯  go test -bench BenchmarkAny -benchmem -run Any

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAnyInGoroutine/any-12           3977013               316.1 ns/op            64 B/op          1 allocs/op
BenchmarkAnyInGoroutine/int-12           4172178               289.6 ns/op            64 B/op          1 allocs/op
BenchmarkAnyInGoroutine/goroutine-12     5018606               253.1 ns/op             0 B/op          0 allocs/op
BenchmarkAnyInGoroutine/int-in-go-12     2167634               561.5 ns/op            88 B/op          2 allocs/op
BenchmarkAnyInGoroutine/any-in-go-12     1875784               637.6 ns/op            88 B/op          2 allocs/op
BenchmarkAnyInGoroutine/int-in-go-with-stack-12                  2500544               446.7 ns/op            88 B/op          2 allocs/op
BenchmarkAnyInGoroutine/any-in-go-with-stack-12                  1769258               653.0 ns/op            88 B/op          2 allocs/op
PASS
```

After:
```
❯  go test -bench BenchmarkAny -benchmem -run Any

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAnyInGoroutine/any-12           3967261               288.2 ns/op            64 B/op          1 allocs/op
BenchmarkAnyInGoroutine/int-12           4571604               260.6 ns/op            64 B/op          1 allocs/op
BenchmarkAnyInGoroutine/goroutine-12     5481904               216.4 ns/op             0 B/op          0 allocs/op
BenchmarkAnyInGoroutine/int-in-go-12     2097002               583.1 ns/op            88 B/op          2 allocs/op
BenchmarkAnyInGoroutine/any-in-go-12     2164167               551.6 ns/op            88 B/op          2 allocs/op
BenchmarkAnyInGoroutine/int-in-go-with-stack-12                  2989617               378.2 ns/op            88 B/op          2 allocs/op
BenchmarkAnyInGoroutine/any-in-go-with-stack-12                  3123987               381.8 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 11.486s
```

10 runs.
```
❯ benchstat before.txt after.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                                       │ before.txt  │              after.txt              │
                                       │   sec/op    │   sec/op     vs base                │
AnyInGoroutine/any-12                    305.2n ± 3%   290.0n ± 1%   -5.00% (p=0.000 n=10)
AnyInGoroutine/int-12                    288.0n ± 0%   265.5n ± 1%   -7.85% (p=0.000 n=10)
AnyInGoroutine/goroutine-12              218.3n ± 6%   216.9n ± 2%        ~ (p=0.469 n=10)
AnyInGoroutine/int-in-go-12              592.7n ± 2%   578.5n ± 0%   -2.40% (p=0.001 n=10)
AnyInGoroutine/any-in-go-12              666.5n ± 4%   557.8n ± 1%  -16.30% (p=0.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12   474.4n ± 4%   470.3n ± 6%        ~ (p=0.631 n=10)
AnyInGoroutine/any-in-go-with-stack-12   617.6n ± 4%   475.9n ± 3%  -22.95% (p=0.000 n=10)
geomean                                  417.8n        382.8n        -8.36%

                                       │  before.txt  │              after.txt              │
                                       │     B/op     │    B/op     vs base                 │
AnyInGoroutine/any-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/int-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/goroutine-12              0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/int-in-go-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/any-in-go-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/int-in-go-with-stack-12   88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/any-in-go-with-stack-12   88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                             ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                       │  before.txt  │              after.txt              │
                                       │  allocs/op   │ allocs/op   vs base                 │
AnyInGoroutine/any-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/int-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/goroutine-12              0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/int-in-go-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/any-in-go-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/int-in-go-with-stack-12   2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
AnyInGoroutine/any-in-go-with-stack-12   2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                             ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```
rabbbit added a commit to uber-go/zap that referenced this issue Jul 25, 2023
We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @CDVR and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before.txt ~/after-after.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │         sec/op          │   sec/op     vs base                  │
AnyInGoroutine/any-12                                305.2n ± 3%   297.0n ± 0%        ~ (p=0.085 n=10)
AnyInGoroutine/int-12                                288.0n ± 0%   270.1n ± 1%   -6.25% (p=0.000 n=10)
AnyInGoroutine/goroutine-12                          218.3n ± 6%   209.5n ± 5%   -4.05% (p=0.015 n=10)
AnyInGoroutine/int-in-go-12                          592.7n ± 2%   573.9n ± 2%   -3.17% (p=0.000 n=10)
AnyInGoroutine/any-in-go-12                          666.5n ± 4%   552.3n ± 1%  -17.13% (p=0.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12               474.4n ± 4%   459.4n ± 6%        ~ (p=0.447 n=10+9)
AnyInGoroutine/any-in-go-with-stack-12               617.6n ± 4%   468.8n ± 4%  -24.09% (p=0.000 n=10)
geomean                                              417.8n        380.1n        -9.01%

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │          B/op           │    B/op     vs base                   │
AnyInGoroutine/any-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/any-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │        allocs/op        │ allocs/op   vs base                   │
AnyInGoroutine/any-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/any-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```
rabbbit added a commit to uber-go/zap that referenced this issue Jul 26, 2023
We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @CDVR and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before.txt ~/after-after.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │         sec/op          │   sec/op     vs base                  │
AnyInGoroutine/any-12                                305.2n ± 3%   297.0n ± 0%        ~ (p=0.085 n=10)
AnyInGoroutine/int-12                                288.0n ± 0%   270.1n ± 1%   -6.25% (p=0.000 n=10)
AnyInGoroutine/goroutine-12                          218.3n ± 6%   209.5n ± 5%   -4.05% (p=0.015 n=10)
AnyInGoroutine/int-in-go-12                          592.7n ± 2%   573.9n ± 2%   -3.17% (p=0.000 n=10)
AnyInGoroutine/any-in-go-12                          666.5n ± 4%   552.3n ± 1%  -17.13% (p=0.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12               474.4n ± 4%   459.4n ± 6%        ~ (p=0.447 n=10+9)
AnyInGoroutine/any-in-go-with-stack-12               617.6n ± 4%   468.8n ± 4%  -24.09% (p=0.000 n=10)
geomean                                              417.8n        380.1n        -9.01%

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │          B/op           │    B/op     vs base                   │
AnyInGoroutine/any-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/any-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │        allocs/op        │ allocs/op   vs base                   │
AnyInGoroutine/any-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/any-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```
rabbbit added a commit to uber-go/zap that referenced this issue Jul 26, 2023
We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @CDVR and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before.txt ~/after-after.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │         sec/op          │   sec/op     vs base                  │
AnyInGoroutine/any-12                                305.2n ± 3%   297.0n ± 0%        ~ (p=0.085 n=10)
AnyInGoroutine/int-12                                288.0n ± 0%   270.1n ± 1%   -6.25% (p=0.000 n=10)
AnyInGoroutine/goroutine-12                          218.3n ± 6%   209.5n ± 5%   -4.05% (p=0.015 n=10)
AnyInGoroutine/int-in-go-12                          592.7n ± 2%   573.9n ± 2%   -3.17% (p=0.000 n=10)
AnyInGoroutine/any-in-go-12                          666.5n ± 4%   552.3n ± 1%  -17.13% (p=0.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12               474.4n ± 4%   459.4n ± 6%        ~ (p=0.447 n=10+9)
AnyInGoroutine/any-in-go-with-stack-12               617.6n ± 4%   468.8n ± 4%  -24.09% (p=0.000 n=10)
geomean                                              417.8n        380.1n        -9.01%

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │          B/op           │    B/op     vs base                   │
AnyInGoroutine/any-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/any-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │        allocs/op        │ allocs/op   vs base                   │
AnyInGoroutine/any-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/any-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```
rabbbit added a commit to uber-go/zap that referenced this issue Jul 26, 2023
We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @CDVR and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before.txt ~/after-after.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │         sec/op          │   sec/op     vs base                  │
AnyInGoroutine/any-12                                305.2n ± 3%   297.0n ± 0%        ~ (p=0.085 n=10)
AnyInGoroutine/int-12                                288.0n ± 0%   270.1n ± 1%   -6.25% (p=0.000 n=10)
AnyInGoroutine/goroutine-12                          218.3n ± 6%   209.5n ± 5%   -4.05% (p=0.015 n=10)
AnyInGoroutine/int-in-go-12                          592.7n ± 2%   573.9n ± 2%   -3.17% (p=0.000 n=10)
AnyInGoroutine/any-in-go-12                          666.5n ± 4%   552.3n ± 1%  -17.13% (p=0.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12               474.4n ± 4%   459.4n ± 6%        ~ (p=0.447 n=10+9)
AnyInGoroutine/any-in-go-with-stack-12               617.6n ± 4%   468.8n ± 4%  -24.09% (p=0.000 n=10)
geomean                                              417.8n        380.1n        -9.01%

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │          B/op           │    B/op     vs base                   │
AnyInGoroutine/any-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/any-in-go-12                         88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)
AnyInGoroutine/int-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                       │ /Users/pawel/before.txt │     /Users/pawel/after-after.txt      │
                                       │        allocs/op        │ allocs/op   vs base                   │
AnyInGoroutine/any-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-12                               1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/goroutine-12                         0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/any-in-go-12                         2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
AnyInGoroutine/int-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10+9) ¹
AnyInGoroutine/any-in-go-with-stack-12              2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10)   ¹
geomean                                                        ²               +0.00%                  ²
¹ all samples are equal
² summaries must be >0 to compute geomean
```
rabbbit added a commit to uber-go/zap that referenced this issue Jul 26, 2023
We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @CDVR and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before2.txt ~/after2.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt        │
                            │          sec/op          │    sec/op     vs base                │
Any/str-no-logger-12                      3.344n ±  1%   3.029n ±  1%   -9.40% (p=0.000 n=10)
Any/any-no-logger-12                      13.80n ±  4%   18.67n ±  1%  +35.29% (p=0.000 n=10)
Any/str-with-logger-12                    372.4n ±  3%   363.6n ±  1%   -2.35% (p=0.001 n=10)
Any/any-with-logger-12                    369.2n ±  1%   363.6n ±  1%   -1.52% (p=0.002 n=10)
Any/str-in-go-12                          587.2n ±  2%   587.0n ±  1%        ~ (p=0.617 n=10)
Any/any-in-go-12                          666.5n ±  3%   567.6n ±  1%  -14.85% (p=0.000 n=10)
Any/str-in-go-with-stack-12               448.6n ± 18%   403.4n ± 13%        ~ (p=0.280 n=10)
Any/any-in-go-with-stack-12               564.9n ±  7%   443.2n ±  4%  -21.55% (p=0.000 n=10)
geomean                                   167.8n         160.7n         -4.23%

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │           B/op           │    B/op     vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │        allocs/op         │ allocs/op   vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean
```
rabbbit added a commit to uber-go/zap that referenced this issue Jul 27, 2023
We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @CDVR and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before2.txt ~/after2.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt        │
                            │          sec/op          │    sec/op     vs base                │
Any/str-no-logger-12                      3.344n ±  1%   3.029n ±  1%   -9.40% (p=0.000 n=10)
Any/any-no-logger-12                      13.80n ±  4%   18.67n ±  1%  +35.29% (p=0.000 n=10)
Any/str-with-logger-12                    372.4n ±  3%   363.6n ±  1%   -2.35% (p=0.001 n=10)
Any/any-with-logger-12                    369.2n ±  1%   363.6n ±  1%   -1.52% (p=0.002 n=10)
Any/str-in-go-12                          587.2n ±  2%   587.0n ±  1%        ~ (p=0.617 n=10)
Any/any-in-go-12                          666.5n ±  3%   567.6n ±  1%  -14.85% (p=0.000 n=10)
Any/str-in-go-with-stack-12               448.6n ± 18%   403.4n ± 13%        ~ (p=0.280 n=10)
Any/any-in-go-with-stack-12               564.9n ±  7%   443.2n ±  4%  -21.55% (p=0.000 n=10)
geomean                                   167.8n         160.7n         -4.23%

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │           B/op           │    B/op     vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │        allocs/op         │ allocs/op   vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean
```

On absolute terms:

Before:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3191725               382.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3159882               367.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           2998960               373.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3264657               361.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3168627               386.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3169394               364.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3271981               368.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3293463               362.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    793905              1388 ns/op             143 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1724048               748.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 2536380               444.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2177941               586.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    890155              1237 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1836302               719.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 3671503               322.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2257405               540.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         811408              1457 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1384990               729.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      3228151               381.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2678596               450.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         821092              1386 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1747638               662.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3747934               341.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2678191               463.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 53.238s
```
rabbbit added a commit to uber-go/zap that referenced this issue Jul 27, 2023
This is an alternative to #1301 and #1302. It's not as fast as these two
options, but it still gives us half the stack reduction without the
`unsafe` usage.

Interestingly it seems that on both arm64 and amd64 the new code, with
the closure, is faster than the plain old switch.
We do see a ~5-10ns delay on `Any` creation if it's used without
`logger`, but that's minimal and not realistic.

Bunch of credit for this goes to @cdvr1993, we started independently,
I was about to give up but the conversations pushed me forward. In the
end he ended up going into a more advanced land where I dare not to enter.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @CDVR and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before2.txt ~/after2.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt        │
                            │          sec/op          │    sec/op     vs base                │
Any/str-no-logger-12                      3.344n ±  1%   3.029n ±  1%   -9.40% (p=0.000 n=10)
Any/any-no-logger-12                      13.80n ±  4%   18.67n ±  1%  +35.29% (p=0.000 n=10)
Any/str-with-logger-12                    372.4n ±  3%   363.6n ±  1%   -2.35% (p=0.001 n=10)
Any/any-with-logger-12                    369.2n ±  1%   363.6n ±  1%   -1.52% (p=0.002 n=10)
Any/str-in-go-12                          587.2n ±  2%   587.0n ±  1%        ~ (p=0.617 n=10)
Any/any-in-go-12                          666.5n ±  3%   567.6n ±  1%  -14.85% (p=0.000 n=10)
Any/str-in-go-with-stack-12               448.6n ± 18%   403.4n ± 13%        ~ (p=0.280 n=10)
Any/any-in-go-with-stack-12               564.9n ±  7%   443.2n ±  4%  -21.55% (p=0.000 n=10)
geomean                                   167.8n         160.7n         -4.23%

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │           B/op           │    B/op     vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │        allocs/op         │ allocs/op   vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3191725               382.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3159882               367.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           2998960               373.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3264657               361.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3168627               386.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3169394               364.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3271981               368.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3293463               362.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    793905              1388 ns/op             143 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1724048               748.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 2536380               444.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2177941               586.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    890155              1237 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1836302               719.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 3671503               322.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2257405               540.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         811408              1457 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1384990               729.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      3228151               381.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2678596               450.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         821092              1386 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1747638               662.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3747934               341.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2678191               463.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 53.238s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.64 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              27779637                44.20 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            27881986                42.96 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            27587953                43.39 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            26861058                43.43 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1749990               690.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1807341               660.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1821039               654.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1865083               650.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1677643               741.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1905400               689.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1843364               646.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1899883               645.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    453326              2479 ns/op              92 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  724555              1580 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1358790               953.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1805985               585.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    466447              2395 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  874053              1487 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1457768               834.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1795317               632.5 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         407620              2749 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       725614              1597 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1303908               863.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1957864               609.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         497640              2401 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       648355              1549 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1486416               869.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2116040               568.8 ns/op            88 B/op          2 allocs/op
PASS
```
rabbbit added a commit to uber-go/zap that referenced this issue Jul 28, 2023
This is an alternative to #1301 and #1302. It's not as fast as these two
options, but it still gives us half the stack reduction without the
`unsafe` usage.

Interestingly it seems that on both arm64 and amd64 the new code, with
the closure, is faster than the plain old switch.
We do see a ~5-10ns delay on `Any` creation if it's used without
`logger`, but that's minimal and not realistic.

Bunch of credit for this goes to @cdvr1993, we started independently,
I was about to give up but the conversations pushed me forward. In the
end he ended up going into a more advanced land where I dare not to enter.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @CDVR and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before2.txt ~/after2.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt        │
                            │          sec/op          │    sec/op     vs base                │
Any/str-no-logger-12                      3.344n ±  1%   3.029n ±  1%   -9.40% (p=0.000 n=10)
Any/any-no-logger-12                      13.80n ±  4%   18.67n ±  1%  +35.29% (p=0.000 n=10)
Any/str-with-logger-12                    372.4n ±  3%   363.6n ±  1%   -2.35% (p=0.001 n=10)
Any/any-with-logger-12                    369.2n ±  1%   363.6n ±  1%   -1.52% (p=0.002 n=10)
Any/str-in-go-12                          587.2n ±  2%   587.0n ±  1%        ~ (p=0.617 n=10)
Any/any-in-go-12                          666.5n ±  3%   567.6n ±  1%  -14.85% (p=0.000 n=10)
Any/str-in-go-with-stack-12               448.6n ± 18%   403.4n ± 13%        ~ (p=0.280 n=10)
Any/any-in-go-with-stack-12               564.9n ±  7%   443.2n ±  4%  -21.55% (p=0.000 n=10)
geomean                                   167.8n         160.7n         -4.23%

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │           B/op           │    B/op     vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │        allocs/op         │ allocs/op   vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3191725               382.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3159882               367.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           2998960               373.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3264657               361.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3168627               386.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3169394               364.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3271981               368.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3293463               362.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    793905              1388 ns/op             143 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1724048               748.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 2536380               444.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2177941               586.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    890155              1237 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1836302               719.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 3671503               322.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2257405               540.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         811408              1457 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1384990               729.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      3228151               381.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2678596               450.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         821092              1386 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1747638               662.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3747934               341.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2678191               463.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 53.238s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.64 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              27779637                44.20 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            27881986                42.96 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            27587953                43.39 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            26861058                43.43 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1749990               690.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1807341               660.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1821039               654.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1865083               650.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1677643               741.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1905400               689.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1843364               646.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1899883               645.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    453326              2479 ns/op              92 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  724555              1580 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1358790               953.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1805985               585.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    466447              2395 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  874053              1487 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1457768               834.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1795317               632.5 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         407620              2749 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       725614              1597 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1303908               863.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1957864               609.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         497640              2401 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       648355              1549 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1486416               869.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2116040               568.8 ns/op            88 B/op          2 allocs/op
PASS
```
rabbbit added a commit to uber-go/zap that referenced this issue Jul 28, 2023
This is an alternative to #1301 and #1302. It's not as fast as these two
options, but it still gives us half the stack reduction without the
`unsafe` usage.

Interestingly it seems that on both arm64 and amd64 the new code, with
the closure, is faster than the plain old switch.
We do see a ~5-10ns delay on `Any` creation if it's used without
`logger`, but that's minimal and not realistic.

Bunch of credit for this goes to @cdvr1993, we started independently,
I was about to give up but the conversations pushed me forward. In the
end he ended up going into a more advanced land where I dare not to enter.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @CDVR and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before2.txt ~/after2.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt        │
                            │          sec/op          │    sec/op     vs base                │
Any/str-no-logger-12                      3.344n ±  1%   3.029n ±  1%   -9.40% (p=0.000 n=10)
Any/any-no-logger-12                      13.80n ±  4%   18.67n ±  1%  +35.29% (p=0.000 n=10)
Any/str-with-logger-12                    372.4n ±  3%   363.6n ±  1%   -2.35% (p=0.001 n=10)
Any/any-with-logger-12                    369.2n ±  1%   363.6n ±  1%   -1.52% (p=0.002 n=10)
Any/str-in-go-12                          587.2n ±  2%   587.0n ±  1%        ~ (p=0.617 n=10)
Any/any-in-go-12                          666.5n ±  3%   567.6n ±  1%  -14.85% (p=0.000 n=10)
Any/str-in-go-with-stack-12               448.6n ± 18%   403.4n ± 13%        ~ (p=0.280 n=10)
Any/any-in-go-with-stack-12               564.9n ±  7%   443.2n ±  4%  -21.55% (p=0.000 n=10)
geomean                                   167.8n         160.7n         -4.23%

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │           B/op           │    B/op     vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │        allocs/op         │ allocs/op   vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3191725               382.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3159882               367.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           2998960               373.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3264657               361.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3168627               386.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3169394               364.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3271981               368.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3293463               362.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    793905              1388 ns/op             143 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1724048               748.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 2536380               444.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2177941               586.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    890155              1237 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1836302               719.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 3671503               322.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2257405               540.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         811408              1457 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1384990               729.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      3228151               381.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2678596               450.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         821092              1386 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1747638               662.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3747934               341.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2678191               463.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 53.238s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.64 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              27779637                44.20 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            27881986                42.96 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            27587953                43.39 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            26861058                43.43 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1749990               690.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1807341               660.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1821039               654.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1865083               650.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1677643               741.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1905400               689.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1843364               646.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1899883               645.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    453326              2479 ns/op              92 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  724555              1580 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1358790               953.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1805985               585.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    466447              2395 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  874053              1487 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1457768               834.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1795317               632.5 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         407620              2749 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       725614              1597 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1303908               863.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1957864               609.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         497640              2401 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       648355              1549 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1486416               869.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2116040               568.8 ns/op            88 B/op          2 allocs/op
PASS
```
rabbbit added a commit to uber-go/zap that referenced this issue Jul 28, 2023
This is an alternative to:
- #1301 and #1302 and #1304 - a series of PRs that are faster than this
  one. However, they rely on unsafe.
- #1303 - my own PR that uses closures, to reduce the stack size by 60%.

This PR reduces the stack size from:
```
 field.go:420          0xd16c3                 4881ecf8120000          SUBQ $0x12f8, SP   // 4856
```
to
```
  field.go:420          0xcb603                 4881ecb8000000          SUBQ $0xb8, SP // 184
```
so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed
methods, like `zap.String`, etc.

The downside is the (slight) incrase in the code maitenance - we unroll
as much as we can and rely on the compiler correctly re-using small
variable sizes. While this is not pretty, it feels safe - the changes
were purely mechanical. Future changes and extensions should be easy to
review.

Additionally, the new code is (slightly) faster in all cases since we
remove 1-2 function calls from all paths. The "in new goroutine" path is
most affected, as shown in benchmarks below.

This was largely inspired by conversations with @cdvr1993. We started
looking at this in parallel, but I would have given up if it wasn't for
our conversations.
This particular version was inspired by an earlier version of #1304 -
where I realized that @cdvr1993 is doing a similar dispatching mechanism
that zap is already doing via `zapcore` - a possible optimization.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

*Most crucially, `zap.Any` is now as fast as a direct dispatch like
`zap.String`.*

10 runs.
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                           │ before-final.txt │           after-final.txt            │
                           │      sec/op      │    sec/op     vs base                │
Any/str-no-logger                3.106n ±  2%   3.160n ±  1%   +1.75% (p=0.025 n=10)
Any/str-no-logger-2              3.171n ±  4%   3.142n ±  1%        ~ (p=0.593 n=10)
Any/str-no-logger-4              3.108n ±  3%   3.139n ±  2%   +0.97% (p=0.004 n=10)
Any/str-no-logger-8              3.099n ±  2%   3.143n ±  2%        ~ (p=0.086 n=10)
Any/any-no-logger                13.89n ±  2%   12.98n ±  2%   -6.59% (p=0.000 n=10)
Any/any-no-logger-2              13.97n ±  2%   12.96n ±  2%   -7.27% (p=0.000 n=10)
Any/any-no-logger-4              13.83n ±  2%   12.89n ±  2%   -6.83% (p=0.000 n=10)
Any/any-no-logger-8              13.77n ±  2%   12.88n ±  2%   -6.43% (p=0.000 n=10)
Any/str-with-logger              384.1n ±  2%   383.9n ±  6%        ~ (p=0.810 n=10)
Any/str-with-logger-2            367.8n ±  2%   368.5n ±  3%        ~ (p=0.971 n=10)
Any/str-with-logger-4            372.4n ±  2%   368.6n ±  4%        ~ (p=0.912 n=10)
Any/str-with-logger-8            369.8n ±  3%   368.3n ±  3%        ~ (p=0.698 n=10)
Any/any-with-logger              383.8n ±  3%   383.3n ±  6%        ~ (p=0.838 n=10)
Any/any-with-logger-2            370.0n ±  3%   367.6n ±  1%        ~ (p=0.239 n=10)
Any/any-with-logger-4            370.0n ±  3%   368.2n ±  4%        ~ (p=0.631 n=10)
Any/any-with-logger-8            367.6n ±  2%   369.7n ±  3%        ~ (p=0.756 n=10)
Any/str-in-go                    1.334µ ±  3%   1.347µ ±  3%        ~ (p=0.271 n=10)
Any/str-in-go-2                  754.5n ±  3%   744.8n ±  5%        ~ (p=0.481 n=10)
Any/str-in-go-4                  420.2n ± 11%   367.7n ± 31%        ~ (p=0.086 n=10)
Any/str-in-go-8                  557.6n ±  4%   547.1n ± 12%        ~ (p=0.579 n=10)
Any/any-in-go                    2.562µ ±  4%   1.447µ ±  3%  -43.53% (p=0.000 n=10)
Any/any-in-go-2                 1361.0n ±  4%   761.4n ±  7%  -44.06% (p=0.000 n=10)
Any/any-in-go-4                  732.1n ±  9%   397.1n ± 11%  -45.76% (p=0.000 n=10)
Any/any-in-go-8                  541.3n ± 13%   564.6n ±  5%   +4.30% (p=0.041 n=10)
Any/str-in-go-with-stack         1.420µ ±  1%   1.428µ ±  3%        ~ (p=0.670 n=10)
Any/str-in-go-with-stack-2       749.5n ±  4%   771.8n ±  4%        ~ (p=0.123 n=10)
Any/str-in-go-with-stack-4       433.2n ± 15%   400.7n ± 14%        ~ (p=0.393 n=10)
Any/str-in-go-with-stack-8       494.0n ±  7%   490.1n ± 10%        ~ (p=0.853 n=10)
Any/any-in-go-with-stack         2.586µ ±  3%   1.471µ ±  4%  -43.14% (p=0.000 n=10)
Any/any-in-go-with-stack-2      1343.0n ±  3%   773.7n ±  4%  -42.39% (p=0.000 n=10)
Any/any-in-go-with-stack-4       697.7n ±  8%   403.4n ±  9%  -42.17% (p=0.000 n=10)
Any/any-in-go-with-stack-8       490.8n ±  9%   492.8n ±  8%        ~ (p=0.796 n=10)
geomean                          206.3n         182.9n        -11.35%
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
❯  go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3202051               382.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3301683               371.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3186028               364.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3061030               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3203704               378.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3281462               372.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3252879               371.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3246148               373.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    804132              1404 ns/op             133 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1686093               758.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3075596               430.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2101650               543.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    845822              1424 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1531311               736.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 2618665               464.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2130280               536.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         818583              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1533379               739.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2507131               399.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2348804               453.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         807199              1526 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1590476               783.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3026263               383.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2615467               493.8 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.077s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.77 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.75 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.76 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.69 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              49795383                24.33 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            48821454                24.31 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            49452686                24.79 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            49359926                24.26 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1808188               700.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1894179               643.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1858263               649.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1879894               645.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1817276               663.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1906438               637.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1837354               641.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1909658               648.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    468484              2463 ns/op              96 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  726475              1465 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1285284               958.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1746547               573.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    426568              2715 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  611106              1703 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1000000              1017 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2220459               625.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         429721              2673 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       637306              1593 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1301713               902.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2012583               651.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         391810              2833 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       675589              1639 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1219318               970.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1825632               574.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.294s
```
rabbbit added a commit to uber-go/zap that referenced this issue Jul 28, 2023
This is an alternative to:
- #1301 and #1302 and #1304 - a series of PRs that are faster than this
  one. However, they rely on unsafe.
- #1303 - my own PR that uses closures, to reduce the stack size by 60%.

This PR reduces the stack size from:
```
 field.go:420          0xd16c3                 4881ecf8120000          SUBQ $0x12f8, SP   // 4856
```
to
```
  field.go:420          0xcb603                 4881ecb8000000          SUBQ $0xb8, SP // 184
```
so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed
methods, like `zap.String`, etc.

The downside is the (slight) incrase in the code maitenance - we unroll
as much as we can and rely on the compiler correctly re-using small
variable sizes. While this is not pretty, it feels safe - the changes
were purely mechanical. Future changes and extensions should be easy to
review.

Additionally, the new code is (slightly) faster in all cases since we
remove 1-2 function calls from all paths. The "in new goroutine" path is
most affected, as shown in benchmarks below.

This was largely inspired by conversations with @cdvr1993. We started
looking at this in parallel, but I would have given up if it wasn't for
our conversations.
This particular version was inspired by an earlier version of #1304 -
where I realized that @cdvr1993 is doing a similar dispatching mechanism
that zap is already doing via `zapcore` - a possible optimization.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

*Most crucially, `zap.Any` is now as fast as a direct dispatch like
`zap.String`.*

10 runs.
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                           │ before-final.txt │           after-final.txt            │
                           │      sec/op      │    sec/op     vs base                │
Any/str-no-logger                3.106n ±  2%   3.160n ±  1%   +1.75% (p=0.025 n=10)
Any/str-no-logger-2              3.171n ±  4%   3.142n ±  1%        ~ (p=0.593 n=10)
Any/str-no-logger-4              3.108n ±  3%   3.139n ±  2%   +0.97% (p=0.004 n=10)
Any/str-no-logger-8              3.099n ±  2%   3.143n ±  2%        ~ (p=0.086 n=10)
Any/any-no-logger                13.89n ±  2%   12.98n ±  2%   -6.59% (p=0.000 n=10)
Any/any-no-logger-2              13.97n ±  2%   12.96n ±  2%   -7.27% (p=0.000 n=10)
Any/any-no-logger-4              13.83n ±  2%   12.89n ±  2%   -6.83% (p=0.000 n=10)
Any/any-no-logger-8              13.77n ±  2%   12.88n ±  2%   -6.43% (p=0.000 n=10)
Any/str-with-logger              384.1n ±  2%   383.9n ±  6%        ~ (p=0.810 n=10)
Any/str-with-logger-2            367.8n ±  2%   368.5n ±  3%        ~ (p=0.971 n=10)
Any/str-with-logger-4            372.4n ±  2%   368.6n ±  4%        ~ (p=0.912 n=10)
Any/str-with-logger-8            369.8n ±  3%   368.3n ±  3%        ~ (p=0.698 n=10)
Any/any-with-logger              383.8n ±  3%   383.3n ±  6%        ~ (p=0.838 n=10)
Any/any-with-logger-2            370.0n ±  3%   367.6n ±  1%        ~ (p=0.239 n=10)
Any/any-with-logger-4            370.0n ±  3%   368.2n ±  4%        ~ (p=0.631 n=10)
Any/any-with-logger-8            367.6n ±  2%   369.7n ±  3%        ~ (p=0.756 n=10)
Any/str-in-go                    1.334µ ±  3%   1.347µ ±  3%        ~ (p=0.271 n=10)
Any/str-in-go-2                  754.5n ±  3%   744.8n ±  5%        ~ (p=0.481 n=10)
Any/str-in-go-4                  420.2n ± 11%   367.7n ± 31%        ~ (p=0.086 n=10)
Any/str-in-go-8                  557.6n ±  4%   547.1n ± 12%        ~ (p=0.579 n=10)
Any/any-in-go                    2.562µ ±  4%   1.447µ ±  3%  -43.53% (p=0.000 n=10)
Any/any-in-go-2                 1361.0n ±  4%   761.4n ±  7%  -44.06% (p=0.000 n=10)
Any/any-in-go-4                  732.1n ±  9%   397.1n ± 11%  -45.76% (p=0.000 n=10)
Any/any-in-go-8                  541.3n ± 13%   564.6n ±  5%   +4.30% (p=0.041 n=10)
Any/str-in-go-with-stack         1.420µ ±  1%   1.428µ ±  3%        ~ (p=0.670 n=10)
Any/str-in-go-with-stack-2       749.5n ±  4%   771.8n ±  4%        ~ (p=0.123 n=10)
Any/str-in-go-with-stack-4       433.2n ± 15%   400.7n ± 14%        ~ (p=0.393 n=10)
Any/str-in-go-with-stack-8       494.0n ±  7%   490.1n ± 10%        ~ (p=0.853 n=10)
Any/any-in-go-with-stack         2.586µ ±  3%   1.471µ ±  4%  -43.14% (p=0.000 n=10)
Any/any-in-go-with-stack-2      1343.0n ±  3%   773.7n ±  4%  -42.39% (p=0.000 n=10)
Any/any-in-go-with-stack-4       697.7n ±  8%   403.4n ±  9%  -42.17% (p=0.000 n=10)
Any/any-in-go-with-stack-8       490.8n ±  9%   492.8n ±  8%        ~ (p=0.796 n=10)
geomean                          206.3n         182.9n        -11.35%
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
❯  go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3202051               382.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3301683               371.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3186028               364.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3061030               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3203704               378.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3281462               372.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3252879               371.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3246148               373.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    804132              1404 ns/op             133 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1686093               758.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3075596               430.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2101650               543.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    845822              1424 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1531311               736.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 2618665               464.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2130280               536.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         818583              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1533379               739.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2507131               399.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2348804               453.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         807199              1526 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1590476               783.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3026263               383.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2615467               493.8 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.077s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.77 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.75 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.76 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.69 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              49795383                24.33 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            48821454                24.31 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            49452686                24.79 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            49359926                24.26 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1808188               700.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1894179               643.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1858263               649.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1879894               645.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1817276               663.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1906438               637.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1837354               641.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1909658               648.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    468484              2463 ns/op              96 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  726475              1465 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1285284               958.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1746547               573.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    426568              2715 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  611106              1703 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1000000              1017 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2220459               625.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         429721              2673 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       637306              1593 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1301713               902.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2012583               651.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         391810              2833 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       675589              1639 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1219318               970.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1825632               574.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.294s
```
rabbbit added a commit to uber-go/zap that referenced this issue Jul 28, 2023
This is an alternative to:
- #1301 and #1302 and #1304 - a series of PRs that are faster than this
  one. However, they rely on unsafe.
- #1303 - my own PR that uses closures, to reduce the stack size by 60%.

This PR reduces the stack size from:
```
 field.go:420          0xd16c3                 4881ecf8120000          SUBQ $0x12f8, SP   // 4856
```
to
```
  field.go:420          0xcb603                 4881ecb8000000          SUBQ $0xb8, SP // 184
```
so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed
methods, like `zap.String`, etc.

The downside is the (slight) incrase in the code maitenance - we unroll
as much as we can and rely on the compiler correctly re-using small
variable sizes. While this is not pretty, it feels safe - the changes
were purely mechanical. Future changes and extensions should be easy to
review.

Additionally, the new code is (slightly) faster in all cases since we
remove 1-2 function calls from all paths. The "in new goroutine" path is
most affected, as shown in benchmarks below.

This was largely inspired by conversations with @cdvr1993. We started
looking at this in parallel, but I would have given up if it wasn't for
our conversations.
This particular version was inspired by an earlier version of #1304 -
where I realized that @cdvr1993 is doing a similar dispatching mechanism
that zap is already doing via `zapcore` - a possible optimization.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

*Most crucially, `zap.Any` is now as fast as a direct dispatch like
`zap.String`.*

10 runs.
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                           │ before-final.txt │           after-final.txt            │
                           │      sec/op      │    sec/op     vs base                │
Any/str-no-logger                3.106n ±  2%   3.160n ±  1%   +1.75% (p=0.025 n=10)
Any/str-no-logger-2              3.171n ±  4%   3.142n ±  1%        ~ (p=0.593 n=10)
Any/str-no-logger-4              3.108n ±  3%   3.139n ±  2%   +0.97% (p=0.004 n=10)
Any/str-no-logger-8              3.099n ±  2%   3.143n ±  2%        ~ (p=0.086 n=10)
Any/any-no-logger                13.89n ±  2%   12.98n ±  2%   -6.59% (p=0.000 n=10)
Any/any-no-logger-2              13.97n ±  2%   12.96n ±  2%   -7.27% (p=0.000 n=10)
Any/any-no-logger-4              13.83n ±  2%   12.89n ±  2%   -6.83% (p=0.000 n=10)
Any/any-no-logger-8              13.77n ±  2%   12.88n ±  2%   -6.43% (p=0.000 n=10)
Any/str-with-logger              384.1n ±  2%   383.9n ±  6%        ~ (p=0.810 n=10)
Any/str-with-logger-2            367.8n ±  2%   368.5n ±  3%        ~ (p=0.971 n=10)
Any/str-with-logger-4            372.4n ±  2%   368.6n ±  4%        ~ (p=0.912 n=10)
Any/str-with-logger-8            369.8n ±  3%   368.3n ±  3%        ~ (p=0.698 n=10)
Any/any-with-logger              383.8n ±  3%   383.3n ±  6%        ~ (p=0.838 n=10)
Any/any-with-logger-2            370.0n ±  3%   367.6n ±  1%        ~ (p=0.239 n=10)
Any/any-with-logger-4            370.0n ±  3%   368.2n ±  4%        ~ (p=0.631 n=10)
Any/any-with-logger-8            367.6n ±  2%   369.7n ±  3%        ~ (p=0.756 n=10)
Any/str-in-go                    1.334µ ±  3%   1.347µ ±  3%        ~ (p=0.271 n=10)
Any/str-in-go-2                  754.5n ±  3%   744.8n ±  5%        ~ (p=0.481 n=10)
Any/str-in-go-4                  420.2n ± 11%   367.7n ± 31%        ~ (p=0.086 n=10)
Any/str-in-go-8                  557.6n ±  4%   547.1n ± 12%        ~ (p=0.579 n=10)
Any/any-in-go                    2.562µ ±  4%   1.447µ ±  3%  -43.53% (p=0.000 n=10)
Any/any-in-go-2                 1361.0n ±  4%   761.4n ±  7%  -44.06% (p=0.000 n=10)
Any/any-in-go-4                  732.1n ±  9%   397.1n ± 11%  -45.76% (p=0.000 n=10)
Any/any-in-go-8                  541.3n ± 13%   564.6n ±  5%   +4.30% (p=0.041 n=10)
Any/str-in-go-with-stack         1.420µ ±  1%   1.428µ ±  3%        ~ (p=0.670 n=10)
Any/str-in-go-with-stack-2       749.5n ±  4%   771.8n ±  4%        ~ (p=0.123 n=10)
Any/str-in-go-with-stack-4       433.2n ± 15%   400.7n ± 14%        ~ (p=0.393 n=10)
Any/str-in-go-with-stack-8       494.0n ±  7%   490.1n ± 10%        ~ (p=0.853 n=10)
Any/any-in-go-with-stack         2.586µ ±  3%   1.471µ ±  4%  -43.14% (p=0.000 n=10)
Any/any-in-go-with-stack-2      1343.0n ±  3%   773.7n ±  4%  -42.39% (p=0.000 n=10)
Any/any-in-go-with-stack-4       697.7n ±  8%   403.4n ±  9%  -42.17% (p=0.000 n=10)
Any/any-in-go-with-stack-8       490.8n ±  9%   492.8n ±  8%        ~ (p=0.796 n=10)
geomean                          206.3n         182.9n        -11.35%
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
❯  go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3202051               382.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3301683               371.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3186028               364.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3061030               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3203704               378.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3281462               372.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3252879               371.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3246148               373.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    804132              1404 ns/op             133 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1686093               758.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3075596               430.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2101650               543.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    845822              1424 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1531311               736.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 2618665               464.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2130280               536.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         818583              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1533379               739.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2507131               399.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2348804               453.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         807199              1526 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1590476               783.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3026263               383.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2615467               493.8 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.077s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.77 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.75 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.76 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.69 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              49795383                24.33 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            48821454                24.31 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            49452686                24.79 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            49359926                24.26 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1808188               700.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1894179               643.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1858263               649.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1879894               645.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1817276               663.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1906438               637.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1837354               641.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1909658               648.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    468484              2463 ns/op              96 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  726475              1465 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1285284               958.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1746547               573.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    426568              2715 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  611106              1703 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1000000              1017 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2220459               625.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         429721              2673 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       637306              1593 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1301713               902.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2012583               651.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         391810              2833 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       675589              1639 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1219318               970.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1825632               574.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.294s
```
rabbbit added a commit to uber-go/zap that referenced this issue Jul 29, 2023
This is an alternative to:
- #1301 and #1302 and #1304 - a series of PRs that are faster than this
  one. However, they rely on unsafe.
- #1303 - my own PR that uses closures, to reduce the stack size by 60%.

This PR reduces the stack size from:
```
 field.go:420          0xd16c3                 4881ecf8120000          SUBQ $0x12f8, SP   // 4856
```
to
```
  field.go:420          0xcb603                 4881ecb8000000          SUBQ $0xb8, SP // 184
```
so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed
methods, like `zap.String`, etc.

The downside is the (slight) incrase in the code maitenance - we unroll
as much as we can and rely on the compiler correctly re-using small
variable sizes. While this is not pretty, it feels safe - the changes
were purely mechanical. Future changes and extensions should be easy to
review.

Additionally, the new code is (slightly) faster in all cases since we
remove 1-2 function calls from all paths. The "in new goroutine" path is
most affected, as shown in benchmarks below.

This was largely inspired by conversations with @cdvr1993. We started
looking at this in parallel, but I would have given up if it wasn't for
our conversations.
This particular version was inspired by an earlier version of #1304 -
where I realized that @cdvr1993 is doing a similar dispatching mechanism
that zap is already doing via `zapcore` - a possible optimization.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

*Most crucially, `zap.Any` is now as fast as a direct dispatch like
`zap.String`.*

10 runs.
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                           │ before-final.txt │           after-final.txt            │
                           │      sec/op      │    sec/op     vs base                │
Any/str-no-logger                3.106n ±  2%   3.160n ±  1%   +1.75% (p=0.025 n=10)
Any/str-no-logger-2              3.171n ±  4%   3.142n ±  1%        ~ (p=0.593 n=10)
Any/str-no-logger-4              3.108n ±  3%   3.139n ±  2%   +0.97% (p=0.004 n=10)
Any/str-no-logger-8              3.099n ±  2%   3.143n ±  2%        ~ (p=0.086 n=10)
Any/any-no-logger                13.89n ±  2%   12.98n ±  2%   -6.59% (p=0.000 n=10)
Any/any-no-logger-2              13.97n ±  2%   12.96n ±  2%   -7.27% (p=0.000 n=10)
Any/any-no-logger-4              13.83n ±  2%   12.89n ±  2%   -6.83% (p=0.000 n=10)
Any/any-no-logger-8              13.77n ±  2%   12.88n ±  2%   -6.43% (p=0.000 n=10)
Any/str-with-logger              384.1n ±  2%   383.9n ±  6%        ~ (p=0.810 n=10)
Any/str-with-logger-2            367.8n ±  2%   368.5n ±  3%        ~ (p=0.971 n=10)
Any/str-with-logger-4            372.4n ±  2%   368.6n ±  4%        ~ (p=0.912 n=10)
Any/str-with-logger-8            369.8n ±  3%   368.3n ±  3%        ~ (p=0.698 n=10)
Any/any-with-logger              383.8n ±  3%   383.3n ±  6%        ~ (p=0.838 n=10)
Any/any-with-logger-2            370.0n ±  3%   367.6n ±  1%        ~ (p=0.239 n=10)
Any/any-with-logger-4            370.0n ±  3%   368.2n ±  4%        ~ (p=0.631 n=10)
Any/any-with-logger-8            367.6n ±  2%   369.7n ±  3%        ~ (p=0.756 n=10)
Any/str-in-go                    1.334µ ±  3%   1.347µ ±  3%        ~ (p=0.271 n=10)
Any/str-in-go-2                  754.5n ±  3%   744.8n ±  5%        ~ (p=0.481 n=10)
Any/str-in-go-4                  420.2n ± 11%   367.7n ± 31%        ~ (p=0.086 n=10)
Any/str-in-go-8                  557.6n ±  4%   547.1n ± 12%        ~ (p=0.579 n=10)
Any/any-in-go                    2.562µ ±  4%   1.447µ ±  3%  -43.53% (p=0.000 n=10)
Any/any-in-go-2                 1361.0n ±  4%   761.4n ±  7%  -44.06% (p=0.000 n=10)
Any/any-in-go-4                  732.1n ±  9%   397.1n ± 11%  -45.76% (p=0.000 n=10)
Any/any-in-go-8                  541.3n ± 13%   564.6n ±  5%   +4.30% (p=0.041 n=10)
Any/str-in-go-with-stack         1.420µ ±  1%   1.428µ ±  3%        ~ (p=0.670 n=10)
Any/str-in-go-with-stack-2       749.5n ±  4%   771.8n ±  4%        ~ (p=0.123 n=10)
Any/str-in-go-with-stack-4       433.2n ± 15%   400.7n ± 14%        ~ (p=0.393 n=10)
Any/str-in-go-with-stack-8       494.0n ±  7%   490.1n ± 10%        ~ (p=0.853 n=10)
Any/any-in-go-with-stack         2.586µ ±  3%   1.471µ ±  4%  -43.14% (p=0.000 n=10)
Any/any-in-go-with-stack-2      1343.0n ±  3%   773.7n ±  4%  -42.39% (p=0.000 n=10)
Any/any-in-go-with-stack-4       697.7n ±  8%   403.4n ±  9%  -42.17% (p=0.000 n=10)
Any/any-in-go-with-stack-8       490.8n ±  9%   492.8n ±  8%        ~ (p=0.796 n=10)
geomean                          206.3n         182.9n        -11.35%
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
❯  go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3202051               382.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3301683               371.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3186028               364.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3061030               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3203704               378.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3281462               372.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3252879               371.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3246148               373.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    804132              1404 ns/op             133 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1686093               758.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3075596               430.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2101650               543.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    845822              1424 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1531311               736.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 2618665               464.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2130280               536.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         818583              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1533379               739.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2507131               399.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2348804               453.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         807199              1526 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1590476               783.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3026263               383.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2615467               493.8 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.077s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.77 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.75 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.76 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.69 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              49795383                24.33 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            48821454                24.31 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            49452686                24.79 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            49359926                24.26 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1808188               700.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1894179               643.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1858263               649.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1879894               645.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1817276               663.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1906438               637.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1837354               641.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1909658               648.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    468484              2463 ns/op              96 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  726475              1465 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1285284               958.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1746547               573.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    426568              2715 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  611106              1703 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1000000              1017 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2220459               625.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         429721              2673 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       637306              1593 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1301713               902.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2012583               651.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         391810              2833 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       675589              1639 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1219318               970.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1825632               574.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.294s
```
prashantv pushed a commit to uber-go/zap that referenced this issue Jul 31, 2023
This is an alternative to:
- #1301 and #1302 and #1304 - a series of PRs that are faster than this
  one. However, they rely on unsafe.
- #1303 - my own PR that uses closures, to reduce the stack size by 60%.

This PR reduces the stack size from:
```
 field.go:420          0xd16c3                 4881ecf8120000          SUBQ $0x12f8, SP   // 4856
```
to
```
  field.go:420          0xcb603                 4881ecb8000000          SUBQ $0xb8, SP // 184
```
so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed
methods, like `zap.String`, etc.

The downside is the (slight) incrase in the code maitenance - we unroll
as much as we can and rely on the compiler correctly re-using small
variable sizes. While this is not pretty, it feels safe - the changes
were purely mechanical. Future changes and extensions should be easy to
review.

Additionally, the new code is (slightly) faster in all cases since we
remove 1-2 function calls from all paths. The "in new goroutine" path is
most affected, as shown in benchmarks below.

This was largely inspired by conversations with @cdvr1993. We started
looking at this in parallel, but I would have given up if it wasn't for
our conversations.
This particular version was inspired by an earlier version of #1304 -
where I realized that @cdvr1993 is doing a similar dispatching mechanism
that zap is already doing via `zapcore` - a possible optimization.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

*Most crucially, `zap.Any` is now as fast as a direct dispatch like
`zap.String`.*

10 runs.
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                           │ before-final.txt │           after-final.txt            │
                           │      sec/op      │    sec/op     vs base                │
Any/str-no-logger                3.106n ±  2%   3.160n ±  1%   +1.75% (p=0.025 n=10)
Any/str-no-logger-2              3.171n ±  4%   3.142n ±  1%        ~ (p=0.593 n=10)
Any/str-no-logger-4              3.108n ±  3%   3.139n ±  2%   +0.97% (p=0.004 n=10)
Any/str-no-logger-8              3.099n ±  2%   3.143n ±  2%        ~ (p=0.086 n=10)
Any/any-no-logger                13.89n ±  2%   12.98n ±  2%   -6.59% (p=0.000 n=10)
Any/any-no-logger-2              13.97n ±  2%   12.96n ±  2%   -7.27% (p=0.000 n=10)
Any/any-no-logger-4              13.83n ±  2%   12.89n ±  2%   -6.83% (p=0.000 n=10)
Any/any-no-logger-8              13.77n ±  2%   12.88n ±  2%   -6.43% (p=0.000 n=10)
Any/str-with-logger              384.1n ±  2%   383.9n ±  6%        ~ (p=0.810 n=10)
Any/str-with-logger-2            367.8n ±  2%   368.5n ±  3%        ~ (p=0.971 n=10)
Any/str-with-logger-4            372.4n ±  2%   368.6n ±  4%        ~ (p=0.912 n=10)
Any/str-with-logger-8            369.8n ±  3%   368.3n ±  3%        ~ (p=0.698 n=10)
Any/any-with-logger              383.8n ±  3%   383.3n ±  6%        ~ (p=0.838 n=10)
Any/any-with-logger-2            370.0n ±  3%   367.6n ±  1%        ~ (p=0.239 n=10)
Any/any-with-logger-4            370.0n ±  3%   368.2n ±  4%        ~ (p=0.631 n=10)
Any/any-with-logger-8            367.6n ±  2%   369.7n ±  3%        ~ (p=0.756 n=10)
Any/str-in-go                    1.334µ ±  3%   1.347µ ±  3%        ~ (p=0.271 n=10)
Any/str-in-go-2                  754.5n ±  3%   744.8n ±  5%        ~ (p=0.481 n=10)
Any/str-in-go-4                  420.2n ± 11%   367.7n ± 31%        ~ (p=0.086 n=10)
Any/str-in-go-8                  557.6n ±  4%   547.1n ± 12%        ~ (p=0.579 n=10)
Any/any-in-go                    2.562µ ±  4%   1.447µ ±  3%  -43.53% (p=0.000 n=10)
Any/any-in-go-2                 1361.0n ±  4%   761.4n ±  7%  -44.06% (p=0.000 n=10)
Any/any-in-go-4                  732.1n ±  9%   397.1n ± 11%  -45.76% (p=0.000 n=10)
Any/any-in-go-8                  541.3n ± 13%   564.6n ±  5%   +4.30% (p=0.041 n=10)
Any/str-in-go-with-stack         1.420µ ±  1%   1.428µ ±  3%        ~ (p=0.670 n=10)
Any/str-in-go-with-stack-2       749.5n ±  4%   771.8n ±  4%        ~ (p=0.123 n=10)
Any/str-in-go-with-stack-4       433.2n ± 15%   400.7n ± 14%        ~ (p=0.393 n=10)
Any/str-in-go-with-stack-8       494.0n ±  7%   490.1n ± 10%        ~ (p=0.853 n=10)
Any/any-in-go-with-stack         2.586µ ±  3%   1.471µ ±  4%  -43.14% (p=0.000 n=10)
Any/any-in-go-with-stack-2      1343.0n ±  3%   773.7n ±  4%  -42.39% (p=0.000 n=10)
Any/any-in-go-with-stack-4       697.7n ±  8%   403.4n ±  9%  -42.17% (p=0.000 n=10)
Any/any-in-go-with-stack-8       490.8n ±  9%   492.8n ±  8%        ~ (p=0.796 n=10)
geomean                          206.3n         182.9n        -11.35%
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
❯  go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3202051               382.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3301683               371.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3186028               364.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3061030               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3203704               378.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3281462               372.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3252879               371.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3246148               373.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    804132              1404 ns/op             133 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1686093               758.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3075596               430.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2101650               543.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    845822              1424 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1531311               736.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 2618665               464.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2130280               536.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         818583              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1533379               739.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2507131               399.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2348804               453.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         807199              1526 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1590476               783.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3026263               383.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2615467               493.8 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.077s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.77 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.75 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.76 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.69 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              49795383                24.33 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            48821454                24.31 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            49452686                24.79 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            49359926                24.26 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1808188               700.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1894179               643.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1858263               649.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1879894               645.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1817276               663.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1906438               637.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1837354               641.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1909658               648.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    468484              2463 ns/op              96 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  726475              1465 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1285284               958.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1746547               573.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    426568              2715 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  611106              1703 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1000000              1017 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2220459               625.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         429721              2673 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       637306              1593 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1301713               902.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2012583               651.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         391810              2833 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       675589              1639 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1219318               970.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1825632               574.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.294s
```
rabbbit added a commit to uber-go/zap that referenced this issue Aug 1, 2023
This is an alternative to #1301 and #1302. It's not as fast as these two
options, but it still gives us half the stack reduction without the
`unsafe` usage.

Interestingly it seems that on both arm64 and amd64 the new code, with
the closure, is faster than the plain old switch.
We do see a ~5-10ns delay on `Any` creation if it's used without
`logger`, but that's minimal and not realistic.

Bunch of credit for this goes to @cdvr1993, we started independently,
I was about to give up but the conversations pushed me forward. In the
end he ended up going into a more advanced land where I dare not to enter.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

This is an alternative to #1301, @CDVR and me were talking about this,
and he inspired this idea with the closure.

By using a function and a closure we're able to reduce the size and
remove the degradation.
At least on my laptop, this change result in a new performance gain,
as all benchmarks show reduced time.

10 runs.
```
❯ benchstat ~/before2.txt ~/after2.txt
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt        │
                            │          sec/op          │    sec/op     vs base                │
Any/str-no-logger-12                      3.344n ±  1%   3.029n ±  1%   -9.40% (p=0.000 n=10)
Any/any-no-logger-12                      13.80n ±  4%   18.67n ±  1%  +35.29% (p=0.000 n=10)
Any/str-with-logger-12                    372.4n ±  3%   363.6n ±  1%   -2.35% (p=0.001 n=10)
Any/any-with-logger-12                    369.2n ±  1%   363.6n ±  1%   -1.52% (p=0.002 n=10)
Any/str-in-go-12                          587.2n ±  2%   587.0n ±  1%        ~ (p=0.617 n=10)
Any/any-in-go-12                          666.5n ±  3%   567.6n ±  1%  -14.85% (p=0.000 n=10)
Any/str-in-go-with-stack-12               448.6n ± 18%   403.4n ± 13%        ~ (p=0.280 n=10)
Any/any-in-go-with-stack-12               564.9n ±  7%   443.2n ±  4%  -21.55% (p=0.000 n=10)
geomean                                   167.8n         160.7n         -4.23%

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │           B/op           │    B/op     vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    64.00 ± 0%     64.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               88.00 ± 0%     88.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                              ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                            │ /Users/pawel/before2.txt │       /Users/pawel/after2.txt       │
                            │        allocs/op         │ allocs/op   vs base                 │
Any/str-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-no-logger-12                      0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-with-logger-12                    1.000 ± 0%     1.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-12                          2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/str-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
Any/any-in-go-with-stack-12               2.000 ± 0%     2.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3191725               382.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3159882               367.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           2998960               373.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3264657               361.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3168627               386.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3169394               364.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3271981               368.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3293463               362.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    793905              1388 ns/op             143 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1724048               748.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 2536380               444.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2177941               586.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    890155              1237 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1836302               719.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 3671503               322.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2257405               540.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         811408              1457 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1384990               729.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      3228151               381.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2678596               450.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         821092              1386 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1747638               662.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3747934               341.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2678191               463.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 53.238s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.64 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.65 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              27779637                44.20 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            27881986                42.96 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            27587953                43.39 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            26861058                43.43 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1749990               690.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1807341               660.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1821039               654.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1865083               650.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1677643               741.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1905400               689.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1843364               646.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1899883               645.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    453326              2479 ns/op              92 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  724555              1580 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1358790               953.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1805985               585.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    466447              2395 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  874053              1487 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1457768               834.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1795317               632.5 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         407620              2749 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       725614              1597 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1303908               863.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1957864               609.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         497640              2401 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       648355              1549 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1486416               869.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2116040               568.8 ns/op            88 B/op          2 allocs/op
PASS
```
rabbbit added a commit to uber-go/zap that referenced this issue Aug 1, 2023
This is an alternative to:
- #1301 and #1302 and #1304 - a series of PRs that are faster than this
  one. However, they rely on unsafe.
- #1303 - my own PR that uses closures, to reduce the stack size by 60%.

This PR reduces the stack size from:
```
 field.go:420          0xd16c3                 4881ecf8120000          SUBQ $0x12f8, SP   // 4856
```
to
```
  field.go:420          0xcb603                 4881ecb8000000          SUBQ $0xb8, SP // 184
```
so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed
methods, like `zap.String`, etc.

The downside is the (slight) incrase in the code maitenance - we unroll
as much as we can and rely on the compiler correctly re-using small
variable sizes. While this is not pretty, it feels safe - the changes
were purely mechanical. Future changes and extensions should be easy to
review.

Additionally, the new code is (slightly) faster in all cases since we
remove 1-2 function calls from all paths. The "in new goroutine" path is
most affected, as shown in benchmarks below.

This was largely inspired by conversations with @cdvr1993. We started
looking at this in parallel, but I would have given up if it wasn't for
our conversations.
This particular version was inspired by an earlier version of #1304 -
where I realized that @cdvr1993 is doing a similar dispatching mechanism
that zap is already doing via `zapcore` - a possible optimization.

Longer version:

We have identified an issue where zap.Any can cause performance
degradation due to stack size.

This is apparently cased by the compiler assigning 4.8kb (a zap.Field
per arm of the switch statement) for zap.Any on stack. This can result
in an unnecessary runtime.newstack+runtime.copystack.
A github issue against Go language is pending.

This can be particularly bad if `zap.Any` was to be used in a new
goroutine, since the default goroutine sizes can be as low as 2kb (it can
vary depending on average stack size - see golang/go#18138).

*Most crucially, `zap.Any` is now as fast as a direct dispatch like
`zap.String`.*

10 runs.
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
                           │ before-final.txt │           after-final.txt            │
                           │      sec/op      │    sec/op     vs base                │
Any/str-no-logger                3.106n ±  2%   3.160n ±  1%   +1.75% (p=0.025 n=10)
Any/str-no-logger-2              3.171n ±  4%   3.142n ±  1%        ~ (p=0.593 n=10)
Any/str-no-logger-4              3.108n ±  3%   3.139n ±  2%   +0.97% (p=0.004 n=10)
Any/str-no-logger-8              3.099n ±  2%   3.143n ±  2%        ~ (p=0.086 n=10)
Any/any-no-logger                13.89n ±  2%   12.98n ±  2%   -6.59% (p=0.000 n=10)
Any/any-no-logger-2              13.97n ±  2%   12.96n ±  2%   -7.27% (p=0.000 n=10)
Any/any-no-logger-4              13.83n ±  2%   12.89n ±  2%   -6.83% (p=0.000 n=10)
Any/any-no-logger-8              13.77n ±  2%   12.88n ±  2%   -6.43% (p=0.000 n=10)
Any/str-with-logger              384.1n ±  2%   383.9n ±  6%        ~ (p=0.810 n=10)
Any/str-with-logger-2            367.8n ±  2%   368.5n ±  3%        ~ (p=0.971 n=10)
Any/str-with-logger-4            372.4n ±  2%   368.6n ±  4%        ~ (p=0.912 n=10)
Any/str-with-logger-8            369.8n ±  3%   368.3n ±  3%        ~ (p=0.698 n=10)
Any/any-with-logger              383.8n ±  3%   383.3n ±  6%        ~ (p=0.838 n=10)
Any/any-with-logger-2            370.0n ±  3%   367.6n ±  1%        ~ (p=0.239 n=10)
Any/any-with-logger-4            370.0n ±  3%   368.2n ±  4%        ~ (p=0.631 n=10)
Any/any-with-logger-8            367.6n ±  2%   369.7n ±  3%        ~ (p=0.756 n=10)
Any/str-in-go                    1.334µ ±  3%   1.347µ ±  3%        ~ (p=0.271 n=10)
Any/str-in-go-2                  754.5n ±  3%   744.8n ±  5%        ~ (p=0.481 n=10)
Any/str-in-go-4                  420.2n ± 11%   367.7n ± 31%        ~ (p=0.086 n=10)
Any/str-in-go-8                  557.6n ±  4%   547.1n ± 12%        ~ (p=0.579 n=10)
Any/any-in-go                    2.562µ ±  4%   1.447µ ±  3%  -43.53% (p=0.000 n=10)
Any/any-in-go-2                 1361.0n ±  4%   761.4n ±  7%  -44.06% (p=0.000 n=10)
Any/any-in-go-4                  732.1n ±  9%   397.1n ± 11%  -45.76% (p=0.000 n=10)
Any/any-in-go-8                  541.3n ± 13%   564.6n ±  5%   +4.30% (p=0.041 n=10)
Any/str-in-go-with-stack         1.420µ ±  1%   1.428µ ±  3%        ~ (p=0.670 n=10)
Any/str-in-go-with-stack-2       749.5n ±  4%   771.8n ±  4%        ~ (p=0.123 n=10)
Any/str-in-go-with-stack-4       433.2n ± 15%   400.7n ± 14%        ~ (p=0.393 n=10)
Any/str-in-go-with-stack-8       494.0n ±  7%   490.1n ± 10%        ~ (p=0.853 n=10)
Any/any-in-go-with-stack         2.586µ ±  3%   1.471µ ±  4%  -43.14% (p=0.000 n=10)
Any/any-in-go-with-stack-2      1343.0n ±  3%   773.7n ±  4%  -42.39% (p=0.000 n=10)
Any/any-in-go-with-stack-4       697.7n ±  8%   403.4n ±  9%  -42.17% (p=0.000 n=10)
Any/any-in-go-with-stack-8       490.8n ±  9%   492.8n ±  8%        ~ (p=0.796 n=10)
geomean                          206.3n         182.9n        -11.35%
```

On absolute terms:

Before, on arm64:
```
goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3154850               387.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3239221               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3273285               363.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3251991               372.4 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             2944020               401.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           2984863               368.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3265248               363.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3301592               365.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    764239              1423 ns/op             140 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1510189               753.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3013986               369.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2128927               540.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    464083              2551 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  818104              1347 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1587925               698.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2452558               466.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         767626              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1534382               771.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2384058               433.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      3146942               450.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         434194              2524 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       851312              1304 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1570944               710.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2546115               604.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.238s
```

After:
```
❯  go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8

goos: darwin
goarch: arm64
pkg: go.uber.org/zap
BenchmarkAny/str-with-logger             3202051               382.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           3301683               371.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           3186028               364.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           3061030               371.0 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             3203704               378.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           3281462               372.8 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           3252879               371.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           3246148               373.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    804132              1404 ns/op             133 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                 1686093               758.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 3075596               430.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 2101650               543.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    845822              1424 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                 1531311               736.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 2618665               464.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2130280               536.2 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         818583              1440 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2      1533379               739.4 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      2507131               399.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2348804               453.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         807199              1526 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2      1590476               783.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      3026263               383.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      2615467               493.8 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.077s
```

And amd64, before:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.58 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.52 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.56 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.50 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              39399811                30.35 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            39448304                30.63 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            39647024                30.32 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            39479619                30.46 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1798702               669.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1862551               647.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1848636               642.2 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1878465               656.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1776140               684.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1868102               668.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1869589               639.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1782540               648.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    458112              2594 ns/op              91 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  820398              1344 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1392148               969.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1790403               644.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    220327              4897 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  494391              2701 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                  823672              1399 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 1591206               746.8 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         384094              2820 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       809073              1530 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1464598               933.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      1943251               578.0 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         233019              4967 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       356689              2848 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4       791342              1385 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1662126               746.0 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 51.671s
```

After:
```
 % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8
goos: linux
goarch: amd64
pkg: go.uber.org/zap
cpu: AMD EPYC 7B13
BenchmarkAny/str-no-logger              100000000               11.77 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-2            100000000               11.75 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-4            100000000               11.76 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-no-logger-8            100000000               11.69 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger              49795383                24.33 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-2            48821454                24.31 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-4            49452686                24.79 ns/op            0 B/op          0 allocs/op
BenchmarkAny/any-no-logger-8            49359926                24.26 ns/op            0 B/op          0 allocs/op
BenchmarkAny/str-with-logger             1808188               700.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-2           1894179               643.9 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-4           1858263               649.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-with-logger-8           1879894               645.1 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger             1817276               663.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-2           1906438               637.7 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-4           1837354               641.5 ns/op            64 B/op          1 allocs/op
BenchmarkAny/any-with-logger-8           1909658               648.3 ns/op            64 B/op          1 allocs/op
BenchmarkAny/str-in-go                    468484              2463 ns/op              96 B/op          2 allocs/op
BenchmarkAny/str-in-go-2                  726475              1465 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-4                 1285284               958.9 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-8                 1746547               573.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go                    426568              2715 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-2                  611106              1703 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-4                 1000000              1017 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-8                 2220459               625.7 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack         429721              2673 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-2       637306              1593 ns/op              88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-4      1301713               902.1 ns/op            88 B/op          2 allocs/op
BenchmarkAny/str-in-go-with-stack-8      2012583               651.6 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack         391810              2833 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-2       675589              1639 ns/op              88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-4      1219318               970.3 ns/op            88 B/op          2 allocs/op
BenchmarkAny/any-in-go-with-stack-8      1825632               574.6 ns/op            88 B/op          2 allocs/op
PASS
ok      go.uber.org/zap 50.294s
```
@aclements
Copy link
Member

It would be useful if someone still having problems with the stack size estimator could post a benchmark that demonstrates the problem. (Not that we don't believe you, it's just hard to iterate on solutions without something to measure!)

Stack sizing is definitely something we've considered for PGO, but also it seems like something where, anything we could do with PGO, we could probably do on the fly. This would have a slower start, but could also be more adaptive.

@marcogrecopriolo
Copy link

@aclements sorry - I don't have an issue with the stack size estimator anymore, but that's just because I moved from Couchbase to Ditto, and I am now battling with Rust instead ;-)

@prattmic
Copy link
Member

prattmic commented Nov 9, 2023

A commonly proposed refinement to the existing average stack size adjustment is to track average stack size based on goroutine start PCs or perhaps one or more frames at the point the goroutine is created.

One place where I worry this won't work well are things like gRPC servers. This servers launch one goroutine per request, and some of these requests may require lots of stack space (FooServer.Complicated()) while others are cheap (FooServer.Cheap()). Ideally, we have a large starting stack for the former, and small for the latter.

As far as I can tell [1], gRPC starts this per-request goroutine before determining which request it is handling. This means that no matter the amount of context the runtime uses prior to starting the goroutine, it won't be able to deterministically differentiate between the different types of requests.

This is rather unfortunate, though perhaps if we did this projects like gRPC would be motivated to refactor and start unique goroutine start PCs for each request type.

[1] I've love to be wrong here, but I believe https://github.com/grpc/grpc-go/blob/f1a1fcd042af436be0eb3ac08ed81e2e29fcde4f/server.go#L1024 is the point the goroutine is started.

@petermattis
Copy link
Author

One place where I worry this won't work well are things like gRPC servers. This servers launch one goroutine per request, and some of these requests may require lots of stack space (FooServer.Complicated()) while others are cheap (FooServer.Cheap()). Ideally, we have a large starting stack for the former, and small for the latter.

The first time a stack is resized the runtime would likely have sufficient stack context to know what to resize it to. When I posted this issue originally what I noticed is that it was the growth from 2KB -> 4KB -> 8KB -> 16 KB -> 32KB that was slow. The growStack hack to force the stack size to grow directly from 2KB to 32KB completely eliminated the overhead (i.e. I couldn't measure a difference between that approach and starting the stack size at 32KB).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Projects
None yet
Development

No branches or pull requests