Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/crypto/cryptobyte: non allocating AddUint*LengthPrefixed and AddASN1 API #54059

Open
mateusz834 opened this issue Jul 26, 2022 · 28 comments · May be fixed by golang/crypto#228 or golang/crypto#232
Open
Labels
Proposal Proposal-Accepted Proposal-Crypto Proposal related to crypto packages or other security issues
Milestone

Comments

@mateusz834
Copy link
Member

mateusz834 commented Jul 26, 2022

The (*Builder).AddUint*LengthPrefixed allocates on every call a new *Builder.
This allocation cause ~17% of all heap allocations of a crypto/tls tls handshake.

Proposal:

Add new non allocating API Uint*LengthPrefixed:

p := NewBuilder(make([]byte,0,512))
p.Uint8LengthPrefixed(func () {
	p.Uint8LengthPrefixed(func () {
		p.AddUint8(42)
	})
	p.AddUint8(11)
})

The new methods will not return a *Builder in the func(), as the original API does.
Instedad it will require the usage of the original *Builder.
All writes to the original *Builder, inside Uint*LengthPrexied methods will
write them accordingly to the LengthPrefixed part in which they were executed.

Implementation example: commit

Benchmarks:

BenchmarkLengthPrefixed-4            15930202                70.18 ns/op            0 B/op          0 allocs/op
BenchmarkAddLengthPrefixedOld-4          2160042               762.2 ns/op           292 B/op          7 allocs/op
b := NewBuilder(buf)
b.Uint8LengthPrefixed(func() {
	b.AddBytes([]byte("123456"))
})
b.Uint8LengthPrefixed(func() {
	b.AddBytes([]byte("abcdef"))
})
b.Uint8LengthPrefixed(func() {
        b.AddBytes([]byte("qwerty"))
})
@gopherbot gopherbot added this to the Proposal milestone Jul 26, 2022
@ianlancetaylor ianlancetaylor added the Proposal-Crypto Proposal related to crypto packages or other security issues label Jul 26, 2022
@ianlancetaylor
Copy link
Contributor

CC @golang/security

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Jul 26, 2022
@FiloSottile
Copy link
Contributor

This seems like a good use case for a sync.Pool, to solve the performance issue without duplicating APIs. Could you try benchmarking that?

@mateusz834
Copy link
Member Author

mateusz834 commented Jul 27, 2022

I am benchmarking now more AddUintLengthPrefixeds in single iteration.

a := NewBuilder(buf)
a.AddUint8LengthPrefixed(func(b *Builder) {
	b.AddBytes([]byte("123456"))
	b.AddUint8LengthPrefixed(func(b *Builder) {
		b.AddBytes([]byte("123456"))
	})
	b.AddUint8LengthPrefixed(func(b *Builder) {
		b.AddBytes([]byte("123456"))
	})
	b.AddUint8LengthPrefixed(func(b *Builder) {
		b.AddBytes([]byte("123456"))
	})
})
a.AddUint8LengthPrefixed(func(b *Builder) {
	b.AddBytes([]byte("123456"))
	b.AddUint8LengthPrefixed(func(b *Builder) {
		b.AddBytes([]byte("123456"))
	})
})
a.AddUint8LengthPrefixed(func(b *Builder) {
	b.AddBytes([]byte("123456"))
	b.AddUint8LengthPrefixed(func(b *Builder) {
		b.AddBytes([]byte("123456"))
	})
})

BenchmarkAddLengthPrefixedOld (now uses sync.Pool)

BenchmarkLengthPrefixed-4       7148973             172.2 ns/op             0 B/op          0 allocs/op
BenchmarkAddLengthPrefixedOld-4           643360              2352 ns/op             958 B/op          6 allocs/op

(in this BenchmarkAddLengthPrefixedOld I also removed 2 additional allocations per AddUintLengthPrefixed)
The performance is even worse now.

@mateusz834
Copy link
Member Author

mateusz834 commented Jul 27, 2022

But we can add a new method called (maybe?) NoParentUsageCheck that will cause the AddUint*LengthPrefixed methods to return the original Builder, but then we would not get the parent builder usage panic (because we are reusing the original *Builder).

BenchmarkLengthPrefixed-4             6922315               169.9 ns/op             0 B/op       0 allocs/op
BenchmarkAddLengthPrefixedOld-4          4758037               313.0 ns/op            80 B/op       1 allocs/op

Edit: we might also just past nil as the *Builder in LengthPrefixed arg (after calling EmptyBuilder), this way we can remove this last 80B allocation. It looks like the heap analysis is smart, and it might remove the alloc.

BenchmarkLengthPrefixed-4            69277897               174.8 ns/op             0 B/op            0 allocs/op
BenchmarkAddLengthPrefixedOld-4         58995422               192.4 ns/op             0 B/op            0 allocs/op

This is how it could be used then:

b := NewBuilder(buf)
b.EmptyBuilder()
b.AddUint8LengthPrefixed(func(_ *Builder) {
	b.AddUint8LengthPrefixed(func(_ *Builder) {
		b.AddBytes([]byte("123456"))
	})
})

@ianlancetaylor ianlancetaylor changed the title proposal: x/crypto/cryptobyte: non allocating AddUint*LenghtPrefixed API proposal: x/crypto/cryptobyte: non allocating AddUint*LengthPrefixed API Jul 27, 2022
@rsc
Copy link
Contributor

rsc commented Jul 27, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) Jul 27, 2022
@rsc
Copy link
Contributor

rsc commented Aug 3, 2022

It would be good to find some way to do this optimization without doubling the API here.

@mateusz834
Copy link
Member Author

mateusz834 commented Aug 3, 2022

@rsc #54059 (comment) it will only introduce new method NoParentUsageCheck or EmptyBuilder.

@mateusz834
Copy link
Member Author

mateusz834 commented Aug 3, 2022

So to summary that, we have 2 possibilities:

  1. NoParentUsageCheck - child is equal to b, the *Builder inside AddUint*LengthPrefixed is just the original Builder.
b := NewBuilder(buf)
b.NoParentUsageCheck()
b.AddUint8LengthPrefixed(func(child *Builder) { {
	child.AddBytes([]byte("123456"))
})
  1. EmptyBuilder - *Builder inside AddUint*LengthPrefixed is nil.
b := NewBuilder(buf)
b.EmptyBuilder()
b.AddUint8LengthPrefixed(func(_ *Builder) { {
	b.AddBytes([]byte("123456"))
})

NoParentUsageCheck 313.0 ns/op 80 B/op 1 allocs/op
EmptyBuilder 192.4 ns/op 0 B/op 0 allocs/op
(it benchmarks code as in: #54059 (comment))

I feel like the EmptyBuilder way is more explicit what we are doing. NoParentUsageCheck might cause mistakes, which would be really confusing. (it will work, we cannot panic on parent usage, we are reusing *Builder):

b := NewBuilder(buf)
b.NoParentUsageCheck()
b.AddUint8LengthPrefixed(func(c *Builder) { {
	c.AddBytes([]byte("123456"))
	b.AddBytes([]byte("123456"))
        c.AddUint8LengthPrefixed(func(cc *Builder) { {
	        cc.AddBytes([]byte("123456"))
	        c.AddBytes([]byte("123456"))
        })
})

It will still work as expected, AddBytes() will be written accordingly to the LengthPrefixed part in which in was executed.

@rsc
Copy link
Contributor

rsc commented Aug 10, 2022

/cc @golang/security

@rsc
Copy link
Contributor

rsc commented Aug 17, 2022

The b.NoParentUsageCheck() seems nice, but at that point why not make that the default, so that existing code all runs faster? I can't think of any case where it's valid to use c after the callback returns, and we know the callback never uses b (or cryptobyte would panic). So code should just run faster, with no API additions at all. Thoughts?

@mateusz834
Copy link
Member Author

mateusz834 commented Aug 17, 2022

I thought that we shouldn't get rid of this panic, that was indeed the reason why i proposed the new api. but as I think about it again it should be fine to do so. Overall I am fine with your idea, it will also simplify the implementation.

@rsc
Copy link
Contributor

rsc commented Aug 31, 2022

@golang/security any thoughts about #54059 (comment)? That is, just start passing the parent Builder as the child Builder and let all code get faster.

@FiloSottile
Copy link
Contributor

I am trying to figure out a Chesterton's Fence explanation for why the callbacks take a Builder in the first place, and why the panic is implemented and tested. If it's just because it's more explicit than requiring closures and to allow passing a func(*Builder), then it feels reasonable to sacrifice the panic to reduce allocations.

Trying to use the parent Builder in the callback is not something developers might do with an expectation that it would do something else than write to the length-prefixed span, and Builders are not to be used concurrently.

👍 from me.

@rolandshoemaker
Copy link
Member

Seems reasonable. This will require a not-insignificant change to the length prefixed logic to use a single builder rather than nested builders, but I think it makes for a slightly less confusing (and faster) API overall.

@mateusz834
Copy link
Member Author

@rsc does it have to be a proposal then? It will not introduce any new API.

@gopherbot
Copy link

Change https://go.dev/cl/428475 mentions this issue: cryptobyte: AddUint*LengthPrefixed API perfomance optimization

@mateusz834
Copy link
Member Author

mateusz834 commented Sep 5, 2022

I prototyped an optimization in the CL (above).

@rsc
Copy link
Contributor

rsc commented Sep 7, 2022

Re Chesterton, I think once you have child Builders vs parent Builders you have to have the panic to catch misuse due to confusion. But now what was formerly confused misuse would become correct code.

Even though there are no new API functions, it's a significant enough semantic change to be worth continuing the proposal process to a resolution.

@mateusz834
Copy link
Member Author

mateusz834 commented Sep 7, 2022

Also I think that we should also add some strict rules to the docs of BuilderContinuation. Like: "inside func(child *Builder) you can only use the Builder supplied from the argument, there are no guarantees what would happen while using the parent builders".

@mateusz834
Copy link
Member Author

Also it is worth noting that this will also affect AddASN1 method (not only the LengthPrefixed methods)

@rsc rsc changed the title proposal: x/crypto/cryptobyte: non allocating AddUint*LengthPrefixed API proposal: x/crypto/cryptobyte: non allocating AddUint*LengthPrefixed and AddASN1 API Sep 7, 2022
@rsc
Copy link
Contributor

rsc commented Sep 7, 2022

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Sep 21, 2022

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: x/crypto/cryptobyte: non allocating AddUint*LengthPrefixed and AddASN1 API x/crypto/cryptobyte: non allocating AddUint*LengthPrefixed and AddASN1 API Sep 21, 2022
@rsc rsc modified the milestones: Proposal, Backlog Sep 21, 2022
@andig
Copy link
Contributor

andig commented Sep 21, 2022

I may not have followed the discussion closely enough, but why is the sync.Pool not an option (as log does)?

@mateusz834
Copy link
Member Author

@andig using sync.Pool the performance was even worse than the original implementation.

@andig
Copy link
Contributor

andig commented Sep 21, 2022

@mateusz834 please forgive me, responding as a learning exercise for me. Please don't consider me rude. I gave it a quick try as I happened to look into log's usage of sync.Pool time ago. Here's what I did:

var pool = sync.Pool{
	New: func() interface{} {
		return new(Builder)
	},
}

In addLengthPrefixed:

// b.child = &Builder{
// 	result:         b.result,
// 	fixedSize:      b.fixedSize,
// 	offset:         offset,
// 	pendingLenLen:  lenLen,
// 	pendingIsASN1:  isASN1,
// 	inContinuation: b.inContinuation,
// }

bb := pool.Get().(*Builder)

// don't need to init child; it must be nil to be put into the pool
bb.err = nil
bb.result = b.result
bb.fixedSize = b.fixedSize
bb.offset = offset
bb.pendingLenLen = lenLen
bb.pendingIsASN1 = isASN1
bb.inContinuation = b.inContinuation

b.child = bb

plus

pool.Put(bb)

in the end. This is the benchmark difference:

func BenchmarkLengthPrefixed(tb *testing.B) {
	b := NewBuilder(make([]byte, 0, 512))

	for i := 0; i < tb.N; i++ {
		b.AddUint8LengthPrefixed(func(b *Builder) {
			b.AddBytes([]byte("123456"))
		})
		b.AddUint8LengthPrefixed(func(b *Builder) {
			b.AddBytes([]byte("abcdef"))
		})
		b.AddUint8LengthPrefixed(func(b *Builder) {
			b.AddBytes([]byte("qwerty"))
		})
	}
}

Goes from

BenchmarkLengthPrefixed-8   	 4758802	       224.2 ns/op	     420 B/op	       6 allocs/op

to

BenchmarkLengthPrefixed-8   	11953297	        94.57 ns/op	     128 B/op	       3 allocs/op

All tests pass except TestASN1(U)Int64 which I couldn't get to compile for some weird reason.

Maybe it's of any use...

@mateusz834
Copy link
Member Author

@andig
Oh, it seems that I've made a mistake in benchmark with sync.Pool. I improved your code and I was able to get to 0 allocs/op.

benchmark                     old ns/op     new ns/op     delta
BenchmarkLengthPrefixed-4     1660          371           -77.63%

benchmark                     old allocs     new allocs     delta
BenchmarkLengthPrefixed-4     17             0              -100.00%

benchmark                     old bytes     new bytes     delta
BenchmarkLengthPrefixed-4     777           0             -100.00%

@andig Thanks for pointing this out.

Not sure which method we should choose now.
The proposed approach (reusing *Builder) is a bit faster, but not significantly. For BenchmarkLengthPrefixed it is about ~100ns/op.

Apologies for the mistake.

@gopherbot
Copy link

Change https://go.dev/cl/433503 mentions this issue: cryptobyte: AddUint*LengthPrefixed API perfomance optimization with sync.Pool

@andig
Copy link
Contributor

andig commented Sep 26, 2022

Not sure which method we should choose now.

Imho one potential downside of the pool approach is that the Builder.result will remain in the pool until the Builder is retrieved. Not sure if that could be a potential security issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposal Proposal-Accepted Proposal-Crypto Proposal related to crypto packages or other security issues
Projects
Status: Accepted
7 participants