Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/gob: optimize decoder for single object streams #46089

Closed
tmm1 opened this issue May 10, 2021 · 6 comments
Closed

encoding/gob: optimize decoder for single object streams #46089

tmm1 opened this issue May 10, 2021 · 6 comments
Labels
FeatureRequest FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance

Comments

@tmm1
Copy link
Contributor

tmm1 commented May 10, 2021

What version of Go are you using (go version)?

$ go version
go version go1.16.3 darwin/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="auto"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/tmm1/Library/Caches/go-build"
GOENV="/Users/tmm1/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/tmm1/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/tmm1/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.16.3/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.16.3/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.16.3"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/_2/hljyy_zj3912lv9qqpy70t5w0000gn/T/go-build1674747577=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I'm using encoding/gob to encode and decode single objects and store them in a persistent object database.

I realize that encoding/gob is optimized for streams of objects. The documentation recommends using a single Encoder and Decoder to amortize the cost of compilation.

That said, there are several advantages to encoding/gob which make it attractive to use in non-stream use-cases, such as:

  • space-optimized minimal-length encoding of values
  • zero value elements are automatically ignored
  • automatic handling of indirection and dereferencing
  • support for golang interfaces

I am looking for ways to optimize the gob.Decoder implementation to improve performance to make it suitable for usage in non-streaming contexts.


I created a benchmark to demonstrate the order of magnitude difference in performance when using a single Decoder vs creating/compiling a new Decoder for every instance:

package main

import (
	"bytes"
	"encoding/gob"
	"testing"
)

type obj struct {
	Str   string
	KV    map[string]interface{}
	List  []string
	FList []float64
}

var o = obj{
	Str: "baz",
	KV: map[string]interface{}{
		"foo": "bar",
		"num": 123,
		"pct": 98.7623,
	},
	List:  []string{"a", "bb", "ccc"},
	FList: []float64{1.1, 2.2, 3.3, 4.4, 5.5},
}

func BenchmarkEncode(b *testing.B) {
	b.Run("WithTypes", func(b *testing.B) {
		buf := new(bytes.Buffer)

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			enc := gob.NewEncoder(buf)
			err := enc.Encode(o)
			if err != nil {
				panic(err)
			}
		}
	})
	b.Run("WithoutTypes", func(b *testing.B) {
		buf := new(bytes.Buffer)
		enc := gob.NewEncoder(buf)
		err := enc.Encode(o)
		if err != nil {
			panic(err)
		}

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			err := enc.Encode(o)
			if err != nil {
				panic(err)
			}
		}
	})
}

func BenchmarkDecode(b *testing.B) {
	buf := new(bytes.Buffer)
	enc := gob.NewEncoder(buf)
	err := enc.Encode(o)
	if err != nil {
		panic(err)
	}
	tmp := buf.Bytes()
	gobTyped := make([]byte, len(tmp))
	copy(gobTyped, tmp)

	buf.Reset()
	err = enc.Encode(o)
	if err != nil {
		panic(err)
	}
	gobUntyped := make([]byte, len(tmp))
	copy(gobUntyped, tmp)

	b.Run("WithTypes", func(b *testing.B) {
		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			var j obj
			buf := bytes.NewReader(gobTyped)
			enc := gob.NewDecoder(buf)
			err := enc.Decode(&j)
			if err != nil {
				panic(err)
			}
		}
	})
	b.Run("WithoutTypes", func(b *testing.B) {
		buf := bytes.NewReader(gobTyped)
		var j obj
		enc := gob.NewDecoder(buf)
		err := enc.Decode(&j)
		if err != nil {
			panic(err)
		}

		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			buf.Reset(gobUntyped)
			err := enc.Decode(&j)
			if err != nil {
				panic(err)
			}
		}
	})
}
goos: darwin
goarch: amd64
cpu: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
BenchmarkEncode/WithTypes-8         	  197048	      6146 ns/op
BenchmarkEncode/WithoutTypes-8      	  629750	      1752 ns/op
BenchmarkDecode/WithTypes-8         	   52821	     22553 ns/op
BenchmarkDecode/WithoutTypes-8      	  586774	      2024 ns/op
PASS

Profiling the decoder shows that, as expected, most of the time is spent compiling the decoder instance (encoding/gob.(*Decoder).compileDec).

I'm looking to understand what, if anything, could be done to optimize and cache the decoder to improve performance.

My first attempt was simply to re-use a single Decoder instance and feed in multiple streams. This doesn't work, and results in a "extra data in buffer" (or "duplicate type received" with b3235b7) error. The Decoder does not expect to receive type definitions multiple times.

Next, instead, I tried to add more caching. One hotspot in compileDec is due to reflection, and I was able to cache the result of the reflect call with a patch like so:

diff --git a/src/encoding/gob/decode.go b/src/encoding/gob/decode.go
index d2f6c749b1..6d3bc8991f 100644
--- a/src/encoding/gob/decode.go
+++ b/src/encoding/gob/decode.go
@@ -1115,7 +1115,7 @@ func (dec *Decoder) compileDec(remoteId typeId, ut *userTypeInfo) (engine *decEn
 		}
 		ovfl := overflow(wireField.Name)
 		// Find the field of the local type with the same name.
-		localField, present := srt.FieldByName(wireField.Name)
+		localField, present := ut.fieldByName(wireField.Name)
 		// TODO(r): anonymous names
 		if !present || !isExported(wireField.Name) {
 			op := dec.decIgnoreOpFor(wireField.Id, make(map[typeId]*decOp))
diff --git a/src/encoding/gob/type.go b/src/encoding/gob/type.go
index 31c0ef7af1..9a0f357fc5 100644
--- a/src/encoding/gob/type.go
+++ b/src/encoding/gob/type.go
@@ -27,6 +27,7 @@ type userTypeInfo struct {
 	externalDec int          // xGob, xBinary or xText
 	encIndir    int8         // number of indirections to reach the receiver type; may be negative
 	decIndir    int8         // number of indirections to reach the receiver type; may be negative
+	fieldCache  sync.Map     // map[string]reflect.Type
 }
 
 // externalEncoding bits
@@ -102,6 +103,18 @@ func validUserType(rt reflect.Type) (*userTypeInfo, error) {
 	return ui.(*userTypeInfo), nil
 }
 
+func (u *userTypeInfo) fieldByName(name string) (reflect.StructField, bool) {
+	if f, ok := u.fieldCache.Load(name); ok {
+		return f.(reflect.StructField), true
+	}
+
+	f, ok := u.base.FieldByName(name)
+	if ok {
+		u.fieldCache.Store(name, f)
+	}
+	return f, ok
+}
+
 var (
 	gobEncoderInterfaceType        = reflect.TypeOf((*GobEncoder)(nil)).Elem()
 	gobDecoderInterfaceType        = reflect.TypeOf((*GobDecoder)(nil)).Elem()

This has a minor impact because the rest of compileDec is still quite slow and does a lot of repeated work.


At this point, I am wondering:

  • Is it worth trying to make encoding/gob better suited for this use-case?
    I have explored other encoding libraries, but they seem to lack a lot of the bells and whistles that are already part of gob, making things like supporting interfaces very complicated

  • If encoding/gob can be improved, is it worth doing that via a new flag or feature, vs forking encoding/gob and modifying it for this new use-case?

  • How can the gob decoder compiler best be optimized? Are compiled decoders able to be cached, or does the design of the library make that impractical?
    Would it be possible for instance to define more global types like the built-in ones, such that a decoder instance didn't have to recompile decOps for the same types over and over again?

cc @robpike

@tmm1
Copy link
Contributor Author

tmm1 commented May 10, 2021

My first attempt was simply to re-use a single Decoder instance and feed in multiple streams. This doesn't work, and results in a "extra data in buffer" (or "duplicate type received" with b3235b7) error. The Decoder does not expect to receive type definitions multiple times.

One idea would be to re-use a Decoder instance anyway, and ignore the duplicate type definitions. This is simple enough:

--- a/src/encoding/gob/decoder.go
+++ b/src/encoding/gob/decoder.go
@@ -57,7 +57,7 @@ func NewDecoder(r io.Reader) *Decoder {
 // recvType loads the definition of a type.
 func (dec *Decoder) recvType(id typeId) {
        // Have we already seen this type? That's an error
-       if id < firstUserId || dec.wireType[id] != nil {
+       if id < firstUserId /*|| dec.wireType[id] != nil*/ {
                dec.err = errors.New("gob: duplicate type received")
                return
        }

and does yield the type of performance improvement that I'm looking for:

BenchmarkDecode/WithTypes-8                55258             21711 ns/op
BenchmarkDecode/WithoutTypes-8            607665              2002 ns/op
BenchmarkDecode/WithReuse-8               269878              4509 ns/op

However I'm not sure how best to do this safely.

Since type ids are generated on demand, and values are only sent when non-zero, it seems that two objects would have overlapping type ids assigned if different fields were empty. Is that correct?

In that case, the id alone would not be enough and the patch would need to be smarter like so:

--- a/src/encoding/gob/decoder.go
+++ b/src/encoding/gob/decoder.go
@@ -57,7 +57,7 @@ func NewDecoder(r io.Reader) *Decoder {
 // recvType loads the definition of a type.
 func (dec *Decoder) recvType(id typeId) {
        // Have we already seen this type? That's an error
-       if id < firstUserId || dec.wireType[id] != nil {
+       if id < firstUserId {
                dec.err = errors.New("gob: duplicate type received")
                return
        }
@@ -68,6 +68,12 @@ func (dec *Decoder) recvType(id typeId) {
        if dec.err != nil {
                return
        }
+       if old := dec.wireType[id]; old != nil {
+               if same(old, wire) {
+                       // Same as old type, ignore and use existing
+                       return
+               }
+       }
        // Remember we've seen this type.
        dec.wireType[id] = wire
 }

Would something like this be feasible, or is there other state inside the decoder that would also need to be updated to make this work reliably?

@ALTree ALTree added Performance NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. FeatureRequest labels May 10, 2021
@tmm1
Copy link
Contributor Author

tmm1 commented May 10, 2021

Since type ids are generated on demand, and values are only sent when non-zero, it seems that two objects would have overlapping type ids assigned if different fields were empty. Is that correct?

This assumption appears to be incorrect. From what I can see, type information is sent for every nested type regardless of value. If that's true in all cases, then it suggests that type identifiers would remain stable across encoder invocations as long as the underlying type has the same fields in the same order.

If type ids are in fact stable in this case, then it becomes much safer to re-use decoder instances with the patch above. The same() check in the patch can become a simple (*wireType).string() comparison.

If gob.Decoder had a mode where it ignored the "duplicate type received" errors, and only complained if the underlying type also changed, that would be sufficient for my needs. As an API user, I could cache and re-use a Decoder instance per type. With a new "conflicting type id received" error I could detect and invalidate my cached Decoder in the rare case where it tried to parse two different generations of values sequentially.

Does this seem like a reasonable approach?

@tmm1
Copy link
Contributor Author

tmm1 commented May 10, 2021

If type ids are in fact stable in this case, then it becomes much safer to re-use decoder instances with the patch above. The same() check in the patch can become a simple (*wireType).string() comparison.

While this is a reasonable heuristic, it is not perfect. Specifically, (*wireType).string() only refers to the type's name, so the dec.wireType map looks something like:

65 = obj
66 = map[string]interface {}
67 = []string
68 = []float64

By checking to ensure the same type name and id are received, we can detect when fields are added or reordered, but only as long as their types also change.

So if someone simply reordered two fields with the same type (i.e []string), re-using a decoder instance with the simplistic patch above would result in the values being deserialized into the wrong place.

Perhaps a better checksum can be computed elsewhere containing the names of the fields as well, to more effectively error out when the underlying structure differs from the compiled decoder?

@tmm1
Copy link
Contributor Author

tmm1 commented May 10, 2021

Specifically, (*wireType).string() only refers to the type's name

Instead of using this for the sameness check, I tried to use reflect.DeepEqual and confirmed that would properly detect field reordering within struct types. However the performance of reflection here was quite bad, so I implemented a simple same() instead.

Below is an updated version of my patch, which allows me to reuse a single decoder instance across streams. An error is only thrown in the case where the same decoder is used to try to decode two different generations of an object, signaling to the caller that the fields have been altered in a way that the cached decoder needs to be recreated and retried.

The performance is 4-5x better compared to creating a fresh decoder every time.

If this new behavior makes sense and is worth including in stdlib (optionally under a new setting), then I can take some time to write tests and submit a CL. But if it's not likely to be accepted then I will just maintain a fork.

BenchmarkDecode/WithTypes-8         	   38244	     31949 ns/op
BenchmarkDecode/WithoutTypes-8      	  441136	      2676 ns/op
BenchmarkDecode/WithReuse-8         	  174122	      6856 ns/op
diff --git a/src/encoding/gob/decoder.go b/src/encoding/gob/decoder.go
index b52aabe54b..5957a7435d 100644
--- a/src/encoding/gob/decoder.go
+++ b/src/encoding/gob/decoder.go
@@ -56,9 +56,9 @@ func NewDecoder(r io.Reader) *Decoder {
 
 // recvType loads the definition of a type.
 func (dec *Decoder) recvType(id typeId) {
-	// Have we already seen this type? That's an error
-	if id < firstUserId || dec.wireType[id] != nil {
-		dec.err = errors.New("gob: duplicate type received")
+	// Is this type id within allowed range?
+	if id < firstUserId {
+		dec.err = errors.New("gob: reserved type id received")
 		return
 	}
 
@@ -68,6 +68,13 @@ func (dec *Decoder) recvType(id typeId) {
 	if dec.err != nil {
 		return
 	}
+
+	// Have we already seen this type id for a different type? That's an error
+	if old := dec.wireType[id]; old != nil && !old.same(wire) {
+		dec.err = errors.New("gob: conflicting type id received")
+		return
+	}
+
 	// Remember we've seen this type.
 	dec.wireType[id] = wire
 }
diff --git a/src/encoding/gob/type.go b/src/encoding/gob/type.go
index 31c0ef7af1..a41c90c99d 100644
--- a/src/encoding/gob/type.go
+++ b/src/encoding/gob/type.go
@@ -670,6 +670,53 @@ func (w *wireType) string() string {
 	return unknown
 }
 
+func (w *wireType) same(v *wireType) bool {
+	if w == v {
+		return true
+	}
+	switch {
+	case w.ArrayT != nil &&
+		v.ArrayT != nil:
+		return w.ArrayT.Name == v.ArrayT.Name &&
+			w.ArrayT.Elem == v.ArrayT.Elem &&
+			w.ArrayT.Len == v.ArrayT.Len
+	case w.SliceT != nil &&
+		v.SliceT != nil:
+		return w.SliceT.Name == v.SliceT.Name &&
+			w.SliceT.Elem == v.SliceT.Elem
+	case w.StructT != nil &&
+		v.StructT != nil:
+		if w.StructT.Name != v.StructT.Name {
+			return false
+		}
+		if len(w.StructT.Field) != len(v.StructT.Field) {
+			return false
+		}
+		for i := range w.StructT.Field {
+			if w.StructT.Field[i].Name != v.StructT.Field[i].Name ||
+				w.StructT.Field[i].Id != v.StructT.Field[i].Id {
+				return false
+			}
+		}
+		return true
+	case w.MapT != nil &&
+		v.MapT != nil:
+		return w.MapT.Name == v.MapT.Name &&
+			w.MapT.Key == v.MapT.Key &&
+			w.MapT.Elem == v.MapT.Elem
+	case w.GobEncoderT != nil &&
+		v.GobEncoderT != nil:
+		return w.GobEncoderT.Name == v.GobEncoderT.Name
+	case w.BinaryMarshalerT != nil &&
+		v.BinaryMarshalerT != nil:
+		return w.BinaryMarshalerT.Name == v.BinaryMarshalerT.Name
+	case w.TextMarshalerT != nil &&
+		v.TextMarshalerT != nil:
+		return w.TextMarshalerT.Name == v.TextMarshalerT.Name
+	}
+	return false
+}
+
 type typeInfo struct {
 	id      typeId
 	encInit sync.Mutex   // protects creation of encoder

@tmm1
Copy link
Contributor Author

tmm1 commented May 11, 2021

FYI, wiring up this cacheable decoder to my app resulted in a 3x performance improvement in the gob unmarshal path:

BenchmarkUnmarshal/WithoutCache-8         	    9024	    132112 ns/op
BenchmarkUnmarshal/WithCache-8            	   27282	     43448 ns/op

This is with a fairly complex struct with lots of nested types.

@tmm1
Copy link
Contributor Author

tmm1 commented May 11, 2021

Though things looked great in my app benchmarks, with a real world dataset there are enough variations across object generations that many of the advantages of caching are lost.

I think it's fair to say that the design of encoding/gob makes it unsuitable for my use-case.

@tmm1 tmm1 closed this as completed May 12, 2021
@golang golang locked and limited conversation to collaborators May 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FeatureRequest FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Projects
None yet
Development

No branches or pull requests

3 participants