...
Run Format

Source file src/encoding/gob/doc.go

     1	// Copyright 2009 The Go Authors. All rights reserved.
     2	// Use of this source code is governed by a BSD-style
     3	// license that can be found in the LICENSE file.
     4	
     5	/*
     6	Package gob manages streams of gobs - binary values exchanged between an
     7	Encoder (transmitter) and a Decoder (receiver).  A typical use is transporting
     8	arguments and results of remote procedure calls (RPCs) such as those provided by
     9	package "net/rpc".
    10	
    11	The implementation compiles a custom codec for each data type in the stream and
    12	is most efficient when a single Encoder is used to transmit a stream of values,
    13	amortizing the cost of compilation.
    14	
    15	Basics
    16	
    17	A stream of gobs is self-describing.  Each data item in the stream is preceded by
    18	a specification of its type, expressed in terms of a small set of predefined
    19	types.  Pointers are not transmitted, but the things they point to are
    20	transmitted; that is, the values are flattened. Nil pointers are not permitted,
    21	as they have no value. Recursive types work fine, but
    22	recursive values (data with cycles) are problematic.  This may change.
    23	
    24	To use gobs, create an Encoder and present it with a series of data items as
    25	values or addresses that can be dereferenced to values.  The Encoder makes sure
    26	all type information is sent before it is needed.  At the receive side, a
    27	Decoder retrieves values from the encoded stream and unpacks them into local
    28	variables.
    29	
    30	Types and Values
    31	
    32	The source and destination values/types need not correspond exactly.  For structs,
    33	fields (identified by name) that are in the source but absent from the receiving
    34	variable will be ignored.  Fields that are in the receiving variable but missing
    35	from the transmitted type or value will be ignored in the destination.  If a field
    36	with the same name is present in both, their types must be compatible. Both the
    37	receiver and transmitter will do all necessary indirection and dereferencing to
    38	convert between gobs and actual Go values.  For instance, a gob type that is
    39	schematically,
    40	
    41		struct { A, B int }
    42	
    43	can be sent from or received into any of these Go types:
    44	
    45		struct { A, B int }	// the same
    46		*struct { A, B int }	// extra indirection of the struct
    47		struct { *A, **B int }	// extra indirection of the fields
    48		struct { A, B int64 }	// different concrete value type; see below
    49	
    50	It may also be received into any of these:
    51	
    52		struct { A, B int }	// the same
    53		struct { B, A int }	// ordering doesn't matter; matching is by name
    54		struct { A, B, C int }	// extra field (C) ignored
    55		struct { B int }	// missing field (A) ignored; data will be dropped
    56		struct { B, C int }	// missing field (A) ignored; extra field (C) ignored.
    57	
    58	Attempting to receive into these types will draw a decode error:
    59	
    60		struct { A int; B uint }	// change of signedness for B
    61		struct { A int; B float }	// change of type for B
    62		struct { }			// no field names in common
    63		struct { C, D int }		// no field names in common
    64	
    65	Integers are transmitted two ways: arbitrary precision signed integers or
    66	arbitrary precision unsigned integers.  There is no int8, int16 etc.
    67	discrimination in the gob format; there are only signed and unsigned integers.  As
    68	described below, the transmitter sends the value in a variable-length encoding;
    69	the receiver accepts the value and stores it in the destination variable.
    70	Floating-point numbers are always sent using IEEE-754 64-bit precision (see
    71	below).
    72	
    73	Signed integers may be received into any signed integer variable: int, int16, etc.;
    74	unsigned integers may be received into any unsigned integer variable; and floating
    75	point values may be received into any floating point variable.  However,
    76	the destination variable must be able to represent the value or the decode
    77	operation will fail.
    78	
    79	Structs, arrays and slices are also supported. Structs encode and decode only
    80	exported fields. Strings and arrays of bytes are supported with a special,
    81	efficient representation (see below). When a slice is decoded, if the existing
    82	slice has capacity the slice will be extended in place; if not, a new array is
    83	allocated. Regardless, the length of the resulting slice reports the number of
    84	elements decoded.
    85	
    86	In general, if allocation is required, the decoder will allocate memory. If not,
    87	it will update the destination variables with values read from the stream. It does
    88	not initialize them first, so if the destination is a compound value such as a
    89	map, struct, or slice, the decoded values will be merged elementwise into the
    90	existing variables.
    91	
    92	Functions and channels will not be sent in a gob. Attempting to encode such a value
    93	at the top level will fail. A struct field of chan or func type is treated exactly
    94	like an unexported field and is ignored.
    95	
    96	Gob can encode a value of any type implementing the GobEncoder or
    97	encoding.BinaryMarshaler interfaces by calling the corresponding method,
    98	in that order of preference.
    99	
   100	Gob can decode a value of any type implementing the GobDecoder or
   101	encoding.BinaryUnmarshaler interfaces by calling the corresponding method,
   102	again in that order of preference.
   103	
   104	Encoding Details
   105	
   106	This section documents the encoding, details that are not important for most
   107	users. Details are presented bottom-up.
   108	
   109	An unsigned integer is sent one of two ways.  If it is less than 128, it is sent
   110	as a byte with that value.  Otherwise it is sent as a minimal-length big-endian
   111	(high byte first) byte stream holding the value, preceded by one byte holding the
   112	byte count, negated.  Thus 0 is transmitted as (00), 7 is transmitted as (07) and
   113	256 is transmitted as (FE 01 00).
   114	
   115	A boolean is encoded within an unsigned integer: 0 for false, 1 for true.
   116	
   117	A signed integer, i, is encoded within an unsigned integer, u.  Within u, bits 1
   118	upward contain the value; bit 0 says whether they should be complemented upon
   119	receipt.  The encode algorithm looks like this:
   120	
   121		var u uint
   122		if i < 0 {
   123			u = (^uint(i) << 1) | 1 // complement i, bit 0 is 1
   124		} else {
   125			u = (uint(i) << 1) // do not complement i, bit 0 is 0
   126		}
   127		encodeUnsigned(u)
   128	
   129	The low bit is therefore analogous to a sign bit, but making it the complement bit
   130	instead guarantees that the largest negative integer is not a special case.  For
   131	example, -129=^128=(^256>>1) encodes as (FE 01 01).
   132	
   133	Floating-point numbers are always sent as a representation of a float64 value.
   134	That value is converted to a uint64 using math.Float64bits.  The uint64 is then
   135	byte-reversed and sent as a regular unsigned integer.  The byte-reversal means the
   136	exponent and high-precision part of the mantissa go first.  Since the low bits are
   137	often zero, this can save encoding bytes.  For instance, 17.0 is encoded in only
   138	three bytes (FE 31 40).
   139	
   140	Strings and slices of bytes are sent as an unsigned count followed by that many
   141	uninterpreted bytes of the value.
   142	
   143	All other slices and arrays are sent as an unsigned count followed by that many
   144	elements using the standard gob encoding for their type, recursively.
   145	
   146	Maps are sent as an unsigned count followed by that many key, element
   147	pairs. Empty but non-nil maps are sent, so if the receiver has not allocated
   148	one already, one will always be allocated on receipt unless the transmitted map
   149	is nil and not at the top level.
   150	
   151	In slices and arrays, as well as maps, all elements, even zero-valued elements,
   152	are transmitted, even if all the elements are zero.
   153	
   154	Structs are sent as a sequence of (field number, field value) pairs.  The field
   155	value is sent using the standard gob encoding for its type, recursively.  If a
   156	field has the zero value for its type (except for arrays; see above), it is omitted
   157	from the transmission.  The field number is defined by the type of the encoded
   158	struct: the first field of the encoded type is field 0, the second is field 1,
   159	etc.  When encoding a value, the field numbers are delta encoded for efficiency
   160	and the fields are always sent in order of increasing field number; the deltas are
   161	therefore unsigned.  The initialization for the delta encoding sets the field
   162	number to -1, so an unsigned integer field 0 with value 7 is transmitted as unsigned
   163	delta = 1, unsigned value = 7 or (01 07).  Finally, after all the fields have been
   164	sent a terminating mark denotes the end of the struct.  That mark is a delta=0
   165	value, which has representation (00).
   166	
   167	Interface types are not checked for compatibility; all interface types are
   168	treated, for transmission, as members of a single "interface" type, analogous to
   169	int or []byte - in effect they're all treated as interface{}.  Interface values
   170	are transmitted as a string identifying the concrete type being sent (a name
   171	that must be pre-defined by calling Register), followed by a byte count of the
   172	length of the following data (so the value can be skipped if it cannot be
   173	stored), followed by the usual encoding of concrete (dynamic) value stored in
   174	the interface value.  (A nil interface value is identified by the empty string
   175	and transmits no value.) Upon receipt, the decoder verifies that the unpacked
   176	concrete item satisfies the interface of the receiving variable.
   177	
   178	The representation of types is described below.  When a type is defined on a given
   179	connection between an Encoder and Decoder, it is assigned a signed integer type
   180	id.  When Encoder.Encode(v) is called, it makes sure there is an id assigned for
   181	the type of v and all its elements and then it sends the pair (typeid, encoded-v)
   182	where typeid is the type id of the encoded type of v and encoded-v is the gob
   183	encoding of the value v.
   184	
   185	To define a type, the encoder chooses an unused, positive type id and sends the
   186	pair (-type id, encoded-type) where encoded-type is the gob encoding of a wireType
   187	description, constructed from these types:
   188	
   189		type wireType struct {
   190			ArrayT  *ArrayType
   191			SliceT  *SliceType
   192			StructT *StructType
   193			MapT    *MapType
   194		}
   195		type arrayType struct {
   196			CommonType
   197			Elem typeId
   198			Len  int
   199		}
   200		type CommonType struct {
   201			Name string // the name of the struct type
   202			Id  int    // the id of the type, repeated so it's inside the type
   203		}
   204		type sliceType struct {
   205			CommonType
   206			Elem typeId
   207		}
   208		type structType struct {
   209			CommonType
   210			Field []*fieldType // the fields of the struct.
   211		}
   212		type fieldType struct {
   213			Name string // the name of the field.
   214			Id   int    // the type id of the field, which must be already defined
   215		}
   216		type mapType struct {
   217			CommonType
   218			Key  typeId
   219			Elem typeId
   220		}
   221	
   222	If there are nested type ids, the types for all inner type ids must be defined
   223	before the top-level type id is used to describe an encoded-v.
   224	
   225	For simplicity in setup, the connection is defined to understand these types a
   226	priori, as well as the basic gob types int, uint, etc.  Their ids are:
   227	
   228		bool        1
   229		int         2
   230		uint        3
   231		float       4
   232		[]byte      5
   233		string      6
   234		complex     7
   235		interface   8
   236		// gap for reserved ids.
   237		WireType    16
   238		ArrayType   17
   239		CommonType  18
   240		SliceType   19
   241		StructType  20
   242		FieldType   21
   243		// 22 is slice of fieldType.
   244		MapType     23
   245	
   246	Finally, each message created by a call to Encode is preceded by an encoded
   247	unsigned integer count of the number of bytes remaining in the message.  After
   248	the initial type name, interface values are wrapped the same way; in effect, the
   249	interface value acts like a recursive invocation of Encode.
   250	
   251	In summary, a gob stream looks like
   252	
   253		(byteCount (-type id, encoding of a wireType)* (type id, encoding of a value))*
   254	
   255	where * signifies zero or more repetitions and the type id of a value must
   256	be predefined or be defined before the value in the stream.
   257	
   258	Compatibility: Any future changes to the package will endeavor to maintain
   259	compatibility with streams encoded using previous versions.  That is, any released
   260	version of this package should be able to decode data written with any previously
   261	released version, subject to issues such as security fixes. See the Go compatibility
   262	document for background: https://golang.org/doc/go1compat
   263	
   264	See "Gobs of data" for a design discussion of the gob wire format:
   265	https://blog.golang.org/gobs-of-data
   266	*/
   267	package gob
   268	
   269	/*
   270	Grammar:
   271	
   272	Tokens starting with a lower case letter are terminals; int(n)
   273	and uint(n) represent the signed/unsigned encodings of the value n.
   274	
   275	GobStream:
   276		DelimitedMessage*
   277	DelimitedMessage:
   278		uint(lengthOfMessage) Message
   279	Message:
   280		TypeSequence TypedValue
   281	TypeSequence
   282		(TypeDefinition DelimitedTypeDefinition*)?
   283	DelimitedTypeDefinition:
   284		uint(lengthOfTypeDefinition) TypeDefinition
   285	TypedValue:
   286		int(typeId) Value
   287	TypeDefinition:
   288		int(-typeId) encodingOfWireType
   289	Value:
   290		SingletonValue | StructValue
   291	SingletonValue:
   292		uint(0) FieldValue
   293	FieldValue:
   294		builtinValue | ArrayValue | MapValue | SliceValue | StructValue | InterfaceValue
   295	InterfaceValue:
   296		NilInterfaceValue | NonNilInterfaceValue
   297	NilInterfaceValue:
   298		uint(0)
   299	NonNilInterfaceValue:
   300		ConcreteTypeName TypeSequence InterfaceContents
   301	ConcreteTypeName:
   302		uint(lengthOfName) [already read=n] name
   303	InterfaceContents:
   304		int(concreteTypeId) DelimitedValue
   305	DelimitedValue:
   306		uint(length) Value
   307	ArrayValue:
   308		uint(n) FieldValue*n [n elements]
   309	MapValue:
   310		uint(n) (FieldValue FieldValue)*n  [n (key, value) pairs]
   311	SliceValue:
   312		uint(n) FieldValue*n [n elements]
   313	StructValue:
   314		(uint(fieldDelta) FieldValue)*
   315	*/
   316	
   317	/*
   318	For implementers and the curious, here is an encoded example.  Given
   319		type Point struct {X, Y int}
   320	and the value
   321		p := Point{22, 33}
   322	the bytes transmitted that encode p will be:
   323		1f ff 81 03 01 01 05 50 6f 69 6e 74 01 ff 82 00
   324		01 02 01 01 58 01 04 00 01 01 59 01 04 00 00 00
   325		07 ff 82 01 2c 01 42 00
   326	They are determined as follows.
   327	
   328	Since this is the first transmission of type Point, the type descriptor
   329	for Point itself must be sent before the value.  This is the first type
   330	we've sent on this Encoder, so it has type id 65 (0 through 64 are
   331	reserved).
   332	
   333		1f	// This item (a type descriptor) is 31 bytes long.
   334		ff 81	// The negative of the id for the type we're defining, -65.
   335			// This is one byte (indicated by FF = -1) followed by
   336			// ^-65<<1 | 1.  The low 1 bit signals to complement the
   337			// rest upon receipt.
   338	
   339		// Now we send a type descriptor, which is itself a struct (wireType).
   340		// The type of wireType itself is known (it's built in, as is the type of
   341		// all its components), so we just need to send a *value* of type wireType
   342		// that represents type "Point".
   343		// Here starts the encoding of that value.
   344		// Set the field number implicitly to -1; this is done at the beginning
   345		// of every struct, including nested structs.
   346		03	// Add 3 to field number; now 2 (wireType.structType; this is a struct).
   347			// structType starts with an embedded CommonType, which appears
   348			// as a regular structure here too.
   349		01	// add 1 to field number (now 0); start of embedded CommonType.
   350		01	// add 1 to field number (now 0, the name of the type)
   351		05	// string is (unsigned) 5 bytes long
   352		50 6f 69 6e 74	// wireType.structType.CommonType.name = "Point"
   353		01	// add 1 to field number (now 1, the id of the type)
   354		ff 82	// wireType.structType.CommonType._id = 65
   355		00	// end of embedded wiretype.structType.CommonType struct
   356		01	// add 1 to field number (now 1, the field array in wireType.structType)
   357		02	// There are two fields in the type (len(structType.field))
   358		01	// Start of first field structure; add 1 to get field number 0: field[0].name
   359		01	// 1 byte
   360		58	// structType.field[0].name = "X"
   361		01	// Add 1 to get field number 1: field[0].id
   362		04	// structType.field[0].typeId is 2 (signed int).
   363		00	// End of structType.field[0]; start structType.field[1]; set field number to -1.
   364		01	// Add 1 to get field number 0: field[1].name
   365		01	// 1 byte
   366		59	// structType.field[1].name = "Y"
   367		01	// Add 1 to get field number 1: field[1].id
   368		04	// struct.Type.field[1].typeId is 2 (signed int).
   369		00	// End of structType.field[1]; end of structType.field.
   370		00	// end of wireType.structType structure
   371		00	// end of wireType structure
   372	
   373	Now we can send the Point value.  Again the field number resets to -1:
   374	
   375		07	// this value is 7 bytes long
   376		ff 82	// the type number, 65 (1 byte (-FF) followed by 65<<1)
   377		01	// add one to field number, yielding field 0
   378		2c	// encoding of signed "22" (0x22 = 44 = 22<<1); Point.x = 22
   379		01	// add one to field number, yielding field 1
   380		42	// encoding of signed "33" (0x42 = 66 = 33<<1); Point.y = 33
   381		00	// end of structure
   382	
   383	The type encoding is long and fairly intricate but we send it only once.
   384	If p is transmitted a second time, the type is already known so the
   385	output will be just:
   386	
   387		07 ff 82 01 2c 01 42 00
   388	
   389	A single non-struct value at top level is transmitted like a field with
   390	delta tag 0.  For instance, a signed integer with value 3 presented as
   391	the argument to Encode will emit:
   392	
   393		03 04 00 06
   394	
   395	Which represents:
   396	
   397		03	// this value is 3 bytes long
   398		04	// the type number, 2, represents an integer
   399		00	// tag delta 0
   400		06	// value 3
   401	
   402	*/
   403	

View as plain text