Run Format

Source file src/pkg/encoding/gob/doc.go

     1	// Copyright 2009 The Go Authors. All rights reserved.
     2	// Use of this source code is governed by a BSD-style
     3	// license that can be found in the LICENSE file.
     4	
     5	/*
     6	Package gob manages streams of gobs - binary values exchanged between an
     7	Encoder (transmitter) and a Decoder (receiver).  A typical use is transporting
     8	arguments and results of remote procedure calls (RPCs) such as those provided by
     9	package "rpc".
    10	
    11	A stream of gobs is self-describing.  Each data item in the stream is preceded by
    12	a specification of its type, expressed in terms of a small set of predefined
    13	types.  Pointers are not transmitted, but the things they point to are
    14	transmitted; that is, the values are flattened.  Recursive types work fine, but
    15	recursive values (data with cycles) are problematic.  This may change.
    16	
    17	To use gobs, create an Encoder and present it with a series of data items as
    18	values or addresses that can be dereferenced to values.  The Encoder makes sure
    19	all type information is sent before it is needed.  At the receive side, a
    20	Decoder retrieves values from the encoded stream and unpacks them into local
    21	variables.
    22	
    23	The source and destination values/types need not correspond exactly.  For structs,
    24	fields (identified by name) that are in the source but absent from the receiving
    25	variable will be ignored.  Fields that are in the receiving variable but missing
    26	from the transmitted type or value will be ignored in the destination.  If a field
    27	with the same name is present in both, their types must be compatible. Both the
    28	receiver and transmitter will do all necessary indirection and dereferencing to
    29	convert between gobs and actual Go values.  For instance, a gob type that is
    30	schematically,
    31	
    32		struct { A, B int }
    33	
    34	can be sent from or received into any of these Go types:
    35	
    36		struct { A, B int }	// the same
    37		*struct { A, B int }	// extra indirection of the struct
    38		struct { *A, **B int }	// extra indirection of the fields
    39		struct { A, B int64 }	// different concrete value type; see below
    40	
    41	It may also be received into any of these:
    42	
    43		struct { A, B int }	// the same
    44		struct { B, A int }	// ordering doesn't matter; matching is by name
    45		struct { A, B, C int }	// extra field (C) ignored
    46		struct { B int }	// missing field (A) ignored; data will be dropped
    47		struct { B, C int }	// missing field (A) ignored; extra field (C) ignored.
    48	
    49	Attempting to receive into these types will draw a decode error:
    50	
    51		struct { A int; B uint }	// change of signedness for B
    52		struct { A int; B float }	// change of type for B
    53		struct { }			// no field names in common
    54		struct { C, D int }		// no field names in common
    55	
    56	Integers are transmitted two ways: arbitrary precision signed integers or
    57	arbitrary precision unsigned integers.  There is no int8, int16 etc.
    58	discrimination in the gob format; there are only signed and unsigned integers.  As
    59	described below, the transmitter sends the value in a variable-length encoding;
    60	the receiver accepts the value and stores it in the destination variable.
    61	Floating-point numbers are always sent using IEEE-754 64-bit precision (see
    62	below).
    63	
    64	Signed integers may be received into any signed integer variable: int, int16, etc.;
    65	unsigned integers may be received into any unsigned integer variable; and floating
    66	point values may be received into any floating point variable.  However,
    67	the destination variable must be able to represent the value or the decode
    68	operation will fail.
    69	
    70	Structs, arrays and slices are also supported.  Structs encode and
    71	decode only exported fields. Strings and arrays of bytes are supported
    72	with a special, efficient representation (see below).  When a slice
    73	is decoded, if the existing slice has capacity the slice will be
    74	extended in place; if not, a new array is allocated.  Regardless,
    75	the length of the resulting slice reports the number of elements
    76	decoded.
    77	
    78	Functions and channels cannot be sent in a gob.  Attempting
    79	to encode a value that contains one will fail.
    80	
    81	The rest of this comment documents the encoding, details that are not important
    82	for most users.  Details are presented bottom-up.
    83	
    84	An unsigned integer is sent one of two ways.  If it is less than 128, it is sent
    85	as a byte with that value.  Otherwise it is sent as a minimal-length big-endian
    86	(high byte first) byte stream holding the value, preceded by one byte holding the
    87	byte count, negated.  Thus 0 is transmitted as (00), 7 is transmitted as (07) and
    88	256 is transmitted as (FE 01 00).
    89	
    90	A boolean is encoded within an unsigned integer: 0 for false, 1 for true.
    91	
    92	A signed integer, i, is encoded within an unsigned integer, u.  Within u, bits 1
    93	upward contain the value; bit 0 says whether they should be complemented upon
    94	receipt.  The encode algorithm looks like this:
    95	
    96		uint u;
    97		if i < 0 {
    98			u = (^i << 1) | 1	// complement i, bit 0 is 1
    99		} else {
   100			u = (i << 1)	// do not complement i, bit 0 is 0
   101		}
   102		encodeUnsigned(u)
   103	
   104	The low bit is therefore analogous to a sign bit, but making it the complement bit
   105	instead guarantees that the largest negative integer is not a special case.  For
   106	example, -129=^128=(^256>>1) encodes as (FE 01 01).
   107	
   108	Floating-point numbers are always sent as a representation of a float64 value.
   109	That value is converted to a uint64 using math.Float64bits.  The uint64 is then
   110	byte-reversed and sent as a regular unsigned integer.  The byte-reversal means the
   111	exponent and high-precision part of the mantissa go first.  Since the low bits are
   112	often zero, this can save encoding bytes.  For instance, 17.0 is encoded in only
   113	three bytes (FE 31 40).
   114	
   115	Strings and slices of bytes are sent as an unsigned count followed by that many
   116	uninterpreted bytes of the value.
   117	
   118	All other slices and arrays are sent as an unsigned count followed by that many
   119	elements using the standard gob encoding for their type, recursively.
   120	
   121	Maps are sent as an unsigned count followed by that many key, element
   122	pairs. Empty but non-nil maps are sent, so if the sender has allocated
   123	a map, the receiver will allocate a map even if no elements are
   124	transmitted.
   125	
   126	Structs are sent as a sequence of (field number, field value) pairs.  The field
   127	value is sent using the standard gob encoding for its type, recursively.  If a
   128	field has the zero value for its type, it is omitted from the transmission.  The
   129	field number is defined by the type of the encoded struct: the first field of the
   130	encoded type is field 0, the second is field 1, etc.  When encoding a value, the
   131	field numbers are delta encoded for efficiency and the fields are always sent in
   132	order of increasing field number; the deltas are therefore unsigned.  The
   133	initialization for the delta encoding sets the field number to -1, so an unsigned
   134	integer field 0 with value 7 is transmitted as unsigned delta = 1, unsigned value
   135	= 7 or (01 07).  Finally, after all the fields have been sent a terminating mark
   136	denotes the end of the struct.  That mark is a delta=0 value, which has
   137	representation (00).
   138	
   139	Interface types are not checked for compatibility; all interface types are
   140	treated, for transmission, as members of a single "interface" type, analogous to
   141	int or []byte - in effect they're all treated as interface{}.  Interface values
   142	are transmitted as a string identifying the concrete type being sent (a name
   143	that must be pre-defined by calling Register), followed by a byte count of the
   144	length of the following data (so the value can be skipped if it cannot be
   145	stored), followed by the usual encoding of concrete (dynamic) value stored in
   146	the interface value.  (A nil interface value is identified by the empty string
   147	and transmits no value.) Upon receipt, the decoder verifies that the unpacked
   148	concrete item satisfies the interface of the receiving variable.
   149	
   150	The representation of types is described below.  When a type is defined on a given
   151	connection between an Encoder and Decoder, it is assigned a signed integer type
   152	id.  When Encoder.Encode(v) is called, it makes sure there is an id assigned for
   153	the type of v and all its elements and then it sends the pair (typeid, encoded-v)
   154	where typeid is the type id of the encoded type of v and encoded-v is the gob
   155	encoding of the value v.
   156	
   157	To define a type, the encoder chooses an unused, positive type id and sends the
   158	pair (-type id, encoded-type) where encoded-type is the gob encoding of a wireType
   159	description, constructed from these types:
   160	
   161		type wireType struct {
   162			ArrayT  *ArrayType
   163			SliceT  *SliceType
   164			StructT *StructType
   165			MapT    *MapType
   166		}
   167		type arrayType struct {
   168			CommonType
   169			Elem typeId
   170			Len  int
   171		}
   172		type CommonType struct {
   173			Name string // the name of the struct type
   174			Id  int    // the id of the type, repeated so it's inside the type
   175		}
   176		type sliceType struct {
   177			CommonType
   178			Elem typeId
   179		}
   180		type structType struct {
   181			CommonType
   182			Field []*fieldType // the fields of the struct.
   183		}
   184		type fieldType struct {
   185			Name string // the name of the field.
   186			Id   int    // the type id of the field, which must be already defined
   187		}
   188		type mapType struct {
   189			CommonType
   190			Key  typeId
   191			Elem typeId
   192		}
   193	
   194	If there are nested type ids, the types for all inner type ids must be defined
   195	before the top-level type id is used to describe an encoded-v.
   196	
   197	For simplicity in setup, the connection is defined to understand these types a
   198	priori, as well as the basic gob types int, uint, etc.  Their ids are:
   199	
   200		bool        1
   201		int         2
   202		uint        3
   203		float       4
   204		[]byte      5
   205		string      6
   206		complex     7
   207		interface   8
   208		// gap for reserved ids.
   209		WireType    16
   210		ArrayType   17
   211		CommonType  18
   212		SliceType   19
   213		StructType  20
   214		FieldType   21
   215		// 22 is slice of fieldType.
   216		MapType     23
   217	
   218	Finally, each message created by a call to Encode is preceded by an encoded
   219	unsigned integer count of the number of bytes remaining in the message.  After
   220	the initial type name, interface values are wrapped the same way; in effect, the
   221	interface value acts like a recursive invocation of Encode.
   222	
   223	In summary, a gob stream looks like
   224	
   225		(byteCount (-type id, encoding of a wireType)* (type id, encoding of a value))*
   226	
   227	where * signifies zero or more repetitions and the type id of a value must
   228	be predefined or be defined before the value in the stream.
   229	
   230	See "Gobs of data" for a design discussion of the gob wire format:
   231	http://golang.org/doc/articles/gobs_of_data.html
   232	*/
   233	package gob
   234	
   235	/*
   236	Grammar:
   237	
   238	Tokens starting with a lower case letter are terminals; int(n)
   239	and uint(n) represent the signed/unsigned encodings of the value n.
   240	
   241	GobStream:
   242		DelimitedMessage*
   243	DelimitedMessage:
   244		uint(lengthOfMessage) Message
   245	Message:
   246		TypeSequence TypedValue
   247	TypeSequence
   248		(TypeDefinition DelimitedTypeDefinition*)?
   249	DelimitedTypeDefinition:
   250		uint(lengthOfTypeDefinition) TypeDefinition
   251	TypedValue:
   252		int(typeId) Value
   253	TypeDefinition:
   254		int(-typeId) encodingOfWireType
   255	Value:
   256		SingletonValue | StructValue
   257	SingletonValue:
   258		uint(0) FieldValue
   259	FieldValue:
   260		builtinValue | ArrayValue | MapValue | SliceValue | StructValue | InterfaceValue
   261	InterfaceValue:
   262		NilInterfaceValue | NonNilInterfaceValue
   263	NilInterfaceValue:
   264		uint(0)
   265	NonNilInterfaceValue:
   266		ConcreteTypeName TypeSequence InterfaceContents
   267	ConcreteTypeName:
   268		uint(lengthOfName) [already read=n] name
   269	InterfaceContents:
   270		int(concreteTypeId) DelimitedValue
   271	DelimitedValue:
   272		uint(length) Value
   273	ArrayValue:
   274		uint(n) FieldValue*n [n elements]
   275	MapValue:
   276		uint(n) (FieldValue FieldValue)*n  [n (key, value) pairs]
   277	SliceValue:
   278		uint(n) FieldValue*n [n elements]
   279	StructValue:
   280		(uint(fieldDelta) FieldValue)*
   281	*/
   282	
   283	/*
   284	For implementers and the curious, here is an encoded example.  Given
   285		type Point struct {X, Y int}
   286	and the value
   287		p := Point{22, 33}
   288	the bytes transmitted that encode p will be:
   289		1f ff 81 03 01 01 05 50 6f 69 6e 74 01 ff 82 00
   290		01 02 01 01 58 01 04 00 01 01 59 01 04 00 00 00
   291		07 ff 82 01 2c 01 42 00
   292	They are determined as follows.
   293	
   294	Since this is the first transmission of type Point, the type descriptor
   295	for Point itself must be sent before the value.  This is the first type
   296	we've sent on this Encoder, so it has type id 65 (0 through 64 are
   297	reserved).
   298	
   299		1f	// This item (a type descriptor) is 31 bytes long.
   300		ff 81	// The negative of the id for the type we're defining, -65.
   301			// This is one byte (indicated by FF = -1) followed by
   302			// ^-65<<1 | 1.  The low 1 bit signals to complement the
   303			// rest upon receipt.
   304	
   305		// Now we send a type descriptor, which is itself a struct (wireType).
   306		// The type of wireType itself is known (it's built in, as is the type of
   307		// all its components), so we just need to send a *value* of type wireType
   308		// that represents type "Point".
   309		// Here starts the encoding of that value.
   310		// Set the field number implicitly to -1; this is done at the beginning
   311		// of every struct, including nested structs.
   312		03	// Add 3 to field number; now 2 (wireType.structType; this is a struct).
   313			// structType starts with an embedded CommonType, which appears
   314			// as a regular structure here too.
   315		01	// add 1 to field number (now 0); start of embedded CommonType.
   316		01	// add 1 to field number (now 0, the name of the type)
   317		05	// string is (unsigned) 5 bytes long
   318		50 6f 69 6e 74	// wireType.structType.CommonType.name = "Point"
   319		01	// add 1 to field number (now 1, the id of the type)
   320		ff 82	// wireType.structType.CommonType._id = 65
   321		00	// end of embedded wiretype.structType.CommonType struct
   322		01	// add 1 to field number (now 1, the field array in wireType.structType)
   323		02	// There are two fields in the type (len(structType.field))
   324		01	// Start of first field structure; add 1 to get field number 0: field[0].name
   325		01	// 1 byte
   326		58	// structType.field[0].name = "X"
   327		01	// Add 1 to get field number 1: field[0].id
   328		04	// structType.field[0].typeId is 2 (signed int).
   329		00	// End of structType.field[0]; start structType.field[1]; set field number to -1.
   330		01	// Add 1 to get field number 0: field[1].name
   331		01	// 1 byte
   332		59	// structType.field[1].name = "Y"
   333		01	// Add 1 to get field number 1: field[1].id
   334		04	// struct.Type.field[1].typeId is 2 (signed int).
   335		00	// End of structType.field[1]; end of structType.field.
   336		00	// end of wireType.structType structure
   337		00	// end of wireType structure
   338	
   339	Now we can send the Point value.  Again the field number resets to -1:
   340	
   341		07	// this value is 7 bytes long
   342		ff 82	// the type number, 65 (1 byte (-FF) followed by 65<<1)
   343		01	// add one to field number, yielding field 0
   344		2c	// encoding of signed "22" (0x22 = 44 = 22<<1); Point.x = 22
   345		01	// add one to field number, yielding field 1
   346		42	// encoding of signed "33" (0x42 = 66 = 33<<1); Point.y = 33
   347		00	// end of structure
   348	
   349	The type encoding is long and fairly intricate but we send it only once.
   350	If p is transmitted a second time, the type is already known so the
   351	output will be just:
   352	
   353		07 ff 82 01 2c 01 42 00
   354	
   355	A single non-struct value at top level is transmitted like a field with
   356	delta tag 0.  For instance, a signed integer with value 3 presented as
   357	the argument to Encode will emit:
   358	
   359		03 04 00 06
   360	
   361	Which represents:
   362	
   363		03	// this value is 3 bytes long
   364		04	// the type number, 2, represents an integer
   365		00	// tag delta 0
   366		06	// value 3
   367	
   368	*/

View as plain text