Descriptiongo.net/publicsuffix: tighten the encoding from 8 bytes per node to 4.
On the full list (running gen.go with -subset=false):
Before, there were 6086 nodes (at 8 bytes per node) before. After,
there were 6086 nodes (at 4 bytes per node) plus 354 children entries
(at 4 bytes per node). The difference is 22928 bytes.
In comparison, the (crushed) text is 21082 bytes, and for the curious,
the longest label is 36 bytes: "xn--correios-e-telecomunicaes-ghc29a".
All 32 bits in the nodes table are used, but there's wiggle room to
accomodate future changes to effective_tld_names.dat:
The largest children index is 353 (in 9 bits, so max is 511).
The largest node type is 2 (in 2 bits, so max is 3).
The largest text offset is 21080 (in 15 bits, so max is 32767).
The largest text length is 36 (in 6 bits, so max is 63).
benchmark old ns/op new ns/op delta
BenchmarkPublicSuffix 19948 19744 -1.02%
Patch Set 1 #Patch Set 2 : diff -r 3a318dc9be38 https://code.google.com/p/go.net #Patch Set 3 : diff -r 3a318dc9be38 https://code.google.com/p/go.net #
Total comments: 12
Patch Set 4 : diff -r 3a318dc9be38 https://code.google.com/p/go.net #Patch Set 5 : diff -r b0dd3b602c14 https://code.google.com/p/go.net #
MessagesTotal messages: 4
|