You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As can be seen, it is not completely random, and it is clear that some symbols appear more frequently than others. Currently, on go1.7beta1, the flate.Writer on HuffmanOnly mode does not even attempt to encode this dataset (and resorts to outputting raw blocks).
From an entropy perspective, we should be able to encode each symbol with 7.63 bits/symbol. With 10000 symbols, we should be able to compress this to approximately 95375 bytes. So unless the huffman table definition itself would have occupied more than 4625 bytes (unlikely), the writer should have chosen to use a dynamic block instead of a raw block.
On HuffmanOnly mode it may be worth exploring how we split up each block.
The text was updated successfully, but these errors were encountered:
Using
go1.7beta1
HuffmanOnly
mode onflate.Writer
sometimes encodes data as raw blocks instead of dynamic blocks when there is something to gain.The attached file (repeats.bin inside the zip file) has the following byte histogram:
As can be seen, it is not completely random, and it is clear that some symbols appear more frequently than others. Currently, on
go1.7beta1
, theflate.Writer
onHuffmanOnly
mode does not even attempt to encode this dataset (and resorts to outputting raw blocks).From an entropy perspective, we should be able to encode each symbol with 7.63 bits/symbol. With 10000 symbols, we should be able to compress this to approximately 95375 bytes. So unless the huffman table definition itself would have occupied more than 4625 bytes (unlikely), the writer should have chosen to use a dynamic block instead of a raw block.
On
HuffmanOnly
mode it may be worth exploring how we split up each block.The text was updated successfully, but these errors were encountered: