Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash: add Combine functions #12297

Closed
dsnet opened this issue Aug 24, 2015 · 5 comments
Closed

hash: add Combine functions #12297

dsnet opened this issue Aug 24, 2015 · 5 comments

Comments

@dsnet
Copy link
Member

dsnet commented Aug 24, 2015

I propose that the standard libraries hash/crc32, hash/crc64, and hash/adler32 add a Combine function that does the following:

Let AB be the string concatenation of strings A and B. Then it is possible to define a efficient function Combine that computes Hash(AB) given only Hash(A), Hash(B), Len(B).

Hash(AB) == Combine(Hash(A), Hash(B), Len(B))

Thus, usage of Combine in hash/crc32 may look like:

var s1, s2 []byte // Let these be two large byte strings
crc1 := crc32.ChecksumIEEE(s1)
crc2 := crc32.ChecksumIEEE(s2)
len2 := int64(len(s2))

// crc3 is equivalent to crc32.ChecksumIEEE(append(s1, s2...))
crc3 := crc32.Combine(crc32.IEEE, crc1, crc2, len2) 

The zlib library provides these functions and they are extremely useful for merging large data segments under a single hash or for computing the hash of a large dataset in parallel.

Advantages

  • Add functionality that is obviously related to each of the hash libraries and has clear usefulness and prevents users from needing to implement these non-trivial functions themselves.

Disadvantages

  • The hash libraries will be slightly asymmetrical since hash/fnv won't have a Combine function. I haven't spent too much time on it, but it may be possible to define a Combine function for fnv as well.

I'd be interested in adding these if other people find them useful as well.

@randall77
Copy link
Contributor

You can already do CRC of multiple large byte slices without copying.

var s1, s2 []byte // Let these be two large byte strings
c := crc32.NewIEEE()
c.Write(s1)
c.Write(s2)
r := c.Sum32()

Your proposal would still be useful for computing the checksums in parallel (or at different times). I'm not sure how often that comes up.

@dsnet
Copy link
Member Author

dsnet commented Aug 24, 2015

@randall77

The advantage is not gained from avoiding copying, but rather from avoiding the need to compute the full hash. If s1 and s2 have the lengths n and m and assuming you already knew crc1, then you could use the Update function, but would still have a runtime of O(m), which can still be significant.

The algorithm used by zlib can combine crc1, crc2, and len2 in O(log m) time for CRC and O(1) for adler32.

Useful applications of this:

  • Merging of any packet that uses crc. For example back-to-back gzip files can be merged into a single gzip file efficiently without recomputing the full hash. Other times, data is sent in chunks (along with their CRC). With Combine, it would now be efficient to merge the CRC of all of these chunks in less than linear time.
  • Computing crc in parallel.
    • As a practical example: github.com/klauspost/pgzip could benefit from having each routine compute the CRC individually and have the master just merge the results.

@dr2chase
Copy link
Contributor

I would be very amused to see this happen. I did this work for Java (in particular, fork/join CRC calculation) and it was judged to be too scary for incorporation into the Java standard library because it used threads. For CRCs that use different polynomials than the one wired into the CRC32 instruction, you can (on sufficiently modern AMD64) use carryless multiply to get a nice serial speed boost, too.

@klauspost
Copy link
Contributor

(Searched my name by accient ;)

Wouldn't this be a nice feature of a third party package? I don't see much use for it in general, so IMO it doesn't belong in the standard package. Even in pgzip, it wouldn't really help, since crc is done in the calling goroutine, which doesn't have the "heavy" load.

@dsnet
Copy link
Member Author

dsnet commented Oct 13, 2015

@klauspost I agree. I still need/desire this feature, but I'm sure most people probably won't need this. As such, it fits better in a third-party package.

Closing my issue down, unless many other people voice a need for this.

@dsnet dsnet closed this as completed Oct 13, 2015
@golang golang locked and limited conversation to collaborators Oct 12, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants