Skip to content

hash/maphash: purego implementation is not updated #69940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ErikPelli opened this issue Oct 18, 2024 · 7 comments
Closed

hash/maphash: purego implementation is not updated #69940

ErikPelli opened this issue Oct 18, 2024 · 7 comments
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@ErikPelli
Copy link
Contributor

According to lines 33 & 34 of hash/maphash/maphash_purego.go:

// This is a port of wyhash implementation in runtime/hash64.go,
// without using unsafe for purego.

However, when I look to the effective code inside this file, the implementation looks very different from the one in runtime/hash64.go, and I think it should be changed to match it accordingly.

Among the differences:

  1. In the runtime/hash64.go, the variable hashkey is used often, but it's totally absent from the maphash_purego.go implementation. This is a [4]uint64 array that contains random values, generated during the package initialization inside runtime/alg.go. I think an init() function should be added to the maphash_purego.go file to generate random uint64 in a private global variable, similar to the hashkey one.
  2. On April 2024 (6 months ago), hash64.go implementation has been updated, in commits ef2f339 and 35c6193, now the global hashkey variable is used even more (should we add an equivalent in maphash too? Related to point 1) and 4 out of 5 constants have been removed, these changes are currently missing from the hashmap_purego.go implementation.
  3. Although the comments says the the code afterward is just a porting from hash64.go, the code structure looks very different. For example: the default case of the hash64 implementation, in this implementation is written before the whole switch, and a lot of cases are grouped under a new default case. Maybe these differences are correct, but there is no documentation to explain them, and I don't understand why they are not directly part of runtime/hash64.go too if the code does the same thing? Maybe they are less performant? Someone has any hint about this one?
@ianlancetaylor
Copy link
Member

CC @randall77

@prattmic prattmic added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 21, 2024
@prattmic prattmic added this to the Backlog milestone Oct 21, 2024
@randall77
Copy link
Contributor

I don't think the intent of these two hashes is that they match exactly. Differences are ok. Having said that, though, we should probably mirror any changes that were added for collision resistance.

I'll mark this for 1.24.

@randall77 randall77 modified the milestones: Backlog, Go1.24 Oct 22, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/621756 mentions this issue: hash/maphash: sync wyhash with runtime implementation

@ErikPelli
Copy link
Contributor Author

ErikPelli commented Oct 22, 2024

For the point 3, maybe should we check if there is any performance improvement by backporting the current maphash_purego implementation to runtime/hash64?
Idk, I checked the history and it seems that the maphash_purego.go has been created in 9b221f9, @cuonglm do you recall anything about the reasons behind it? In the gerrit discussion I don't see anything about it.

@cuonglm
Copy link
Member

cuonglm commented Oct 22, 2024

Maybe should we check if there is any performance improvement by backporting the current maphash_purego implementation to runtime/hash64?

Idk, I checked the history and it seems that the maphash_purego.go has been created in 9b221f9, @cuonglm do you recall anything about the reasons behind it? In the gerrit discussion I don't see anything about it.

What do you mean by "backporting the current maphash_purego implementation to runtime/hash64"? The purego implementation should be updated to stay up-to-date with runtime one, to improve hash collisions resistance.

maphash_purego was created to be used by compiler/runtime where unsafe is unavailable.

@ErikPelli
Copy link
Contributor Author

What do you mean by "backporting the current maphash_purego implementation to runtime/hash64"? The purego implementation should be updated to stay up-to-date with runtime one, to improve hash collisions resistance.

maphash_purego was created to be used by compiler/runtime where unsafe is unavailable.

I mean, if you check the two implementations, the multiple cases in the switch that were present in runtime, in the maphash are grouped in a single one with bitwise operations, was there a purpose behind this?

@cuonglm
Copy link
Member

cuonglm commented Oct 22, 2024

What do you mean by "backporting the current maphash_purego implementation to runtime/hash64"? The purego implementation should be updated to stay up-to-date with runtime one, to improve hash collisions resistance.
maphash_purego was created to be used by compiler/runtime where unsafe is unavailable.

I mean, if you check the two implementations, the multiple cases in the switch that were present in runtime, in the maphash are grouped in a single one with bitwise operations, was there a purpose behind this?

Like @randall77 said above, the runtime implementation maybe different, since these two hashes are not intended to be matched exactly.

The implementation in maphash_purego follows this wyhash_final4: https://github.com/wangyi-fudan/wyhash/blob/master/old_versions/wyhash_final4.h#L118

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants