Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: allow map hashes with different tradeoffs #36915

Open
eric-s-raymond opened this issue Jan 31, 2020 · 10 comments
Open

runtime: allow map hashes with different tradeoffs #36915

eric-s-raymond opened this issue Jan 31, 2020 · 10 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@eric-s-raymond
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.13.4 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
go version go1.13.4 linux/amd64
esr@snark:~/WWW/reposurgeon$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/esr/.cache/go-build"
GOENV="/home/esr/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/esr/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.13"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.13/pkg/tool/linux_amd64"
GCCGO="/usr/bin/gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/esr/WWW/reposurgeon/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build890210982=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Wrote a complicated data transformation using maps heavily. Observed that after I did enough optimization, map hashing costs actually competed with GC as a performance bottleneck.

This is where the standard bug template stops being useful. The problem I see is that the choice of aeshash512 is more expensive than is right for my application, and I can't fix that.

I'm not arguing with the choice to optimize for cryptographic hardness over performance by default; given Golang's intended role as a language for network servers this was sensible. But my application - reposurgeon - doesn't need that hardening, and is paying a significant performance cost for it.

Therefore, this RFE: Offer a runtime switch that allows plugging in a weaker but faster hash.

@randall77
Copy link
Contributor

aeshash512

Where did 512 come from? That doesn't exist.

What do your keys look like (type, size, etc.)? Do you have a suggestion for an alternative, and how fast do you think it would be?

@eric-s-raymond
Copy link
Author

randall77: I thought I saw that suffix in the profiles.

My keys are typically strings or 32-bit ints (in the latter case these numbers are revision IDs in a repository event sequence).

I'm not an expert on hash functions, so I hesitate to suggest a concrete alternative. What I want is lookup speed.

@mvdan
Copy link
Member

mvdan commented Jan 31, 2020

Have you explored using a custom map implementation? For example, if you want to optimize for performance, and you know that the keys are 32-bit integers, I'm sure it's possible to beat map[int32]Value with a custom type. For example, see https://github.com/brentp/intintmap - though I'm not an expert either, and I haven't tried that particular package.

You'd have to replace the nice syntax with methods, but I think that's an OK tradeoff for the niche use cases where map isn't enough.

@odeke-em odeke-em changed the title RFE: Allow map hashes with different tradeoffs runtime: allow map hashes with different tradeoffs Jan 31, 2020
@seebs
Copy link
Contributor

seebs commented Jan 31, 2020

https://godoc.org/modernc.org/hash <-- this also exists

having seen the nightmares the Java people get from allowing users to specify hashes, I am moderately inclined to think that Go's choice of "don't let user specify a hash" is a good one for the innate map implementation.

@networkimprov
Copy link

Re "nightmares the Java people get," could you elaborate?

@seebs
Copy link
Contributor

seebs commented Jan 31, 2020

A while back Java started switching buckets to tree structures under load because it's fairly common for user-provided hash choices to be nigh-pessimal, producing a handful of buckets which end up with most of the items in them. I was told once that this may be in part because some IDE's prefilled template for a hash implementation provided a spectacularly poorly-considered suggested default hash, but that could be apocryphal. But generally, it turns out that if you allow people to pick hash functions, they do, and most of us are genuinely not qualified to pick hash functions, but also most of us don't realize this. Allowing a bit more specialization without actually letting users pick the function might work, but I'd be wary.

I also note that, at least in my experience, if I am seeing the map lookup get expensive, it's possible that I need to coalesce operations -- extract a value, do operations on it, put it back in, rather than writing a long sequence of operations using the map key. That's bitten me more than once.

@randall77
Copy link
Contributor

@networkimprov There is also the danger of writing a hash function and an equals function that aren't compatible (two equal values hash to different things). When that happens, it leads to very confusing behavior.

@eric-s-raymond
Copy link
Author

eric-s-raymond commented Jan 31, 2020 via email

@beoran
Copy link

beoran commented Jan 31, 2020

Maybe this could be done by adding a third argument to make() for maps, where the third argument is a string with the description of the hashing function/strategy. The compiler errors if the third argument is unknown to it.

Then again, if we get Go generics, then a generic hash map library should solve this issue just a easily and perhaps with even better performance.

@CAFxX
Copy link
Contributor

CAFxX commented Feb 1, 2020

Maybe this could be done by adding a third argument to make() for maps, where the third argument is a string with the description of the hashing function/strategy. The compiler errors if the third argument is unknown to it.

I would rather avoid extra parameters, and have maps start with a faster/weaker hash: if then the map detects extremely skewed bucket overflows (that IIRC we already use to trigger map growth) it could switch to a slower hash with better confusion properties (like the current aeshash) during map growth.

@cagedmantis cagedmantis added this to the Backlog milestone Feb 7, 2020
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 7, 2020
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Status: Triage Backlog
Development

No branches or pull requests

9 participants