Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/rand: panic generating NormFloat64 #66086

Closed
ranlavanet opened this issue Mar 4, 2024 · 7 comments
Closed

math/rand: panic generating NormFloat64 #66086

ranlavanet opened this issue Mar 4, 2024 · 7 comments

Comments

@ranlavanet
Copy link

ranlavanet commented Mar 4, 2024

Go version

go 1.20.5

Output of go env in your module/workspace:

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/user/.cache/go-build"
GOENV="/home/user/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/user/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/user/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/user/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/user/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.5"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/user/go/src/lava/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build4204543259=/tmp/go-build -gno-record-gcc-switches"

What did you do?

firstly we generate a global random instance:

// rand wrapper for protocol structs which hosts the same seed for deterministic unified random distribution
// we set the seed once for the entire process.

var protocolRand *rand.Rand

func Initialized() bool {
	return protocolRand != nil
}

func InitRandomSeed() {
	seed := time.Now().UnixNano()
	protocolRand = rand.New(rand.NewSource(seed))
}

And later on we are running:

func pertrubWithNormalGaussian(orig, percentage float64) float64 {
	perturb := rand.NormFloat64() * percentage * orig
	return orig + perturb
} 

generating the float caused our application to panic, Important to note this is very rare we used this method probably millions of times before it crashed.

What did you see happen?

panic: runtime error: index out of range [-1]

goroutine 18216 [running]:
math/rand.(*rngSource).Uint64(...)
	math/rand/rng.go:249
math/rand.(*rngSource).Int63(0x4d7c6d00?)
	math/rand/rng.go:234 +0x92
math/rand.(*Rand).Int63(...)
	math/rand/rand.go:95
math/rand.(*Rand).Uint32(...)
	math/rand/rand.go:98
math/rand.(*Rand).NormFloat64(0xc0015f7a40)
	math/rand/normal.go:39 +0x35
github.com/lavanet/lava/utils/rand.NormFloat64(...)
	github.com/lavanet/lava/utils/rand/rand.go:71

What did you expect to see?

Getting a random float without a panic :)

@Jorropo
Copy link
Member

Jorropo commented Mar 4, 2024

Looking at the code:

go/src/math/rand/rng.go

Lines 239 to 249 in 6f5d774

rng.tap--
if rng.tap < 0 {
rng.tap += rngLen
}
rng.feed--
if rng.feed < 0 {
rng.feed += rngLen
}
x := rng.vec[rng.feed] + rng.vec[rng.tap]

This bug looks impossible.
Because rng.tap and rng.feed are always updated with -- and += rngLen adds 603 if it is negative, so the index can't be -1 (as if it were += 603 would make it 602).
Is this function called concurrently ? What does running your code with -race says ?

@Jorropo Jorropo changed the title math/rand/rng.go: panic generating NormFloat64 math/rand: panic generating NormFloat64 Mar 4, 2024
@Jorropo Jorropo added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Mar 4, 2024
@ranlavanet
Copy link
Author

ranlavanet commented Mar 4, 2024

Yes, we are calling rand.NormFloat64() using different go routines so it is called concurrently, we didn't try to run it with -race this was a rare case and wouldn't reproduce easily, just thought of reporting this in case it was not reported earlier.

our repo is open source this is the file:

https://github.com/lavanet/lava/blob/main/protocol/provideroptimizer/provider_optimizer.go

and full log of the panic:


goroutine 18216 [running]:
math/rand.(*rngSource).Uint64(...)
	math/rand/rng.go:249
math/rand.(*rngSource).Int63(0x4d7c6d00?)
	math/rand/rng.go:234 +0x92
math/rand.(*Rand).Int63(...)
	math/rand/rand.go:95
math/rand.(*Rand).Uint32(...)
	math/rand/rand.go:98
math/rand.(*Rand).NormFloat64(0xc0015f7a40)
	math/rand/normal.go:39 +0x35
github.com/lavanet/lava/utils/rand.NormFloat64(...)
	github.com/lavanet/lava/utils/rand/rand.go:71
github.com/lavanet/lava/protocol/provideroptimizer.pertrubWithNormalGaussian(...)
	github.com/lavanet/lava/protocol/provideroptimizer/provider_optimizer.go:463
github.com/lavanet/lava/protocol/provideroptimizer.(*ProviderOptimizer).ChooseProvider(0xc0031880c0, {0xc04cd73600, 0xb, 0x0?}, 0x0?, 0x0?, 0xfffffffffffffffe, 0x3fb999999999999a)
	github.com/lavanet/lava/protocol/provideroptimizer/provider_optimizer.go:146 +0x286
github.com/lavanet/lava/protocol/lavasession.(*ConsumerSessionManager).getValidProviderAddresses(0xc002cde180, 0xc01a9730b0, 0xc01adb69b0?, 0x4529d4?, {0x0, 0x0}, {0x0, 0x0, 0x0}, 0x0)
	github.com/lavanet/lava/protocol/lavasession/consumer_session_manager.go:497 +0x175
github.com/lavanet/lava/protocol/lavasession.(*ConsumerSessionManager).getValidConsumerSessionsWithProvider(0xc002cde180, 0xc097d26cd8, 0x0?, 0x0?, {0x0, 0x0}, {0x0, 0x0, 0x0}, 0x0, ...)
	github.com/lavanet/lava/protocol/lavasession/consumer_session_manager.go:534 +0x2b1
github.com/lavanet/lava/protocol/lavasession.(*ConsumerSessionManager).GetSessions(0xc002cde180, {0x37663e8, 0xc01a972f90}, 0xa, 0xc01a9730b0, 0xc013508700?, {0x0, 0x0}, {0x0, 0x0, ...}, ...)
	github.com/lavanet/lava/protocol/lavasession/consumer_session_manager.go:336 +0x28f
github.com/lavanet/lava/protocol/rpcconsumer.(*RPCConsumerServer).sendRelayToProvider(0xc002cf42c0, {0x37663e8?, 0xc01a972f90}, {0x377bc20, 0xc01a959680}, 0xc01ac826e0, {0x2abe5e3, 0xd}, {0xc01aca06c0, 0x1e}, ...)
	github.com/lavanet/lava/protocol/rpcconsumer/rpcconsumer_server.go:486 +0x789
github.com/lavanet/lava/protocol/rpcconsumer.(*RPCConsumerServer).SendRelay(0xc002cf42c0, {0x37663e8?, 0xc01a972f90}, {0x0, 0x0}, {0xc01a976d70, 0x44}, {0x2aaf4fa, 0x4}, {0x2abe5e3, ...}, ...)
	github.com/lavanet/lava/protocol/rpcconsumer/rpcconsumer_server.go:301 +0x79e
github.com/lavanet/lava/protocol/chainlib.(*JsonRPCChainListener).Serve.func3(0xc01ac93200)
	github.com/lavanet/lava/protocol/chainlib/jsonRPC.go:454 +0x76e
github.com/gofiber/fiber/v2.(*App).next(0xc001130500, 0xc01ac93200)
	github.com/gofiber/fiber/v2@v2.50.0/router.go:145 +0x1bf
github.com/gofiber/fiber/v2.(*Ctx).Next(0xc01acca930?)
	github.com/gofiber/fiber/v2@v2.50.0/ctx.go:1014 +0x53
github.com/lavanet/lava/protocol/chainlib.createAndSetupBaseAppListener.func1(0xc01ac93200)
	github.com/lavanet/lava/protocol/chainlib/common.go:340 +0x1be
github.com/gofiber/fiber/v2.(*Ctx).Next(0xc01ac93200?)
	github.com/gofiber/fiber/v2@v2.50.0/ctx.go:1011 +0x43
github.com/gofiber/fiber/v2/middleware/favicon.New.func1(0xc01ac93200)
	github.com/gofiber/fiber/v2@v2.50.0/middleware/favicon/favicon.go:121 +0xae
github.com/gofiber/fiber/v2.(*App).next(0xc001130500, 0xc01ac93200)
	github.com/gofiber/fiber/v2@v2.50.0/router.go:145 +0x1bf
github.com/gofiber/fiber/v2.(*App).handler(0xc001130500, 0x4dde77?)
	github.com/gofiber/fiber/v2@v2.50.0/router.go:172 +0x87
github.com/valyala/fasthttp.(*Server).serveConn(0xc001396600, {0x3776198?, 0xc00e2cca98})
	github.com/valyala/fasthttp@v1.50.0/server.go:2359 +0x11d3
github.com/valyala/fasthttp.(*workerPool).workerFunc(0xc00195fcc0, 0xc01aadf3c0)
	github.com/valyala/fasthttp@v1.50.0/workerpool.go:224 +0xa9
github.com/valyala/fasthttp.(*workerPool).getCh.func1()
	github.com/valyala/fasthttp@v1.50.0/workerpool.go:196 +0x38
created by github.com/valyala/fasthttp.(*workerPool).getCh
	github.com/valyala/fasthttp@v1.50.0/workerpool.go:195 +0x1b0```


@Jorropo
Copy link
Member

Jorropo commented Mar 4, 2024

You should try -race, you need extremely specific luck for the panic to happen normally because of the extremely short delay between the if and indexing and cache coherence but -race manually keep tracks of unsynchronized accesses, it can catch things much faster.

@ranlavanet
Copy link
Author

ok an update here this happens quite frequently, I would hope it would get fixed. for now we will just use some other methodology

@Jorropo
Copy link
Member

Jorropo commented Mar 4, 2024

what happens frequently ? The race ?

@ranlavanet
Copy link
Author

yes, we are using this method when traffic enters our servers, when the load increased this happens every few seconds. crashing our server.

@Jorropo
Copy link
Member

Jorropo commented Mar 4, 2024

There is this mention at the top of math/rand:

Random numbers are generated by a Source, usually wrapped in a Rand. Both types should be used by a single goroutine at a time: sharing among multiple goroutines requires some kind of synchronization.

I'm closing this as wontfix if that a confirmed race as you are improperly using a non thread safe object concurrently.
Might be worth reopening as documentation issue, if this notice were on math/rand.Rand it's more likely to show up when using gopls.

@Jorropo Jorropo closed this as not planned Won't fix, can't repro, duplicate, stale Mar 4, 2024
@Jorropo Jorropo removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants