Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strings: unexpected nil pointer dereference on Replace #24541

Closed
blind-oracle opened this issue Mar 26, 2018 · 9 comments
Closed

strings: unexpected nil pointer dereference on Replace #24541

blind-oracle opened this issue Mar 26, 2018 · 9 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@blind-oracle
Copy link

What version of Go are you using (go version)?

go version go1.10 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/root/go"
GORACE=""
GOROOT="/opt/go"
GOTMPDIR=""
GOTOOLDIR="/opt/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build061548367=/tmp/go-build"

What did you do?

Sadly I can't reproduce the issue, I can't even imagine how it could happen.

Very rarely our service crashes in strings.Replace, because (for some impossible reason) it receives the first (string) argument as 0x0 (nil), according to the backtrace:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x458675]

goroutine 25438489 [running]:
strings.Count(0x0, 0x7, 0xdd60a1, 0x1, 0xc456b46a68)
        /opt/go/src/strings/strings_amd64.go:85 +0x96
strings.Replace(0x0, 0x7, 0xdd60a1, 0x1, 0xdd6242, 0x2, 0xffffffffffffffff, 0x20, 0x7)
        /opt/go/src/strings/strings.go:847 +0x9b
.../logrusformat.(*MTLogrusFormat).Format(0xc42018c600, 0xc45479ca50, 0x7ff3e3065c34, 0xc456b46c02, 0x21, 0xc456b46cf0, 0x411e2d)
        .../logrusformat/format.go:80 +0x29b
.../log/logrus-syslog-hook.(*SyslogHook).Fire(0xc42014cdc0, 0xc45479ca50, 0x2, 0xc420258238)
        .../log/logrus-syslog-hook/syslog.go:80 +0x4dd

The sample code snippet is:

type F struct {}

func (f *F) a() {
 x := "whatever"
 y := strings.Replace(f.b(x), `"`, `\"`, -1)
}

func (f *F) b(i interface{}) string {
 return fmt.Sprintf("%v", i)
}

So I'm calling some formatting method, which takes an interface and returns string.
By the way, I don't see this function call in the backtrace, it goes directly to Replace()->Count()
And then somehow Replace() gets a nil address as a first arg.

From what I can tell string can't be nil and this should not happen.

What did you expect to see?

no panic

What did you see instead?

panic

@mvdan
Copy link
Member

mvdan commented Mar 26, 2018

Are you positive there isn't a data race in your code? Try running the tests or one of the services with -race.

Did this start happening recently? For example, did it never happen with 1.9, and start happening with 1.10?

@mvdan mvdan added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 26, 2018
@mvdan mvdan changed the title Panic in strings.Replace strings: unexpected nil pointer dereference on Replace Mar 26, 2018
@blind-oracle
Copy link
Author

blind-oracle commented Mar 26, 2018

I can't be 100% certain, of course, but I've checked the code thoroughly and I'm more or less sure that the call to this function is covered by a mutex, because there are not that much places where it's called from.

I'll try to run it with race, but I don't know whether I can manage to do this on production load for long, since it happens only on 1-2 requests out of 60-70 million a day.

Yes, I'm seeing this for the first time after we've moved to go1.10.
I will try to downgrade and see if it helps.

@mvdan
Copy link
Member

mvdan commented Mar 26, 2018

I'll ping some people familiar with the runtime and compiler, in case they have any ideas. Would be great if you could confirm whether this is something new in 1.10.

/cc @aclements @randall77

@blind-oracle
Copy link
Author

blind-oracle commented Mar 26, 2018

Thanks!

I've given it a second thought and came to a conclusion that even a data race in this case can't be the reason. The string returned by the b() function shouldn't be subject to any races because it's not a pointer (under the hood it's anyway a pointer, but from go perspective it shouldn't be) and it's not shared in any way. I may be wrong, of course, but that's how I see it currently...

@aclements
Copy link
Member

Perhaps more concerning is not that the string is nil, but that it's nil with a length of 7:

strings.Replace(0x0, 0x7, 0xdd60a1, 0x1, 0xdd6242, 0x2, 0xffffffffffffffff, 0x20, 0x7)

The string is actually the first two words of arguments: the base pointer and the length.

A race still seems like the most likely candidate. I don't know what your specific code looks like, but, for example, if two goroutines can access the same string value and one writes an empty string to it at the same time the other writes a 7 byte long string to it, the writes could shear and result in a nil string pointer with length 7. If that were then passed to strings.Replace, this is exactly what you would see.

@andybons andybons added this to the Go1.11 milestone Mar 26, 2018
@blind-oracle
Copy link
Author

Yes, I was searching for possible races but I couldn't find any.

I have a reentrant function (that takes an interface{}, processes it somehow and returns string), which is called from this panicking goroutine. If I understand Go correctly then this string cannot be shared with other goroutines, unless I'll share the pointer to this string somehow through globals or channel, which I'm not doing of course.

@davecheney
Copy link
Contributor

davecheney commented Mar 27, 2018 via email

@mvdan mvdan added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label May 29, 2018
@mvdan
Copy link
Member

mvdan commented May 29, 2018

@blind-oracle reminder that we are waiting on the confirmation that you can reproduce the crash while running the program with the race detector enabled. Otherwise, there is little that can be done about this issue without a way to reproduce it, or any other details on how it happens.

@blind-oracle
Copy link
Author

@mvdan Sorry for silence - we haven't encountered this crash again not with race detector, nor without it. So I guess we can close this one for now, will reopen in case it appears again. Thanks!

@golang golang locked and limited conversation to collaborators May 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

6 participants