Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: possible garbage collector error? #21726

Closed
davidrenne opened this issue Sep 1, 2017 · 12 comments
Closed

runtime: possible garbage collector error? #21726

davidrenne opened this issue Sep 1, 2017 · 12 comments

Comments

@davidrenne
Copy link

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.8

Does this issue reproduce with the latest release?

We could try it later.

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/ubuntu/"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build023866510=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

Hello I am having some sort of garbage collector or memory error where data just disappears on me.

I wrote the following code which is calling the SaveWithTran function many times. SaveWithTran takes in a transaction Id and essentially builds up all the mongo records we wish to flush to the database in an in memory map[string]*transactionsToPersist. Only if all the records in our transaction are without error, do we finally loop through the persisted transactions and finally flush to mongo on our Commit function.

calling code:

https://gist.github.com/davidrenne/8b80246fcfe8182722c1d000026a2014#file-gistfile1-txt-L75-L87

I have highlighted the area where errors are being thrown because data is just disappearing. Now it is important to note that no errors are thrown when I use this code:

https://gist.github.com/davidrenne/8b80246fcfe8182722c1d000026a2014#file-gistfile1-txt-L57-L72

macro.go:

This is a collection model file that is auto generated from https://github.com/DanielRenne/GoCore/blob/master/core/dbServices/createDBServices.go which basically outputs this file which we use to save data to mongo:

https://gist.github.com/davidrenne/4b1c361b9ebb92f7e71e1bc4235864b0

model.go:

defines the structs for tQueue and modelEntity
https://gist.github.com/davidrenne/3a46e1d24a340cfe12f9e02809a4e0b0

And finally here are my logs with the log.Println() in various places we return in SaveWithTran() and logging the transaction queue.

As you read the code and read the log please identify that the BSON ID 59a90a8c1d41c86e12e9b653 is the problematic one that just disappears. I had over 30000 line items in my log before this where the slice contained the BSON id 59a90a8c1d41c86e12e9b653 and then just magically the data disappeared not because of a race condition where our mutexes were faulty, but because golang just deleted it from the map (I suspect because of the range’s temporary variable for “macro” was garbage collected and since https://gist.github.com/davidrenne/3a46e1d24a340cfe12f9e02809a4e0b0#file-gistfile1-txt-L87 newItems is not a pointer and thus when it assigned an entityTransaction object into the slice and since that object was passed from a range, that the garbage collector is deleting references that are still in use). Does this make any sense to anyone? I am here to explain further and it will be nearly impossible to recreate without you guys having our system and build in front of you, but I feel that the garbage collector has a bug and is deleting stuff out of my slice!!

https://gist.github.com/davidrenne/33ee3c46ec501650f1312a1ca6bbe48a#file-gistfile1-txt-L2349

Any explanation of maybe some theories how this could be happening I would really appreciate! Thanks for any insight!

My work around to battle the golang garbage collector (or other issue in my code) is to create a simple map[string][]string

https://gist.github.com/davidrenne/4b1c361b9ebb92f7e71e1bc4235864b0#file-gistfile1-txt-L467

(I commented it out so I could provide the failed log where the code bails out and logs log.Println("exit 12441241") because it cannot find a committed record in the database)

https://gist.github.com/davidrenne/4b1c361b9ebb92f7e71e1bc4235864b0#file-gistfile1-txt-L433-L441

And I re-inject the missing data that golang stole from me!! BAD BAD GOPHER (or bad code!! ;-P )

What did you expect to see?

That my map[string] does not just disappear out of the list.

What did you see instead?

My map[string] 59a90a8c1d41c86e12e9b653 disappeared and thus errored

@davecheney
Copy link
Contributor

davecheney commented Sep 1, 2017 via email

@cznic
Copy link
Contributor

cznic commented Sep 1, 2017

@davidrenne It would be helpful if you could reduce the failure-reproducing code to a minimal, self-contained program. Links to snippets are not as useful and I didn't read them. Nonetheless, can you please answer some questions?

  • Does the program use CGO? (Possible source of memory corruption.)
  • Does the program use package unsafe directly? (ditto)
  • Does the program run cleanly under the race detector?

@davidrenne
Copy link
Author

Hey guys. Thanks for the quick responses. I just tested in golang 1.9 and it fails still.

No we dont use CGO.

No we dont use the unsafe package.

Yes there are some race conditions in the application but in other areas of the code outside of the impacted modules, there are no race conditions.

I have a hard time spending 8 hours on the job to create a perfect unit test. But I am more just seeking advice of the types of things people think might be able to happen if you can imagine the flow of the code and the structs we have and the behaviors of how many times SaveWithTran() was called along with the log how possibly the BSON ID of 59a90a8c1d41c86e12e9b653 just disappears from a slice. No other code is interfacing with this dedicated mutex and value used to track all of the data.

If someone brilliant who knows more of the internals of go thinks about my caller which works in golang 1.8 and 1.9

			//for i := range newMacros {
			//	macId := newMacros[i].Id.Hex()
			//	session_functions.VelocityDump(macId)
			//
			//	for idx, command := range newMacros[i].MacroCommands {
			//		if command.DeviceId == d.DeviceId {
			//			newMacros[i].MacroCommands[idx].DeviceId = vmDevice.Device.Id.Hex()
			//			err = newMacros[i].SaveWithTran(t)
			//			if err != nil {
			//				err = errors.Wrap(err, core.Debug.ErrLineAndFile(err))
			//				message = constants.ERRORS_ROOM_COPY
			//				return
			//			}
			//		}
			//	}
			//}

But this one fails I think because the reference to macro was garabage collected and then subsequently the slice index was deleted out of our stack

			for _, macro := range newMacros {
				for idx, command := range macro.MacroCommands {
					if command.DeviceId == d.DeviceId {
						macro.MacroCommands[idx].DeviceId = vmDevice.Device.Id.Hex()
						err = macro.SaveWithTran(t)
						if err != nil {
							err = errors.Wrap(err, core.Debug.ErrLineAndFile(err))
							message = constants.ERRORS_ROOM_COPY
							return
						}
					}
				}
			}

If someone has an idea that it could be a bug in golang with any evidence, I would gladly work with my team to come up with a smaller unit test, but I didnt want to waste the time until someone brilliant says they might think it could be a bug.

Thanks for your time gophers!

@ianlancetaylor
Copy link
Contributor

I don't know what is happening, but I want to make clear that the garbage collector does not remove individual items from a map or slice. If items are disappearing unexpectedly from a slice then my guess would be that you have unexpected slice aliasing.

@ianlancetaylor ianlancetaylor changed the title Possible Garbage Collector Error? runtime: possible garbage collector error? Sep 1, 2017
@davidrenne
Copy link
Author

@ianlancetaylor what is sclice aliasing? Got a link?

@cznic
Copy link
Contributor

cznic commented Sep 1, 2017

Eg.

a := []int{1, 2, 3}
b := a[1:]
...
b[0] = 0
println(a[1] == 0) // true

Also

Yes there are some race conditions in the application but in other areas of the code outside of the impacted modules, there are no race conditions.

It's really necessary to get rid of any data races first, wherever they are. Only after that other conclusions can be reasonably made.

@davidrenne
Copy link
Author

Thanks @cznic ok. Well sometimes races happen when our code falls into certain scenarios.

We always build in dev with -race, but is it plausible if I see in my log that during a test that golang says there was no race conditions then that test is verified as cleared?

Yea I know with this particular slice we are never indexing and assigning stuff like that. I am merely calling our SaveWithTransaction() function many times and the data disappears.

Ill try to reproduce and make sure my logs say no race conditions happen and update the thread.

@cznic
Copy link
Contributor

cznic commented Sep 1, 2017

Unfortunately, happy race detector is not a proof of a race free program. The opposite works though, if the race detector complains, the program has a real race condition (no false positives).

BTW, there are other ways a slice item can get (unexpectedly) mutated, for example:

a := []int{1, 2, 3}
p := &a[1]
...
*p = 0
println(a[1] == 0) // true

@davidrenne
Copy link
Author

Also I would be glad to do a screenshare and single step debugger if anyone is interested I have setup in golang using pycharm, I can get to the exact loop where it drops off easily.

@davidrenne
Copy link
Author

wow thanks @cznic for the golang trickery. Ill provide those to my brother.

@ianlancetaylor
Copy link
Contributor

I assume that there is nothing for us to fix on the Go side. Please comment if you disagree.

@davidrenne
Copy link
Author

Nah we’re good, it’s been so long I don’t even know how to recreate it. We’ve moved on and I’m sure it’s just bad code on our side somewhere not the language

@golang golang locked and limited conversation to collaborators Mar 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants