Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/rand: Panic fetching random boundary string #9205

Closed
gopherbot opened this issue Dec 4, 2014 · 23 comments
Closed

crypto/rand: Panic fetching random boundary string #9205

gopherbot opened this issue Dec 4, 2014 · 23 comments
Milestone

Comments

@gopherbot
Copy link

by birkirb@stoicviking.net:

What happened?

On creating a random boundary the following panic occurred:

panic: read /dev/urandom: resource temporarily unavailable

Which unfortunately crashed my program (since it was running in a go-routine off main
thread which had panic recovery).

Is having a panic recovery for all go-routines the Right Thing to do?

What should have happened instead?

Error in this rate case would have been nice. Not sure what causes /dev/urandom to be
unavailable.
@ianlancetaylor
Copy link
Contributor

Comment 1:

What system did this happen on?
I don't understand why /dev/urandom would be unavailable.  But I also don't understand
why mime/multipart needs secure random strings.  As far as I can see mime/multipart
could reasonably use math/random instead.
Brad, you switched from math/rand to crypto/rand in
http://golang.org/cl/4532089 .  Any special reason?
It's also possible that this indicates a bug in crypto/rand.

Labels changed: added repo-main, release-go1.5.

@bradfitz
Copy link
Contributor

bradfitz commented Dec 4, 2014

Comment 2:

Russ told me to.
https://golang.org/cl/4530054/diff/19001/src/pkg/mime/multipart/writer.go#newcode48
"I know it seems like overkill but I think you'd be better
off importing and using crypto/rand here.  Package rand is
not great if your goal is to avoid collisions.  For example
since the generator is not typically seeded this will use
the same boundary for the first message each time it gets run.  Even if it is
seeded, you're only getting 64 bits of
randomness out of this, and you're trying to pull out 
quite a bit more."
And I still agree with that.
I also have no clue why /dev/urandom would be temporarily unavailable. 
If there was a well-seeded random mechanism available, I would use it, but I don't
believe there is.

@gopherbot
Copy link
Author

Comment 3 by birkirb@stoicviking.net:

System was:
$ uname -a
Linux ####### 3.14.19-17.43.amzn1.x86_64 #1 SMP Wed Sep 17 22:14:52 UTC 2014 x86_64
x86_64 x86_64 GNU/Linux

@ianlancetaylor
Copy link
Contributor

Comment 4:

Which version of Go did you use to build your program?

@ianlancetaylor
Copy link
Contributor

Comment 5:

I don't see any way that the Linux kernel could return EAGAIN for a read of /dev/urandom
when it is in blocking mode.  I don't see any Go code that could set the file descriptor
to nonblocking mode.
Is this problem repeatable?  If so, please run it under strace -f and attach the results.

@gopherbot
Copy link
Author

Comment 6 by birkirb@stoicviking.net:

Sorry, I thought I had pasted a complete path in the subject.
It was here:
https://code.google.com/p/go/source/browse/src/pkg/mime/multipart/writer.go?name=release-branch.go1.3#75
So compiled in go 1.3
It happened twice in two separate go-routines, that both were uploading files. Of of
those go routines was using goamz which caught the panic and returned an error, but the
other didn't which crashed my server :(
It's definitely not common, I've run the same code for over two weeks without this
issue, and probably not easily repeatable since I'm a bit clueless as to why
/dev/urandom would be unavailable.  I had recently restarted my application which uses
goagain for seemless restarts.
Would having a fallback to math/rand make sense there? Wasn't really expecting a panic
in this package.

@bradfitz
Copy link
Contributor

bradfitz commented Dec 4, 2014

Comment 7:

Could it be the lazy os.Open that returned EAGAIN? That would also be weird.
In Go 1.4 with a modern kernel we use the getrandom system call so no fd opens/reads are
involved.

@minux
Copy link
Member

minux commented Dec 4, 2014

Comment 8:

Could it be the case that the VM has just started and there is not enough entropy
in the random pool so even /dev/urandom returns EAGAIN?
(the kernel shouldn't do this, but on VMs, this has been known to create security
issues, e.g. when generating ssh hostkeys, so perhaps the kernel contain a patch)
Also, according to this post,
https://groups.google.com/forum/#!topic/cryptopp-users/mEMkHo3Gafk
the poster asserted that it's possible for read from urandom to return EAGAIN and
EINTR.

@ianlancetaylor
Copy link
Contributor

Comment 9:

OK, I guess crypto/rand should handle a read from /dev/urandom returning EAGAIN (note
that we should not see EINTR as all signals should have SA_RESTART set).

Status changed to Accepted.

@bradfitz bradfitz modified the milestone: Go1.5 Dec 16, 2014
@bradfitz
Copy link
Contributor

@rsc
Copy link
Contributor

rsc commented Dec 22, 2014

Reopening. We need to understand this better.

@rsc rsc reopened this Dec 22, 2014
@minux
Copy link
Member

minux commented Dec 28, 2014

I did some reading of relevant kernel source code, and still
couldn't find out how the kernel can return EAGAIN for
reading /dev/urandom.

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/char/random.c?id=refs/heads/master

@ianlancetaylor
Copy link
Contributor

I also don't see how the kernel could possibly return EAGAIN here ,which is why I requested the strace -f. I was convinced that we should handle this by your comment 8. I also wonder if some people "ln -sf /dev/random /dev/urandom".

@birkirb
Copy link

birkirb commented Feb 11, 2015

Hi, I'm getting this more and more and from a few different sources, not just multipart, but anything using /dev/urandom. Some examples:

&os.PathError{Op:"read", Path:"/dev/urandom", Err:0xb}
&url.Error{Op:"Post", URL:"https://platform-api.newrelic.com/platform/v1/metrics", Err:(*errors.errorString)(0xc20844fbd0)}
tls: short read from Rand: read /dev/urandom: resource temporarily unavailable

I've found a way for this to happen consistently so I'll give some background:

I'm running a API based on Revel. It's been modified by me to use GoAgain to fork the process, shutdown and start again with a new listener.

This didn't happen from the start, but at some point, probably given increase traffic through the API this started to happen consistently every time I restarted the app using the fork method. I start to get massive amount of errors. I can alleviate those by stopping the process completely and start it again (no forks).

So something must be broken, not cleaned up or otherwise causing issues. This error makes it really tough for me to run updates to my application since I have to manually restart servers that exhibit the errors, not all do, but some will.

I'd really appreciate any help in getting to the bottom of this.
Do the symptoms described here give an idea of any possible causes?

@bradfitz
Copy link
Contributor

If you could write a standalone program (or Test function) to reliably reproduce this, that would be most helpful. I'd start with a bunch of goroutines reading from /dev/urandom while another few repeatedly run child processes. Maybe mix in some TCP network I/O to yourself.

@birkirb
Copy link

birkirb commented Feb 18, 2015

I'll try when I have some time for it, hopefully soonish. Another weird thing I noticed just today is that it takes almost 10 minutes after my restarts before this starts happening, and then almost simultaneously on two independent machines. After that it just keeps failing until I stop and start again.

@rsc rsc removed accepted labels Apr 14, 2015
@rsc
Copy link
Contributor

rsc commented Jun 29, 2015

@birkirb Is this still happening for you? If so, can you send us information about the kernel you are running? At the least, 'uname -a' output, also 'cat /etc/lsb-release', and if you are using a VM hosting provider, which one? Also useful might be the output of 'ls -l /dev/*random', to test Ian's hypothesis that urandom is configured as random (which I kind of doubt, but worth checking).

It really seems like this is a kernel problem, but maybe we can help find it. Thanks.

@rsc rsc modified the milestones: Go1.5Maybe, Go1.5 Jun 29, 2015
@birkirb
Copy link

birkirb commented Jul 1, 2015

This is still happening, yes but there has been a development. I started building with go1.4, after which the exact same symptoms now cause a different error.

Instead of the os.PathError, I get:

x509: failed to load system roots and no roots provided

and the url.Error is now:

&url.Error{Op:"Post", URL:"https://platform-api.newrelic.com/platform/v1/metrics", Err:x509.SystemRootsError{}}

@jjhuff
Copy link

jjhuff commented Jul 1, 2015

We've also run into this from time to time as well. Always starting post-restart using GoAgain.

Caveats:

  • Go 1.2
  • Similarly old GoAgain

I'll be updating those shortly. I'll also try to put together a repro case.

@rsc
Copy link
Contributor

rsc commented Jul 14, 2015

Not enough information for Go 1.5.

@rsc rsc modified the milestones: Go1.6, Go1.5Maybe Jul 14, 2015
@birkirb
Copy link

birkirb commented Jul 23, 2015

To give you the info you requested earlier:

$ uname -a
Linux hostname 3.14.35-28.38.amzn1.x86_64 #1 SMP Wed Mar 11 22:50:37 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$ ls -l /dev/*random
crw-rw-rw- 1 root root 1, 8 Jun 28 12:26 /dev/random
crw-rw-rw- 1 root root 1, 9 Jun 28 12:26 /dev/urandom

I'll attempt to switch from GoAgain to FB's Grace package, see if it will work as a workaround.

@birkirb
Copy link

birkirb commented Jul 31, 2015

Grace was a successful workaround for me. Haven't seen this issue yet after a week of redeploys. //cc @jjhuff

@rsc
Copy link
Contributor

rsc commented Oct 16, 2015

Closing this as not reproducible.

@rsc rsc closed this as completed Oct 16, 2015
@golang golang locked and limited conversation to collaborators Oct 17, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants