Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/rand: warn about very slow (60-second) /dev/urandom reads #22614

Closed
ncw opened this issue Nov 7, 2017 · 17 comments
Closed

crypto/rand: warn about very slow (60-second) /dev/urandom reads #22614

ncw opened this issue Nov 7, 2017 · 17 comments
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Proposal-Accepted Security
Milestone

Comments

@ncw
Copy link
Contributor

ncw commented Nov 7, 2017

What version of Go are you using (go version)?

go version go1.9.2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

This issue shows itself on linux/arm

What did you do?

Compile this program and run it on a linux/arm server.

What did you expect to see?

If it works it should produce output similar to this

go version devel +0ba4eba864 Mon Nov 6 08:37:30 2017 +0000
Status: 401: 401 Unauthorized
Body:
{
  "code": "missing_auth_token",
  "message": "Missing authorization token",
  "status": 401
}

What did you see instead?

However when compiled with go1.9 or go-tip it hangs indefinitely on some ARM servers. It works fine with go-1.8.

This issue came up as part of rclone/rclone#1794

I haven't managed to replicate the hang on my raspberry Pi 3, but both @HptmHavoc and @freespace have on scaleway arm servers using different kernels.

I'm not 100% sure what is going on but it is clearly a regression.


From the above @freespace wrote:

$ time ./b2get-go-tip
go version devel +0ba4eba864 Mon Nov 6 08:37:30 2017 +0000
^C

real    0m7.447s
user    0m0.020s
sys     0m0.010s
steve@mercury:~/tmp/b2get
$ time ./b2get-go1.9.2
go version go1.9.2
^C

real    0m10.927s
user    0m0.030s
sys     0m0.010s
steve@mercury:~/tmp/b2get
$ time ./b2get-go1.8.3
go version go1.8.3
Status: 401: 401 Unauthorized
Body:
{
  "code": "missing_auth_token",
  "message": "Missing authorization token",
  "status": 401
}
real    0m0.842s
user    0m0.230s
sys     0m0.010s
$ uname -a
Linux mercury 4.5.7-std-4 #1 SMP Tue Jul 12 11:00:06 UTC 2016 armv7l GNU/Linux
$ cat /proc/cpuinfo | head -20
processor       : 0
model name      : ARMv7 Processor rev 2 (v7l)
BogoMIPS        : 50.00
Features        : half thumb fastmult vfp edsp thumbee vfpv3 tls idiva idivt vfpd32 lpae
CPU implementer : 0x56
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0x584
CPU revision    : 2

...
<snip>
...
Hardware        : Marvell Armada 370/XP (Device Tree)
Revision        : 0000
Serial          : 0000000000000000

Mine is a scaleway arm server.


And from the above @HptmHavoc wrote:

Here pretty much the same, also a scaleway arm server.

time ./b2get-go-tip 
go version devel +0ba4eba864 Mon Nov 6 08:37:30 2017 +0000
^CCommand terminated by signal 2
real	0m 34.52s
user	0m 0.00s
sys	0m 0.01s

time ./b2get-go1.9.2 
go version go1.9.2
^CCommand terminated by signal 2
real	3m 2.82s
user	0m 0.02s
sys	0m 0.00s

time ./b2get-go1.8.3 
go version go1.8.3
Status: 401: 401 Unauthorized
Body:
{
  "code": "missing_auth_token",
  "message": "Missing authorization token",
  "status": 401
}real	0m 0.78s
user	0m 0.16s
sys	0m 0.01s
$ uname -a
Linux scw-553c2c 4.4.96-mainline-rev1 #1 SMP Thu Nov 2 11:27:34 UTC 2017 armv7l Linux
$ cat /proc/cpuinfo | head -20
processor	: 0
model name	: ARMv7 Processor rev 2 (v7l)
BogoMIPS	: 50.00
Features	: half thumb fastmult vfp edsp thumbee vfpv3 tls idiva idivt vfpd32 lpae 
CPU implementer	: 0x56
CPU architecture: 7
CPU variant	: 0x2
CPU part	: 0x584
CPU revision	: 2

...
<snip>
...

Hardware	: Marvell Armada 370/XP (Device Tree)
Revision	: 0000
Serial		: 0000000000000000
@ncw
Copy link
Contributor Author

ncw commented Nov 7, 2017

@HptmHavoc and/or @freespace it would be useful if you could kill the hung program (running the go-tip binary) with SIGQUIT (CTRL-\) and post the output - that will give us more idea what is going on.

@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Nov 7, 2017
@ianlancetaylor ianlancetaylor added this to the Go1.10 milestone Nov 7, 2017
@HptmHavoc
Copy link

Here is the Output with SIGQUIT

time ./b2get-go-tip 
go version devel +0ba4eba864 Mon Nov 6 08:37:30 2017 +0000
^\SIGQUIT: quit
PC=0x6a4cc m=0 sigcode=128

goroutine 33 [syscall]:
runtime.notetsleepg(0x340ff4, 0xfb9a8811, 0x6, 0x33e758)
	/opt/go/go-tip/src/runtime/lock_futex.go:227 +0x2c fp=0x1062678c sp=0x10626774 pc=0x1f078
runtime.timerproc(0x340fe0)
	/opt/go/go-tip/src/runtime/time.go:260 +0x430 fp=0x106267e4 sp=0x1062678c pc=0x5d340
runtime.goexit()
	/opt/go/go-tip/src/runtime/asm_arm.s:1015 +0x4 fp=0x106267e4 sp=0x106267e4 pc=0x698ec
created by runtime.(*timersBucket).addtimerLocked
	/opt/go/go-tip/src/runtime/time.go:159 +0x124

goroutine 1 [select]:
net/http.(*Transport).getConn(0x33c3f8, 0x1064aa40, 0x0, 0x226a6a, 0x5, 0x106143a0, 0x17, 0x0, 0x0, 0x114)
	/opt/go/go-tip/src/net/http/transport.go:948 +0x3bc
net/http.(*Transport).RoundTrip(0x33c3f8, 0x10694200, 0x33c3f8, 0x0, 0x0)
	/opt/go/go-tip/src/net/http/transport.go:400 +0x410
net/http.send(0x10694200, 0x249228, 0x33c3f8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1060c568, 0x10694200, ...)
	/opt/go/go-tip/src/net/http/client.go:252 +0x134
net/http.(*Client).send(0x33e9f0, 0x10694200, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2, 0x0, 0x33f088, ...)
	/opt/go/go-tip/src/net/http/client.go:176 +0xb8
net/http.(*Client).Do(0x33e9f0, 0x10694200, 0x226a6a, 0x39, 0x0)
	/opt/go/go-tip/src/net/http/client.go:615 +0x1d4
net/http.(*Client).Get(0x33e9f0, 0x226a6a, 0x39, 0x3b, 0x0, 0x0)
	/opt/go/go-tip/src/net/http/client.go:396 +0x78
net/http.Get(0x226a6a, 0x39, 0x10633fa4, 0x1, 0x1)
	/opt/go/go-tip/src/net/http/client.go:370 +0x30
main.main()
	/home/ncw/Go/b2get/b2get.go:14 +0x8c

goroutine 5 [syscall]:
syscall.Syscall(0x180, 0x10600c9e, 0x20, 0x0, 0x0, 0x106de030, 0x11c9b8)
	/opt/go/go-tip/src/syscall/asm_linux_arm.s:17 +0x8
internal/syscall/unix.GetRandom(0x10600c9e, 0x20, 0x20, 0x0, 0x1, 0x0, 0x219c6a)
	/opt/go/go-tip/src/internal/syscall/unix/getrandom_linux.go:38 +0x60
crypto/rand.getRandomLinux(0x10600c9e, 0x20, 0x20, 0x0)
	/opt/go/go-tip/src/crypto/rand/rand_linux.go:46 +0x34
crypto/rand.batched.func1(0x10600c9e, 0x20, 0x20, 0x219a01)
	/opt/go/go-tip/src/crypto/rand/rand_linux.go:34 +0xd4
crypto/rand.(*devReader).Read(0x1064a180, 0x10600c9e, 0x20, 0x20, 0x0, 0x0, 0x0)
	/opt/go/go-tip/src/crypto/rand/rand_unix.go:48 +0x2dc
io.ReadAtLeast(0x2490a8, 0x1064a180, 0x10600c9e, 0x20, 0x20, 0x20, 0x0, 0x0, 0x10600cf4)
	/opt/go/go-tip/src/io/io.go:309 +0x6c
io.ReadFull(0x2490a8, 0x1064a180, 0x10600c9e, 0x20, 0x20, 0x1, 0x1, 0x5ce34)
	/opt/go/go-tip/src/io/io.go:327 +0x40
crypto/tls.(*Config).serverInit(0x10600c40, 0x0)
	/opt/go/go-tip/src/crypto/tls/common.go:597 +0x1f0
crypto/tls.(*Config).Clone.func1()
	/opt/go/go-tip/src/crypto/tls/common.go:541 +0x24
sync.(*Once).Do(0x10600ce8, 0x1063ec3c)
	/opt/go/go-tip/src/sync/once.go:44 +0xb8
crypto/tls.(*Config).Clone(0x10600c40, 0x106fc018)
	/opt/go/go-tip/src/crypto/tls/common.go:541 +0x38
net/http.cloneTLSConfig(0x10600c40, 0x0)
	/opt/go/go-tip/src/net/http/transport.go:2269 +0x24
net/http.(*persistConn).addTLS(0x10682140, 0x106143a0, 0x13, 0x0, 0x106143b4, 0x3)
	/opt/go/go-tip/src/net/http/transport.go:1037 +0x28
net/http.(*Transport).dialConn(0x33c3f8, 0x24a478, 0x10610098, 0x0, 0x226a6a, 0x5, 0x106143a0, 0x17, 0x0, 0x0, ...)
	/opt/go/go-tip/src/net/http/transport.go:1139 +0x11a8
net/http.(*Transport).getConn.func4(0x33c3f8, 0x24a478, 0x10610098, 0x1064aa60, 0x10618180)
	/opt/go/go-tip/src/net/http/transport.go:943 +0x38
created by net/http.(*Transport).getConn
	/opt/go/go-tip/src/net/http/transport.go:942 +0x258

trap    0x0
error   0x0
oldmask 0x0
r0      0xfffffffc
r1      0x0
r2      0x0
r3      0x1062674c
r4      0x0
r5      0x0
r6      0x1
r7      0xf0
r8      0x0
r9      0x0
r10     0x10600ee0
fp      0x3394a4
ip      0x1
sp      0x1062672c
lr      0x388a8
pc      0x6a4cc
cpsr    0x20000010
fault   0x0
./b2get-go-tip  0.02s user 0.00s system 0% cpu 14.071 total

@bradfitz
Copy link
Contributor

bradfitz commented Nov 7, 2017

Probably #19274 if it's blocking in the getrandom system call? Not enough entropy, I guess.

@agl, thoughts?

@freespace
Copy link

@ncw output after killing with SIGQUIT on b2get-go-tip is below:

$ ./b2get-go-tip
go version devel +0ba4eba864 Mon Nov 6 08:37:30 2017 +0000
^\SIGQUIT: quit
PC=0x6a4cc m=0 sigcode=128

goroutine 33 [syscall]:
runtime.notetsleepg(0x341034, 0xfbd39db2, 0x6, 0x0)
        /opt/go/go-tip/src/runtime/lock_futex.go:227 +0x2c fp=0x1070a78c sp=0x1070a774 pc=0x1f078
runtime.timerproc(0x341020)
        /opt/go/go-tip/src/runtime/time.go:260 +0x430 fp=0x1070a7e4 sp=0x1070a78c pc=0x5d340
runtime.goexit()
        /opt/go/go-tip/src/runtime/asm_arm.s:1015 +0x4 fp=0x1070a7e4 sp=0x1070a7e4 pc=0x698ec
created by runtime.(*timersBucket).addtimerLocked
        /opt/go/go-tip/src/runtime/time.go:159 +0x124

goroutine 1 [select]:
net/http.(*Transport).getConn(0x33c3f8, 0x1064ea40, 0x0, 0x226a6a, 0x5, 0x10610440, 0x17, 0x0, 0x0, 0x20104)
        /opt/go/go-tip/src/net/http/transport.go:948 +0x3bc
net/http.(*Transport).RoundTrip(0x33c3f8, 0x10698200, 0x33c3f8, 0x0, 0x0)
        /opt/go/go-tip/src/net/http/transport.go:400 +0x410
net/http.send(0x10698200, 0x249228, 0x33c3f8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1060c568, 0x10698200, ...)
        /opt/go/go-tip/src/net/http/client.go:252 +0x134
net/http.(*Client).send(0x33e9f0, 0x10698200, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2, 0x0, 0x33f088, ...)
        /opt/go/go-tip/src/net/http/client.go:176 +0xb8
net/http.(*Client).Do(0x33e9f0, 0x10698200, 0x226a6a, 0x39, 0x0)
        /opt/go/go-tip/src/net/http/client.go:615 +0x1d4
net/http.(*Client).Get(0x33e9f0, 0x226a6a, 0x39, 0x3b, 0x0, 0x0)
        /opt/go/go-tip/src/net/http/client.go:396 +0x78
net/http.Get(0x226a6a, 0x39, 0x1063dfa4, 0x1, 0x1)
        /opt/go/go-tip/src/net/http/client.go:370 +0x30
main.main()
        /home/ncw/Go/b2get/b2get.go:14 +0x8c

goroutine 5 [syscall]:
syscall.Syscall(0x180, 0x10600c9e, 0x20, 0x0, 0x0, 0x106e8038, 0x11c9b8)
        /opt/go/go-tip/src/syscall/asm_linux_arm.s:17 +0x8
internal/syscall/unix.GetRandom(0x10600c9e, 0x20, 0x20, 0x0, 0x1, 0x0, 0x219c6a)
        /opt/go/go-tip/src/internal/syscall/unix/getrandom_linux.go:38 +0x60
crypto/rand.getRandomLinux(0x10600c9e, 0x20, 0x20, 0x0)
        /opt/go/go-tip/src/crypto/rand/rand_linux.go:46 +0x34
crypto/rand.batched.func1(0x10600c9e, 0x20, 0x20, 0x219a01)
        /opt/go/go-tip/src/crypto/rand/rand_linux.go:34 +0xd4
crypto/rand.(*devReader).Read(0x1064e180, 0x10600c9e, 0x20, 0x20, 0x0, 0x0, 0x0)
        /opt/go/go-tip/src/crypto/rand/rand_unix.go:48 +0x2dc
io.ReadAtLeast(0x2490a8, 0x1064e180, 0x10600c9e, 0x20, 0x20, 0x20, 0x0, 0x0, 0x10600cf4)
        /opt/go/go-tip/src/io/io.go:309 +0x6c
io.ReadFull(0x2490a8, 0x1064e180, 0x10600c9e, 0x20, 0x20, 0x1, 0x1, 0x5ce34)
        /opt/go/go-tip/src/io/io.go:327 +0x40
crypto/tls.(*Config).serverInit(0x10600c40, 0x0)
        /opt/go/go-tip/src/crypto/tls/common.go:597 +0x1f0
crypto/tls.(*Config).Clone.func1()
        /opt/go/go-tip/src/crypto/tls/common.go:541 +0x24
sync.(*Once).Do(0x10600ce8, 0x10642c3c)
        /opt/go/go-tip/src/sync/once.go:44 +0xb8
crypto/tls.(*Config).Clone(0x10600c40, 0x10700010)
        /opt/go/go-tip/src/crypto/tls/common.go:541 +0x38
net/http.cloneTLSConfig(0x10600c40, 0x0)
        /opt/go/go-tip/src/net/http/transport.go:2269 +0x24
net/http.(*persistConn).addTLS(0x10686140, 0x10610440, 0x13, 0x0, 0x10610454, 0x3)
        /opt/go/go-tip/src/net/http/transport.go:1037 +0x28
net/http.(*Transport).dialConn(0x33c3f8, 0x24a478, 0x106120d8, 0x0, 0x226a6a, 0x5, 0x10610440, 0x17, 0x0, 0x0, ...)
        /opt/go/go-tip/src/net/http/transport.go:1139 +0x11a8
net/http.(*Transport).getConn.func4(0x33c3f8, 0x24a478, 0x106120d8, 0x1064ea60, 0x10650140)
        /opt/go/go-tip/src/net/http/transport.go:943 +0x38
created by net/http.(*Transport).getConn
        /opt/go/go-tip/src/net/http/transport.go:942 +0x258

trap    0x6
error   0x0
oldmask 0x0
r0      0xfffffffc
r1      0x0
r2      0x0
r3      0x1070a74c
r4      0x0
r5      0x0
r6      0x1
r7      0xf0
r8      0x0
r9      0x0
r10     0x10600ee0
fp      0x3394a4
ip      0x1
sp      0x1070a72c
lr      0x388a8
pc      0x6a4cc
cpsr    0x20000010
fault   0x0

@freespace
Copy link

@bradfitz based on your suggestion I checked
/proc/sys/kernel/random/entropy_avail and it was 0. I then installed haveged to get some entropy and my entropy_avail increased to 2176. After a little while it seem to have solved the problem:

$ time ./b2get-go-tip 
go version devel +0ba4eba864 Mon Nov 6 08:37:30 2017 +0000
Status: 401: 401 Unauthorized
Body:
{
  "code": "missing_auth_token",
  "message": "Missing authorization token",
  "status": 401
}
real    0m1.561s
user    0m0.930s
sys     0m0.070s

$ cat /proc/sys/kernel/random/entropy_avail 
1920

@agl
Copy link
Contributor

agl commented Nov 7, 2017

The backtrace does show getrandom blocking. It looks like it's probably blocking when generating the session-ticket key but, if it weren't that, it would quickly block in something else.

TLS needs entropy and the system doesn't have any. Previously we ignored that state of affairs and ran insecurely. Now we'll block until the entropy pool has hit 128 bits at some point in the system's history, which is the minimal (and "correct") thing to do. Sadly, since there's a history of ignoring these issues in all sorts of software (due to OSes not providing the correct interface), VMs aren't correctly plumbed to seed entropy from the host.

It is not an easy call to make. If the fallout proves to be too large, we may have made this move too soon.

@bradfitz
Copy link
Contributor

bradfitz commented Nov 7, 2017

@agl, maybe we should make Go log to stderr after N seconds of blocking on gathering entropy, telling the user what's happening? Maybe.

@agl
Copy link
Contributor

agl commented Nov 7, 2017

@agl, maybe we should make Go log to stderr after N seconds of blocking on gathering entropy, telling the user what's happening? Maybe.

I have done that in the past in a different context and the experience was that lots of automated processes were unhappy about the unexpected stderr output since the condition was frequently transient.

However, that case was triggered in a more flakey way than this and perhaps the delay (which was only a handful of seconds, as I recall) was too small.

@ncw
Copy link
Contributor Author

ncw commented Nov 7, 2017

@bradfitz that seems like a nice pragmatic solution and there is form for log message like this, eg the log.Printf~s in net/http/transport.go. Not my call though!

@bradfitz bradfitz changed the title net/http: Get an https address from certain ARM servers hangs indefinitely crypto/rand: Read hangs for some entropy-lacking users on Linux causing user confusion Nov 27, 2017
@bradfitz bradfitz modified the milestones: Go1.10, Go1.11 Nov 27, 2017
@bradfitz bradfitz added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed release-blocker labels Nov 27, 2017
@gopherbot gopherbot removed the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 27, 2017
@bradfitz
Copy link
Contributor

Now that we know what this was about, I removed the release blocker label. We can decide what to do during the Go 1.11 cycle.

@bradfitz
Copy link
Contributor

Ping @FiloSottile and @agl. What to do here?

What's the rebuttal to the https://www.2uo.de/myths-about-urandom/ argument that we should always use urandom?

@tv42
Copy link

tv42 commented May 31, 2018

@bradfitz The getrandom syscall (without the GRND_RANDOM flag) already fetches from the same source as /dev/urandom. It just waits until the kernel gathers enough entropy once to provide a secure urandom service from there on. Go is doing exactly what that link recommends, no rebuttal needed. Specific quote:

This syscall does the right thing: blocking until it has gathered enough initial entropy, and never blocking after that point.

@tv42
Copy link

tv42 commented May 31, 2018

Oh, and a quick concrete fix for a low-entropy box (usually some embedded SBC) is to save entropy from a previous run on disk, and feed it back to the kernel at boot. haveged (userspace entropy sources) obviously helps too, but it might take it a while to gather enough entropy; saving some from the previous run gets everything going faster.

Resources:

https://www.freedesktop.org/software/systemd/man/systemd-random-seed.service.html

http://man7.org/linux/man-pages/man4/random.4.html (search for e.g. RNDADDENTROPY)

@rsc
Copy link
Contributor

rsc commented Jun 11, 2018

Per discussion with proposal review:

  • Make devReader track whether it has ever seen a successful read.
  • If it has, then just read (no timeout overheads).
  • But otherwise (first read, potentially many first reads in multiple goroutines) set a N-second timeout. The first one that fires causes a print to standard error (just print, don't import log) warning about reading /dev/urandom taking a long time. Just one print, ever.

What should N be? @agl says too small is annoying because it causes spurious warnings. What is a time interval after which it's likely that the entropy is just not coming?

@rsc rsc modified the milestones: Go1.11, Go1.12 Jun 11, 2018
@rsc rsc changed the title crypto/rand: Read hangs for some entropy-lacking users on Linux causing user confusion crypto/rand: warn about very slow /dev/urandom reads Jun 11, 2018
@rsc
Copy link
Contributor

rsc commented Jun 11, 2018

For NeedsDecision, only decision left is to pick N (number of seconds before printing warning).

ns-codereview pushed a commit to couchbase/query that referenced this issue Aug 28, 2018
uuid() (which is used to set requests ids) can be quite slow because of

golang/go#22614

essentially access to /dev/urandom has to be serialized, and when everyone is piling in,
the cost of assigning a request id is (measured) 3.5% of the whole execution time!
This change makes the access to /dev/urandom asynchronous, reading in 32k chunks (one read
every 4k requests), which eliminates the cost completely.

It has been used in anger, at a rate of 50k requests/second, and it has not misbehaved.

Change-Id: I47e524ef11344aadafff770444f214093fe6e008
Reviewed-on: http://review.couchbase.org/98891
Reviewed-by: Sitaram Vemulapalli <sitaram.vemulapalli@couchbase.com>
Reviewed-by: Johan Larson <johan.larson@couchbase.com>
Tested-by: Marco Greco <marco.greco@couchbase.com>
@rsc
Copy link
Contributor

rsc commented Oct 3, 2018

Let's start with N=60 seconds. This will happen mainly at boot time and you want a note about why the system has stopped booting. Most people will wait a minute in that context. That should eliminate any concern about flakes.

@rsc rsc changed the title crypto/rand: warn about very slow /dev/urandom reads crypto/rand: warn about very slow (60-second) /dev/urandom reads Oct 3, 2018
@rsc rsc added Proposal-Accepted help wanted and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Oct 3, 2018
@ianlancetaylor ianlancetaylor added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 3, 2018
@gopherbot
Copy link

Change https://golang.org/cl/139419 mentions this issue: crypto/rand: warn to stderr if blocked over 60 seconds in Reader.Read

@golang golang locked and limited conversation to collaborators Oct 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Proposal-Accepted Security
Projects
None yet
Development

No branches or pull requests

9 participants