Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: time.Sleep not monotonic if you import os/user & have used setcap to give binding capabilities #42150

Closed
Radisovik opened this issue Oct 22, 2020 · 11 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Linux WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@Radisovik
Copy link

What version of Go are you using (go version)?

Tried both these versions and both reproduced the issue
go version go1.12.7 linux/amd64
go version go1.15.3 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN="/home/eric/go/bin"
GOCACHE="/home/eric/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/eric/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go1.12.7"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go1.12.7/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build154110967=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package main

import (
	"fmt"
	"os/user"
	"time"
)

func main() {
	start := time.Now()
	for {
		fmt.Printf("%v Uptime: %v\n", time.Now(), time.Now().Sub(start))
		time.Sleep(5 * time.Second)
	}
	cu := user.Current
	fmt.Printf("never get here %v", cu)
}

If I compile that into a binary, and give the bind capability to the binary via:

setcap cap_net_bind_service=+ep <binary>

And then I set the clock backwards 5minutes.

When I took out the import to os/user -- the problem goes away.

What did you expect to see?

I would expect to continue to see the "Uptime" messages to be printed every 5 seconds

What did you see instead?

Output stopped

@seankhliao
Copy link
Member

perhaps CGo related due to os/user import but I can't reproduce

19:32:08 ~/testrepo-199 0:00:00
main » go build .

19:32:16 ~/testrepo-199 0:00:01
main » sudo setcap cap_net_bind_service=+ep testrepo-199

19:32:20 ~/testrepo-199 0:00:00
main » ./testrepo-199
2020-10-22 19:32:27.263129711 +0200 CEST m=+0.000057595 Uptime: 154ns
2020-10-22 19:32:32.263498017 +0200 CEST m=+5.000426013 Uptime: 5.000368853s
2020-10-22 19:32:37.263981624 +0200 CEST m=+10.000909642 Uptime: 10.000852462s
2020-10-22 19:20:00.410276497 +0200 CEST m=+15.001422131 Uptime: 15.001364944s
2020-10-22 19:20:05.410748758 +0200 CEST m=+20.001894382 Uptime: 20.001837211s
2020-10-22 19:20:10.411244842 +0200 CEST m=+25.002390468 Uptime: 25.002333275s
^C

19:20:11 ~/testrepo-199 0:12:16
main » cat main.go
package main

import (
	"fmt"
	"os/user"
	"time"
)

func main() {
	start := time.Now()
	for {
		fmt.Printf("%v Uptime: %v\n", time.Now(), time.Now().Sub(start))
		time.Sleep(5 * time.Second)
	}
	cu := user.Current
	fmt.Printf("never get here %v", cu)
}

while the program was running:

19:32:23 ~ 0:00:00
» sudo timedatectl set-ntp 0

19:32:30 ~ 0:00:00
» sudo timedatectl set-time 19:20:00

@ianlancetaylor
Copy link
Contributor

If you import os/user, and set the capability, and run the program, and set the clock backward, you say that the output stops? Do you mean that the program keeps running, it just doesn't print anything? What happens if you wait until the clock catches up?

A call to time.Sleep will use the monotonic clock (clock_gettime with CLOCK_MONOTONIC) so it's hard to see why changing the wall clock time would matter. At least in 1.15 the core sleep step will be a call to epoll_wait which passes a timeout in milliseconds, so again it's hard to see why the wall clock time matters.

It might help to run your program, in the failing case, under strace -f, and attach the output here.

@ianlancetaylor ianlancetaylor changed the title Sleep not monotonic if you import os/user & have used setcap to give binding capabilities runtime: time.Sleep not monotonic if you import os/user & have used setcap to give binding capabilities Oct 22, 2020
@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Linux labels Oct 22, 2020
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Oct 22, 2020
@ianlancetaylor ianlancetaylor added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Oct 22, 2020
@Radisovik
Copy link
Author

If you import os/user, and set the capability, and run the program, and set the clock backward, you say that the output stops? Do you mean that the program keeps running, it just doesn't print anything? What happens if you wait until the clock catches up?

In this incarnation we don't return from the sleep. When this was originally encountered on a full sized application every sleep that was in progress also waited for (original sleep time + time we set the clock back). We did still see inbound web requests being served.. so the whole application wasn't stuck.. just things stuck on sleep.

I should add that while we didn't reproduce this in the sample code provided.. a ticker also suffered the lack of monotomic clock-ness.

A call to time.Sleep will use the monotonic clock (clock_gettime with CLOCK_MONOTONIC) so it's hard to see why changing the wall clock time would matter. At least in 1.15 the core sleep step will be a call to epoll_wait which passes a timeout in milliseconds, so again it's hard to see why the wall clock time matters.

We were able to reproduce this in:

go version go1.12.7 linux/amd64
go version go1.15.3 linux/amd64

If that helps isolate.

So far we can reproduce this on our embedded device plus a vm that is running a standard ubuntu (18.04?).. while another ubuntu box doesn't exhibit the problem. Very much feels like a "per system" problem. I'm not sure how go determines if a monotomic clock is available..etc.. maybe that is busted up.

It might help to run your program, in the failing case, under strace -f, and attach the output here.

Let me do that.

@Radisovik
Copy link
Author

strace -f

Well.. the plot thickens. Doing a:

strace -f <ourbinary>

The problem goes away. Tried it several times with several pairs of eyes.

@ianlancetaylor
Copy link
Contributor

I do see a potential problem here. The Go runtime will check for the VDSO clock_gettime call, and use that if available. Through 1.15, if there is no VDSO, the Go runtime will call the fixed gettimeofday call at 0xffffffffff600000. But gettimeofday uses wall clock time whereas clock_gettime uses monotonic time. So this suggests that when you set the capability, and when you link dynamically, it somehow disables the VDSO. I do not know why that would happen.

Interestingly, this has been fixed on tip for the future 1.16 release, by https://golang.org/cl/257982. So while I don't know what is going wrong, it's possible that this will be fixed in 1.16. You might try applying that patch to your 1.15 sources to see if it helps.

@Radisovik
Copy link
Author

I do see a potential problem here. The Go runtime will check for the VDSO clock_gettime call, and use that if available. Through 1.15, if there is no VDSO, the Go runtime will call the fixed gettimeofday call at 0xffffffffff600000. But gettimeofday uses wall clock time whereas clock_gettime uses monotonic time. So this suggests that when you set the capability, and when you link dynamically, it somehow disables the VDSO. I do not know why that would happen.

Interestingly, this has been fixed on tip for the future 1.16 release, by https://golang.org/cl/257982. So while I don't know what is going wrong, it's possible that this will be fixed in 1.16. You might try applying that patch to your 1.15 sources to see if it helps.

I do remember something about setcap possibly mucking with LD_LIBRARY_PATH in an attempt to avoid escalation -- but my skillset didn't let me dig deeper into that angle.
Building statically did clear things up. So that is another possible workaround.
I don't mind building go from source and giving that ago. What branch should I build from?

@ianlancetaylor
Copy link
Contributor

Either release-branch.go1.15 with https://golang.org/cl/257982 applied on top, or just plain master. Thanks.

@Radisovik
Copy link
Author

Either release-branch.go1.15 with https://golang.org/cl/257982 applied on top, or just plain master. Thanks.

Master does indeed rectify the problem.

Is there anything else I can do to help out? It is very puzzling how os/user works into things..

It looks like our workarounds are:

  1. static link
  2. wait for 1.16
  3. stop using os/user and any code that also uses os/user (shell out for whoami..etc..)
  4. don't use setcap.. instead forward traffic with some port forwarding magic

@prattmic
Copy link
Member

FWIW, I dug into the kernel a bit and verified that it does unconditionally include the VDSO auxv entry, so it shouldn't be missing at the time of exec.

It seems most likely then that the dynamic linker would be removing / changing the entry, though I've never heard of such behavior (some sort of "protection" for "setuid" binaries?). Do you know which dynamic linker your system is using? I looked at glibc but didn't see anything along these lines at a quick glance.

This would align with importing "os/user", as that will cause the program to become dynamically linked by default.

@prattmic
Copy link
Member

And to your question, it seems like static linking (by disabling cgo, I assume that is what you mean?) seems like the cleanest workaround to me, provided the pure Go versions of "os/user" and "net" are sufficient for your application.

@ianlancetaylor
Copy link
Contributor

Since this will be fixed in the upcoming Go 1.16 release, I'm going to close this issue.

@golang golang locked and limited conversation to collaborators Oct 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Linux WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

5 participants