Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/signal: notify should filter out runtime-generated SIGURG #37942

Open
minux opened this issue Mar 19, 2020 · 12 comments
Open

os/signal: notify should filter out runtime-generated SIGURG #37942

minux opened this issue Mar 19, 2020 · 12 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@minux
Copy link
Member

minux commented Mar 19, 2020

This program, when running under go 1.14, will show a huge number of SIGURG are received even though none should be (no network IO, no explicit kills).

https://play.golang.org/p/x7hFjPZnrg5

package main

import (
	"fmt"
	"os"
	"os/signal"
	"syscall"
	"time"
)

func main() {
	ch := make(chan os.Signal, 1)
	signal.Notify(ch, syscall.SIGURG)

	for {
		select {
		case sig := <-ch:
			fmt.Printf("received %v: %s\n", sig, time.Now())
		default:
			_ = new(int) // generate some GC activities
		}
	}
}

https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md changes to use SIGURG to preempt goroutines, which is fine, but I think os/signal.Notify should filter out those runtime generated SIGURGs.

The reason I found this is that I have a program that happens to use SIGURG as a custom signal like SIGUSR1, which works fine with previous Go releases. I understand that it should probably use SIGUSR1, but still I think os/signal should hide any signals generated by the runtime as it's irrelevant for the user and an implementation detail.

/cc @aclements thoughts?

@minux minux changed the title os/signal.Notfiy: filter out runtime-generated SIGURG? os/signal.Notify: filter out runtime-generated SIGURG? Mar 19, 2020
@aclements
Copy link
Member

The reason I found this is that I have a program that happens to use SIGURG as a custom signal like SIGUSR1

Right. It was an explicit decision to pass these through to the program, in case the program is using SIGURG. Unfortunately, the runtime can't tell if a signal is "runtime generated" and thus can't filter it out. Because the kernel combines signals, it's not even well-specified what it would mean for a received signal to be runtime-generated because the kernel could have combined a runtime-generated signal with a signal from another source.

@changkun, I don't really understand your program. First, I don't understand the need for Gosched there. There's no tight loop, and even if there were, I don't see what that Gosched would accomplish. Second, if a program is going to subscribe to all signals, rather than passing specific signals of interest to signal.Notify, then it should be prepared to receive any signal. There are lots of reasons lots of signals get sent. For example, if I ran that program in a terminal emulator and started resizing the window, it would begin streaming warnings about unexpected SIGWINCHes.

@ianlancetaylor
Copy link
Contributor

@minux I don't see how it is possible even in principle for os/signal to filter out signals generated from the runtime package. Do you have any suggestions as to how that could be done?

@cherrymui
Copy link
Member

cherrymui commented Mar 20, 2020

signal.Notify is to capture process-targeted signal, right? The preemption signals are thread-targeted. Maybe we can filter signals of which sigcode is SI_TKILL (at least on Linux)?

@cherrymui
Copy link
Member

That would not work if the kernel combines signals with different sigcode. Does it?

@ianlancetaylor
Copy link
Contributor

As far as I can tell, the Linux 5.3.12 kernel does combine signals with a different si_code value. In __send_signal I see

	if (legacy_queue(pending, sig))
		goto ret;

and

static inline bool legacy_queue(struct sigpending *signals, int sig)
{
	return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}

where signals->signal is the set of pending signals.

@cagedmantis cagedmantis changed the title os/signal.Notify: filter out runtime-generated SIGURG? os/signal: notify should filter out runtime-generated SIGURG Mar 20, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Mar 20, 2020
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 20, 2020
@minux
Copy link
Member Author

minux commented Mar 21, 2020 via email

bfix added a commit to bfix/gnunet-go that referenced this issue Mar 31, 2020
Probably a Go1.14 issue; see golang/go#37942
@mengzhuo
Copy link
Contributor

mengzhuo commented Apr 3, 2020

I found a lot of SIGURGs will timeout the runtime/regex testes of mips64le.

The origin test of runtime only takes 43s to finish, now it's 600s.

https://farmer.golang.org/temporarylogs?name=linux-mips64le-mengzhuo&rev=dcf0929de6a12103a8fd7097abd6e797188c366d&st=0xc010fd8ea0

rt_sigreturn({mask=[]})                 = -1 ENOENT (No such file or directory)     
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = 824642013144                              
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 (errno 824635811712)                   
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = 18446744073709551615                      
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 (errno 824642012824)                   
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 (errno 824642012416)                   
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 (errno 824642002944)                   
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 EDOM (Numerical argument out of domain)
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = 0                                         
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 (errno 1098664997869)                  
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = 18446744073709551615                      
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = 27                                        
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 EFBIG (File too large)                 
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 ESRCH (No such process)                
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = -1 EPERM (Operation not permitted)        
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       
rt_sigreturn({mask=[]})                 = 18446744073709551615                      
sched_yield()                           = 0                                         
--- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=19904, si_uid=1000} ---       

@ianlancetaylor
Copy link
Contributor

@mengzhuo Can you open a separate issue for that? It sounds like the program is having a hard time making progress, which seems more like #37741. Thanks.

@aclements
Copy link
Member

Perhaps we can repurpose this issue to "runtime: only use forced preemption when necessary"?

I've considered delaying the first attempt to use signals until some time after we've tried cooperative preemption and it's failed. Most of the time cooperative preemption responds very quickly. I'm not sure what to set that threshold to, though, since every millisecond of delay there is a millisecond added to STW if there's anything not cooperating.

@networkimprov
Copy link

Could you also provide a build option to enable (or never enable) preemptive? Runtime env flags would presumably override that...

@aclements
Copy link
Member

That option already exists (GODEBUG=asyncpreemptoff=1), but, as it says on the tin, it's meant for debugging.

@networkimprov
Copy link

networkimprov commented Apr 4, 2020

Um, I meant a compiler flag to set the default scheduler, which the env var could override.

mrsdizzie added a commit to mrsdizzie/gitea that referenced this issue Apr 28, 2020
…14 in git env

As seen in trouble shooting go-gitea#11032 the new feature of Go 1.14 is causing several second delays in startup in certain situations. Debugging shows it spending several seconds handling SIGURG commands during init:

```
6922:04:51.984234 trace init() ./modules/queue/unique_queue_wrapped.go
remote: ) = 69 <0.000012>
remote: [pid 15984] 22:04:51 write(1, "\ttime taken: 236.761\302\265s\n\n", 25    time taken: 236.761µs
remote:
remote: ) = 25 <0.000011>
remote: [pid 15984] 22:04:51 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
```

This causes up to 20 seconds added to a push in some cases as it happens for each call of the gitea hook command. This is likely the cause of go-gitea#10661 as well and would start to effect users once we release 1.12 which would be the first release compiled with Go 1.14. I suspect this is just a slight issue with the upstream implementatation as there have been a few very similar bugs fixed and reported:

 golang/go#37741
 golang/go#37942

We should revisit this in the future and see if a newer version of Go has solved it, but for now disable this option in the environment that gitea hook runs in to avoid it.
techknowlogick pushed a commit to go-gitea/gitea that referenced this issue Apr 28, 2020
…14 in git env (#11237)

As seen in trouble shooting #11032 the new feature of Go 1.14 is causing several second delays in startup in certain situations. Debugging shows it spending several seconds handling SIGURG commands during init:

```
6922:04:51.984234 trace init() ./modules/queue/unique_queue_wrapped.go
remote: ) = 69 <0.000012>
remote: [pid 15984] 22:04:51 write(1, "\ttime taken: 236.761\302\265s\n\n", 25    time taken: 236.761µs
remote:
remote: ) = 25 <0.000011>
remote: [pid 15984] 22:04:51 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
```

This causes up to 20 seconds added to a push in some cases as it happens for each call of the gitea hook command. This is likely the cause of #10661 as well and would start to effect users once we release 1.12 which would be the first release compiled with Go 1.14. I suspect this is just a slight issue with the upstream implementatation as there have been a few very similar bugs fixed and reported:

 golang/go#37741
 golang/go#37942

We should revisit this in the future and see if a newer version of Go has solved it, but for now disable this option in the environment that gitea hook runs in to avoid it.
ydelafollye pushed a commit to ydelafollye/gitea that referenced this issue Jul 31, 2020
…14 in git env (go-gitea#11237)

As seen in trouble shooting go-gitea#11032 the new feature of Go 1.14 is causing several second delays in startup in certain situations. Debugging shows it spending several seconds handling SIGURG commands during init:

```
6922:04:51.984234 trace init() ./modules/queue/unique_queue_wrapped.go
remote: ) = 69 <0.000012>
remote: [pid 15984] 22:04:51 write(1, "\ttime taken: 236.761\302\265s\n\n", 25    time taken: 236.761µs
remote:
remote: ) = 25 <0.000011>
remote: [pid 15984] 22:04:51 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
remote: [pid 15984] 22:04:52 --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=15984, si_uid=0} ---
```

This causes up to 20 seconds added to a push in some cases as it happens for each call of the gitea hook command. This is likely the cause of go-gitea#10661 as well and would start to effect users once we release 1.12 which would be the first release compiled with Go 1.14. I suspect this is just a slight issue with the upstream implementatation as there have been a few very similar bugs fixed and reported:

 golang/go#37741
 golang/go#37942

We should revisit this in the future and see if a newer version of Go has solved it, but for now disable this option in the environment that gitea hook runs in to avoid it.
pothos added a commit to flatcar-archive/coreos-overlay that referenced this issue Jan 12, 2021
When Docker/containerd binaries are compiled with Go 1.15 the
containers generate many signal 23 (SIGURG) events which flood
monitoring systems:
  kubernetes/kops#10388
The SIGURG signal does not kill the process but is generated by Go
runtime scheduling:
  https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md)
Because the Go runtime does not know if the process expects external
SIGURG signals, the signal is not filtered out but reported to the
process: golang/go#37942
The process has to filter this signal out itself before forwarding it
to, e.g,. children processes or logs.
This change was introduced with the Go 1.15 update (actually Go 1.14
but Flatcar skipped that for Stable), however, while containerd has
some workarounds in place, e.g., in
containerd/containerd#4532 but there are still
areas where the signal is not handled correctly.
Until this is the case, downgrade to use the Go 1.13 compiler for
Docker/containerd binaries.

See flatcar/Flatcar#315
donob4n added a commit to donob4n/gopass that referenced this issue Oct 5, 2022
workaround for golang/go#37942

Signed-off-by: Donoban <donoban@riseup.net>
donob4n added a commit to donob4n/gopass that referenced this issue Oct 9, 2022
workaround for golang/go#37942

Signed-off-by: Donoban <donoban@riseup.net>
donob4n added a commit to donob4n/gopass that referenced this issue Oct 10, 2022
workaround for golang/go#37942

Signed-off-by: Donoban <donoban@riseup.net>
vr009 added a commit to tarantool/tt that referenced this issue Feb 3, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit to tarantool/tt that referenced this issue Feb 3, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit to tarantool/tt that referenced this issue Feb 3, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit to tarantool/tt that referenced this issue Feb 4, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit to tarantool/tt that referenced this issue Feb 4, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit to tarantool/tt that referenced this issue Feb 6, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit to tarantool/tt that referenced this issue Feb 7, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit to tarantool/tt that referenced this issue Feb 8, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit to tarantool/tt that referenced this issue Feb 8, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit to tarantool/tt that referenced this issue Feb 8, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit to tarantool/tt that referenced this issue Feb 8, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit to tarantool/tt that referenced this issue Feb 8, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit to tarantool/tt that referenced this issue Feb 8, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
vr009 added a commit to tarantool/tt that referenced this issue Feb 10, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
LeonidVas pushed a commit to tarantool/tt that referenced this issue Feb 10, 2023
This patch fixes the problem, which was occurring when the watchdog
process received a signal SIGURG from go runtime[1][2] and passed it
to the forked process before the exec call. Fixed by adding a call of
the Ignore function. Receiving this signal is unexpected, cause tt
doesn't work with sockets at all.

[1] - https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md
[2] - golang/go#37942

Closes #325
n-e added a commit to n-e/secrets-init that referenced this issue May 26, 2023
Since go 1.14, SIGURG is emitted internally by the fo runtime, see:

- golang/go#37942
- https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md#other-considerations "Choosing a signal"

This causes two problems:
- This will interfere with processes that use SIGURG for a real purpose (though this is unlikely)
- If SIGURG is emitted after the child process has exited but before secrets-init, there will be a spurious error message (see doitintl#16)
n-e added a commit to n-e/secrets-init that referenced this issue Jun 5, 2023
Since go 1.14, SIGURG is emitted internally by the fo runtime, see:

- golang/go#37942
- https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md#other-considerations "Choosing a signal"

This causes two problems:
- This will interfere with processes that use SIGURG for a real purpose (though this is unlikely)
- If SIGURG is emitted after the child process has exited but before secrets-init, there will be a spurious error message (see doitintl#16)
n-e added a commit to n-e/secrets-init that referenced this issue Jun 5, 2023
Since go 1.14, SIGURG is emitted internally by the fo runtime, see:

- golang/go#37942
- https://go.googlesource.com/proposal/+/master/design/24543-non-cooperative-preemption.md#other-considerations "Choosing a signal"

This causes two problems:
- This will interfere with processes that use SIGURG for a real purpose (though this is unlikely)
- If SIGURG is emitted after the child process has exited but before secrets-init, there will be a spurious error message (see doitintl#16)
greyerof added a commit to greyerof/cnf-certification-test that referenced this issue Jul 5, 2023
Not sure why, but "ps" seems not to be able to handle the URG signal,
returning an error status code. As per some comments in github, looks
like the golang runtime might have sent the socket related URG signal.
See:
golang/go#37942
https://stackoverflow.com/questions/72568024/go-preempt-can-exit-out-of-a-loop

The ps program shows this output:
[command.go: 72] stderr: Signal 23 (URG) caught by ps (3.3.15).

Apparently, ps works well, but it just returns with an error status
code due to the received SIGURG.

A retry mechanism doesn't work well here: in my tests, I saw the ps
command failing up to 5 times in a row, so I think it's better trapping
the signal with the bash built-in command "trap". This is the new
command sent to the node's shell to get all the processes:

const command = "trap \"\" SIGURG ; ps -e -o pidns,pid,args"

Also, I refactored the way the pids from the container namespaces are
retrieved, as the nsenter used by the previous code is not actually
needed. Regular "lsns" and "ps" commands running in the node's debug
container (with chroot /root) are good enough to get the final lists of
pids from processes that are running in the same namespace as the
container's one. In fact, as per the stdout logs of the nsenter
commands, the "ps" that was running with nsenter was showing all the
processes running in that node.

After the refactor, the schedulling UT TestGetProcessCPUScheduling was
failing because it was mocking a function that was not used inside the
target function. I tried fixing it, but then I decided to remove it as
I realized it was not covering more code than the existing UT
TestExistsSchedulingPolicyAndPriority.
greyerof added a commit to test-network-function/cnf-certification-test that referenced this issue Jul 6, 2023
Not sure why, but "ps" seems not to be able to handle the URG signal,
returning an error status code. As per some comments in github, looks
like the golang runtime might have sent the socket related URG signal.
See:
golang/go#37942
https://stackoverflow.com/questions/72568024/go-preempt-can-exit-out-of-a-loop

The ps program shows this output:
[command.go: 72] stderr: Signal 23 (URG) caught by ps (3.3.15).

Apparently, ps works well, but it just returns with an error status
code due to the received SIGURG.

A retry mechanism doesn't work well here: in my tests, I saw the ps
command failing up to 5 times in a row, so I think it's better trapping
the signal with the bash built-in command "trap". This is the new
command sent to the node's shell to get all the processes:

const command = "trap \"\" SIGURG ; ps -e -o pidns,pid,args"

Also, I refactored the way the pids from the container namespaces are
retrieved, as the nsenter used by the previous code is not actually
needed. Regular "lsns" and "ps" commands running in the node's debug
container (with chroot /root) are good enough to get the final lists of
pids from processes that are running in the same namespace as the
container's one. In fact, as per the stdout logs of the nsenter
commands, the "ps" that was running with nsenter was showing all the
processes running in that node.

After the refactor, the schedulling UT TestGetProcessCPUScheduling was
failing because it was mocking a function that was not used inside the
target function. I tried fixing it, but then I decided to remove it as
I realized it was not covering more code than the existing UT
TestExistsSchedulingPolicyAndPriority.
mheon added a commit to mheon/libpod that referenced this issue Sep 19, 2023
Also, we don't proxy SIGURG (Golang uses it internally for waking
threads, so Go processes get it constantly (see [1] for more
details).

[1] golang/go#37942

Signed-off-by: Matt Heon <mheon@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: Triage Backlog
Development

No branches or pull requests

8 participants