Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gccgo: high cpu utilization on aix #26634

Closed
amandeepgautam opened this issue Jul 27, 2018 · 8 comments
Closed

gccgo: high cpu utilization on aix #26634

amandeepgautam opened this issue Jul 27, 2018 · 8 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@amandeepgautam
Copy link

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.10 gccgo (GCC) 8.0.1 20180305 (experimental) aix/ppc

Does this issue reproduce with the latest release?

Not sure.

What operating system and processor architecture are you using (go env)?

GOARCH="ppc"
GOBIN=""
GOCACHE="/.cache/go-build"
GOEXE=""
GOHOSTARCH="ppc"
GOHOSTOS="aix"
GOOS="aix"
GOPATH="/go"
GORACE=""
GOROOT="/opt/freeware"
GOTMPDIR=""
GOTOOLDIR="/opt/freeware/libexec/gcc/powerpc-ibm-aix7.2.0.0/8.0.1"
GCCGO="/opt/freeware/bin/gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build699131618=/tmp/go-build -gno-record-gcc-switches -funwind-tables"

What did you do?

Wrote a simple grpc server program and started the server.

The sample program lives here:
https://drive.google.com/file/d/1w6alI6PNicFCiA_7mCvEYvqTm-whbJH0/view?usp=sharing
To compile the progam, modify the makefile to use correct protoc and go compiler and then do:
make -f Makefile.aix grpc_server

What did you expect to see?

The CPU usage should have been more or less stable.

What did you see instead?

The CPU usage goes upto 103% of entitled capacity on an AIX LPAR. Please look at Entc%

Topas Monitor for host:lpar1host                EVENTS/QUEUES    FILE/TTY        
Thu Jul 26 16:05:56 2018   Interval:2           Cswitch      85  Readch     1329 
                                                Syscall  494.0K  Writech     652 
CPU     User% Kern% Wait% Idle%   Physc  Entc%  Reads         1  Rawin         0 
Total    16.3  39.8   0.0  43.9    0.76  95.01  Writes        0  Ttyout      318 
                                                Forks         0  Igets         0 
Network    BPS  I-Pkts  O-Pkts    B-In   B-Out  Execs         0  Namei         3 
Total    1.42K    9.50    2.50   612.5   846.0  Runqueue   3.00  Dirblk        0 
                                                Waitqueue   0.0                  
Disk    Busy%      BPS     TPS  B-Read  B-Writ                   MEMORY          
Total     0.0        0       0       0       0  PAGING           Real,MB   12288 
                                                Faults        0  % Comp     25   
FileSystem          BPS    TPS  B-Read  B-Writ  Steals        0  % Noncomp  16   
Total             1.30K   1.00   1.30K       0  PgspIn        0  % Client   16   
                                                PgspOut       0                  
Name           PID  CPU%  PgSp Owner            PageIn        0  PAGING SPACE    
grpc_ser   14746080 44.9 20.3M root             PageOut       0  Size,MB    4096 
topas      12845400  0.4 3.39M root             Sios          0  % Used      0   
gil         3146084  0.0  960K root                              % Free    100   
hostmibd    6291912  0.0 1.23M root             NFS (calls/sec)  
lock_rcv    4587932  0.0  448K root             SerV2         0  WPAR Activ    0 
sshd        7602566  0.0 1.18M root             CliV2         0  WPAR Total    0 
nfssync_    5177788  0.0  704K root             SerV3         0  Press: "h"-help 
rpc.lock    6684968  0.0 1.19M root             CliV3         0         "q"-quit 
aso         6554064  0.0 1.42M root             SerV4         0  
sendmail    6488522  0.0 1.14M root             CliV4         0  
sec_rcv      590164  0.0  448K root            
reaffin      459026  0.0  640K root            
snmpdv3n    6095302  0.0 1020K root            
netm        3080546  0.0  448K root            
syncd       3866946  0.0  616K root            
kbiod       4260338  0.0 3.50M root            
vtiol        655640  0.0 1.81M root            
xmgc         852254  0.0  448K root            
rtcmd       5243344  0.0  512K root            
topasrec   12190076  0.0 1.50M root            

Without the grpc server running we have:

Topas Monitor for host:lpar1host                EVENTS/QUEUES    FILE/TTY        
Thu Jul 26 16:11:04 2018   Interval:2           Cswitch      83  Readch     1077 
                                                Syscall     140  Writech     426 
CPU     User% Kern% Wait% Idle%   Physc  Entc%  Reads         1  Rawin         0 
Total     0.0   0.3   0.0  99.7    0.01   0.82  Writes        1  Ttyout      204 
                                                Forks         0  Igets         0 
Network    BPS  I-Pkts  O-Pkts    B-In   B-Out  Execs         0  Namei         2 
Total    775.0    9.50    0.50   520.0   255.0  Runqueue   2.50  Dirblk        0 
                                                Waitqueue   0.0                  
Disk    Busy%      BPS     TPS  B-Read  B-Writ                   MEMORY          
Total     0.0        0       0       0       0  PAGING           Real,MB   12288 
                                                Faults        0  % Comp     25   
FileSystem          BPS    TPS  B-Read  B-Writ  Steals        0  % Noncomp  16   
Total             1.05K   1.00   1.05K       0  PgspIn        0  % Client   16   
                                                PgspOut       0                  
Name           PID  CPU%  PgSp Owner            PageIn        0  PAGING SPACE    
topas      12845400  0.1 3.41M root             PageOut       0  Size,MB    4096 
gil         3146084  0.0  960K root             Sios          0  % Used      0   
lock_rcv    4587932  0.0  448K root                              % Free    100   
sshd        7602566  0.0 1.18M root             NFS (calls/sec)  
rpc.lock    6684968  0.0 1.19M root             SerV2         0  WPAR Activ    0 
sec_rcv      590164  0.0  448K root             CliV2         0  WPAR Total    0 
reaffin      459026  0.0  640K root             SerV3         0  Press: "h"-help 
netm        3080546  0.0  448K root             CliV3         0         "q"-quit 
syncd       3866946  0.0  616K root             SerV4         0  
kbiod       4260338  0.0 3.50M root             CliV4         0  
vtiol        655640  0.0 1.81M root            
xmgc         852254  0.0  448K root            
rtcmd       5243344  0.0  512K root            
nfssync_    5177788  0.0  704K root            
topasrec   12190076  0.0 1.50M root            
aixmibd     6422988  0.0 1.34M root            
lrud         131336  0.0  832K root            
aso         6554064  0.0 1.42M root            
sendmail    6488522  0.0 1.14M root            
snmpdv3n    6095302  0.0 1020K root            
@gopherbot gopherbot added this to the Gccgo milestone Jul 27, 2018
@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 27, 2018
@amandeepgautam
Copy link
Author

In fact, we do not need a grpc server for it. I am seeing this pattern for simple http sever as well.

package main
  
import (
        "fmt"
        "net/http"
)

func main() {
        http.HandleFunc("/", func (w http.ResponseWriter, r *http.Request) {
                fmt.Fprintf(w, "Welcome to my website!")
        })

        fs := http.FileServer(http.Dir("static/"))
        http.Handle("/static/", http.StripPrefix("/static/", fs))

        http.ListenAndServe(":10002", nil)
}

The usage is stable when there is no request, but as soon as a request is made the cpu utilization increasesd to around 95% entitled capacity. After the request is answered, the usage remains at 95%. Please look at "entc%" field in the sample output.

Topas output with server running (but have not answered any request):

Topas Monitor for host:lpar1host                EVENTS/QUEUES    FILE/TTY        
Fri Jul 27 10:40:09 2018   Interval:2           Cswitch      83  Readch      284 
                                                Syscall     139  Writech     586 
CPU     User% Kern% Wait% Idle%   Physc  Entc%  Reads         0  Rawin         0 
Total     0.0   0.3   0.0  99.7    0.01   0.85  Writes        0  Ttyout      284 
                                                Forks         0  Igets         0 
Network    BPS  I-Pkts  O-Pkts    B-In   B-Out  Execs         0  Namei         2 
Total    646.0    6.00    0.50   311.0   335.0  Runqueue   2.00  Dirblk        0 
                                                Waitqueue   0.0                  
Disk    Busy%      BPS     TPS  B-Read  B-Writ                   MEMORY          
Total     0.0        0       0       0       0  PAGING           Real,MB   12288 
                                                Faults        0  % Comp     25   
FileSystem          BPS    TPS  B-Read  B-Writ  Steals        0  % Noncomp  17   
Total             284.5   0.50   284.5       0  PgspIn        0  % Client   17   
                                                PgspOut       0                  
Name           PID  CPU%  PgSp Owner            PageIn        0  PAGING SPACE    
topas      14811502 13.0 6.28M root             PageOut       0  Size,MB    4096 
gil         3146084  0.0  960K root             Sios          0  % Used      0   
lock_rcv    4587932  0.0  448K root                              % Free    100   
sshd        7930212  0.0 1.18M root             NFS (calls/sec)  
rpc.lock    6684968  0.0 1.19M root             SerV2         0  WPAR Activ    0 
aso         6554064  0.0 1.42M root             CliV2         0  WPAR Total    0 
sec_rcv      590164  0.0  448K root             SerV3         0  Press: "h"-help 
reaffin      459026  0.0  640K root             CliV3         0         "q"-quit 
netm        3080546  0.0  448K root             SerV4         0  
syncd       3866946  0.0  616K root             CliV4         0  
kbiod       4260338  0.0 3.50M root            
vtiol        655640  0.0 1.81M root            
xmgc         852254  0.0  448K root            
rtcmd       5243344  0.0  512K root            
nfssync_    5177788  0.0  704K root            
topasrec   12190076  0.0 1.50M root            
aixmibd     6422988  0.0 1.34M root            
lrud         131336  0.0  832K root            
sendmail    6488522  0.0 1.14M root            
snmpdv3n    6095302  0.0 1020K root            

topas output after server has answered a request:

Topas Monitor for host:lpar1host                EVENTS/QUEUES    FILE/TTY        
Fri Jul 27 10:42:17 2018   Interval:2           Cswitch      83  Readch      349 
                                                Syscall  453.9K  Writech     715 
CPU     User% Kern% Wait% Idle%   Physc  Entc%  Reads         0  Rawin         0 
Total    14.6  41.4   0.0  43.9    0.76  95.23  Writes        0  Ttyout      349 
                                                Forks         0  Igets         0 
Network    BPS  I-Pkts  O-Pkts    B-In   B-Out  Execs         0  Namei         1 
Total    755.0    5.50    1.00   323.0   432.0  Runqueue   3.00  Dirblk        0 
                                                Waitqueue   0.0                  
Disk    Busy%      BPS     TPS  B-Read  B-Writ                   MEMORY          
Total     0.0        0       0       0       0  PAGING           Real,MB   12288 
                                                Faults        0  % Comp     25   
FileSystem          BPS    TPS  B-Read  B-Writ  Steals        0  % Noncomp  17   
Total             349.5   0.50   349.5       0  PgspIn        0  % Client   17   
                                                PgspOut       0                  
Name           PID  CPU%  PgSp Owner            PageIn        0  PAGING SPACE    
http_ser   12845450 55.8 17.7M root             PageOut       0  Size,MB    4096 
topas      14811502  0.1 6.28M root             Sios          0  % Used      0   
gil         3146084  0.0  960K root                              % Free    100   
lock_rcv    4587932  0.0  448K root             NFS (calls/sec)  
sshd        7930212  0.0 1.18M root             SerV2         0  WPAR Activ    0 
rpc.lock    6684968  0.0 1.19M root             CliV2         0  WPAR Total    0 
sendmail    6488522  0.0 1.14M root             SerV3         0  Press: "h"-help 
reaffin      459026  0.0  640K root             CliV3         0         "q"-quit 
sec_rcv      590164  0.0  448K root             SerV4         0  
qdaemon     7209224  0.0  240K root             CliV4         0  
aixmibd     6422988  0.0 1.34M root            
snmpdv3n    6095302  0.0 1020K root            
pilegc       786716  0.0  640K root            
netm        3080546  0.0  448K root            
syncd       3866946  0.0  616K root            
kbiod       4260338  0.0 3.50M root            
vtiol        655640  0.0 1.81M root            
xmgc         852254  0.0  448K root            
rtcmd       5243344  0.0  512K root            
nfssync_    5177788  0.0  704K root            

@amandeepgautam
Copy link
Author

It would seem that high CPU utilization is result of continuous polling. truss -p output before the request is made:

-bash-4.4# truss -p 13173136
_pollset_poll(0, 0x00000000)    (sleeping...)
                                                Err#78 ETIMEDOUT
_select(0, 0x00000000, 0x00000000, 0x00000000, 0x20457958) = 0
_pollset_poll(0, 0x00000000)    (sleeping...)

after the responding to the request, we see this being continuously printed:

_pollset_poll(0, 0x2050C430)                    = 3
_pollset_poll(0, 0x2050C430)                    = 3
_pollset_poll(0, 0x2050C430)                    = 3
_pollset_poll(0, 0x2050C430)                    = 3
_pollset_poll(0, 0x2050C430)                    = 3
_pollset_poll(0, 0x2050C430)                    = 3
_pollset_poll(0, 0x2050C430)                    = 3
_pollset_poll(0, 0x2050C430)                    = 3
_pollset_poll(0, 0x2050C430)                    = 3

@amandeepgautam
Copy link
Author

@ianlancetaylor can there be something we can do to debug this at our end? The problem is that even profiler has a bug(#26595) and it is really hard to find what is going on here.

@ianlancetaylor
Copy link
Contributor

I don't have access to an AIX system myself, so I hope that you can debug it (I have sent an e-mail to the folks who contributed the AIX port--I don't know if they are on GitHub).

The code in question in libgo/go/runtime/netpoll_aix.go. I don't know how the AIX pollset code works. The problem description sounds like some step needs to be taken to clear an event once it occurs.

@gopherbot
Copy link

Change https://golang.org/cl/126857 mentions this issue: runtime: use poll rather than pollset for netpoll on AIX

gopherbot pushed a commit to golang/gofrontend that referenced this issue Aug 1, 2018
Updates golang/go#26634

Change-Id: I5b97ceceed9a4c726831d2875360b27022e381cb
Reviewed-on: https://go-review.googlesource.com/126857
Reviewed-by: Ian Lance Taylor <iant@golang.org>
hubot pushed a commit to gcc-mirror/gcc that referenced this issue Aug 1, 2018
    
    Updates golang/go#26634
    
    Reviewed-on: https://go-review.googlesource.com/126857


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@263186 138bc75d-0d04-0410-961f-82ee72b054a4
@trex58
Copy link

trex58 commented Aug 7, 2018

We had noticed a performance issue on our side some days ago. This was due to the use of pollset_poll() routine for implementing libgo/go/runtime/netpoll_aix.go .
We have now changed this by using poll() routine instead.
We have done the change here and we have seen our performance issue disappear.
I have then built and delivered RPMs 8.1.0-3 (AIX 7.2) and 8.2.0-1 (AIX 7.2 & 7.1) on BullFreeware web-site, with this change.
When looking with nm at libgo.a of versions 8.1.0-1 and 8.1.0-2 of libgo RPM, I can see the pollset_poll() routine.
When looking with nm at libgo.a of version 8.2.0, I no more see it.
So, either you are not using the latest versions of libgo RPMs, or there is another issue, I think.
About the pprof utility, we still have libgo pprof tests failures on AIX. I'll see if we can improve this.

@trex58
Copy link

trex58 commented Aug 7, 2018

Please run the "ldd" command on your agent executable and run "rpm -q --whatprovides ...." on each library which belongs to /opt/freeware in order to check which libraries are used for building your executable.
Also, use a debugger for getting the stack when _pollset_poll is called, in order to know which code get it called.

@trex58
Copy link

trex58 commented Aug 7, 2018

Based on your email, the issue seems fixed. Please close this defect.

hubot pushed a commit to gcc-mirror/gcc that referenced this issue Aug 7, 2018
    
    Updates golang/go#26634
    
    Reviewed-on: https://go-review.googlesource.com/126857


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@263364 138bc75d-0d04-0410-961f-82ee72b054a4
@golang golang locked and limited conversation to collaborators Aug 7, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants