Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gccgo: big increase in compile time of cmplxdivide.go results in test timeout on ppc64le #43573

Open
laboger opened this issue Jan 7, 2021 · 4 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@laboger
Copy link
Contributor

laboger commented Jan 7, 2021

The test cmplxdivide.go recently had a dramatic increase in compile time, resulting in a compile timeout for the test:

Executing on host: /home/boger/gccgo.git/bld/gcc/testsuite/go/../../gccgo -B/home/boger/gccgo.git/bld/gcc/testsuite/go/../../ /home/boger/gccgo.git/gcc/gcc/testsuite/go.test/test/cmplxdivide.go /home/boger/gccgo.git/gcc/gcc/testsuite/go.test/test/cmplxdivide1.go   -fdiagnostics-plain-output  -I/home/boger/gccgo.git/bld/powerpc64le-linux/./libgo   -pedantic-errors  -L/home/boger/gccgo.git/bld/powerpc64le-linux/./libgo -L/home/boger/gccgo.git/bld/powerpc64le-linux/./libgo/.libs  -lm  -o ./cmplxdivide.o    (timeout = 300)
spawn -ignore SIGHUP /home/boger/gccgo.git/bld/gcc/testsuite/go/../../gccgo -B/home/boger/gccgo.git/bld/gcc/testsuite/go/../../ /home/boger/gccgo.git/gcc/gcc/testsuite/go.test/test/cmplxdivide.go /home/boger/gccgo.git/gcc/gcc/testsuite/go.test/test/cmplxdivide1.go -fdiagnostics-plain-output -I/home/boger/gccgo.git/bld/powerpc64le-linux/./libgo -pedantic-errors -L/home/boger/gccgo.git/bld/powerpc64le-linux/./libgo -L/home/boger/gccgo.git/bld/powerpc64le-linux/./libgo/.libs -lm -o ./cmplxdivide.o^M
WARNING: program timed out
compiler exited with status 1
exit status is 1
FAIL: go.test/test/cmplxdivide.go

This started happening after this change:

commit cd34d5f2c40f3c65407f4b0bee0b49fc84e4a4ab
Author: Ian Lance Taylor <iant@golang.org>
Date:   Tue Dec 1 18:59:18 2020 -0800

    compiler: defer to middle-end for complex division
    
    Go used to use slightly different semantics than C99 for complex division,
    so we used runtime routines to handle the different.  The gc compiler
    has changes its behavior to match C99, so changes ours as well.
    
    For golang/go#14644
    
    Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/274213

The compile now takes > 5m. Prior to the commit, it took about 12s.

@gopherbot gopherbot added this to the Gccgo milestone Jan 7, 2021
@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 7, 2021
@ianlancetaylor
Copy link
Member

On my x86 laptop building the test takes about 30 seconds. With the GCC 10 test took about 4 seconds.

I think the key change here is not the change to the compiler, but the change to the test. The test used to have a lot of references to the constant 0. Now they refer to the variable zero. That seems to slow things down a fair amount.

Unfortunately I don't see a quick fix.

@laboger
Copy link
Contributor Author

laboger commented Jan 11, 2021

I did a profile and it now shows this, which looks to be in register allocation.

  31.42%  go1       go1                        [.] bitmap_clear_bit                                              ◆
  27.88%  go1       go1                        [.] assign_by_spills                                              ▒
  25.15%  go1       go1                        [.] bitmap_set_bit                                                ▒
   8.21%  go1       go1                        [.] insert_in_live_range_start_chain                              ▒

I didn't realize the test was changed. If the increase is expected, is there a way the timeout can be increased?

@ianlancetaylor
Copy link
Member

If the test really takes 5 minutes to compile on PPC, we may just want to skip it for now.

Just in case it helps, can you compile the test with -ftime-report and show us the passes that take the most time? Thanks.

@laboger
Copy link
Contributor Author

laboger commented Jan 11, 2021

Skipping it is fine. I will ask someone on the gcc team about this. Here is the report.

[boger@trout test]$ gccgo -ftime-report cmplxdivide.go cmplxdivide1.go

Time variable                                   usr           sys          wall           GGC
 phase parsing                      :   0.24 (  0%)   0.00 (  0%)   0.24 (  0%)  8536k (  4%)
 phase opt and generate             : 400.72 (100%)   0.18 (100%) 400.87 (100%)   186M ( 96%)
 garbage collection                 :   0.28 (  0%)   0.00 (  0%)   0.29 (  0%)     0  (  0%)
 dump files                         :   0.00 (  0%)   0.01 (  6%)   0.01 (  0%)     0  (  0%)
 callgraph construction             :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)  2161k (  1%)
 callgraph optimization             :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 callgraph ipa passes               :   0.62 (  0%)   0.06 ( 33%)   0.67 (  0%)  5612k (  3%)
 ipa dead code removal              :   0.00 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 ipa inlining heuristics            :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 CFG verifier                       :   0.64 (  0%)   0.00 (  0%)   0.62 (  0%)     0  (  0%)
 trivially dead code                :   0.06 (  0%)   0.00 (  0%)   0.06 (  0%)     0  (  0%)
 df scan insns                      :   0.04 (  0%)   0.00 (  0%)   0.04 (  0%)  3264  (  0%)
 df live regs                       :   0.04 (  0%)   0.00 (  0%)   0.05 (  0%)     0  (  0%)
 df reg dead/unused notes           :   0.05 (  0%)   0.00 (  0%)   0.06 (  0%)  3416k (  2%)
 register information               :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 alias analysis                     :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)  4112k (  2%)
 rebuild jump labels                :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 parser (global)                    :   0.24 (  0%)   0.00 (  0%)   0.24 (  0%)  8535k (  4%)
 inline parameters                  :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)  7088  (  0%)
 tree gimplify                      :   0.06 (  0%)   0.00 (  0%)   0.05 (  0%)    10M (  5%)
 tree CFG construction              :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)    93k (  0%)
 tree SSA rewrite                   :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)  2555k (  1%)
 tree SSA other                     :   0.00 (  0%)   0.03 ( 17%)   0.03 (  0%)  4944  (  0%)
 tree operand scan                  :   0.01 (  0%)   0.02 ( 11%)   0.03 (  0%)  3010k (  2%)
 tree SSA verifier                  :   0.34 (  0%)   0.02 ( 11%)   0.30 (  0%)     0  (  0%)
 tree STMT verifier                 :   0.40 (  0%)   0.00 (  0%)   0.46 (  0%)     0  (  0%)
 callgraph verifier                 :   0.02 (  0%)   0.00 (  0%)   0.02 (  0%)     0  (  0%)
 dominance computation              :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)     0  (  0%)
 expand vars                        :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)  4723k (  2%)
 expand                             :   0.14 (  0%)   0.00 (  0%)   0.16 (  0%)    29M ( 15%)
 post expand cleanups               :   0.02 (  0%)   0.00 (  0%)   0.01 (  0%)    21k (  0%)
 integrated RA                      :   1.89 (  0%)   0.02 ( 11%)   1.90 (  0%)    25M ( 13%)
 LRA non-specific                   :   2.23 (  1%)   0.01 (  6%)   2.29 (  1%)    54M ( 28%)
 LRA virtuals elimination           :   0.24 (  0%)   0.02 ( 11%)   0.21 (  0%)   682k (  0%)
 LRA reload inheritance             :   0.45 (  0%)   0.00 (  0%)   0.46 (  0%)   582k (  0%)
 LRA create live ranges             :   2.15 (  1%)   0.00 (  0%)   2.14 (  1%)  3804k (  2%)
 LRA hard reg assignment            : 389.11 ( 97%)   0.05 ( 28%) 389.13 ( 97%)     0  (  0%)
 LRA coalesce pseudo regs           :   0.02 (  0%)   0.00 (  0%)   0.03 (  0%)     0  (  0%)
 reload                             :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)     0  (  0%)
 thread pro- & epilogue             :   0.20 (  0%)   0.00 (  0%)   0.19 (  0%)   212k (  0%)
 shorten branches                   :   0.07 (  0%)   0.00 (  0%)   0.06 (  0%)   330k (  0%)
 final                              :   0.56 (  0%)   0.00 (  0%)   0.54 (  0%)   861k (  0%)
 rest of compilation                :   0.45 (  0%)   0.00 (  0%)   0.42 (  0%)    40M ( 21%)
 verify RTL sharing                 :   1.10 (  0%)   0.00 (  0%)   1.16 (  0%)     0  (  0%)
 TOTAL                              : 400.96          0.18        401.11          195M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants