Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/big: large fibonacci calculation is faster in GMP #30943

Open
Borthralla opened this issue Mar 20, 2019 · 21 comments
Open

math/big: large fibonacci calculation is faster in GMP #30943

Borthralla opened this issue Mar 20, 2019 · 21 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@Borthralla
Copy link

What version of Go are you using (go version)?

go1.12.1 windows/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\Philip\AppData\Local\go-build
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=C:\Users\Philip\go
set GOPROXY=
set GORACE=
set GOROOT=C:\Go
set GOTMPDIR=
set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\Philip\AppData\Local\Temp\go-build593709294=/tmp/go-build -gno-record-gcc-switches

What did you do?

I wrote a Fibonacci calculator in Go using the math/big library. It was ~instant in run time up until n=10^7, after which it was extremely slow. I then used https://github.com/ncw/gmp to replace math/big with a gmp wrapper. No code was changed.

What did you expect to see?

A small performance benefit, maybe 2x or 3x.

What did you see instead?

With the same set of code, gmp quickly calculates n=10^8 in 5 seconds, while the normal math/big library takes more than 5 minutes. Gmp is even able to calculate n=10^9 after 1 minute.

@Borthralla
Copy link
Author

Borthralla commented Mar 20, 2019

Here is the fibonacci implementation:

func fib(n int) *big.Int {

	left := big.NewInt(1)
	right := big.NewInt(1)

	helper1 := big.NewInt(0)
	helper2 := big.NewInt(0)


	bin := fmt.Sprintf("%b", n)
	for i := 1; i < len(bin); i++ {

		helper1.Mul(big.NewInt(2), right)
		helper1.Sub(helper1, left)
		helper1.Mul(left, helper1)

		helper2.Mul(right, right)
		left.Mul(left, left)
		helper2.Add(helper2, left)

		if (bin[i] == '0') {
			left.Set(helper1)
			right.Set(helper2)
		} else {
			left.Set(helper2)
			right.Add(helper1, helper2)
		}
	}

	return left
}

Using math/big n=3*10^7:

     flat  flat%   sum%        cum   cum%
    31.93s 54.18% 54.18%     31.93s 54.18%  math/big.mulAddVWW
    23.17s 39.32% 93.50%     23.17s 39.32%  math/big.subVV
     2.34s  3.97% 97.47%      2.34s  3.97%  math/big.addMulVVW
     0.31s  0.53% 98.00%      0.31s  0.53%  runtime.cgocall
     0.15s  0.25% 98.25%      1.79s  3.04%  math/big.basicSqr
     0.14s  0.24% 98.49%     55.11s 93.52%  math/big.nat.divLarge
     0.13s  0.22% 98.71%      0.99s  1.68%  math/big.basicMul
     0.02s 0.034% 98.74%      0.38s  0.64%  internal/poll.(*FD).writeConsole
     0.01s 0.017% 98.76%      1.19s  2.02%  math/big.karatsuba
         0     0% 98.76%     55.71s 94.54%  fmt.(*pp).doPrintf

Using https://github.com/ncw/gmp at n=3*10^7 (same code):

     flat  flat%   sum%        cum   cum%
     1.07s 89.92% 89.92%      1.07s 89.92%  runtime.cgocall
     0.04s  3.36% 93.28%      0.04s  3.36%  runtime.memmove
     0.02s  1.68% 94.96%      0.02s  1.68%  runtime.stdcall2
     0.02s  1.68% 96.64%      0.02s  1.68%  unicode/utf16.Encode
     0.01s  0.84% 97.48%      0.01s  0.84%  indexbytebody
     0.01s  0.84% 98.32%      0.01s  0.84%  runtime.makemap_small
     0.01s  0.84% 99.16%      0.01s  0.84%  runtime.stdcall1
     0.01s  0.84%   100%      0.01s  0.84%  runtime.stdcall3
         0     0%   100%      0.53s 44.54%  fmt.(*pp).doPrintf
         0     0%   100%      0.53s 44.54%  fmt.(*pp).handleMethods

Using https://github.com/ncw/gmp at n=10^9 (same code):

flat  flat%   sum%        cum   cum%
    50.01s 91.81% 91.81%     50.01s 91.81%  runtime.cgocall
     1.14s  2.09% 93.90%      1.14s  2.09%  runtime.memmove
     0.54s  0.99% 94.90%     10.37s 19.04%  internal/poll.(*FD).writeConsole
     0.50s  0.92% 95.81%      0.50s  0.92%  runtime.procyield
     0.48s  0.88% 96.70%      0.60s  1.10%  unicode/utf16.Encode
     0.34s  0.62% 97.32%      0.34s  0.62%  runtime.memclrNoHeapPointers
     0.29s  0.53% 97.85%      0.29s  0.53%  unicode/utf8.DecodeRune
     0.01s 0.018% 97.87%      1.17s  2.15%  runtime.systemstack
         0     0% 97.87%     34.33s 63.03%  fmt.(*pp).doPrintf
         0     0% 97.87%     34.33s 63.03%  fmt.(*pp).handleMethods

@agnivade
Copy link
Contributor

gmp uses CGo. math/big is pure Go.

And according to their site https://gmplib.org/

GMP is carefully designed to be as fast as possible, both for small operands and for huge operands. The speed is achieved by using fullwords as the basic arithmetic type, by using fast algorithms, with highly optimised assembly code for the most common inner loops for a lot of CPUs, and by a general emphasis on speed.

It is expected that highly optimized C code will be faster than Go.

@crvv
Copy link
Contributor

crvv commented Mar 20, 2019

0.14s  0.24% 98.49%     55.11s 93.52%  math/big.nat.divLarge

divLarge is slow.
Please see #21960.

@crvv
Copy link
Contributor

crvv commented Mar 20, 2019

The code doesn't use divLarge, so the pprof result posted previously includes (*big.Int).String().

Without (*big.Int).String(), for n = 1e8, math/big uses 32 s and gmp uses 1.6 s on my machine.

The time complexity of math/big is about n**(log(3)/log(2)), which is the complexity of karatsuba algorithm.
For gmp, it is is about n**(log(2.3)/log(2))

The pprof result of math/big is

      flat  flat%   sum%        cum   cum%
    17.27s 59.80% 59.80%     17.27s 59.80%  math/big.addMulVVW
     1.88s  6.51% 66.31%      1.88s  6.51%  runtime.memmove
     1.87s  6.48% 72.78%      1.87s  6.48%  math/big.subVV
     1.51s  5.23% 78.01%      1.51s  5.23%  runtime.madvise
     1.38s  4.78% 82.79%     12.85s 44.49%  math/big.basicSqr
     1.34s  4.64% 87.43%      1.34s  4.64%  math/big.addVV
     1.09s  3.77% 91.20%      9.13s 31.61%  math/big.basicMul
     0.74s  2.56% 93.77%      0.74s  2.56%  runtime.memclrNoHeapPointers
     0.31s  1.07% 94.84%      0.31s  1.07%  math/big.shlVU
     0.24s  0.83% 95.67%      0.69s  2.39%  runtime.stkbucket

@thombrem
Copy link

Though I am not yet a Go developer, I am tempted to chime in, since I have implemented this in Python3.

This is not how I would find the nth Fibonacci Number. Instead use the direct formula using the Golden Ration (Phi) to achieve results in O(1) time!

I am uploading the code to a Public repository soon, will share link once done.

Thx
Milind

@thombrem
Copy link

I believe the correct term to Google would be Binet's Formula... and here it is..

F(n) = round( Phi^n / √5 )

This is yet another reason to know good math and not just code!

Regards
Milind

@Borthralla
Copy link
Author

I believe the correct term to Google would be Binet's Formula... and here it is..

F(n) = round( Phi^n / √5 )

This is yet another reason to know good math and not just code!

Regards
Milind

While the fibonacci implementation is not the focus of this thread, it is clear that you have some fundamental misunderstandings about floating point as well as big integer arithmetic. Floating point numbers are only o(1) multiplication because they have finite precision. Big Integers, on the other hand, have arbitrary precision. In particular, the size we are referring to in this thread are in the millions of digits.
At the level of precision necessary to get an exact answer, you would need extremely precise floating point values which would take much longer to multiply. Additionally, assuming you are using exponentiation by squaring and not naive exponentiation, it is the same number of multiplications (log(n)) as in the function I posted.
My implementation is about as fast as the native gmp implementation of fibonacci, which means that it doesn't really get much faster.
If you're in doubt, try plugging in n=10000000 to that function and see if you get the correct 2-million digit answer.

@thombrem
Copy link

@Borthralla Thanks for the insight! I do have a misunderstanding about the difference between floats and Big Integers. I will read this up in a day or two.

On a side note, this is yet another reason for me to be on this forum, thanks @Borthralla.

AND Exponentiation does take O(n) you are right yet again!

However, also note that there may be a great benefit in writing this in a single line of code as well, esp for (smaller) integer numbers.

@Borthralla
Copy link
Author

The code doesn't use divLarge, so the pprof result posted previously includes (*big.Int).String().

Without (*big.Int).String(), for n = 1e8, math/big uses 32 s and gmp uses 1.6 s on my machine.

The time complexity of math/big is about n**(log(3)/log(2)), which is the complexity of karatsuba algorithm.
For gmp, it is is about n**(log(2.3)/log(2))

The pprof result of math/big is

      flat  flat%   sum%        cum   cum%
    17.27s 59.80% 59.80%     17.27s 59.80%  math/big.addMulVVW
     1.88s  6.51% 66.31%      1.88s  6.51%  runtime.memmove
     1.87s  6.48% 72.78%      1.87s  6.48%  math/big.subVV
     1.51s  5.23% 78.01%      1.51s  5.23%  runtime.madvise
     1.38s  4.78% 82.79%     12.85s 44.49%  math/big.basicSqr
     1.34s  4.64% 87.43%      1.34s  4.64%  math/big.addVV
     1.09s  3.77% 91.20%      9.13s 31.61%  math/big.basicMul
     0.74s  2.56% 93.77%      0.74s  2.56%  runtime.memclrNoHeapPointers
     0.31s  1.07% 94.84%      0.31s  1.07%  math/big.shlVU
     0.24s  0.83% 95.67%      0.69s  2.39%  runtime.stkbucket

@crvv Thanks for the explanation. Are there plans to support faster operations for extremely large integers in math/big similar to gmp? Is gmp's method prohibitively complex?

@josharian
Copy link
Contributor

cc @griesemer @bmkessler

Do you know offhand whether gmp uses karatsuba or a different algorithm for this?

@ALTree
Copy link
Member

ALTree commented Mar 20, 2019

@josharian

Definitely not. gmp stops using karatsuba pretty early:

NxN limb multiplications and squares are done using one of seven algorithms, as the size N increases.

Algorithm Threshold
Basecase (none)
Karatsuba MUL_TOOM22_THRESHOLD
Toom-3 MUL_TOOM33_THRESHOLD
Toom-4 MUL_TOOM44_THRESHOLD
Toom-6.5 MUL_TOOM6H_THRESHOLD
Toom-8.5 MUL_TOOM8H_THRESHOLD
FFT MUL_FFT_THRESHOLD

Source: https://gmplib.org/manual/Multiplication-Algorithms.html#Multiplication-Algorithms

@ALTree
Copy link
Member

ALTree commented Mar 20, 2019

The thresholds are calibrated by a calibration procedure, but for reference MUL_TOOM33_THRESHOLD (the point where they stop using karatsuba) is pretty low. Like in the order of a 100 limbs (= machine words): https://gmplib.org/repo/gmp/file/tip/mpn/x86/coreisbr/gmp-mparam.h#l56

@ericlagergren
Copy link
Contributor

ericlagergren commented Mar 20, 2019

@josharian GMP uses schoolbook -> Toom-Cook -> Schönhage-Strassen for multiplication, depending on the input size.

@Borthralla
Copy link
Author

Does math/big include an implementation of Schönhage-Strassen?

@ericlagergren
Copy link
Contributor

No, it uses karatsuba for large numbers.

@bmkessler
Copy link
Contributor

For faster asymptotic multiplication, @remyoudompheng has https://github.com/remyoudompheng/bigfft which is much faster than math/big and only ~2x slower than gmp for very large numbers. Maybe he would like to contribute that implementation to math/big.

Also, for printing large integers @crvv noted above see #20906

@griesemer
Copy link
Contributor

Most observations have already been made here. To summarize:

  • We know that certain low-level ops could be faster (e.g., addMulVVW)
  • math/big only implements Karatsuba. It should use the various Toom versions and FFT for very large numbers.
  • Division is slow (for output).

One should probably start with the Tooms.

@Borthralla
Copy link
Author

Borthralla commented Mar 25, 2019

Good new, guys! They found an nlog(n) multiplication algorithm, we should just use that!
https://hal.archives-ouvertes.fr/hal-02070778/document
(Mostly joking, but the algorithm is real and it just came out today)

@gopherbot
Copy link

Change https://golang.org/cl/172018 mentions this issue: math/big: implement recursive algorithm for division

@thombrem
Copy link

I can think of a O(log(log(n))) best case time complexity algorithm for computing exponents, which means Binet's formula might be doable for fibonacci(n).

Sorry to bring this up again
~Milind

@julieqiu julieqiu changed the title math/big: Large Fibonacci Calculation is Tremendously Faster in GMP math/big: large fibonacci calculation is faster in GMP May 28, 2019
@julieqiu julieqiu added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 28, 2019
@bcmills bcmills added this to the Unplanned milestone May 28, 2019
gopherbot pushed a commit that referenced this issue Nov 12, 2019
The current division algorithm produces one word of result at a time,
using 2-word division to compute the top word and mulAddVWW to compute
the remainder. The top word may need to be adjusted by 1 or 2 units.

The recursive version, based on Burnikel, Ziegler, "Fast Recursive Division",
uses the same principles, but in a multi-word setting, so that
multiplication benefits from the Karatsuba algorithm (and possibly later
improvements).

benchmark                             old ns/op        new ns/op      delta
BenchmarkDiv/20/10-4                  38.2             38.3           +0.26%
BenchmarkDiv/40/20-4                  38.7             38.5           -0.52%
BenchmarkDiv/100/50-4                 62.5             62.6           +0.16%
BenchmarkDiv/200/100-4                238              259            +8.82%
BenchmarkDiv/400/200-4                311              338            +8.68%
BenchmarkDiv/1000/500-4               604              649            +7.45%
BenchmarkDiv/2000/1000-4              1214             1278           +5.27%
BenchmarkDiv/20000/10000-4            38279            36510          -4.62%
BenchmarkDiv/200000/100000-4          3022057          1359615        -55.01%
BenchmarkDiv/2000000/1000000-4        310827664        54012939       -82.62%
BenchmarkDiv/20000000/10000000-4      33272829421      1965401359     -94.09%
BenchmarkString/10/Base10-4           158              156            -1.27%
BenchmarkString/100/Base10-4          797              792            -0.63%
BenchmarkString/1000/Base10-4         3677             3814           +3.73%
BenchmarkString/10000/Base10-4        16633            17116          +2.90%
BenchmarkString/100000/Base10-4       5779029          1793808        -68.96%
BenchmarkString/1000000/Base10-4      889840820        85524031       -90.39%
BenchmarkString/10000000/Base10-4     134338236860     4935657026     -96.33%

Fixes #21960
Updates #30943

Change-Id: I134c6f81a47870c688ca95b6081eb9211def15a2
Reviewed-on: https://go-review.googlesource.com/c/go/+/172018
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@gopherbot
Copy link

Change https://golang.org/cl/266201 mentions this issue: math/big: implement Schönhage–Strassen fft algorithm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests