Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/big: GCD too slow #15833

Closed
TuomLarsen opened this issue May 25, 2016 · 14 comments
Closed

math/big: GCD too slow #15833

TuomLarsen opened this issue May 25, 2016 · 14 comments

Comments

@TuomLarsen
Copy link

  1. What version of Go are you using (go version)?

go1.6.1

  1. What operating system and processor architecture are you using (go env)?

linux/amd64

  1. What did you do?

Recently I found out that the bottleneck in my program is the computation of greatest common divisor. CPython 3.5 uses Lehmer's algorithm and runs almost 8x faster. (For very large integers there are even faster algorithms.)

package main

import (
    "fmt"
    "math/big"
    "time"
)

func main() {

    a, _ := new(big.Int).SetString("162446598402363997673493179860749206481570626148774722480464916217519998582106439959196344215073041683672767407342600465416394761370312812652387014077287533579464234592025270747189685651849855568219894257068017511394846018248946392555048664581847849090691446151175543365199139365693828091221296651564817601708173519333480541557916813321048094446738924484963396047875101501912007668905398696596051247666259795425384659975708544211774775625965457167714100319140254438470405084497675188274084057803534621424329554676703869536840619010332513253366351288042987043711316273802950111102675169897156927510722936205243934593761221732977646338069405205393572558504296440846844654915994056051080138684289188618799496532006547298248428910424360291035687113795912784248641162116970198942033380494094025012100218442198181604277428406480729238384656494840163546922286562841336237622223094660753989052636783736736730051815788481272143711008904458168016828251588196506829619252076432097202449298271511577883563293048604787739890118213783329525568780738050849374005547963863874473115226603810373230563816359760094564252571054145767564049725284205901365726049711717022372349529374435379628254191205730293550748181543891163723281674777750059512413586588828314084535671582111735660379159552799083036587458583555292293962217151577840489845885188592504493431102401049590622433757873291425858434918697762180295687147582650879514071801716939533681208102052410202696047460966976913845601990028270324422645505289049580017114480330931133826162290356916061574256540132190171694393638181573510025576808968516762515657212669911734833269039513284253725561578543014863535935809511150296456155470508927211411840144748006797665686217130525378029137093695488159569875226988769985175538075995372140625158000424680862039472332990803861888330870444263274715912499029641167624951512824402013661747715825752288086558167740183961939884790195081886224217177152869350897055478137773963535642498587936119572587325769768711699791252679336168006908091872154677467014758193802812900921873466740528890306650385439501239427958166273861080306246501556064395044538359837700353874621248486133241226805310121486599455004347652284602607212474583483132546550311381260917418037865509077618272877792484328607271857806356", 10)
    b, _ := new(big.Int).SetString("101784352192926179474691847690295364412146644944337147893018137607751766917729750758328166579925569899765459590972517709626629422864683438193290125043063518436017102066763671846630815217722251657259307553099132772201209113951039569557515190691205890052686828351751884803712581582093817758343808197167567247907540122605321423275330224580551817963867742145560163431380567362003538753856228060374866951036259004102917220472306669995666505172223628113212045865085784184336530203837908112187832616046195538504845768119298751526550607442806456808908879070037709190448307774015887964660855587392536795762810554496174847323701119347378258018659963334626467144614011455875192800289417692534392258372638105441472555031300945385673635743106163778851484848839379940456113937207911595141513680574367846648636353220309193154584372103496857062568070324916862436127133316127989083231670428229702525886963156897871633715273896484904539939624236288247009844489720556858425695244949730267384132132441488654009449316215449359564589915898961035516198007811431244695623908883885142048155515509053435754497981693091320819719426795331186500179253864761247316755487225443097006915430516922716275267385443349373289960634713632572446080364203373228601555282278183545440347991778185107399327758589346260248433939637585528206559787605105051710971698307363677567320558959899872756874248500394667863120050131533976493034552093698090232503340912301095656404952421422358796326151351099379863658968843482129929673487863396575147218726229000469299309801093105964119662371898258712567149943154476612246157910925561894764017985426180288097588055722886924974722109733733123413154279708050676188619796030777587411061473509487256419260726999082015872941209426850601596056518119145537343224544999826199864836904118369929404501365516712854950664599501540383745231341313574626357265253728062815992109070862862165578453391683605892693204046869207638687457235701255574852208310767024525571635865352089126234994777689748203394347241246694312344558416290642852901684770713683262440625612790798884009774609112528636374285113491984040948539722165924050855717036844286332805554836911465419002994297397099377669991903562736772713728306241921475535135050281780162431545636315231851527333384539192469227100575974540", 10)

    t := time.Now()

    for i := 0; i < 10000; i++ {
        new(big.Int).GCD(nil, nil, a, b)
    }

    fmt.Println(time.Since(t))

}

(play.golang.org: "process took too long")

  1. What did you expect to see?

Following Python program runs under CPython 3.5 in 1.18s

from math import gcd
from time import time

a = 162446598402363997673493179860749206481570626148774722480464916217519998582106439959196344215073041683672767407342600465416394761370312812652387014077287533579464234592025270747189685651849855568219894257068017511394846018248946392555048664581847849090691446151175543365199139365693828091221296651564817601708173519333480541557916813321048094446738924484963396047875101501912007668905398696596051247666259795425384659975708544211774775625965457167714100319140254438470405084497675188274084057803534621424329554676703869536840619010332513253366351288042987043711316273802950111102675169897156927510722936205243934593761221732977646338069405205393572558504296440846844654915994056051080138684289188618799496532006547298248428910424360291035687113795912784248641162116970198942033380494094025012100218442198181604277428406480729238384656494840163546922286562841336237622223094660753989052636783736736730051815788481272143711008904458168016828251588196506829619252076432097202449298271511577883563293048604787739890118213783329525568780738050849374005547963863874473115226603810373230563816359760094564252571054145767564049725284205901365726049711717022372349529374435379628254191205730293550748181543891163723281674777750059512413586588828314084535671582111735660379159552799083036587458583555292293962217151577840489845885188592504493431102401049590622433757873291425858434918697762180295687147582650879514071801716939533681208102052410202696047460966976913845601990028270324422645505289049580017114480330931133826162290356916061574256540132190171694393638181573510025576808968516762515657212669911734833269039513284253725561578543014863535935809511150296456155470508927211411840144748006797665686217130525378029137093695488159569875226988769985175538075995372140625158000424680862039472332990803861888330870444263274715912499029641167624951512824402013661747715825752288086558167740183961939884790195081886224217177152869350897055478137773963535642498587936119572587325769768711699791252679336168006908091872154677467014758193802812900921873466740528890306650385439501239427958166273861080306246501556064395044538359837700353874621248486133241226805310121486599455004347652284602607212474583483132546550311381260917418037865509077618272877792484328607271857806356
b = 101784352192926179474691847690295364412146644944337147893018137607751766917729750758328166579925569899765459590972517709626629422864683438193290125043063518436017102066763671846630815217722251657259307553099132772201209113951039569557515190691205890052686828351751884803712581582093817758343808197167567247907540122605321423275330224580551817963867742145560163431380567362003538753856228060374866951036259004102917220472306669995666505172223628113212045865085784184336530203837908112187832616046195538504845768119298751526550607442806456808908879070037709190448307774015887964660855587392536795762810554496174847323701119347378258018659963334626467144614011455875192800289417692534392258372638105441472555031300945385673635743106163778851484848839379940456113937207911595141513680574367846648636353220309193154584372103496857062568070324916862436127133316127989083231670428229702525886963156897871633715273896484904539939624236288247009844489720556858425695244949730267384132132441488654009449316215449359564589915898961035516198007811431244695623908883885142048155515509053435754497981693091320819719426795331186500179253864761247316755487225443097006915430516922716275267385443349373289960634713632572446080364203373228601555282278183545440347991778185107399327758589346260248433939637585528206559787605105051710971698307363677567320558959899872756874248500394667863120050131533976493034552093698090232503340912301095656404952421422358796326151351099379863658968843482129929673487863396575147218726229000469299309801093105964119662371898258712567149943154476612246157910925561894764017985426180288097588055722886924974722109733733123413154279708050676188619796030777587411061473509487256419260726999082015872941209426850601596056518119145537343224544999826199864836904118369929404501365516712854950664599501540383745231341313574626357265253728062815992109070862862165578453391683605892693204046869207638687457235701255574852208310767024525571635865352089126234994777689748203394347241246694312344558416290642852901684770713683262440625612790798884009774609112528636374285113491984040948539722165924050855717036844286332805554836911465419002994297397099377669991903562736772713728306241921475535135050281780162431545636315231851527333384539192469227100575974540

t = time()

for i in range(10000):
    gcd(a, b)

print(time() - t)
  1. What did you see instead?

The above go program runs on my machine 9.11s

@ghost
Copy link

ghost commented May 25, 2016

go1.6.2
linux/amd64
Intel i3 CPU

Same experience of me, math/big is very slow, I have not see the math/big's source code yet, so I try gccgo-4.9 for better performance, BUT it's even slower then gc, 3x slower then gc

go build -compiler gc gcd.go
time ./gcd
real 0m6.428s

go build -compiler gccgo -gccgoflags "-O3 -ffast-math -march=native" gcd.go
time ./gcd
real 0m18.032s

At last I try https://github.com/ncw/gmp, It's a libgmp wrapper for go, this will be more faster than above

@odeke-em
Copy link
Member

odeke-em commented Jun 4, 2016

I was taking a look at this as well as the code in misc/cgo and noticed that in the tree under CGO examples ie misc/cgo/gmp we have a wrapper for the gmp library https://github.com/golang/go/blob/master/misc/cgo/gmp/gmp.go#L365-L372. It was added in 2009, but am sure @ianlancetaylor and @griesemer know about it since y'all edited that file before https://github.com/golang/go/commits/master/misc/cgo/gmp/gmp.go. The code isn't directly a drop in replacement for math/big due to slightly different signatures, but pretty much is.

--- cmd/gcd.go  2016-06-04 12:56:10.000000000 -0700
+++ orig.go 2016-06-04 12:56:59.000000000 -0700
@@ -2,24 +2,21 @@

 import (
    "fmt"
+   "math/big"
    "time"
-
-   big "github.com/odeke-em/gmp"
 )

 func main() {
-   a := new(big.Int)
-   _ = a.SetString("162446598402363997673493179860749206481570626148774722480464916217519998582106439959196344215073041683672767407342600465416394761370312812652387014077287533579464234592025270747189685651849855568219894257068017511394846018248946392555048664581847849090691446151175543365199139365693828091221296651564817601708173519333480541557916813321048094446738924484963396047875101501912007668905398696596051247666259795425384659975708544211774775625965457167714100319140254438470405084497675188274084057803534621424329554676703869536840619010332513253366351288042987043711316273802950111102675169897156927510722936205243934593761221732977646338069405205393572558504296440846844654915994056051080138684289188618799496532006547298248428910424360291035687113795912784248641162116970198942033380494094025012100218442198181604277428406480729238384656494840163546922286562841336237622223094660753989052636783736736730051815788481272143711008904458168016828251588196506829619252076432097202449298271511577883563293048604787739890118213783329525568780738050849374005547963863874473115226603810373230563816359760094564252571054145767564049725284205901365726049711717022372349529374435379628254191205730293550748181543891163723281674777750059512413586588828314084535671582111735660379159552799083036587458583555292293962217151577840489845885188592504493431102401049590622433757873291425858434918697762180295687147582650879514071801716939533681208102052410202696047460966976913845601990028270324422645505289049580017114480330931133826162290356916061574256540132190171694393638181573510025576808968516762515657212669911734833269039513284253725561578543014863535935809511150296456155470508927211411840144748006797665686217130525378029137093695488159569875226988769985175538075995372140625158000424680862039472332990803861888330870444263274715912499029641167624951512824402013661747715825752288086558167740183961939884790195081886224217177152869350897055478137773963535642498587936119572587325769768711699791252679336168006908091872154677467014758193802812900921873466740528890306650385439501239427958166273861080306246501556064395044538359837700353874621248486133241226805310121486599455004347652284602607212474583483132546550311381260917418037865509077618272877792484328607271857806356", 10)
-   b := new(big.Int)
-   _ = b.SetString("101784352192926179474691847690295364412146644944337147893018137607751766917729750758328166579925569899765459590972517709626629422864683438193290125043063518436017102066763671846630815217722251657259307553099132772201209113951039569557515190691205890052686828351751884803712581582093817758343808197167567247907540122605321423275330224580551817963867742145560163431380567362003538753856228060374866951036259004102917220472306669995666505172223628113212045865085784184336530203837908112187832616046195538504845768119298751526550607442806456808908879070037709190448307774015887964660855587392536795762810554496174847323701119347378258018659963334626467144614011455875192800289417692534392258372638105441472555031300945385673635743106163778851484848839379940456113937207911595141513680574367846648636353220309193154584372103496857062568070324916862436127133316127989083231670428229702525886963156897871633715273896484904539939624236288247009844489720556858425695244949730267384132132441488654009449316215449359564589915898961035516198007811431244695623908883885142048155515509053435754497981693091320819719426795331186500179253864761247316755487225443097006915430516922716275267385443349373289960634713632572446080364203373228601555282278183545440347991778185107399327758589346260248433939637585528206559787605105051710971698307363677567320558959899872756874248500394667863120050131533976493034552093698090232503340912301095656404952421422358796326151351099379863658968843482129929673487863396575147218726229000469299309801093105964119662371898258712567149943154476612246157910925561894764017985426180288097588055722886924974722109733733123413154279708050676188619796030777587411061473509487256419260726999082015872941209426850601596056518119145537343224544999826199864836904118369929404501365516712854950664599501540383745231341313574626357265253728062815992109070862862165578453391683605892693204046869207638687457235701255574852208310767024525571635865352089126234994777689748203394347241246694312344558416290642852901684770713683262440625612790798884009774609112528636374285113491984040948539722165924050855717036844286332805554836911465419002994297397099377669991903562736772713728306241921475535135050281780162431545636315231851527333384539192469227100575974540", 10)
+
+   a, _ := new(big.Int).SetString("162446598402363997673493179860749206481570626148774722480464916217519998582106439959196344215073041683672767407342600465416394761370312812652387014077287533579464234592025270747189685651849855568219894257068017511394846018248946392555048664581847849090691446151175543365199139365693828091221296651564817601708173519333480541557916813321048094446738924484963396047875101501912007668905398696596051247666259795425384659975708544211774775625965457167714100319140254438470405084497675188274084057803534621424329554676703869536840619010332513253366351288042987043711316273802950111102675169897156927510722936205243934593761221732977646338069405205393572558504296440846844654915994056051080138684289188618799496532006547298248428910424360291035687113795912784248641162116970198942033380494094025012100218442198181604277428406480729238384656494840163546922286562841336237622223094660753989052636783736736730051815788481272143711008904458168016828251588196506829619252076432097202449298271511577883563293048604787739890118213783329525568780738050849374005547963863874473115226603810373230563816359760094564252571054145767564049725284205901365726049711717022372349529374435379628254191205730293550748181543891163723281674777750059512413586588828314084535671582111735660379159552799083036587458583555292293962217151577840489845885188592504493431102401049590622433757873291425858434918697762180295687147582650879514071801716939533681208102052410202696047460966976913845601990028270324422645505289049580017114480330931133826162290356916061574256540132190171694393638181573510025576808968516762515657212669911734833269039513284253725561578543014863535935809511150296456155470508927211411840144748006797665686217130525378029137093695488159569875226988769985175538075995372140625158000424680862039472332990803861888330870444263274715912499029641167624951512824402013661747715825752288086558167740183961939884790195081886224217177152869350897055478137773963535642498587936119572587325769768711699791252679336168006908091872154677467014758193802812900921873466740528890306650385439501239427958166273861080306246501556064395044538359837700353874621248486133241226805310121486599455004347652284602607212474583483132546550311381260917418037865509077618272877792484328607271857806356", 10)
+   b, _ := new(big.Int).SetString("101784352192926179474691847690295364412146644944337147893018137607751766917729750758328166579925569899765459590972517709626629422864683438193290125043063518436017102066763671846630815217722251657259307553099132772201209113951039569557515190691205890052686828351751884803712581582093817758343808197167567247907540122605321423275330224580551817963867742145560163431380567362003538753856228060374866951036259004102917220472306669995666505172223628113212045865085784184336530203837908112187832616046195538504845768119298751526550607442806456808908879070037709190448307774015887964660855587392536795762810554496174847323701119347378258018659963334626467144614011455875192800289417692534392258372638105441472555031300945385673635743106163778851484848839379940456113937207911595141513680574367846648636353220309193154584372103496857062568070324916862436127133316127989083231670428229702525886963156897871633715273896484904539939624236288247009844489720556858425695244949730267384132132441488654009449316215449359564589915898961035516198007811431244695623908883885142048155515509053435754497981693091320819719426795331186500179253864761247316755487225443097006915430516922716275267385443349373289960634713632572446080364203373228601555282278183545440347991778185107399327758589346260248433939637585528206559787605105051710971698307363677567320558959899872756874248500394667863120050131533976493034552093698090232503340912301095656404952421422358796326151351099379863658968843482129929673487863396575147218726229000469299309801093105964119662371898258712567149943154476612246157910925561894764017985426180288097588055722886924974722109733733123413154279708050676188619796030777587411061473509487256419260726999082015872941209426850601596056518119145537343224544999826199864836904118369929404501365516712854950664599501540383745231341313574626357265253728062815992109070862862165578453391683605892693204046869207638687457235701255574852208310767024525571635865352089126234994777689748203394347241246694312344558416290642852901684770713683262440625612790798884009774609112528636374285113491984040948539722165924050855717036844286332805554836911465419002994297397099377669991903562736772713728306241921475535135050281780162431545636315231851527333384539192469227100575974540", 10)

    t := time.Now()

    for i := 0; i < 10000; i++ {
-       divisor := new(big.Int)
-       x, y := new(big.Int), new(big.Int)
-       big.GcdInt(divisor, x, y, a, b)
+       new(big.Int).GCD(nil, nil, a, b)
    }

    fmt.Println(time.Since(t))
-}
+
+}
$ go run gcd.go 
1.347481406s

@gopherbot
Copy link

CL https://golang.org/cl/50530 mentions this issue.

@bmkessler
Copy link
Contributor

CL above is a minor optimization for the extended euclidean algorithm of only calculating one set of cofactors inside the division loop. On my machine it speeds up the extended GCD about 20% and should improve the ModInverse more since only only coefficient is needed in that case. Since the euclidean algorithm is the base case for Lehmer's algorithm, this should still prove useful when Lehmer's algorithm is implemented.

gopherbot pushed a commit that referenced this issue Aug 16, 2017
The current implementation of the extended Euclidean GCD algorithm
calculates both cosequences x and y inside the division loop. This
is unneccessary since the second Bezout coefficient can be obtained
at the end of calculation via a multiplication, subtraction and a
division.  In case only one coefficient is needed, e.g. ModInverse
this calculation can be skipped entirely.  This is a standard
optimization, see e.g.

"Handbook of Elliptic and Hyperelliptic Curve Cryptography"
Cohen et al pp 191
Available at:
http://cs.ucsb.edu/~koc/ccs130h/2013/EllipticHyperelliptic-CohenFrey.pdf

Updates #15833

Change-Id: I1e0d2e63567cfed97fd955048fe6373d36f22757
Reviewed-on: https://go-review.googlesource.com/50530
Reviewed-by: Robert Griesemer <gri@golang.org>
@bmkessler
Copy link
Contributor

@griesemer I had a go at implementing a basic version of Lehmer's algorithm and got the following results for the problem benchmark above on my machine.

Current binary GCD code:
14.189023248s

Lehmer's algorithm:
2.247663627s

For around a 7x speed up, but it is still about a factor of two slower than the gmp wrapper 1.010165176s

It should be straightforward to use Lehmer's algorithm on the extended GCD and speed that up as well.

@gopherbot
Copy link

Change https://golang.org/cl/59450 mentions this issue: math/big: implement lehmer's GCD algorithm

gopherbot pushed a commit that referenced this issue Oct 24, 2017
Updates #15833

Lehmer's GCD algorithm uses single precision calculations
to simulate several steps of multiple precision calculations
in Euclid's GCD algorithm which leads to a considerable
speed up.  This implementation uses Collins' simplified
testing condition on the single digit cosequences which
requires only one quotient and avoids any possibility of
overflow.

name                          old time/op  new time/op  delta
GCD10x10/WithoutXY-4          1.82µs ±24%  0.28µs ± 6%  -84.40%  (p=0.008 n=5+5)
GCD10x10/WithXY-4             1.69µs ± 6%  1.71µs ± 6%     ~     (p=0.595 n=5+5)
GCD10x100/WithoutXY-4         1.87µs ± 2%  0.56µs ± 4%  -70.13%  (p=0.008 n=5+5)
GCD10x100/WithXY-4            2.61µs ± 2%  2.65µs ± 4%     ~     (p=0.635 n=5+5)
GCD10x1000/WithoutXY-4        2.75µs ± 2%  1.48µs ± 1%  -46.06%  (p=0.008 n=5+5)
GCD10x1000/WithXY-4           5.29µs ± 2%  5.25µs ± 2%     ~     (p=0.548 n=5+5)
GCD10x10000/WithoutXY-4       10.7µs ± 2%  10.3µs ± 0%   -4.38%  (p=0.008 n=5+5)
GCD10x10000/WithXY-4          22.3µs ± 6%  22.1µs ± 1%     ~     (p=1.000 n=5+5)
GCD10x100000/WithoutXY-4      93.7µs ± 2%  99.4µs ± 2%   +6.09%  (p=0.008 n=5+5)
GCD10x100000/WithXY-4          196µs ± 2%   199µs ± 2%     ~     (p=0.222 n=5+5)
GCD100x100/WithoutXY-4        10.1µs ± 2%   2.5µs ± 2%  -74.84%  (p=0.008 n=5+5)
GCD100x100/WithXY-4           21.4µs ± 2%  21.3µs ± 7%     ~     (p=0.548 n=5+5)
GCD100x1000/WithoutXY-4       11.3µs ± 2%   4.4µs ± 4%  -60.87%  (p=0.008 n=5+5)
GCD100x1000/WithXY-4          24.7µs ± 3%  23.9µs ± 1%     ~     (p=0.056 n=5+5)
GCD100x10000/WithoutXY-4      26.6µs ± 1%  20.0µs ± 2%  -24.82%  (p=0.008 n=5+5)
GCD100x10000/WithXY-4         78.7µs ± 2%  78.2µs ± 2%     ~     (p=0.690 n=5+5)
GCD100x100000/WithoutXY-4      174µs ± 2%   171µs ± 1%     ~     (p=0.056 n=5+5)
GCD100x100000/WithXY-4         563µs ± 4%   561µs ± 2%     ~     (p=1.000 n=5+5)
GCD1000x1000/WithoutXY-4       120µs ± 5%    29µs ± 3%  -75.71%  (p=0.008 n=5+5)
GCD1000x1000/WithXY-4          355µs ± 4%   358µs ± 2%     ~     (p=0.841 n=5+5)
GCD1000x10000/WithoutXY-4      140µs ± 2%    49µs ± 2%  -65.07%  (p=0.008 n=5+5)
GCD1000x10000/WithXY-4         626µs ± 3%   628µs ± 9%     ~     (p=0.690 n=5+5)
GCD1000x100000/WithoutXY-4     340µs ± 4%   259µs ± 6%  -23.79%  (p=0.008 n=5+5)
GCD1000x100000/WithXY-4       3.76ms ± 4%  3.82ms ± 5%     ~     (p=0.310 n=5+5)
GCD10000x10000/WithoutXY-4    3.11ms ± 3%  0.54ms ± 2%  -82.74%  (p=0.008 n=5+5)
GCD10000x10000/WithXY-4       7.96ms ± 3%  7.69ms ± 3%     ~     (p=0.151 n=5+5)
GCD10000x100000/WithoutXY-4   3.88ms ± 1%  1.27ms ± 2%  -67.21%  (p=0.008 n=5+5)
GCD10000x100000/WithXY-4      38.1ms ± 2%  38.8ms ± 1%     ~     (p=0.095 n=5+5)
GCD100000x100000/WithoutXY-4   208ms ± 1%    25ms ± 4%  -88.07%  (p=0.008 n=5+5)
GCD100000x100000/WithXY-4      533ms ± 5%   525ms ± 4%     ~     (p=0.548 n=5+5)

Change-Id: Ic1e007eb807b93e75f4752e968e98c1f0cb90e43
Reviewed-on: https://go-review.googlesource.com/59450
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
@TuomLarsen
Copy link
Author

I saw your discussion on Gerrit about when is it worth to switch to sub-quadratic algorithm w.r.t. to number of words. This is how GMP goes it, thought you might find it interesting:
https://gmplib.org/devel/thres/HGCD_THRESHOLD.html

@bmkessler thanks a lot for the implementation!

@gopherbot
Copy link

Change https://golang.org/cl/78755 mentions this issue: math/big: implement Lehmer's extended GCD algorithm

gopherbot pushed a commit that referenced this issue May 8, 2018
Updates #15833

The extended GCD algorithm can be implemented using
Lehmer's algorithm with additional updates for the
cosequences following Algorithm 10.45 from Cohen et al.
"Handbook of Elliptic and Hyperelliptic Curve Cryptography" pp 192.
This brings the speed of the extended GCD calculation within
~2x of the base GCD calculation.  There is a slight degradation in
the non-extended GCD speed for small inputs (1-2 words) due to the
additional code to handle the extended updates.

name                          old time/op    new time/op    delta
GCD10x10/WithoutXY-4             262ns ± 1%     266ns ± 2%     ~     (p=0.333 n=5+5)
GCD10x10/WithXY-4               1.42µs ± 2%    0.74µs ± 3%  -47.90%  (p=0.008 n=5+5)
GCD10x100/WithoutXY-4            520ns ± 2%     539ns ± 1%   +3.81%  (p=0.008 n=5+5)
GCD10x100/WithXY-4              2.32µs ± 1%    1.67µs ± 0%  -27.80%  (p=0.008 n=5+5)
GCD10x1000/WithoutXY-4          1.40µs ± 1%    1.45µs ± 2%   +3.26%  (p=0.016 n=4+5)
GCD10x1000/WithXY-4             4.78µs ± 1%    3.43µs ± 1%  -28.37%  (p=0.008 n=5+5)
GCD10x10000/WithoutXY-4         10.0µs ± 0%    10.2µs ± 3%   +1.80%  (p=0.008 n=5+5)
GCD10x10000/WithXY-4            20.9µs ± 3%    17.9µs ± 1%  -14.20%  (p=0.008 n=5+5)
GCD10x100000/WithoutXY-4        96.8µs ± 0%    96.3µs ± 1%     ~     (p=0.310 n=5+5)
GCD10x100000/WithXY-4            196µs ± 3%     159µs ± 2%  -18.61%  (p=0.008 n=5+5)
GCD100x100/WithoutXY-4          2.53µs ±15%    2.34µs ± 0%   -7.35%  (p=0.008 n=5+5)
GCD100x100/WithXY-4             19.3µs ± 0%     3.9µs ± 1%  -79.58%  (p=0.008 n=5+5)
GCD100x1000/WithoutXY-4         4.23µs ± 0%    4.17µs ± 3%     ~     (p=0.127 n=5+5)
GCD100x1000/WithXY-4            22.8µs ± 1%     7.5µs ±10%  -67.00%  (p=0.008 n=5+5)
GCD100x10000/WithoutXY-4        19.1µs ± 0%    19.0µs ± 0%     ~     (p=0.095 n=5+5)
GCD100x10000/WithXY-4           75.1µs ± 2%    30.5µs ± 2%  -59.38%  (p=0.008 n=5+5)
GCD100x100000/WithoutXY-4        170µs ± 5%     167µs ± 1%     ~     (p=1.000 n=5+5)
GCD100x100000/WithXY-4           542µs ± 2%     267µs ± 2%  -50.79%  (p=0.008 n=5+5)
GCD1000x1000/WithoutXY-4        28.0µs ± 0%    27.1µs ± 0%   -3.29%  (p=0.008 n=5+5)
GCD1000x1000/WithXY-4            329µs ± 0%      42µs ± 1%  -87.12%  (p=0.008 n=5+5)
GCD1000x10000/WithoutXY-4       47.2µs ± 0%    46.4µs ± 0%   -1.65%  (p=0.016 n=5+4)
GCD1000x10000/WithXY-4           607µs ± 9%     123µs ± 1%  -79.70%  (p=0.008 n=5+5)
GCD1000x100000/WithoutXY-4       260µs ±17%     245µs ± 0%     ~     (p=0.056 n=5+5)
GCD1000x100000/WithXY-4         3.64ms ± 1%    0.93ms ± 1%  -74.41%  (p=0.016 n=4+5)
GCD10000x10000/WithoutXY-4       513µs ± 0%     507µs ± 0%   -1.22%  (p=0.008 n=5+5)
GCD10000x10000/WithXY-4         7.44ms ± 1%    1.00ms ± 0%  -86.58%  (p=0.008 n=5+5)
GCD10000x100000/WithoutXY-4     1.23ms ± 0%    1.23ms ± 1%     ~     (p=0.056 n=5+5)
GCD10000x100000/WithXY-4        37.3ms ± 0%     7.3ms ± 1%  -80.45%  (p=0.008 n=5+5)
GCD100000x100000/WithoutXY-4    24.2ms ± 0%    24.2ms ± 0%     ~     (p=0.841 n=5+5)
GCD100000x100000/WithXY-4        505ms ± 1%      56ms ± 1%  -88.92%  (p=0.008 n=5+5)

Change-Id: I25f42ab8c55033acb83cc32bb03c12c1963925e8
Reviewed-on: https://go-review.googlesource.com/78755
Reviewed-by: Robert Griesemer <gri@golang.org>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@TuomLarsen
Copy link
Author

An update: using Go 1.11.1 it now completes here in 1.74s (down from 9.11s)

@griesemer
Copy link
Contributor

@TuomLarsen I am going to close this. If you feel there's need for more improvement, feel free to re-open. Thanks.

@TuomLarsen
Copy link
Author

@griesemer Well, it still almost 50% slower than Python 3, which does not use any assembler subroutines. But then it is also more than 5x faster than it used to be so I'm not complaining. On the contrary so thank you @bmkessler very much for the improvement!

@pjebs
Copy link
Contributor

pjebs commented Apr 29, 2019

why is it slower than python? What does python do to be faster. Is it implemented in C?

@griesemer
Copy link
Contributor

@TuomLarsen, @pjebs I'm guessing here but I suspect Python uses GMP underneath (which is highly optimized and does use assembly cores, like we do). There's definitively more work that we could do on the Go side (e.g. faster division - there's a pending CL I believe). We have open issues for more improvements. In short, while our code is actually written entirely in Go, with some assembly support at the bottom, the Python code is very likely just a thin wrapper around GMP.

@TuomLarsen
Copy link
Author

TuomLarsen commented May 1, 2019

@griesemer Python uses their own C-only implementation of Lehmer's algorithm (the one also implemented in Go). The relevant parts in Go are written in assembly but, and now I'm guessing, the small things accumulated: not inlining assembler, passing arguments via memory and not registers, not distinguishing GCD and extended GCD, function calls apparently being slower in Go than in C, etc. I'm not complaining, just trying to come up with an explanation.

@golang golang locked and limited conversation to collaborators Apr 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants