Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: MIPS32: pass ISA level with GOMIPS #59415

Closed
wzssyqa opened this issue Apr 4, 2023 · 16 comments
Closed

proposal: MIPS32: pass ISA level with GOMIPS #59415

wzssyqa opened this issue Apr 4, 2023 · 16 comments

Comments

@wzssyqa
Copy link
Contributor

wzssyqa commented Apr 4, 2023

Currently GOMIPS accepts hardfloat(default) and softfload.

We wish that GOMIPS can also accept r2/r5.

Design:

  1. ,(comma) is used to separate options.
  2. 2 groups of options are supported
    hardfloat and softfloat
    r2 r5
  3. for hardfloat with r2/r5, FPXX is used.
    for hardfloat with none of rN, FP32 is used.
  4. For FP32, to load a float64 from memory to FPR:
    	lwc1 $fEVEN,IMM(GPR)
	nop
	lwc1 $f(EVEN+1),(IMM+4)(GPR)
	nop
  1. For FP32, to store a float64 from FPR to memory
    	swc1 $fEVEN,IMM(GPR)
	nop
	swc1 $f(EVEN+1),(IMM+4)(GPR)
	nop
  1. For FPXX, to load a float64 from memory to FPR:
	lwc1 $fEVEN,IMM(GPR)
	nop
	lw   REGTMP,(IMM+4)(GPR)
	mthc1 REGTMP, $fEVEN
  1. For FPXX, to store a float64 from memory to FPR:
	swc1 $fEVEN,IMM(GPR)
	nop
        mfhc1 REGTMP, $fEVEN
	sw   REGTMP,(IMM+4)(GPR)
@gopherbot gopherbot added this to the Proposal milestone Apr 4, 2023
@wzssyqa
Copy link
Contributor Author

wzssyqa commented Apr 4, 2023

@seankhliao
Copy link
Member

cc @golang/mips

@cherrymui
Copy link
Member

The mips32 part is redundant. Can we just use r2, r5, etc?

@wzssyqa
Copy link
Contributor Author

wzssyqa commented Apr 10, 2023

The mips32 part is redundant. Can we just use r2, r5, etc?

Yes. Modified.

@HeliC829
Copy link
Contributor

HeliC829 commented May 5, 2023

I agree with this proposal. And this proposal is also suitable for MIPS64x.

@cherrymui
Copy link
Member

@HeliC829 you may want to file a separate proposal for changing the ISA level on MIPS64. Thanks.

@HeliC829
Copy link
Contributor

HeliC829 commented May 9, 2023

I tried introduce some instructions from MIPS R2 to MIPS32. The following data shows the test results and performance improvement if we can support mips r2 on mips32x.

goos: linux
goarch: mipsle
pkg: crypto/tls
                                                 │    oldtls    │               newtls               │
                                                 │    sec/op    │   sec/op     vs base               │
CertCache/0-4                                       6.804m ± 5%   6.751m ± 2%        ~ (p=0.485 n=6)
CertCache/1-4                                       6.735m ± 3%   6.698m ± 3%        ~ (p=0.699 n=6)
CertCache/2-4                                       6.896m ± 4%   6.832m ± 3%        ~ (p=0.589 n=6)
CertCache/3-4                                       6.859m ± 4%   6.842m ± 4%        ~ (p=0.485 n=6)
HandshakeServer/RSA-4                               15.06m ± 0%   14.97m ± 0%   -0.57% (p=0.002 n=6)
HandshakeServer/ECDHE-P256-RSA/TLSv13-4             45.95m ± 0%   45.43m ± 4%        ~ (p=0.065 n=6)
HandshakeServer/ECDHE-P256-RSA/TLSv12-4             48.09m ± 0%   48.02m ± 0%   -0.16% (p=0.041 n=6)
HandshakeServer/ECDHE-P256-ECDSA-P256/TLSv13-4      37.67m ± 0%   37.09m ± 0%   -1.55% (p=0.002 n=6)
HandshakeServer/ECDHE-P256-ECDSA-P256/TLSv12-4      39.85m ± 0%   39.82m ± 0%   -0.08% (p=0.015 n=6)
HandshakeServer/ECDHE-X25519-ECDSA-P256/TLSv13-4    32.11m ± 0%   31.90m ± 0%   -0.65% (p=0.002 n=6)
HandshakeServer/ECDHE-X25519-ECDSA-P256/TLSv12-4    33.69m ± 0%   33.59m ± 0%   -0.30% (p=0.002 n=6)
HandshakeServer/ECDHE-P521-ECDSA-P521/TLSv13-4      641.4m ± 0%   641.2m ± 0%   -0.04% (p=0.041 n=6)
HandshakeServer/ECDHE-P521-ECDSA-P521/TLSv12-4      796.3m ± 0%   795.9m ± 0%   -0.04% (p=0.041 n=6)
Throughput/MaxPacket/1MB/TLSv12-4                   756.3m ± 0%   450.2m ± 0%  -40.47% (p=0.002 n=6)
Throughput/MaxPacket/1MB/TLSv13-4                   758.5m ± 0%   450.5m ± 8%  -40.61% (p=0.002 n=6)
Throughput/MaxPacket/2MB/TLSv12-4                  1501.0m ± 0%   829.8m ± 1%  -44.72% (p=0.002 n=6)
Throughput/MaxPacket/2MB/TLSv13-4                  1509.2m ± 0%   832.0m ± 0%  -44.87% (p=0.002 n=6)
Throughput/MaxPacket/4MB/TLSv12-4                    2.932 ± 0%    1.615 ± 1%  -44.92% (p=0.002 n=6)
Throughput/MaxPacket/4MB/TLSv13-4                    2.948 ± 1%    1.620 ± 0%  -45.05% (p=0.002 n=6)
Throughput/MaxPacket/8MB/TLSv12-4                    5.795 ± 0%    3.158 ± 0%  -45.50% (p=0.002 n=6)
Throughput/MaxPacket/8MB/TLSv13-4                    5.826 ± 0%    3.171 ± 2%  -45.57% (p=0.002 n=6)
Throughput/MaxPacket/16MB/TLSv12-4                  11.520 ± 0%    6.229 ± 0%  -45.93% (p=0.002 n=6)
Throughput/MaxPacket/16MB/TLSv13-4                  11.588 ± 0%    6.271 ± 0%  -45.88% (p=0.002 n=6)
Throughput/MaxPacket/32MB/TLSv12-4                   22.99 ± 0%    12.42 ± 0%  -45.97% (p=0.002 n=6)
Throughput/MaxPacket/32MB/TLSv13-4                   23.10 ± 0%    12.49 ± 1%  -45.93% (p=0.002 n=6)
Throughput/MaxPacket/64MB/TLSv12-4                   45.87 ± 0%    24.73 ± 0%  -46.09% (p=0.002 n=6)
Throughput/MaxPacket/64MB/TLSv13-4                   46.17 ± 0%    24.88 ± 0%  -46.10% (p=0.002 n=6)
Throughput/DynamicPacket/1MB/TLSv12-4               749.2m ± 0%   447.1m ± 0%  -40.33% (p=0.002 n=6)
Throughput/DynamicPacket/1MB/TLSv13-4               750.8m ± 1%   446.9m ± 1%  -40.48% (p=0.002 n=6)
Throughput/DynamicPacket/2MB/TLSv12-4              1493.8m ± 1%   826.7m ± 0%  -44.66% (p=0.002 n=6)
Throughput/DynamicPacket/2MB/TLSv13-4              1500.9m ± 1%   828.5m ± 0%  -44.80% (p=0.002 n=6)
Throughput/DynamicPacket/4MB/TLSv12-4                2.925 ± 0%    1.616 ± 0%  -44.77% (p=0.002 n=6)
Throughput/DynamicPacket/4MB/TLSv13-4                2.940 ± 0%    1.623 ± 0%  -44.82% (p=0.002 n=6)
Throughput/DynamicPacket/8MB/TLSv12-4                5.786 ± 0%    3.155 ± 0%  -45.47% (p=0.002 n=6)
Throughput/DynamicPacket/8MB/TLSv13-4                5.821 ± 0%    3.175 ± 0%  -45.44% (p=0.002 n=6)
Throughput/DynamicPacket/16MB/TLSv12-4              11.514 ± 0%    6.253 ± 0%  -45.69% (p=0.002 n=6)
Throughput/DynamicPacket/16MB/TLSv13-4              11.576 ± 0%    6.271 ± 1%  -45.82% (p=0.002 n=6)
Throughput/DynamicPacket/32MB/TLSv12-4               22.97 ± 0%    12.41 ± 0%  -45.95% (p=0.002 n=6)
Throughput/DynamicPacket/32MB/TLSv13-4               23.09 ± 0%    12.47 ± 0%  -45.99% (p=0.002 n=6)
Throughput/DynamicPacket/64MB/TLSv12-4               45.90 ± 1%    24.78 ± 0%  -46.01% (p=0.002 n=6)
Throughput/DynamicPacket/64MB/TLSv13-4               46.12 ± 0%    24.87 ± 0%  -46.06% (p=0.002 n=6)
Latency/MaxPacket/200kbps/TLSv12-4                  779.1m ± 0%   772.3m ± 0%   -0.87% (p=0.002 n=6)
Latency/MaxPacket/200kbps/TLSv13-4                  769.9m ± 0%   763.4m ± 0%   -0.84% (p=0.002 n=6)
Latency/MaxPacket/500kbps/TLSv12-4                  363.2m ± 0%   356.7m ± 0%   -1.79% (p=0.002 n=6)
Latency/MaxPacket/500kbps/TLSv13-4                  352.7m ± 0%   346.2m ± 0%   -1.85% (p=0.002 n=6)
Latency/MaxPacket/1000kbps/TLSv12-4                 224.7m ± 0%   218.3m ± 0%   -2.85% (p=0.002 n=6)
Latency/MaxPacket/1000kbps/TLSv13-4                 219.9m ± 0%   213.3m ± 0%   -3.03% (p=0.002 n=6)
Latency/MaxPacket/2000kbps/TLSv12-4                 155.6m ± 0%   149.0m ± 0%   -4.23% (p=0.002 n=6)
Latency/MaxPacket/2000kbps/TLSv13-4                 154.4m ± 1%   147.6m ± 0%   -4.37% (p=0.002 n=6)
Latency/MaxPacket/5000kbps/TLSv12-4                 113.6m ± 1%   107.1m ± 0%   -5.70% (p=0.002 n=6)
Latency/MaxPacket/5000kbps/TLSv13-4                 115.0m ± 0%   108.1m ± 0%   -6.06% (p=0.002 n=6)
Latency/DynamicPacket/200kbps/TLSv12-4              208.6m ± 0%   206.9m ± 0%   -0.83% (p=0.002 n=6)
Latency/DynamicPacket/200kbps/TLSv13-4              197.3m ± 0%   195.7m ± 0%   -0.81% (p=0.002 n=6)
Latency/DynamicPacket/500kbps/TLSv12-4              129.5m ± 0%   127.7m ± 0%   -1.37% (p=0.002 n=6)
Latency/DynamicPacket/500kbps/TLSv13-4              116.4m ± 0%   114.9m ± 0%   -1.27% (p=0.002 n=6)
Latency/DynamicPacket/1000kbps/TLSv12-4             103.4m ± 0%   101.5m ± 0%   -1.81% (p=0.002 n=6)
Latency/DynamicPacket/1000kbps/TLSv13-4             96.17m ± 1%   94.43m ± 0%   -1.81% (p=0.002 n=6)
Latency/DynamicPacket/2000kbps/TLSv12-4             90.39m ± 1%   88.33m ± 0%   -2.28% (p=0.002 n=6)
Latency/DynamicPacket/2000kbps/TLSv13-4             86.59m ± 1%   84.82m ± 1%   -2.05% (p=0.002 n=6)
Latency/DynamicPacket/5000kbps/TLSv12-4             82.09m ± 0%   80.19m ± 0%   -2.32% (p=0.002 n=6)
Latency/DynamicPacket/5000kbps/TLSv13-4             80.72m ± 1%   79.29m ± 1%   -1.77% (p=0.002 n=6)
geomean                                             608.8m        459.2m       -24.57%

                                       │    oldtls    │               newtls                │
                                       │     B/s      │     B/s       vs base               │
Throughput/MaxPacket/1MB/TLSv12-4        1.326Mi ± 0%   2.222Mi ± 0%  +67.63% (p=0.002 n=6)
Throughput/MaxPacket/1MB/TLSv13-4        1.316Mi ± 0%   2.222Mi ± 8%  +68.84% (p=0.002 n=6)
Throughput/MaxPacket/2MB/TLSv12-4        1.335Mi ± 0%   2.413Mi ± 1%  +80.71% (p=0.002 n=6)
Throughput/MaxPacket/2MB/TLSv13-4        1.326Mi ± 0%   2.403Mi ± 0%  +81.29% (p=0.002 n=6)
Throughput/MaxPacket/4MB/TLSv12-4        1.364Mi ± 0%   2.480Mi ± 1%  +81.82% (p=0.002 n=6)
Throughput/MaxPacket/4MB/TLSv13-4        1.354Mi ± 1%   2.470Mi ± 0%  +82.39% (p=0.002 n=6)
Throughput/MaxPacket/8MB/TLSv12-4        1.383Mi ± 0%   2.537Mi ± 0%  +83.45% (p=0.002 n=6)
Throughput/MaxPacket/8MB/TLSv13-4        1.373Mi ± 0%   2.522Mi ± 2%  +83.68% (p=0.002 n=6)
Throughput/MaxPacket/16MB/TLSv12-4       1.392Mi ± 0%   2.565Mi ± 0%  +84.25% (p=0.002 n=6)
Throughput/MaxPacket/16MB/TLSv13-4       1.383Mi ± 0%   2.551Mi ± 0%  +84.48% (p=0.002 n=6)
Throughput/MaxPacket/32MB/TLSv12-4       1.392Mi ± 0%   2.575Mi ± 0%  +84.93% (p=0.002 n=6)
Throughput/MaxPacket/32MB/TLSv13-4       1.383Mi ± 0%   2.561Mi ± 1%  +85.17% (p=0.002 n=6)
Throughput/MaxPacket/64MB/TLSv12-4       1.392Mi ± 0%   2.584Mi ± 0%  +85.62% (p=0.002 n=6)
Throughput/MaxPacket/64MB/TLSv13-4       1.383Mi ± 0%   2.575Mi ± 0%  +86.21% (p=0.002 n=6)
Throughput/DynamicPacket/1MB/TLSv12-4    1.335Mi ± 0%   2.241Mi ± 0%  +67.86% (p=0.002 n=6)
Throughput/DynamicPacket/1MB/TLSv13-4    1.335Mi ± 1%   2.241Mi ± 1%  +67.86% (p=0.002 n=6)
Throughput/DynamicPacket/2MB/TLSv12-4    1.335Mi ± 1%   2.422Mi ± 0%  +81.43% (p=0.002 n=6)
Throughput/DynamicPacket/2MB/TLSv13-4    1.335Mi ± 1%   2.413Mi ± 0%  +80.71% (p=0.002 n=6)
Throughput/DynamicPacket/4MB/TLSv12-4    1.364Mi ± 0%   2.480Mi ± 0%  +81.82% (p=0.002 n=6)
Throughput/DynamicPacket/4MB/TLSv13-4    1.364Mi ± 1%   2.465Mi ± 0%  +80.77% (p=0.002 n=6)
Throughput/DynamicPacket/8MB/TLSv12-4    1.383Mi ± 0%   2.537Mi ± 0%  +83.45% (p=0.002 n=6)
Throughput/DynamicPacket/8MB/TLSv13-4    1.373Mi ± 1%   2.518Mi ± 0%  +83.33% (p=0.002 n=6)
Throughput/DynamicPacket/16MB/TLSv12-4   1.392Mi ± 0%   2.556Mi ± 1%  +83.56% (p=0.002 n=6)
Throughput/DynamicPacket/16MB/TLSv13-4   1.383Mi ± 0%   2.551Mi ± 1%  +84.48% (p=0.002 n=6)
Throughput/DynamicPacket/32MB/TLSv12-4   1.392Mi ± 0%   2.580Mi ± 0%  +85.27% (p=0.002 n=6)
Throughput/DynamicPacket/32MB/TLSv13-4   1.383Mi ± 0%   2.565Mi ± 0%  +85.52% (p=0.002 n=6)
Throughput/DynamicPacket/64MB/TLSv12-4   1.392Mi ± 1%   2.584Mi ± 0%  +85.62% (p=0.002 n=6)
Throughput/DynamicPacket/64MB/TLSv13-4   1.392Mi ± 1%   2.575Mi ± 0%  +84.93% (p=0.002 n=6)
geomean                                  1.366Mi        2.476Mi       +81.23%
goos: linux
goarch: mipsle
pkg: math/bits
                  │   oldbits    │               newbits                │
                  │    sec/op    │   sec/op     vs base                 │
LeadingZeros-4      6.429n ±  0%   6.428n ± 0%        ~ (p=0.671 n=6)
LeadingZeros8-4     7.033n ±  0%   7.038n ± 0%        ~ (p=0.288 n=6)
LeadingZeros16-4    7.032n ±  0%   7.038n ± 0%        ~ (p=0.119 n=6)
LeadingZeros32-4    6.397n ±  0%   6.410n ± 0%   +0.20% (p=0.013 n=6)
LeadingZeros64-4    17.07n ±  3%   17.08n ± 0%        ~ (p=0.141 n=6)
TrailingZeros-4     8.211n ±  1%   8.055n ± 0%   -1.90% (p=0.002 n=6)
TrailingZeros8-4    9.038n ±  0%   9.042n ± 0%   +0.05% (p=0.006 n=6)
TrailingZeros16-4   11.05n ±  0%   11.07n ± 0%        ~ (p=0.061 n=6)
TrailingZeros32-4   8.180n ±  1%   8.053n ± 0%   -1.55% (p=0.002 n=6)
TrailingZeros64-4   21.09n ±  0%   21.10n ± 0%        ~ (p=0.123 n=6)
OnesCount-4         16.07n ±  0%   16.08n ± 0%        ~ (p=0.448 n=6)
OnesCount8-4        5.523n ±  0%   5.028n ± 0%   -8.96% (p=0.002 n=6)
OnesCount16-4       11.64n ±  1%   11.66n ± 0%        ~ (p=0.165 n=6)
OnesCount32-4       14.06n ±  0%   14.07n ± 0%        ~ (p=0.167 n=6)
OnesCount64-4       43.20n ±  0%   43.23n ± 0%   +0.07% (p=0.011 n=6)
RotateLeft-4        8.035n ±  0%   4.271n ± 0%  -46.85% (p=0.002 n=6)
RotateLeft8-4       7.030n ±  2%   7.032n ± 0%        ~ (p=0.738 n=6)
RotateLeft16-4      7.030n ±  0%   7.030n ± 0%        ~ (p=0.502 n=6)
RotateLeft32-4      7.032n ±  0%   4.018n ± 0%  -42.86% (p=0.002 n=6)
RotateLeft64-4      15.07n ±  0%   15.07n ± 0%        ~ (p=0.076 n=6)
Reverse-4           24.11n ±  0%   24.11n ± 0%        ~ (p=0.182 n=6)
Reverse8-4          4.018n ±  0%   4.020n ± 0%        ~ (p=1.000 n=6)
Reverse16-4         7.036n ±  0%   7.282n ± 0%   +3.49% (p=0.002 n=6)
Reverse32-4         22.82n ±  0%   23.11n ± 0%   +1.25% (p=0.002 n=6)
Reverse64-4         52.24n ±  0%   52.25n ± 0%        ~ (p=0.602 n=6)
ReverseBytes-4      9.038n ±  0%   9.040n ± 0%   +0.02% (p=0.032 n=6)
ReverseBytes16-4    4.256n ± 12%   5.022n ± 0%  +18.00% (p=0.002 n=6)
ReverseBytes32-4    10.04n ±  0%   10.04n ± 0%        ~ (p=0.545 n=6)
ReverseBytes64-4    18.08n ±  0%   18.08n ± 0%        ~ (p=0.773 n=6)
Add-4               6.027n ±  0%   5.049n ± 2%  -16.23% (p=0.002 n=6)
Add32-4             6.027n ±  0%   6.352n ± 0%   +5.38% (p=0.002 n=6)
Add64-4             11.05n ±  0%   11.05n ± 0%        ~ (p=1.000 n=6) ¹
Add64multiple-4     23.10n ±  0%   23.10n ± 0%        ~ (p=0.455 n=6)
Sub-4               8.537n ±  0%   8.537n ± 0%        ~ (p=0.870 n=6)
Sub32-4             8.536n ±  0%   8.537n ± 0%        ~ (p=0.162 n=6)
Sub64-4             14.12n ±  1%   12.05n ± 0%  -14.69% (p=0.002 n=6)
Sub64multiple-4     25.11n ±  0%   25.11n ± 0%        ~ (p=1.000 n=6)
Mul-4               5.023n ±  0%   5.021n ± 0%        ~ (p=0.450 n=6)
Mul32-4             4.688n ±  0%   4.268n ± 0%   -8.95% (p=0.002 n=6)
Mul64-4             143.6n ±  0%   143.6n ± 0%        ~ (p=0.455 n=6)
Div-4               333.4n ±  0%   322.9n ± 0%   -3.15% (p=0.002 n=6)
Div32-4             321.9n ±  0%   322.4n ± 0%   +0.16% (p=0.002 n=6)
Div64-4             717.2n ±  1%   716.6n ± 0%        ~ (p=0.310 n=6)
geomean             14.29n         13.80n        -3.44%
goos: linux
goarch: mipsle
pkg: crypto/hmac
                │   oldhmac   │              newhmac              │
                │   sec/op    │   sec/op     vs base              │
HMACSHA256_1K-4   66.18µ ± 0%   59.79µ ± 0%  -9.66% (p=0.002 n=6)
HMACSHA256_32-4   15.28µ ± 0%   14.60µ ± 0%  -4.49% (p=0.002 n=6)
NewWriteSum-4     27.69µ ± 0%   26.29µ ± 1%  -5.06% (p=0.002 n=6)
geomean           30.37µ        28.41µ       -6.43%

                │   oldhmac    │               newhmac               │
                │     B/s      │     B/s       vs base               │
HMACSHA256_1K-4   14.75Mi ± 0%   16.34Mi ± 0%  +10.73% (p=0.002 n=6)
HMACSHA256_32-4   1.993Mi ± 0%   2.089Mi ± 0%   +4.78% (p=0.002 n=6)
NewWriteSum-4     1.106Mi ± 1%   1.163Mi ± 1%   +5.17% (p=0.002 n=6)
geomean           3.192Mi        3.411Mi        +6.86%
goos: linux
goarch: mipsle
pkg: crypto/sha1
                   │   oldsha1   │              newsha1              │
                   │   sec/op    │   sec/op     vs base              │
Hash8Bytes/New-4     4.652µ ± 0%   4.499µ ± 0%  -3.29% (p=0.002 n=6)
Hash8Bytes/Sum-4     4.913µ ± 0%   4.758µ ± 0%  -3.15% (p=0.002 n=6)
Hash320Bytes/New-4   15.06µ ± 0%   14.16µ ± 0%  -5.99% (p=0.002 n=6)
Hash320Bytes/Sum-4   15.30µ ± 0%   14.39µ ± 0%  -5.89% (p=0.002 n=6)
Hash1K/New-4         37.91µ ± 0%   35.39µ ± 0%  -6.67% (p=0.002 n=6)
Hash1K/Sum-4         38.17µ ± 1%   35.62µ ± 0%  -6.69% (p=0.002 n=6)
Hash8K/New-4         270.6µ ± 9%   251.4µ ± 0%  -7.08% (p=0.002 n=6)
Hash8K/Sum-4         270.9µ ± 0%   251.6µ ± 0%  -7.12% (p=0.002 n=6)
geomean              29.40µ        27.71µ       -5.75%

                   │   oldsha1    │              newsha1               │
                   │     B/s      │     B/s       vs base              │
Hash8Bytes/New-4     1.640Mi ± 0%   1.698Mi ± 0%  +3.49% (p=0.002 n=6)
Hash8Bytes/Sum-4     1.554Mi ± 0%   1.602Mi ± 0%  +3.07% (p=0.002 n=6)
Hash320Bytes/New-4   20.26Mi ± 0%   21.56Mi ± 0%  +6.40% (p=0.002 n=6)
Hash320Bytes/Sum-4   19.95Mi ± 0%   21.20Mi ± 0%  +6.26% (p=0.002 n=6)
Hash1K/New-4         25.76Mi ± 0%   27.60Mi ± 0%  +7.15% (p=0.002 n=6)
Hash1K/Sum-4         25.58Mi ± 1%   27.42Mi ± 0%  +7.18% (p=0.002 n=6)
Hash8K/New-4         28.87Mi ± 8%   31.08Mi ± 0%  +7.63% (p=0.002 n=6)
Hash8K/Sum-4         28.84Mi ± 0%   31.05Mi ± 0%  +7.66% (p=0.002 n=6)
geomean              12.42Mi        13.17Mi       +6.09%
goos: linux
goarch: mipsle
pkg: crypto/sha256
                    │  oldsha256  │             newsha256             │
                    │   sec/op    │   sec/op     vs base              │
Hash8Bytes/New-4      6.720µ ± 0%   6.411µ ± 0%  -4.61% (p=0.002 n=6)
Hash8Bytes/Sum224-4   7.153µ ± 0%   6.845µ ± 0%  -4.31% (p=0.002 n=6)
Hash8Bytes/Sum256-4   7.196µ ± 0%   6.885µ ± 0%  -4.33% (p=0.002 n=6)
Hash1K/New-4          57.62µ ± 1%   52.36µ ± 0%  -9.14% (p=0.002 n=6)
Hash1K/Sum224-4       58.02µ ± 0%   52.77µ ± 0%  -9.05% (p=0.002 n=6)
Hash1K/Sum256-4       58.10µ ± 0%   52.83µ ± 0%  -9.07% (p=0.002 n=6)
Hash8K/New-4          411.7µ ± 0%   371.6µ ± 0%  -9.72% (p=0.002 n=6)
Hash8K/Sum224-4       411.9µ ± 0%   372.1µ ± 0%  -9.68% (p=0.002 n=6)
Hash8K/Sum256-4       412.2µ ± 0%   372.1µ ± 0%  -9.71% (p=0.002 n=6)
geomean               55.12µ        50.84µ       -7.76%

                    │  oldsha256   │              newsha256              │
                    │     B/s      │     B/s       vs base               │
Hash8Bytes/New-4      1.135Mi ± 0%   1.192Mi ± 0%   +5.04% (p=0.002 n=6)
Hash8Bytes/Sum224-4   1.068Mi ± 0%   1.116Mi ± 0%   +4.46% (p=0.002 n=6)
Hash8Bytes/Sum256-4   1.059Mi ± 0%   1.106Mi ± 0%   +4.50% (p=0.002 n=6)
Hash1K/New-4          16.95Mi ± 1%   18.65Mi ± 0%  +10.07% (p=0.002 n=6)
Hash1K/Sum224-4       16.83Mi ± 0%   18.51Mi ± 0%   +9.97% (p=0.002 n=6)
Hash1K/Sum256-4       16.80Mi ± 0%   18.48Mi ± 0%   +9.99% (p=0.002 n=6)
Hash8K/New-4          18.98Mi ± 0%   21.02Mi ± 0%  +10.75% (p=0.002 n=6)
Hash8K/Sum224-4       18.97Mi ± 0%   21.00Mi ± 0%  +10.68% (p=0.002 n=6)
Hash8K/Sum256-4       18.96Mi ± 0%   21.00Mi ± 0%  +10.74% (p=0.002 n=6)
geomean               7.031Mi        7.624Mi        +8.43%

@randall77
Copy link
Contributor

@HeliC829 Are you putting those new instructions in assembly? Or is the compiler generating them?
For the former, would it be enough to have a runtime switch? Other architectures do that (e.g., see crypto/sha1/sha1block_amd64.go).
This issue is more about the latter, giving the compiler the ability to pick new instructions.

@HeliC829
Copy link
Contributor

HeliC829 commented May 9, 2023

@HeliC829 Are you putting those new instructions in assembly? Or is the compiler generating them? For the former, would it be enough to have a runtime switch? Other architectures do that (e.g., see crypto/sha1/sha1block_amd64.go). This issue is more about the latter, giving the compiler the ability to pick new instructions.

Those new instructions were generated by my local compiler.

For this proposal, It's not enough to just have a runtime switch. I think this way is only suitable for some specific algorithm can run efficiently with advanced instructions set like avx2/sve2, and it also must be located golang source code. If we just use runtime switch, other encryption algorithm not located in golang source codelike chacha20 (CL 301711) can't get performance improvement by higher isa level.

Besides, MIPS R2 also have some instruction like loading float-point indexed are waiting to be added. Those instructions can't be generated and used by runtime switch. I showed math/bits crypto/tls crypto/sha1 crypto/sha256 ... test results above because they show obviously performance improvement.

Thanks a lot.

@randall77
Copy link
Contributor

Those new instructions were generated by my local compiler.

Which instructions are you adding?

@HeliC829
Copy link
Contributor

HeliC829 commented May 9, 2023

Those new instructions were generated by my local compiler.

Which instructions are you adding?

for MIPS32: WSBH/ROTR/RORTV
for MIPS64: WSBH/DSBH/DSHD/ROTR/RORTV/DROTR/DROTRV/CLZ/DCLZ

Ref: The MIPS64® Instruction Set, Revision 5.04

@randall77
Copy link
Contributor

Ok, so byte swaps, rotates, and leading zeroes.

@rsc
Copy link
Contributor

rsc commented Jun 7, 2023

It sounds like the proposal is that GOMIPS is now a comma-separated list of the following choices:

  • hardfloat: assume hardware floating point
  • softfloat: use software floating point
  • r2: ???
  • r5: ???

hardfloat and softfloat are mutually exclusive and presumably r2 and r5 are as well. Can someone define r2 and r5? It sounds like they correspond to what gcc calls MIPS32R2 and MIPS32R5 in their documentation. Are those the official names?

@rsc
Copy link
Contributor

rsc commented Jun 7, 2023

From the doc linked in #60072:

Screenshot 2023-06-07 at 1 56 15 PM

@rsc
Copy link
Contributor

rsc commented Jun 7, 2023

Let's have one discussion. Marking this a duplicate of #60072.

@rsc
Copy link
Contributor

rsc commented Jun 7, 2023

This proposal is a duplicate of a previously discussed proposal, as noted above,
and there is no significant new information to justify reopening the discussion.
The issue has therefore been declined as a duplicate.
— rsc for the proposal review group

@rsc rsc closed this as completed Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Declined
Development

No branches or pull requests

7 participants