Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image/png: use branch-free abs function in abs8 #63492

Closed
haruyama480 opened this issue Oct 10, 2023 · 7 comments
Closed

image/png: use branch-free abs function in abs8 #63492

haruyama480 opened this issue Oct 10, 2023 · 7 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance

Comments

@haruyama480
Copy link
Contributor

This is not a bug issue, but propose of optimization.
So, I change subjects of pr-templates.

overview

In the past CL, a branch-free abs was applied to optimize Paeth filters.
https://codereview.appspot.com/117290043

Applying this to abs8 in write.go also improved the latency of BenchmarkEncodeRGBA by 38%.

result

goos: darwin
goarch: arm64
pkg: image/png
                            │   old.txt   │               new.txt               │
                            │   sec/op    │   sec/op     vs base                │
Paeth-10                      2.536n ± 1%   2.534n ± 0%        ~ (p=0.378 n=10)
DecodeGray-10                 354.2µ ± 1%   354.5µ ± 0%        ~ (p=0.353 n=10)
DecodeNRGBAGradient-10        1.510m ± 0%   1.513m ± 0%   +0.16% (p=0.043 n=10)
DecodeNRGBAOpaque-10          1.212m ± 0%   1.215m ± 0%   +0.28% (p=0.011 n=10)
DecodePaletted-10             190.3µ ± 2%   190.2µ ± 2%        ~ (p=0.684 n=10)
DecodeRGB-10                  1.085m ± 1%   1.085m ± 0%        ~ (p=0.912 n=10)
DecodeInterlacing-10          1.393m ± 0%   1.391m ± 0%        ~ (p=0.052 n=10)
EncodeGray-10                 1.178m ± 0%   1.127m ± 0%   -4.39% (p=0.000 n=10)
EncodeGrayWithBufferPool-10   1.104m ± 0%   1.052m ± 0%   -4.65% (p=0.000 n=10)
EncodeNRGBOpaque-10           3.566m ± 0%   3.449m ± 1%   -3.29% (p=0.000 n=10)
EncodeNRGBA-10                3.891m ± 1%   3.727m ± 0%   -4.23% (p=0.000 n=10)
EncodePaletted-10             1.541m ± 0%   1.535m ± 0%   -0.39% (p=0.011 n=10)
EncodeRGBOpaque-10            3.567m ± 0%   3.445m ± 0%   -3.41% (p=0.000 n=10)
EncodeRGBA-10                 24.58m ± 0%   15.25m ± 1%  -37.97% (p=0.000 n=10)
geomean                       616.8µ        587.4µ        -4.75%
@gopherbot
Copy link

Change https://go.dev/cl/534375 mentions this issue: image/png: use branch-free abs function in abs8

@haruyama480
Copy link
Contributor Author

The fix is a few lines, so I created the CL.

@AlexRouSg
Copy link
Contributor

if you've tested and compared the output of your function to the old one you'll see that your "optimized" function doesn't even do the same thing on values >= 129
https://go.dev/play/p/NCH4u6HM3if

@seankhliao seankhliao added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Oct 10, 2023
@nakulbajaj
Copy link
Contributor

Given the above, the best optimization I could come up with based on a 2's complement approach was the following. Benchmark result differences weren't statistically significant compared to the branching version, so I think we should leave as-is?

func abs8(d uint8) int {
	m := uint8(int8(d) >> 7)
	return int((d ^ m) - m)
}

@haruyama480
Copy link
Contributor Author

You are right. My first CL is wrong.

It was embarrassing that I ignored the following code comment because all test passed...

The absolute value of a byte interpreted as a signed int8

@haruyama480
Copy link
Contributor Author

nakulbajaj's suggestion seems to work.
https://go.dev/play/p/MlsoDr9ZE3N

And,

Benchmark result differences weren't statistically significant compared to the branching version, so I think we should leave as-is?

Yes, I agree that. The differences include even worse cases.

goos: darwin
goarch: arm64
pkg: image/png
                            │  old1.txt   │              new1.txt              │
                            │   sec/op    │   sec/op     vs base               │
Paeth-10                      2.466n ± 1%   2.476n ± 2%       ~ (p=0.288 n=10)
DecodeGray-10                 342.3µ ± 0%   341.9µ ± 1%       ~ (p=0.529 n=10)
DecodeNRGBAGradient-10        1.467m ± 0%   1.463m ± 1%       ~ (p=0.218 n=10)
DecodeNRGBAOpaque-10          1.176m ± 1%   1.177m ± 0%       ~ (p=0.631 n=10)
DecodePaletted-10             183.7µ ± 1%   184.1µ ± 1%       ~ (p=0.353 n=10)
DecodeRGB-10                  1.049m ± 0%   1.064m ± 2%  +1.41% (p=0.000 n=10)
DecodeInterlacing-10          1.346m ± 1%   1.351m ± 1%       ~ (p=0.190 n=10)
EncodeGray-10                 1.138m ± 1%   1.196m ± 1%  +5.07% (p=0.000 n=10)
EncodeGrayWithBufferPool-10   1.072m ± 0%   1.115m ± 1%  +4.05% (p=0.000 n=10)
EncodeNRGBOpaque-10           3.448m ± 1%   3.615m ± 0%  +4.85% (p=0.000 n=10)
EncodeNRGBA-10                3.762m ± 1%   3.997m ± 1%  +6.24% (p=0.000 n=10)
EncodePaletted-10             1.496m ± 1%   1.499m ± 1%       ~ (p=0.143 n=10)
EncodeRGBOpaque-10            3.452m ± 2%   3.619m ± 2%  +4.84% (p=0.000 n=10)
EncodeRGBA-10                 23.89m ± 1%   22.77m ± 1%  -4.72% (p=0.000 n=10)
geomean                       597.4µ        606.8µ       +1.58%

@haruyama480
Copy link
Contributor Author

Thanks for your confirmation!
If you need, I'm happly to add a test for abs8.
Or, there seems to be no improvement, and I will close CL and this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

5 participants