runtime: significant number of bounds checks in growslice #60071

egonelbre · 2023-05-09T09:04:23Z

While inspecting the runtime.growslice code I noticed that it contains a lot of bounds checks. They all come from runtime.roundupsize.

What version of Go are you using (`go version`)?

$ go version
go version devel go1.21-8d5065ce6e Tue May 9 11:45:59 2023 +0300 windows/amd64

However also happens with go1.20.4.

Does this issue reproduce with the latest release?

Yes and with tip.

What did you do?

Inspecting runtime.growslice and seeing:

$ go tool objdump -s runtime.growslice ./somebinary
...
  0x445f0f              488d052a320500          LEAQ type:*+45376(SB), AX
  0x445f16              488d1dfb3c0800          LEAQ runtime.buildVersion.str+720(SB), BX
  0x445f1d              0f1f00                  NOPL 0(AX)
  0x445f20              e87bd0feff              CALL runtime.gopanic(SB)
                        return uintptr(class_to_size[size_to_class128[divRoundUp(size-smallSizeMax, largeSizeDiv)]])
  0x445f25              b944000000              MOVL $0x44, CX
  0x445f2a              e891860100              CALL runtime.panicIndex(SB)
  0x445f2f              b9f9000000              MOVL $0xf9, CX
  0x445f34              e8a7860100              CALL runtime.panicIndexU(SB)
                        return uintptr(class_to_size[size_to_class8[divRoundUp(size, smallSizeDiv)]])
  0x445f39              b944000000              MOVL $0x44, CX
  0x445f3e              6690                    NOPW
  0x445f40              e87b860100              CALL runtime.panicIndex(SB)
  0x445f45              b981000000              MOVL $0x81, CX
  0x445f4a              e891860100              CALL runtime.panicIndexU(SB)
                        return uintptr(class_to_size[size_to_class128[divRoundUp(size-smallSizeMax, largeSizeDiv)]])
...

What did you expect to see?

I'm not sure how much they affect performance, but triggering that many in roundupsize does seem a bit concerning.

I tried to do the trivial patch with zero-filling the tables to exactly 256 elements https://go-review.googlesource.com/c/go/+/493796, however, I'm not convinced that this is the best way to address this. Either way, I'm willing to take a stab at fixing it.

The text was updated successfully, but these errors were encountered:

gopherbot · 2023-05-09T09:25:12Z

Change https://go.dev/cl/493796 mentions this issue: runtime: avoid bounds checks in roundup

cherrymui · 2023-05-09T16:13:36Z

Do you have some benchmark results for how much the CL helps? Thanks.

egonelbre · 2023-05-09T17:54:49Z

So, benchmarking all the ^Append things in runtime:

goos: windows
goarch: amd64
pkg: runtime
cpu: AMD Ryzen Threadripper 2950X 16-Core Processor
                                 │ append_baseline.txt~ │         append_roundup.txt~         │
                                 │        sec/op        │   sec/op     vs base                │
Append-32                                   12.29n ± 2%   15.56n ± 1%  +26.65% (p=0.000 n=10)
AppendGrowByte-32                           1.869m ± 3%   1.902m ± 4%        ~ (p=0.247 n=10)
AppendGrowString-32                         41.46m ± 5%   38.72m ± 3%   -6.61% (p=0.002 n=10)
AppendSlice/1Bytes-32                       2.197n ± 1%   2.214n ± 1%   +0.80% (p=0.004 n=10)
AppendSlice/4Bytes-32                       1.950n ± 0%   1.966n ± 1%   +0.79% (p=0.001 n=10)
AppendSlice/7Bytes-32                       2.271n ± 0%   2.282n ± 1%   +0.53% (p=0.008 n=10)
AppendSlice/8Bytes-32                       1.991n ± 1%   2.012n ± 1%   +1.05% (p=0.006 n=10)
AppendSlice/15Bytes-32                      2.285n ± 0%   2.293n ± 1%        ~ (p=0.101 n=10)
AppendSlice/16Bytes-32                      1.989n ± 1%   2.002n ± 0%   +0.65% (p=0.000 n=10)
AppendSlice/32Bytes-32                      2.191n ± 0%   2.200n ± 1%   +0.41% (p=0.033 n=10)
AppendSliceLarge/1024Bytes-32               339.0n ± 3%   338.3n ± 1%        ~ (p=0.684 n=10)
AppendSliceLarge/4096Bytes-32               1.275µ ± 3%   1.284µ ± 1%        ~ (p=0.591 n=10)
AppendSliceLarge/16384Bytes-32              4.261µ ± 5%   4.348µ ± 1%   +2.03% (p=0.035 n=10)
AppendSliceLarge/65536Bytes-32              13.44µ ± 3%   13.04µ ± 6%        ~ (p=0.089 n=10)
AppendSliceLarge/262144Bytes-32             50.48µ ± 3%   48.35µ ± 2%   -4.21% (p=0.000 n=10)
AppendSliceLarge/1048576Bytes-32            161.5µ ± 6%   154.5µ ± 3%   -4.30% (p=0.029 n=10)
AppendStr/1Bytes-32                         2.331n ± 0%   2.334n ± 0%        ~ (p=0.423 n=10)
AppendStr/4Bytes-32                         1.998n ± 0%   2.001n ± 1%        ~ (p=0.224 n=10)
AppendStr/8Bytes-32                         2.034n ± 1%   2.045n ± 1%        ~ (p=0.107 n=10)
AppendStr/16Bytes-32                        2.248n ± 1%   2.251n ± 0%        ~ (p=0.323 n=10)
AppendStr/32Bytes-32                        2.353n ± 1%   2.356n ± 0%        ~ (p=0.956 n=10)
AppendSpecialCase-32                        10.22n ± 0%   10.21n ± 0%        ~ (p=0.194 n=10)
AppendInPlace/NoGrow/Byte-32                400.1n ± 1%   403.9n ± 1%   +0.96% (p=0.007 n=10)
AppendInPlace/NoGrow/1Ptr-32                1.279µ ± 2%   1.275µ ± 2%        ~ (p=0.868 n=10)
AppendInPlace/NoGrow/2Ptr-32                2.160µ ± 2%   2.173µ ± 1%        ~ (p=0.288 n=10)
AppendInPlace/NoGrow/3Ptr-32                2.649µ ± 3%   2.666µ ± 3%        ~ (p=0.700 n=10)
AppendInPlace/NoGrow/4Ptr-32                3.985µ ± 3%   4.073µ ± 2%        ~ (p=0.143 n=10)
AppendInPlace/Grow/Byte-32                  228.8n ± 2%   221.9n ± 2%   -2.99% (p=0.001 n=10)
AppendInPlace/Grow/1Ptr-32                  238.3n ± 2%   223.7n ± 1%   -6.13% (p=0.000 n=10)
AppendInPlace/Grow/2Ptr-32                  329.3n ± 2%   323.7n ± 2%   -1.72% (p=0.009 n=10)
AppendInPlace/Grow/3Ptr-32                  397.8n ± 3%   414.2n ± 1%   +4.14% (p=0.000 n=10)
AppendInPlace/Grow/4Ptr-32                  465.8n ± 4%   472.3n ± 3%        ~ (p=0.239 n=10)
geomean                                     176.1n        176.9n        +0.42%

I haven't yet dug into the reasons for things getting slower.

It might also make sense to try and see how it will interact with #49480 and https://go-review.googlesource.com/c/go/+/493836.

egonelbre · 2023-05-09T18:30:01Z

Hmm, maybe it's due to needing to convert the division result into a uint8 and hence ending up emitting a MOVZX:

  msize.go:16		0x4444b1		400fb6ff		MOVZX DI, DI					
  msize.go:16		0x4444b5		4c8d0504c80d00		LEAQ runtime.size_to_class8(SB), R8		
  msize.go:16		0x4444bc		420fb63c07		MOVZX 0(DI)(R8*1), DI				
  msize.go:16		0x4444c1		4c8d0538cc0d00		LEAQ runtime.class_to_size(SB), R8		
  msize.go:16		0x4444c8		410fb73c78		MOVZX 0(R8)(DI*2), DI

Also, maybe it's possible to use a single lookup instead of a two lookup.

egonelbre · 2023-05-09T19:39:50Z

I guess one option would be to teach BCE about division and calculations, such as for https://go.godbolt.org/z/ccf4Gbs8E. But, I have no clue how complicated that would be.

package runtime

const (
	_PageSize       = 4*1024
	_MaxSmallSize   = 32768
	smallSizeDiv    = 8
	smallSizeMax    = 1024
	largeSizeDiv    = 128
)

func roundupsize(size uintptr) uintptr {
	if size < _MaxSmallSize {
		if size <= smallSizeMax-8 {
			// here we know upper bounds:
			//     size = 1024-8 = 1016
			//     (n + a - 1) / a) = (1016 + 8 - 1) / 8 = 127 // guaranteed to not overflow
			//     smallSizeMax/smallSizeDiv + 1 =  129
			return uintptr(size_to_class_size[divRoundUp(size, smallSizeDiv)])
		} else {
			// similarly here
			return uintptr(size_to_class128_size[divRoundUp(size-smallSizeMax, largeSizeDiv)])
		}
	}
	if size+_PageSize < size {
		return size
	}
	return alignUp(size, _PageSize)
}

// alignUp rounds n up to a multiple of a. a must be a power of 2.
func alignUp(n, a uintptr) uintptr {
	return (n + a - 1) &^ (a - 1)
}

// divRoundUp returns ceil(n / a).
func divRoundUp(n, a uintptr) uintptr {
	// a is generally a power of two. This will get inlined and
	// the compiler will optimize the division.
	return (n + a - 1) / a
}

var size_to_class_size = [smallSizeMax/smallSizeDiv + 1]uint16{}
var size_to_class128_size = [(_MaxSmallSize-smallSizeMax)/largeSizeDiv + 1]uint16{}

egonelbre · 2023-05-17T12:49:04Z

I haven't yet dug into the reasons for things getting slower.

My best guess is that the performance is very sensitive to code alignment on Threadripper 2950X amd64 Windows... for some reason. I'm seeing Append benchmark vary between 8ns, 12ns, 15ns depending on the commit, which don't seem directly impact the code.

Anyways, I submitted the basic change.

egonelbre · 2023-05-19T17:15:34Z

From my experiments and benchmarks, it seems that the benefit is at most ~2ns... and almost imperceptible due to code alignment affecting benchmark results about the same.

So, it seems there's no performance win here. However, supporting division in BCE might be a good idea nevertheless.

gopherbot added the compiler/runtime label May 9, 2023

cherrymui added Performance NeedsInvestigation labels May 9, 2023

cherrymui added this to the Unplanned milestone May 9, 2023

gabyhelp mentioned this issue Apr 3, 2025

cmd/compile: unnecessary bounds check when slicing after using slices.Grow #73145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: significant number of bounds checks in growslice #60071

runtime: significant number of bounds checks in growslice #60071

egonelbre commented May 9, 2023

gopherbot commented May 9, 2023

cherrymui commented May 9, 2023

egonelbre commented May 9, 2023

egonelbre commented May 9, 2023 •

edited

Loading

egonelbre commented May 9, 2023 •

edited

Loading

egonelbre commented May 17, 2023 •

edited

Loading

egonelbre commented May 19, 2023

runtime: significant number of bounds checks in growslice #60071

runtime: significant number of bounds checks in growslice #60071

Comments

egonelbre commented May 9, 2023

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What did you do?

What did you expect to see?

gopherbot commented May 9, 2023

cherrymui commented May 9, 2023

egonelbre commented May 9, 2023

egonelbre commented May 9, 2023 • edited Loading

egonelbre commented May 9, 2023 • edited Loading

egonelbre commented May 17, 2023 • edited Loading

egonelbre commented May 19, 2023

What version of Go are you using (`go version`)?

egonelbre commented May 9, 2023 •

edited

Loading

egonelbre commented May 9, 2023 •

edited

Loading

egonelbre commented May 17, 2023 •

edited

Loading