bytes, strings: optimize Trim for ASCII cutsets #46446

dsnet · 2021-05-29T18:32:10Z

I analyzed usages of strings.Trim (and related functions in both strings and bytes) in all source code known by the module proxy:

76.6% have a cutset of len=1, and
13.4% have a cutset of len=2.

Given that a vast majority of usages only have a cutset of len=1, I argue we should more heavily optimize for that situation. Currently, there is some optimization for cutsets of len=1, but it’s within the internal makeCutsetFunc function. This is sub-optimal as it incurs an allocation in the internal makeCutsetFunc for the closure over that single byte. I believe we should place special handling of cutsets with len=1 directly in Trim, TrimRight, and TrimLeft.

Example benchmark for strings.TrimRight("hello==", "="):

BenchmarkStd   	18299240	        69.16 ns/op	      24 B/op	       1 allocs/op
BenchmarkOpt   	226575349	         5.330 ns/op	       0 B/op	       0 allocs/op

The suggested optimization results in a >10x speedup for this common case.

P.S. There is currently an optimization for cutsets that are all ASCII. This optimization is still justified to keep as 99.3% of all cutsets are pure ASCII.

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2021-05-29T21:03:03Z

Out of curiosity, is there any particular byte that predominantly appears in the set of single-byte cutsets?

dsnet · 2021-05-30T00:27:01Z

Out of curiosity, is there any particular byte that predominantly appears in the set of single-byte cutsets?

The top-ten single-byte cutsets are:

'/' at 30.2%
'\n' at 15.7%
' ' at 10.1%
'.' at 7.0%
'\x00' at 5.7%
'0' at 5.0%
',' at 3.5%
'=' at 3.4%
'-' at 2.6%
'|' at 2.5%
everything else at 14.3%

dsnet · 2021-05-30T02:07:51Z

https://go-review.googlesource.com/c/go/+/323315 is an example of speedup in application code if bytes.Trim were to be optimized.

gopherbot · 2021-05-30T02:18:29Z

Change https://golang.org/cl/323318 mentions this issue: bytes, strings: optimize Trim for single byte cutsets

nightlyone · 2021-05-30T07:17:27Z

Maybe is is useful to do the same optimization as LastIndexAny and IndexAny where the cutover to use a cutset is 8 bytes.

dsnet · 2021-05-30T07:52:12Z

@nightlyone, can you elaborate more what you have in mind?

Currently, the logic for LastIndexAny is already fairly complex. Any optimizations of LastIndexAny should be driven by how Go code at scale uses it, which is an analysis I haven't performed yet.

I should note that the we could conceivably delete the optimization in LastIndexAny for a cutset with len=1, which was added in golang.org/cl/156998. I don't find that specific optimization particularly compelling since:

LastIndexByte already exists, and
a previous commit by mine indicates that only 6% of usages have 1 character cutsets.

josharian · 2021-06-01T02:19:23Z

I recently cobbled together https://go-review.googlesource.com/c/go/+/271669 to work around that allocation. As I noted there, "another alternative is to duplicate a bunch of the trimming code to work directly on asciiSets", which might be a viable option to consider.

@bradfitz also had some thoughts about this. I've forgotten what they were, but perhaps he remembers. :)

dsnet · 2021-06-01T02:53:41Z

Interesting idea caching the cutset functions. I think inlining the single-byte handling is still preferred since a quick check against a single byte is dramatically faster than always calling a func(rune) bool.

That said, if it provides any insights, the top most common cutsets with len>1 are:

"\r\n" at 4.6%
" \t" at 2.3%
"\n\r" at 0.8%
"//" at 0.7%
", " at 0.5%
" \t\r\n" at 0.5%
"\t\n\f\r " at 0.4%
" \t\n\f\r" at 0.4%
"./" at 0.3%
"01234567" at 0.3%

Observations of the above:

Most of them deal with ASCII whitespace characters.
Trimming ASCII digits seems common.
One of them is useless (e.g., "//" is an identical cutset to "/").

gopherbot · 2021-06-05T07:00:57Z

Change https://golang.org/cl/325251 mentions this issue: strings,bytes: optimize the Trim/TrimLeft/TrimRight fast paths

There is evidence that the vast majority of uses for Trim* involve cutsets with a single ASCII character, and the vast majority of remaining uses involve cutsets with a small (<4) ASCII characters. For this reason it makes sense to provide better fast paths for these common cases. Note that the regressions in TrimSpace are likely due to code alignment, since TrimSpace shares no code with the changes in this CL. bytes: name old time/op new time/op delta TrimSpace/NoTrim 3.60ns ± 2% 3.75ns ±10% +4.13% (p=0.027 n=10+10) TrimSpace/ASCII 6.65ns ± 6% 7.30ns ±12% +9.73% (p=0.017 n=9+10) TrimSpace/SomeNonASCII 72.0ns ± 2% 71.7ns ± 3% ~ (p=0.604 n=10+9) TrimSpace/JustNonASCII 116ns ± 2% 115ns ± 1% -1.16% (p=0.037 n=10+10) TrimASCII/1:1 40.9ns ± 8% 4.0ns ± 4% -90.29% (p=0.000 n=10+10) TrimASCII/1:2 62.3ns ± 3% 9.1ns ± 3% -85.45% (p=0.000 n=10+10) TrimASCII/1:4 62.9ns ± 2% 10.3ns ± 2% -83.54% (p=0.000 n=9+10) TrimASCII/1:8 69.6ns ± 8% 69.2ns ± 2% ~ (p=0.959 n=8+8) TrimASCII/1:16 78.2ns ± 1% 83.1ns ±12% ~ (p=0.274 n=8+10) TrimASCII/16:1 83.5ns ± 1% 13.4ns ± 4% -83.89% (p=0.000 n=8+9) TrimASCII/16:2 116ns ±10% 34ns ± 3% -71.18% (p=0.000 n=10+10) TrimASCII/16:4 106ns ± 4% 49ns ± 2% -53.29% (p=0.000 n=10+10) TrimASCII/16:8 110ns ± 3% 114ns ± 2% +3.67% (p=0.000 n=10+10) TrimASCII/16:16 119ns ± 3% 124ns ± 2% +4.19% (p=0.000 n=9+10) TrimASCII/256:1 721ns ± 6% 182ns ± 6% -74.79% (p=0.000 n=10+10) TrimASCII/256:2 739ns ± 1% 437ns ± 8% -40.84% (p=0.000 n=8+10) TrimASCII/256:4 746ns ± 2% 653ns ± 2% -12.38% (p=0.000 n=10+9) TrimASCII/256:8 745ns ± 2% 801ns ± 2% +7.49% (p=0.000 n=10+10) TrimASCII/256:16 808ns ±11% 813ns ± 2% ~ (p=0.954 n=10+10) TrimASCII/4096:1 11.4µs ±13% 2.6µs ± 2% -76.95% (p=0.000 n=10+9) TrimASCII/4096:2 10.9µs ± 1% 5.4µs ± 2% -50.42% (p=0.000 n=10+10) TrimASCII/4096:4 11.3µs ± 6% 10.3µs ± 4% -9.10% (p=0.000 n=10+10) TrimASCII/4096:8 11.1µs ± 3% 11.4µs ± 5% +3.32% (p=0.029 n=10+10) TrimASCII/4096:16 10.9µs ± 3% 12.0µs ±12% +10.21% (p=0.001 n=10+10) strings: name old time/op new time/op delta Trim 1.33µs ± 4% 0.70µs ± 3% -47.20% (p=0.000 n=10+10) TrimASCII/1:1 39.4ns ± 2% 4.6ns ± 3% -88.24% (p=0.000 n=10+10) TrimASCII/1:2 63.1ns ± 3% 8.8ns ± 3% -86.02% (p=0.000 n=9+9) TrimASCII/1:4 63.9ns ± 3% 10.6ns ± 2% -83.39% (p=0.000 n=9+10) TrimASCII/1:8 68.9ns ± 3% 69.0ns ± 5% ~ (p=1.000 n=9+9) TrimASCII/1:16 82.7ns ± 9% 83.6ns ±15% ~ (p=0.684 n=10+10) TrimASCII/16:1 79.5ns ± 5% 14.6ns ± 2% -81.60% (p=0.000 n=10+9) TrimASCII/16:2 107ns ± 5% 34ns ± 3% -68.30% (p=0.000 n=9+10) TrimASCII/16:4 107ns ± 4% 49ns ± 2% -54.30% (p=0.000 n=10+10) TrimASCII/16:8 111ns ± 5% 114ns ± 2% ~ (p=0.068 n=9+10) TrimASCII/16:16 118ns ± 3% 121ns ± 4% +2.26% (p=0.014 n=10+10) TrimASCII/256:1 611ns ± 8% 183ns ± 8% -70.10% (p=0.000 n=9+10) TrimASCII/256:2 811ns ± 8% 442ns ± 9% -45.54% (p=0.000 n=10+10) TrimASCII/256:4 760ns ± 5% 650ns ± 2% -14.41% (p=0.000 n=10+9) TrimASCII/256:8 754ns ± 3% 765ns ± 3% ~ (p=0.101 n=10+10) TrimASCII/256:16 757ns ± 4% 773ns ± 3% +2.19% (p=0.012 n=10+10) TrimASCII/4096:1 8.88µs ± 2% 2.70µs ± 3% -69.64% (p=0.000 n=10+9) TrimASCII/4096:2 11.0µs ± 2% 5.5µs ± 3% -50.06% (p=0.000 n=10+10) TrimASCII/4096:4 11.0µs ± 3% 10.3µs ± 5% -6.53% (p=0.000 n=10+10) TrimASCII/4096:8 11.9µs ±13% 12.0µs ±12% ~ (p=0.796 n=10+10) TrimASCII/4096:16 11.6µs ±10% 12.0µs ±12% ~ (p=0.247 n=10+10) TrimSpace/NoTrim 3.65ns ± 2% 3.73ns ± 2% +2.13% (p=0.004 n=10+10) TrimSpace/ASCII 6.09ns ± 3% 6.71ns ± 3% +10.13% (p=0.000 n=9+10) TrimSpace/SomeNonASCII 71.6ns ± 1% 72.5ns ± 2% ~ (p=0.077 n=9+9) TrimSpace/JustNonASCII 109ns ± 2% 111ns ± 5% +2.19% (p=0.030 n=10+10) name old alloc/op new alloc/op delta Trim 576B ± 0% 168B ± 0% -70.83% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Trim 23.0 ± 0% 6.0 ± 0% -73.91% (p=0.000 n=10+10) Fixes golang#46446 Change-Id: I87bf0d621eecb876f61c6efeff0bf5c3213973b5

gopherbot · 2021-07-04T09:37:36Z

Change https://golang.org/cl/332771 mentions this issue: strings,bytes: optimize the Trim/TrimLeft/TrimRight fast paths

CAFxX · 2021-07-05T06:56:03Z

https://golang.org/cl/332771 removes the allocation in all cases, and in general speeds up Trim/TrimLeft/TrimRight 2x~10x depending on the cutset, with the biggest speedups for the case of a cutset composed of a single ASCII character.

andig · 2021-07-31T11:57:44Z

Out of curiosity as the speedup seems significant: what is missing to move either of the CLs forward? A decision on fast path vs pooling? Combining both?

josharian · 2021-07-31T13:53:13Z

The CL from @CAFxX looks like a good approach to me. Hopefully once the tree opens again we can get it reviewed and submitted.

dsnet · 2021-08-25T19:33:05Z

Re-opening since there's some interest in optimizing for multi-byte ASCII cutsets. The evidence for supporting single-byte cutsets is very strong given that they are >75% of all usages. For the remaining 25% where the cutset is not a single byte, 97% of them are all ASCII.

cespare · 2021-08-25T19:54:55Z

@dsnet what do you think about opening a new issues and reverting the title of this one? I think it's confusing that this was originally about 1-byte cutsets, and then that was implemented, and now the issue is reopened and is about something new.

At minimum, it should be retitled to be about both optimizations.

dsnet · 2021-08-25T20:07:58Z

I renamed the issue to be about ASCII cutsets, which I think accurately captured both optimizations being discussed.

CAFxX · 2021-09-04T06:49:53Z

I just rebased and re-ran the benchmarks in https://golang.org/cl/332771. Would be ideal if someone could take a look at it. It completely eliminates all memory allocations and significantly speeds up both the ASCII fast path and the slow path.

dsnet added the Performance label May 29, 2021

ALTree added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label May 29, 2021

CAFxX mentioned this issue Jun 3, 2021

cmd/vet: report strings.Trim/TrimLeft/TrimRight with duplicate runes in cutset #46533

Open

mtrmac mentioned this issue Aug 21, 2021

Pass OriginalDigest and UncompressedDigest to c/storage.Store.PutLayer containers/image#1352

Merged

dsnet mentioned this issue Aug 25, 2021

proposal: cmd/vet warns when the cutset of strings.{Trim,TrimLeft,TrimRight} is not in order #47822

Closed

gopherbot closed this as completed in 5baf60d Aug 25, 2021

dsnet changed the title ~~bytes, strings: optimize Trim for 1-byte cutsets~~ bytes, strings: optimize Trim for multi-byte cutsets Aug 25, 2021

dsnet reopened this Aug 25, 2021

dsnet changed the title ~~bytes, strings: optimize Trim for multi-byte cutsets~~ bytes, strings: optimize Trim for ASCII cutsets Aug 25, 2021

gopherbot closed this as completed in 4002616 Oct 6, 2021

dmitshur removed the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Nov 19, 2021

dmitshur added the NeedsFix The path to resolution is known, but the work has not been done. label Nov 19, 2021

dmitshur added this to the Go1.18 milestone Nov 19, 2021

mtrmac mentioned this issue Dec 10, 2021

archive parsing: use strings.Trim...Func() instead of strings.Trim() vbatts/tar-split#56

Closed

seankhliao mentioned this issue Mar 18, 2022

bytes: Trim returns empty slice instead of nil in 1.18 #51793

Closed

golang locked and limited conversation to collaborators Nov 19, 2022

gopherbot added the FrozenDueToAge label Nov 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bytes, strings: optimize Trim for ASCII cutsets #46446

bytes, strings: optimize Trim for ASCII cutsets #46446

dsnet commented May 29, 2021 •

edited

ianlancetaylor commented May 29, 2021

dsnet commented May 30, 2021

dsnet commented May 30, 2021

gopherbot commented May 30, 2021

nightlyone commented May 30, 2021

dsnet commented May 30, 2021 •

edited

josharian commented Jun 1, 2021

dsnet commented Jun 1, 2021

gopherbot commented Jun 5, 2021

gopherbot commented Jul 4, 2021

CAFxX commented Jul 5, 2021

andig commented Jul 31, 2021

josharian commented Jul 31, 2021

dsnet commented Aug 25, 2021 •

edited

cespare commented Aug 25, 2021

dsnet commented Aug 25, 2021

CAFxX commented Sep 4, 2021

bytes, strings: optimize Trim for ASCII cutsets #46446

bytes, strings: optimize Trim for ASCII cutsets #46446

Comments

dsnet commented May 29, 2021 • edited

ianlancetaylor commented May 29, 2021

dsnet commented May 30, 2021

dsnet commented May 30, 2021

gopherbot commented May 30, 2021

nightlyone commented May 30, 2021

dsnet commented May 30, 2021 • edited

josharian commented Jun 1, 2021

dsnet commented Jun 1, 2021

gopherbot commented Jun 5, 2021

gopherbot commented Jul 4, 2021

CAFxX commented Jul 5, 2021

andig commented Jul 31, 2021

josharian commented Jul 31, 2021

dsnet commented Aug 25, 2021 • edited

cespare commented Aug 25, 2021

dsnet commented Aug 25, 2021

CAFxX commented Sep 4, 2021

dsnet commented May 29, 2021 •

edited

dsnet commented May 30, 2021 •

edited

dsnet commented Aug 25, 2021 •

edited