Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/exp/slices: use dual-pivot quicksort instead of pdqsort #52789

Closed
PeterRK opened this issue May 9, 2022 · 17 comments
Closed

x/exp/slices: use dual-pivot quicksort instead of pdqsort #52789

PeterRK opened this issue May 9, 2022 · 17 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@PeterRK
Copy link

PeterRK commented May 9, 2022

New benchmark result with go 1.20rc2

Xeon-8374C

name                       old time/op  new time/op  delta
SlicesSortInts-4           7.41ms ± 0%  6.26ms ± 0%  -15.49%  (p=0.000 n=10+10)
SlicesSortInts_Sorted-4    70.2µs ± 0%  51.9µs ± 0%  -26.15%  (p=0.000 n=10+10)
SlicesSortInts_Reversed-4   110µs ± 0%    89µs ± 0%  -18.96%  (p=0.000 n=10+10)
SlicesSortInts_Mixed-4     5.67ms ± 0%  4.58ms ± 0%  -19.14%  (p=0.000 n=10+10)
SlicesSortStrings-4        23.8ms ± 0%  22.2ms ± 0%   -6.82%  (p=0.000 n=10+10)
SortFuncStructs-4          14.2ms ± 1%  13.3ms ± 1%   -6.56%  (p=0.000 n=10+10)

EPYC-7K83

name                       old time/op  new time/op  delta
SlicesSortInts-4           6.71ms ± 0%  5.62ms ± 0%  -16.15%  (p=0.000 n=10+9)
SlicesSortInts_Sorted-4    71.6µs ± 0%  46.7µs ± 1%  -34.73%  (p=0.000 n=9+10)
SlicesSortInts_Reversed-4   107µs ± 1%    76µs ± 1%  -29.25%  (p=0.000 n=9+10)
SlicesSortInts_Mixed-4     5.04ms ± 0%  4.12ms ± 0%  -18.22%  (p=0.000 n=10+9)
SlicesSortStrings-4        23.0ms ± 0%  21.6ms ± 0%   -6.15%  (p=0.000 n=10+10)
SortFuncStructs-4          13.0ms ± 1%  12.4ms ± 1%   -4.77%  (p=0.000 n=10+10)

Pdqsort works well, but I found dual-pivot quicksort may run faster generally. Here is my implement, a variant of dual-pivot quicksort. Pdqsort wins a lot in percentage when the input is already in order, but absolute gaps are small. It's reasonable to optimize the slow cases first.

Benchmark result on Xeon-8372C

name                          old time/op  new time/op   delta
SlicesSortInts-4              7.13ms ± 0%   5.87ms ± 0%  -17.69%  (p=0.000 n=10+9)
SlicesSortInts_Sorted-4       80.2µs ± 4%  109.0µs ± 1%  +35.90%  (p=0.000 n=10+8)
SlicesSortInts_Reversed-4      110µs ± 0%    162µs ± 0%  +47.69%  (p=0.000 n=8+10)
SlicesSortInts_Mixed-4        5.47ms ± 0%   4.38ms ± 0%  -19.86%  (p=0.000 n=9+10)
SlicesSortInts_Pattern30-4     802µs ± 0%    705µs ± 0%  -12.01%  (p=0.000 n=10+10)
SlicesSortInts_Pattern60-4     504µs ± 0%    465µs ± 0%   -7.81%  (p=0.000 n=7+10)
SlicesSortInts_Pattern75-4     354µs ± 0%    343µs ± 0%   -2.97%  (p=0.000 n=10+10)
SlicesSortInts_Pattern90-4     204µs ± 0%    222µs ± 0%   +8.72%  (p=0.000 n=9+10)
SlicesSortInts_Pattern96-4     146µs ± 0%    171µs ± 1%  +16.80%  (p=0.000 n=9+10)
SlicesSortStrings-4           22.8ms ± 0%   21.4ms ± 0%   -6.22%  (p=0.000 n=10+10)
SortFuncStructs-4             13.6ms ± 1%   12.6ms ± 1%   -7.22%  (p=0.000 n=10+10)
SortFuncTinyStructs-4         10.5ms ± 0%    9.7ms ± 0%   -7.28%  (p=0.000 n=10+10)

Benchmark result on EPYC-7K83

name                          old time/op  new time/op  delta
SlicesSortInts-4              6.76ms ± 0%  5.70ms ± 0%  -15.74%  (p=0.000 n=10+9)
SlicesSortInts_Sorted-4        103µs ± 1%   139µs ± 2%  +35.37%  (p=0.000 n=9+10)
SlicesSortInts_Reversed-4      120µs ± 1%   179µs ± 1%  +48.92%  (p=0.000 n=9+10)
SlicesSortInts_Mixed-4        5.10ms ± 0%  4.32ms ± 0%  -15.19%  (p=0.000 n=8+8)
SlicesSortInts_Pattern30-4     974µs ± 0%   851µs ± 1%  -12.62%  (p=0.000 n=9+9)
SlicesSortInts_Pattern60-4     596µs ± 0%   542µs ± 0%   -9.01%  (p=0.000 n=10+9)
SlicesSortInts_Pattern75-4     406µs ± 0%   389µs ± 1%   -4.24%  (p=0.000 n=10+9)
SlicesSortInts_Pattern90-4     217µs ± 0%   235µs ± 0%   +8.05%  (p=0.000 n=10+10)
SlicesSortInts_Pattern96-4     144µs ± 1%   176µs ± 1%  +22.09%  (p=0.000 n=10+10)
SlicesSortStrings-4           22.9ms ± 0%  21.7ms ± 1%   -5.08%  (p=0.000 n=8+10)
SortFuncStructs-4             13.2ms ± 1%  12.4ms ± 1%   -6.05%  (p=0.000 n=8+10)
SortFuncTinyStructs-4         9.92ms ± 0%  9.71ms ± 0%   -2.08%  (p=0.000 n=9+10)

From benchmark result, we can see that pattern-defeating quicksort may have better performance than dual-pivot quicksort in average when more than 75% of workload are in special patterns. I don't it's common to have so much input with special patterns in a real job.

Unlike pdqsort, dual-pivot quicksort is not a new algorithm. It has been used by OpenJDK about 10 years ago. We can find many papers about it. Dual-pivot quicksort brings significantly more swaps, but swap operation is much cheaper now with generics.

@gopherbot gopherbot added this to the Proposal milestone May 9, 2022
@ianlancetaylor ianlancetaylor changed the title proposal: exp/slices: use dual-pivot quicksort instead of pdqsort x/exp/slices: use dual-pivot quicksort instead of pdqsort May 9, 2022
@ianlancetaylor
Copy link
Contributor

There is no API change here, so taking it out of the proposal process.

CC @eliben @zhangyunhao116

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 9, 2022
@ianlancetaylor ianlancetaylor modified the milestones: Proposal, Unreleased May 9, 2022
@zhangyunhao116
Copy link
Contributor

zhangyunhao116 commented May 10, 2022

The pdqsort in exp/slices is not the fastest version, because we need to keep the same algorithm in both sort and slices, which means we have to use the same file to generate these codes(src/sort/gen_sort_variants.go).

For this reason, pdqsort loses many opportunities for optimization, such as removing a, b int every time we pass the slice, and some parameter optimizations. This will reduce the performance of the generic version by about 5% ~ 30% in different cases.

Run the Benchmark in https://github.com/zhangyunhao116/pdqsort (original generic version), it contains more benchmark scenarios, new is the dual-pivot quicksort. (AMD 3700x amd64/linux)

name                       old time/op  new time/op   delta
Random/sort_64             1.35µs ± 0%   1.20µs ± 0%  -10.96%  (p=0.008 n=5+5)
Random/sort_256            7.09µs ± 0%   6.06µs ± 0%  -14.57%  (p=0.008 n=5+5)
Random/sort_1024           34.6µs ± 0%   29.5µs ± 0%  -14.74%  (p=0.008 n=5+5)
Random/sort_4096            164µs ± 0%    141µs ± 1%  -14.06%  (p=0.008 n=5+5)
Random/sort_65536          3.44ms ± 0%   2.96ms ± 0%  -14.02%  (p=0.008 n=5+5)
Sorted/sort_64             98.9ns ± 2%  107.7ns ± 0%   +8.92%  (p=0.008 n=5+5)
Sorted/sort_256             244ns ± 1%    291ns ± 1%  +19.53%  (p=0.008 n=5+5)
Sorted/sort_1024            780ns ± 0%    928ns ± 1%  +18.91%  (p=0.008 n=5+5)
Sorted/sort_4096           2.96µs ± 1%   3.50µs ± 1%  +18.36%  (p=0.008 n=5+5)
Sorted/sort_65536          46.9µs ± 3%   54.8µs ± 3%  +16.86%  (p=0.008 n=5+5)
NearlySorted/sort_64        483ns ± 0%    463ns ± 0%   -4.20%  (p=0.016 n=5+4)
NearlySorted/sort_256      2.61µs ± 1%   2.25µs ± 1%  -13.94%  (p=0.008 n=5+5)
NearlySorted/sort_1024     11.6µs ± 0%   12.1µs ± 1%   +3.65%  (p=0.008 n=5+5)
NearlySorted/sort_4096     51.9µs ± 0%   62.1µs ± 0%  +19.80%  (p=0.008 n=5+5)
NearlySorted/sort_65536    1.01ms ± 0%   1.34ms ± 1%  +33.40%  (p=0.016 n=4+5)
Reversed/sort_64            123ns ± 1%    129ns ± 1%   +4.74%  (p=0.008 n=5+5)
Reversed/sort_256           322ns ± 1%    371ns ± 0%  +15.43%  (p=0.008 n=5+5)
Reversed/sort_1024         1.05µs ± 0%   1.29µs ± 2%  +23.10%  (p=0.008 n=5+5)
Reversed/sort_4096         3.97µs ± 1%   4.91µs ± 0%  +23.87%  (p=0.016 n=5+4)
Reversed/sort_65536        62.2µs ± 0%   78.1µs ± 5%  +25.67%  (p=0.016 n=4+5)
NearlyReversed/sort_64      569ns ± 1%    634ns ± 0%  +11.39%  (p=0.008 n=5+5)
NearlyReversed/sort_256    3.00µs ± 0%   3.14µs ± 0%   +4.70%  (p=0.008 n=5+5)
NearlyReversed/sort_1024   13.9µs ± 0%   15.3µs ± 0%   +9.75%  (p=0.008 n=5+5)
NearlyReversed/sort_4096   64.2µs ± 1%   72.2µs ± 0%  +12.54%  (p=0.008 n=5+5)
NearlyReversed/sort_65536  1.27ms ± 0%   1.46ms ± 1%  +15.18%  (p=0.008 n=5+5)
Mod8/sort_64                384ns ± 1%    386ns ± 2%     ~     (p=0.548 n=5+5)
Mod8/sort_256              1.19µs ± 1%   1.86µs ± 1%  +56.10%  (p=0.008 n=5+5)
Mod8/sort_1024             3.98µs ± 1%   4.19µs ± 2%   +5.29%  (p=0.008 n=5+5)
Mod8/sort_4096             14.5µs ± 2%   14.1µs ± 2%   -2.73%  (p=0.016 n=5+5)
Mod8/sort_65536             254µs ± 0%    357µs ± 4%  +40.47%  (p=0.016 n=4+5)
AllEqual/sort_64            101ns ± 2%    100ns ± 3%     ~     (p=0.421 n=5+5)
AllEqual/sort_256           246ns ± 1%    241ns ± 0%   -2.17%  (p=0.008 n=5+5)
AllEqual/sort_1024          783ns ± 1%    862ns ± 9%  +10.06%  (p=0.032 n=5+5)
AllEqual/sort_4096         2.97µs ± 1%   3.41µs ± 2%  +15.00%  (p=0.008 n=5+5)
AllEqual/sort_65536        47.0µs ± 3%   55.0µs ± 3%  +16.83%  (p=0.008 n=5+5)

Based on this benchmark, in the common patterns, the performance of the two algorithms is similar. In the random case(the most common scenario), the dual-pivot quicksort causes performance regression.

For compatibility reasons, we can't use different algorithms in different versions, for example sort.Interface and slices.SortFunc must get the same result when sorting the same slice, it would be good if you could use the same algorithm in sort.Interface to benchmark it again.


Updated the benchmark results, because the new algorithm is not on the default branch of go-exp(https://github.com/PeterRK/go-exp/tree/rk/slices), the previous result is wrong.

Based on the new benchmark, in most of the common patterns, the dual-pivot quicksort causes performance regression. In the random case, the dual-pivot quicksort is better. Not sure if this is a performance enhancement overall.

cc @PeterRK

@PeterRK
Copy link
Author

PeterRK commented May 10, 2022

For compatibility reasons, we can't use different algorithms in different versions

Is it necessary to keep stability of unstable sort?

@zhangyunhao116
Copy link
Contributor

For compatibility reasons, we can't use different algorithms in different versions

Is it necessary to keep stability of unstable sort?

  • If the stability means how the algorithm treats equal elements.
    No, we don't need to keep that. So it is ok to replace the algorithm.
  • If the stability means that the user used sort and slices to sort the same sequence(with equal elements) and got the same result.
    Yes, we need to keep that rule. I discussed the same issue with Keith earlier. He said In any case, I think we want the corresponding sorts in sort and exp/slices (both unstable and stable, generic and via interface) to generate the same results. e.g. for x of type []float64 (https://go-review.googlesource.com/c/go/+/400534/)

@randall77
Copy link
Contributor

I don't think I intended my statement to be that absolute. When we were developing the first version of sort in exp/slices, it helped to keep the sorts exactly the same. Especially when they were both code-generated from the same source. Longer term, we can consider differences if there is a motivating reason for doing so.

@zhangyunhao116
Copy link
Contributor

I don't think I intended my statement to be that absolute. When we were developing the first version of sort in exp/slices, it helped to keep the sorts exactly the same. Especially when they were both code-generated from the same source. Longer term, we can consider differences if there is a motivating reason for doing so.

Thanks! I misunderstood something, and I am wondering that do we need to keep the sort.Interface and sort.SortFunc to generate the same result?

@zhangyunhao116
Copy link
Contributor

Related test for this case.

type diffint struct {
	data int
	diff int
}

type diffintSlice []diffint

func (x diffintSlice) Len() int           { return len(x) }
func (x diffintSlice) Less(i, j int) bool { return x[i].data < x[j].data }
func (x diffintSlice) Swap(i, j int)      { x[i], x[j] = x[j], x[i] }

func TestSameResult(t *testing.T) {
	x := make([]diffint, 100)
	y := make([]diffint, 100)
	for i := range x {
		x[i].data = rand.Intn(5)
		x[i].diff = rand.Int()
		y[i] = x[i]
	}

	sort.Sort(sort.Interface(diffintSlice(x)))
	SortFunc(y, func(a, b diffint) bool {
		return a.data < b.data
	})

	for i := range x {
		if x[i] != y[i] {
			t.Errorf("Different result in %d, %v != %v", i, x[i], y[i])
		}
	}
}

@PeterRK
Copy link
Author

PeterRK commented May 10, 2022

We see that pdqsort's effort on optimizing some common patterns, but we also see those patterns usually are not the bottleneck of a real job. Dual-pivot quicksort works better with the slowest unordered pattern. I am not sure it's faster in average. But improving the slowest case is worthy.

I ran benchmark on cloud machine, because most of my go programs run on them. More benchmark results on other platforms are needed to figure out which algorithm to choose.

@zhangyunhao116
Copy link
Contributor

Agree! The performance is depends. It would be great if we could find a better way to improve the performance in random cases, could you please post some references for this variant?

@PeterRK
Copy link
Author

PeterRK commented May 10, 2022

The most famous variant of dual-pivot quicksort may be that in OpenJDK. However, the java version looks a little too complex. I like simple implement and make one. It's perhaps not the best.

@PeterRK
Copy link
Author

PeterRK commented May 15, 2022

Facts form benchmark:

  1. pattern-defeating quicksort run faster on data with good patterns;
  2. dual-pivot quicksort run faster on random data;
  3. both algorithms need more time to handle random data than data with good patterns, so dual-pivot quicksort has lower max latency;
  4. pattern-defeating quicksort may have better throughput when more than 75% inputs have good patterns.

Do you think it's resonable to use dual-pivot quicksort instead of pattern-defeating quicksort?
@ianlancetaylor @eliben @zhangyunhao116 @randall77

@randall77
Copy link
Contributor

I'm happy to explore faster sort algorithms. I agree that faster worst case time is probably more important than faster already-sorted time. But I don't see any fundamental reason why we need to make that tradeoff - the ideas behind pdqsort and dual-pivot quicksort can probably be combined.

@PeterRK
Copy link
Author

PeterRK commented May 16, 2022

But I don't see any fundamental reason why we need to make that tradeoff - the ideas behind pdqsort and dual-pivot quicksort can probably be combined.

I have tried to add pattern-defeating technique into dual-pivot quicksort. But it’s not easy to implement such technique in a 3-way algorithm.

PeterRK added a commit to PeterRK/go-exp that referenced this issue May 16, 2022
Dual-pivot quicksort has smaller recursion depth. Some pattern-defeating
technique is added to make this variant.

And use double reversion instead of blockswap for symmerge.

Benchmark results on Xeon-8372C
name                          old time/op   new time/op   delta
SlicesSortInts-4              7.01ms ± 0%  5.84ms ± 0%  -16.64%  (p=0.000 n=10+10)
SlicesSortInts_Sorted-4       80.1µs ± 1%  69.1µs ± 0%  -13.76%  (p=0.000 n=8+10)
SlicesSortInts_Reversed-4      116µs ± 1%   108µs ± 0%   -6.72%  (p=0.000 n=10+10)
SlicesSortInts_Mixed-4        5.36ms ± 0%  4.31ms ± 0%  -19.53%  (p=0.000 n=10+9)
SlicesSortStrings-4           22.5ms ± 0%  20.9ms ± 1%   -7.12%  (p=0.000 n=10+10)
SortFuncStructs-4             13.4ms ± 1%  12.5ms ± 1%   -7.00%  (p=0.000 n=10+10)
SortFuncStructs_Stable-4      31.1ms ± 1%  30.7ms ± 1%   -1.25%  (p=0.000 n=10+10)

For golang/go#52789
PeterRK added a commit to PeterRK/go-exp that referenced this issue May 16, 2022
Dual-pivot quicksort has smaller recursion depth. Some pattern-defeating
technique is added to make this variant.

And use double reversion instead of blockswap for symmerge.

Benchmark results on Xeon-8372C
name                          old time/op   new time/op   delta
SlicesSortInts-4              7.01ms ± 0%  5.84ms ± 0%  -16.64%  (p=0.000 n=10+10)
SlicesSortInts_Sorted-4       80.1µs ± 1%  69.1µs ± 0%  -13.76%  (p=0.000 n=8+10)
SlicesSortInts_Reversed-4      116µs ± 1%   108µs ± 0%   -6.72%  (p=0.000 n=10+10)
SlicesSortInts_Mixed-4        5.36ms ± 0%  4.31ms ± 0%  -19.53%  (p=0.000 n=10+9)
SlicesSortStrings-4           22.5ms ± 0%  20.9ms ± 1%   -7.12%  (p=0.000 n=10+10)
SortFuncStructs-4             13.4ms ± 1%  12.5ms ± 1%   -7.00%  (p=0.000 n=10+10)
SortFuncStructs_Stable-4      31.1ms ± 1%  30.7ms ± 1%   -1.25%  (p=0.000 n=10+10)

For golang/go#52789

Change-Id: I1e572923ee90c7d4ec6792c7dba9a443134f9e61
PeterRK added a commit to PeterRK/go-exp that referenced this issue May 16, 2022
Dual-pivot quicksort has smaller recursion depth. Some pattern-defeating
technique is added to make this variant.

And use double reversion instead of blockswap for symmerge.

Benchmark results on Xeon-8372C
name                          old time/op   new time/op   delta
SlicesSortInts-4              7.01ms ± 0%  5.84ms ± 0%  -16.64%  (p=0.000 n=10+10)
SlicesSortInts_Sorted-4       80.1µs ± 1%  69.1µs ± 0%  -13.76%  (p=0.000 n=8+10)
SlicesSortInts_Reversed-4      116µs ± 1%   108µs ± 0%   -6.72%  (p=0.000 n=10+10)
SlicesSortInts_Mixed-4        5.36ms ± 0%  4.31ms ± 0%  -19.53%  (p=0.000 n=10+9)
SlicesSortStrings-4           22.5ms ± 0%  20.9ms ± 1%   -7.12%  (p=0.000 n=10+10)
SortFuncStructs-4             13.4ms ± 1%  12.5ms ± 1%   -7.00%  (p=0.000 n=10+10)
SortFuncStructs_Stable-4      31.1ms ± 1%  30.7ms ± 1%   -1.25%  (p=0.000 n=10+10)

For golang/go#52789
PeterRK added a commit to PeterRK/go-exp that referenced this issue May 16, 2022
Dual-pivot quicksort has smaller recursion depth.
Some pattern-defeating technique is added to make this variant.

And use double reversion instead of blockswap for symmerge.

Benchmark results on Xeon-8372C
name                          old time/op   new time/op   delta
SlicesSortInts-4              7.01ms ± 0%  5.84ms ± 0%  -16.64%  (p=0.000 n=10+10)
SlicesSortInts_Sorted-4       80.1µs ± 1%  69.1µs ± 0%  -13.76%  (p=0.000 n=8+10)
SlicesSortInts_Reversed-4      116µs ± 1%   108µs ± 0%   -6.72%  (p=0.000 n=10+10)
SlicesSortInts_Mixed-4        5.36ms ± 0%  4.31ms ± 0%  -19.53%  (p=0.000 n=10+9)
SlicesSortStrings-4           22.5ms ± 0%  20.9ms ± 1%   -7.12%  (p=0.000 n=10+10)
SortFuncStructs-4             13.4ms ± 1%  12.5ms ± 1%   -7.00%  (p=0.000 n=10+10)
SortFuncStructs_Stable-4      31.1ms ± 1%  30.7ms ± 1%   -1.25%  (p=0.000 n=10+10)

For golang/go#52789
@gopherbot
Copy link

Change https://go.dev/cl/406594 mentions this issue: slices: use dual-pivot quicksort

PeterRK added a commit to PeterRK/go-exp that referenced this issue May 16, 2022
Dual-pivot quicksort has smaller recursion depth.
Some pattern-defeating technique is added to make this variant.

And use double reversion instead of blockswap for symmerge.

Benchmark results on Xeon-8372C
name                          old time/op   new time/op   delta
SlicesSortInts-4              7.01ms ± 0%  5.84ms ± 0%  -16.64%  (p=0.000 n=10+10)
SlicesSortInts_Sorted-4       80.1µs ± 1%  69.1µs ± 0%  -13.76%  (p=0.000 n=8+10)
SlicesSortInts_Reversed-4      116µs ± 1%   108µs ± 0%   -6.72%  (p=0.000 n=10+10)
SlicesSortInts_Mixed-4        5.36ms ± 0%  4.31ms ± 0%  -19.53%  (p=0.000 n=10+9)
SlicesSortStrings-4           22.5ms ± 0%  20.9ms ± 1%   -7.12%  (p=0.000 n=10+10)
SortFuncStructs-4             13.4ms ± 1%  12.5ms ± 1%   -7.00%  (p=0.000 n=10+10)
SortFuncStructs_Stable-4      31.1ms ± 1%  30.7ms ± 1%   -1.25%  (p=0.000 n=10+10)

For golang/go#52789
PeterRK added a commit to PeterRK/go-exp that referenced this issue May 17, 2022
Dual-pivot quicksort has smaller recursion depth.
Some pattern-defeating technique is added to make this variant.

And use double reversion instead of blockswap for symmerge.

Benchmark results on Xeon-8372C
name                          old time/op   new time/op   delta
SlicesSortInts-4              7.01ms ± 0%  5.84ms ± 0%  -16.64%  (p=0.000 n=10+10)
SlicesSortInts_Sorted-4       80.1µs ± 1%  69.1µs ± 0%  -13.76%  (p=0.000 n=8+10)
SlicesSortInts_Reversed-4      116µs ± 1%   108µs ± 0%   -6.72%  (p=0.000 n=10+10)
SlicesSortInts_Mixed-4        5.36ms ± 0%  4.31ms ± 0%  -19.53%  (p=0.000 n=10+9)
SlicesSortStrings-4           22.5ms ± 0%  20.9ms ± 1%   -7.12%  (p=0.000 n=10+10)
SortFuncStructs-4             13.4ms ± 1%  12.5ms ± 1%   -7.00%  (p=0.000 n=10+10)
SortFuncStructs_Stable-4      31.1ms ± 1%  30.7ms ± 1%   -1.25%  (p=0.000 n=10+10)

For golang/go#52789
PeterRK added a commit to PeterRK/go-exp that referenced this issue May 17, 2022
Dual-pivot quicksort has smaller recursion depth.
Some pattern-defeating technique is added to make this variant.

And use double reversion instead of blockswap for symmerge.

Benchmark results on Xeon-8372C
name                     old time/op  new time/op  delta
SlicesSortInts           7.01ms ± 0%  5.84ms ± 0%  -16.64%  (p=0.000 n=10+10)
SlicesSortInts_Sorted    80.1µs ± 1%  69.1µs ± 0%  -13.76%  (p=0.000 n=8+10)
SlicesSortInts_Reversed   116µs ± 1%   108µs ± 0%   -6.72%  (p=0.000 n=10+10)
SlicesSortInts_Mixed     5.36ms ± 0%  4.31ms ± 0%  -19.53%  (p=0.000 n=10+9)
SlicesSortStrings        22.5ms ± 0%  20.9ms ± 1%   -7.12%  (p=0.000 n=10+10)
SortFuncStructs          13.4ms ± 1%  12.5ms ± 1%   -7.00%  (p=0.000 n=10+10)
SortFuncStructs_Stable   31.1ms ± 1%  30.7ms ± 1%   -1.25%  (p=0.000 n=10+10)

For golang/go#52789
PeterRK added a commit to PeterRK/go-exp that referenced this issue May 17, 2022
Dual-pivot quicksort has smaller recursion depth.
Some pattern-defeating technique is added to make this variant.

And use double reversion instead of blockswap for symmerge.

For golang/go#52789
@gopherbot
Copy link

Change https://go.dev/cl/406834 mentions this issue: slices: use dual-pivot quicksort

@PeterRK
Copy link
Author

PeterRK commented Jun 10, 2022

Benchmark result on Ampere Altra (ARM64)

name                       old time/op  new time/op  delta
SlicesSortInts-4           6.56ms ± 0%  5.50ms ± 0%  -16.16%  (p=0.000 n=10+8)
SlicesSortInts_Sorted-4     158µs ± 1%   107µs ± 2%  -32.16%  (p=0.000 n=10+10)
SlicesSortInts_Reversed-4   224µs ± 1%   174µs ± 1%  -22.26%  (p=0.000 n=10+10)
SlicesSortInts_Mixed-4     5.39ms ± 0%  4.30ms ± 0%  -20.16%  (p=0.000 n=9+10)
SlicesSortStrings-4        24.2ms ± 1%  23.9ms ± 2%     ~     (p=0.052 n=10+10)
SortFuncStructs-4          17.2ms ± 1%  16.0ms ± 2%   -7.22%  (p=0.000 n=10+10)

@PeterRK
Copy link
Author

PeterRK commented Jun 26, 2023

Give up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants