proposal: slices: add IndexPointer #66981

dsnet · 2024-04-23T02:09:13Z

Proposal Details

I propose the addition of:

// IndexPointer returns the index of the element in s that p points to.
// It returns -1 if p is not found in s.
func IndexPointer[S ~[]E, E any](s S, p *E) int

A use of this is to check whether some slice is a subslice of another slice, by checking:
len(subslice) > 0 && slices.IndexPointer(s, &subslice[0]) >= 0

Such an operation is useful:

As an optimization, where knowing that one slice is a directly subslice of another allows you to skip certain operations
As a means of precondition error checking where an operation will produce invalid results if one argument is a subslice of another. For example, AppendFoo(dst, src) is will produce corrupt data if src is a subslice of dst and the operation writes more bytes to dst than it reads from src. Such bugs are difficult to track down and ideally the API can return an error by detecting such situations (or just clone the src for a slower but correct implementation).

An alternative API is:

func ContainsSubslice[S ~[]E, E any](s S, subslice S) bool

but:

This is easy to hold wrong since both s and subslice are of type S.
It has weird edge cases where s and subslice overlap, but neither is subslice of the other.
Returning a bool is strictly less flexible than returning an index.
The naming isn't clear what the definition of "Contains" means. Is it comparing the value of the elements? The naming of IndexPointer is more clear about it's meaning.

Implementation

The naive implementation is straight forward:

func IndexPointer[S ~[]E, E any](s S, p *E) int {
	for i := range s {
		if p == &s[i] {
			return i
		}
	}
	return -1
}

but is unfortunately non-performant as it performs an O(n) search over the slice.

A more efficient implementation takes advantage of pointer arithmetic:

func IndexPointer[S ~[]E, E any](s S, p *E) int {
	i := 0
	if p != nil && unsafe.Sizeof(*p) > 0 {
		pd := unsafe.Pointer(unsafe.SliceData(s))
		pe := unsafe.Pointer(p)
		i = int((uintptr(pe) - uintptr(pd)) / unsafe.Sizeof(*p))
	}
	if uint(i) < uint(len(s)) && p == &s[i] {
		return i
	}
	return -1
}

which can now compute the result in O(1).

Without the helper function in "slices", it is currently impossible in Go to identify whether a given slice is a sub-slice of another slice in O(1) without the use of "unsafe". By including this helper in the "slices" package, callers can avoid the use of "unsafe" to perform this operation.

The text was updated successfully, but these errors were encountered:

randall77 · 2024-04-23T02:25:41Z

FYI, there are already helpers in slices for this, overlaps and startIdx.
Maybe they are not exactly what is needed for this proposal, but they do exist. ('startIdxis O(n), but only because it wasn't necessary to make it faster.) The comment instartIdx` is particularly relevant - what if the pointer points in the slice, but is not exactly at the start of any element?

dsnet · 2024-04-23T02:42:39Z

The comment instartIdx` is particularly relevant - what if the pointer points in the slice, but is not exactly at the start of any element?

With my proposed implementation, it returns -1, since there's a p == &s[i] check before returning the index. Having the optimized variant match the naive implementation seems the least surprising and also answers how each edge case should be handled.

Another edge cases exists with slices of zero-length element types. Again, the optimized version matches the naive implementation in behavior.

dsnet · 2024-04-23T02:45:47Z

As a concrete example of pre-condition error checking, for flate.AppendCompress (#54078), I wanted to have it reports an error if src is a subslice of dst. This is tempting to do and will often be okay since the output is usually smaller than the input. However, that is not guaranteed as it is possible that compression produces a larger output than the input. When this occurs, the output is corrupted and it can be extremely difficult to figure out why. It is better to report an error up front.

Jorropo · 2024-04-23T03:29:31Z

I would rather see:

func IndexPointer[S ~[]E, E any](s S, p *E) (int, bool)

I don't like -1, it's not obvious from the signature how it fails or what I should check for.

dsnet · 2024-04-23T03:33:54Z

The existing Index and IndexFunc functions already set the precedence that -1 means "not found". I'm not sure if we should change that here.

Jorropo · 2024-04-23T05:18:13Z

Ah nvm, well forget what I said. thx

ianlancetaylor · 2024-04-23T17:29:50Z

A use of this is to check whether some slice is a subslice of another slice, by checking:
len(subslice) > 0 && slices.IndexPointer(s, &subslice[0]) >= 0

Technically subslice might extend past the end of s, so this isn't a test of whether some slice is a subslice of another slice, it's a test of a more complicated condition. You could fix that by adding
&& slices.IndexPointer(s, &subslice[len(subslice)-1]) >= 0

What if we instead use Overlapping[E any](s1, s2 []E) bool? That could return true if s1 and s2 overlap--share any elements. The order of the arguments doesn't matter. You only get a bool result--is that enough for your purposes?

dsnet · 2024-04-23T17:54:36Z

For the use cases that I'm wrestling with, Overlapping would work. I like that it resolves the 1st, 2nd, and 4th issues that ContainsSubslice would have.

That said, I think IndexPointer is still a useful function as it serves as a lower-level primitive with well-defined semantics.
The fact that Overlapping could be implemented in terms of IndexPointer shows its flexibility:

func Overlapping[S ~[]E, E any](s1, s2 S) bool {
	return len(s1) > 0 && len(s2) > 0 && (
		IndexPointer(s1, &s2[0]) >= 0 ||
		IndexPointer(s2, &s1[0]) >= 0 ||
		IndexPointer(s1, &s2[len(s2)-1]) >= 0 ||
		IndexPointer(s2, &s1[len(s1)-1]) >= 0)
}

We could provide both IndexPointer and Overlapping, but I'll still take Overlapping over nothing.

earthboundkid · 2024-04-24T17:21:58Z

The real win would be to teach the compiler that seeing if slices.Overlapping(s1, s2) { return } means it can assume non-aliasing beyond that point. That would be huge for optimization.

dsnet added the Proposal label Apr 23, 2024

gopherbot added this to the Proposal milestone Apr 23, 2024

dsnet changed the title ~~proposal: slices: add IndexSubslice~~ proposal: slices: add IndexPointer Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: slices: add IndexPointer #66981

proposal: slices: add IndexPointer #66981

dsnet commented Apr 23, 2024 •

edited

randall77 commented Apr 23, 2024

dsnet commented Apr 23, 2024 •

edited

dsnet commented Apr 23, 2024 •

edited

Jorropo commented Apr 23, 2024

dsnet commented Apr 23, 2024

Jorropo commented Apr 23, 2024

ianlancetaylor commented Apr 23, 2024

dsnet commented Apr 23, 2024

earthboundkid commented Apr 24, 2024

proposal: slices: add IndexPointer #66981

proposal: slices: add IndexPointer #66981

Comments

dsnet commented Apr 23, 2024 • edited

Proposal Details

Implementation

randall77 commented Apr 23, 2024

dsnet commented Apr 23, 2024 • edited

dsnet commented Apr 23, 2024 • edited

Jorropo commented Apr 23, 2024

dsnet commented Apr 23, 2024

Jorropo commented Apr 23, 2024

ianlancetaylor commented Apr 23, 2024

dsnet commented Apr 23, 2024

earthboundkid commented Apr 24, 2024

dsnet commented Apr 23, 2024 •

edited

dsnet commented Apr 23, 2024 •

edited

dsnet commented Apr 23, 2024 •

edited