Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: slices: add IndexPointer #66981

Open
dsnet opened this issue Apr 23, 2024 · 9 comments
Open

proposal: slices: add IndexPointer #66981

dsnet opened this issue Apr 23, 2024 · 9 comments
Labels
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Apr 23, 2024

Proposal Details

I propose the addition of:

// IndexPointer returns the index of the element in s that p points to.
// It returns -1 if p is not found in s.
func IndexPointer[S ~[]E, E any](s S, p *E) int

A use of this is to check whether some slice is a subslice of another slice, by checking:
len(subslice) > 0 && slices.IndexPointer(s, &subslice[0]) >= 0

Such an operation is useful:

  • As an optimization, where knowing that one slice is a directly subslice of another allows you to skip certain operations
  • As a means of precondition error checking where an operation will produce invalid results if one argument is a subslice of another. For example, AppendFoo(dst, src) is will produce corrupt data if src is a subslice of dst and the operation writes more bytes to dst than it reads from src. Such bugs are difficult to track down and ideally the API can return an error by detecting such situations (or just clone the src for a slower but correct implementation).

An alternative API is:

func ContainsSubslice[S ~[]E, E any](s S, subslice S) bool

but:

  1. This is easy to hold wrong since both s and subslice are of type S.
  2. It has weird edge cases where s and subslice overlap, but neither is subslice of the other.
  3. Returning a bool is strictly less flexible than returning an index.
  4. The naming isn't clear what the definition of "Contains" means. Is it comparing the value of the elements? The naming of IndexPointer is more clear about it's meaning.

Implementation

The naive implementation is straight forward:

func IndexPointer[S ~[]E, E any](s S, p *E) int {
	for i := range s {
		if p == &s[i] {
			return i
		}
	}
	return -1
}

but is unfortunately non-performant as it performs an O(n) search over the slice.

A more efficient implementation takes advantage of pointer arithmetic:

func IndexPointer[S ~[]E, E any](s S, p *E) int {
	i := 0
	if p != nil && unsafe.Sizeof(*p) > 0 {
		pd := unsafe.Pointer(unsafe.SliceData(s))
		pe := unsafe.Pointer(p)
		i = int((uintptr(pe) - uintptr(pd)) / unsafe.Sizeof(*p))
	}
	if uint(i) < uint(len(s)) && p == &s[i] {
		return i
	}
	return -1
}

which can now compute the result in O(1).

Without the helper function in "slices", it is currently impossible in Go to identify whether a given slice is a sub-slice of another slice in O(1) without the use of "unsafe". By including this helper in the "slices" package, callers can avoid the use of "unsafe" to perform this operation.

@dsnet dsnet added the Proposal label Apr 23, 2024
@gopherbot gopherbot added this to the Proposal milestone Apr 23, 2024
@randall77
Copy link
Contributor

FYI, there are already helpers in slices for this, overlaps and startIdx.
Maybe they are not exactly what is needed for this proposal, but they do exist. ('startIdxis O(n), but only because it wasn't necessary to make it faster.) The comment instartIdx` is particularly relevant - what if the pointer points in the slice, but is not exactly at the start of any element?

@dsnet
Copy link
Member Author

dsnet commented Apr 23, 2024

The comment instartIdx` is particularly relevant - what if the pointer points in the slice, but is not exactly at the start of any element?

With my proposed implementation, it returns -1, since there's a p == &s[i] check before returning the index. Having the optimized variant match the naive implementation seems the least surprising and also answers how each edge case should be handled.

Another edge cases exists with slices of zero-length element types. Again, the optimized version matches the naive implementation in behavior.

@dsnet
Copy link
Member Author

dsnet commented Apr 23, 2024

As a concrete example of pre-condition error checking, for flate.AppendCompress (#54078), I wanted to have it reports an error if src is a subslice of dst. This is tempting to do and will often be okay since the output is usually smaller than the input. However, that is not guaranteed as it is possible that compression produces a larger output than the input. When this occurs, the output is corrupted and it can be extremely difficult to figure out why. It is better to report an error up front.

@dsnet dsnet changed the title proposal: slices: add IndexSubslice proposal: slices: add IndexPointer Apr 23, 2024
@Jorropo
Copy link
Member

Jorropo commented Apr 23, 2024

I would rather see:

func IndexPointer[S ~[]E, E any](s S, p *E) (int, bool)

I don't like -1, it's not obvious from the signature how it fails or what I should check for.

@dsnet
Copy link
Member Author

dsnet commented Apr 23, 2024

The existing Index and IndexFunc functions already set the precedence that -1 means "not found". I'm not sure if we should change that here.

@Jorropo
Copy link
Member

Jorropo commented Apr 23, 2024

Ah nvm, well forget what I said. thx

@ianlancetaylor
Copy link
Contributor

A use of this is to check whether some slice is a subslice of another slice, by checking:
len(subslice) > 0 && slices.IndexPointer(s, &subslice[0]) >= 0

Technically subslice might extend past the end of s, so this isn't a test of whether some slice is a subslice of another slice, it's a test of a more complicated condition. You could fix that by adding
&& slices.IndexPointer(s, &subslice[len(subslice)-1]) >= 0

What if we instead use Overlapping[E any](s1, s2 []E) bool? That could return true if s1 and s2 overlap--share any elements. The order of the arguments doesn't matter. You only get a bool result--is that enough for your purposes?

@dsnet
Copy link
Member Author

dsnet commented Apr 23, 2024

For the use cases that I'm wrestling with, Overlapping would work. I like that it resolves the 1st, 2nd, and 4th issues that ContainsSubslice would have.

That said, I think IndexPointer is still a useful function as it serves as a lower-level primitive with well-defined semantics.
The fact that Overlapping could be implemented in terms of IndexPointer shows its flexibility:

func Overlapping[S ~[]E, E any](s1, s2 S) bool {
	return len(s1) > 0 && len(s2) > 0 && (
		IndexPointer(s1, &s2[0]) >= 0 ||
		IndexPointer(s2, &s1[0]) >= 0 ||
		IndexPointer(s1, &s2[len(s2)-1]) >= 0 ||
		IndexPointer(s2, &s1[len(s1)-1]) >= 0)
}

We could provide both IndexPointer and Overlapping, but I'll still take Overlapping over nothing.

@earthboundkid
Copy link
Contributor

The real win would be to teach the compiler that seeing if slices.Overlapping(s1, s2) { return } means it can assume non-aliasing beyond that point. That would be huge for optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

6 participants