Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: clarify how unsafe.Pointers may be used to determine offsets between addresses #12445

Open
kortschak opened this issue Sep 2, 2015 · 60 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@kortschak
Copy link
Contributor

For background see discussion at https://groups.google.com/d/topic/golang-nuts/Uet5p_3JhZs/discussion

Currently it is possible to determine the offset between two values' addresses by finding the difference between uintptrs converted from unsafe.Pointer values. This is the only way to be able to determine whether two slices overlap in memory since the advent of three index slicing and the only way to determine in less than linear time whether blocked sparse slices overlap (relevant for views in constructed 2D slices for e.g. BLAS operations). In the event that a moving GC is implemented this may no longer be safe; if a GC move occurs between taking the conversion to uintptr of address of the first and second values, the offset may be incorrect. Note that the GC move does not otherwise invalidate the offset between overlapping slices or falsely result in non-overlapping being construed as overlapping.

@minux
Copy link
Member

minux commented Sep 2, 2015 via email

@kortschak
Copy link
Contributor Author

unsafe.Offset(a, b) int would be more generalisable - I don't want to just be able to determine whether slices overlap, but also determine if strided blocks within the slice overlap. This requires an offset and a pair of lengths (and element size) - the latter which I can already get by other means.

@kortschak
Copy link
Contributor Author

To add a data point to this. We now implement strided block overlap detection in github.com/gonum/matrix/mat64, documentation here and detection implementation here. This is done essentially as described in the original discussion, both with unsafe for environments that allow that and via reflect for those that don't.

I don't see a constant time implementation that can be achieved without access to an offset call.

@rsc rsc added this to the Go1.6 milestone Oct 23, 2015
@rsc rsc changed the title cmd/compiler: clarify how unsafe.Pointers may be used to determine offsets between addresses cmd/compile: clarify how unsafe.Pointers may be used to determine offsets between addresses Nov 4, 2015
@mdempsky
Copy link
Member

Something like below seems like it should be safe even under a moving GC.

// Overlap returns whether there exist i and j such that
// a[i*stride] and b[j*stride] address the same variable.
func overlap(a, b []float64, stride uintptr) bool {
    an, bn := uintptr(len(a)), uintptr(len(b))
    if an == 0 || bn == 0 {
        // Degenerate case; empty slices can't overlap.
        return false
    }

    // Keep as unsafe.Pointer for safety under moving GC.
    ap, bp := unsafe.Pointer(&a[0]), unsafe.Pointer(&b[0])

    // If a and b are entirely disjoint, there can't be any overlap.
    if uintptr(bp) - uintptr(ap) >= 8 * an && uintptr(ap) - uintptr(bp) >= 8 * bn {
        return false
    }

    // There is *some* overlap, so ap and bp must point into the same variable,
    // which means we can assume a stable offset between them.
    offset := uintptr(bp) - uintptr(ap)

    // ap and bp must be aligned modulo the stride.
    return offset % (stride * 8) == 0
}

@kortschak
Copy link
Contributor Author

That does not do what the linked code does. We want to return whether there is an i,j and x,y such that a[i + jstride] and b[x + ystride] address the same variable when the indices are constrained such that they index a subset of the possible blocks that may be described with the given stride over the slices. However it does suggest a possible solution.

// offset returns the number of float64 values b[0] is after a[0].
func offset(a, b []float64) int {
    a0 := unsafe.Pointer(&a[0])
    b0 := unsafe.Pointer(&b[0])

    if a0 == b0 {
        return 0
    }
    if uintptr(a0) < uintptr(b0) {
        return int((uintptr(b0) - uintptr(a0)) / unsafe.Sizeof(float64(0)))
    }
    return -int((uintptr(a0) - uintptr(b0)) / unsafe.Sizeof(float64(0)))
}

and the family friendly version (this one is more tenuous since it depends on values passed back from a reflect call):

// offset returns the number of float64 values b[0] is after a[0].
func offset(a, b []float64) int {
    va0 := reflect.ValueOf(a).Index(0)
    vb0 := reflect.ValueOf(b).Index(0)

    if va0.UnsafeAddr() == vb0.UnsafeAddr() {
        return 0
    }
    if vb0.UnsafeAddr() < va0.UnsafeAddr() {
        return int((vb0.UnsafeAddr() - va0.UnsafeAddr()) / sizeOfFloat64)
    }
    return -int((va0.UnsafeAddr() - vb0.UnsafeAddr()) / sizeOfFloat64)
}

These are then used in the functions also linked above.

Note that it is OK for our use case for the offset to be used after a possible GC move since if a and b are part of the same allocation they move together and so the offset remains valid, and if they are not, an offset cannot become indicative of an overlap. The potential race in the sign condition is troubling though.

@RLH
Copy link
Contributor

RLH commented Nov 28, 2015

There are certainly whole classes of concurrent GC algorithms for which
this will not work. GC's using a variation of a Brooks' pointer, such as
proposed for the Red Hat LLVM based GC as well as Sapphire style
collectors. Currently Go does not use these techniques.

On Sat, Nov 28, 2015 at 4:17 AM, Dan Kortschak notifications@github.com
wrote:

That does not do what the linked code does. We want to return whether
there is an i,j and x,y such that a[i + j_stride] and b[x + y_stride]
address the same variable. However it does suggest a possible solution.

// offset returns the number of float64 values b[0] is after a[0].
func offset(a, b []float64) int {
a0 := unsafe.Pointer(&a[0])
b0 := unsafe.Pointer(&b[0])

if a0 == b0 {
    return 0
}
if uintptr(a0) < uintptr(b0) {
    return int((uintptr(b0) - uintptr(a0)) / unsafe.Sizeof(float64(0)))
}
return -int((uintptr(a0) - uintptr(b0)) / unsafe.Sizeof(float64(0)))

}

and the family friendly version (this on is more tenuous since it depends
on values passed back from a reflect call):

// offset returns the number of float64 values b[0] is after a[0].
func offset(a, b []float64) int {
va0 := reflect.ValueOf(a).Index(0)
vb0 := reflect.ValueOf(b).Index(0)

if va0.UnsafeAddr() == vb0.UnsafeAddr() {
    return 0
}
if va0.UnsafeAddr() < vb0.UnsafeAddr() {
    return int((vb0.UnsafeAddr() - va0.UnsafeAddr()) / sizeOfFloat64)
}
return -int((va0.UnsafeAddr() - vb0.UnsafeAddr()) / sizeOfFloat64)

}

These are then used in the functions also linked above.

Note that it is OK for our use case for the offset to be used after a
possible GC move since if a and b are part of the same allocation they move
together and so the offset remains valid, and if they are not, an offset
cannot become indicative of an overlap.


Reply to this email directly or view it on GitHub
#12445 (comment).

@kortschak
Copy link
Contributor Author

There are certainly whole classes of concurrent GC algorithms for which this will not work.

This is true if the GC algorithms are considered in isolation from the language's promises about behaviour. This issue aims to address that.

Currently there seems to be a de facto promise* see discussion in #7192 and #8994 that use of converted unsafe.Pointers is atomic wrt the GC if the use is in a single expression. This is how the unsafe.Pointer->uintptr happens in the code above, with the exception of the condition which troubles me.

* Not a promise.

Currently Go does not use these techniques.

This is understood; hence the comments in the linked code. The issue for us is that introduction of GC behaviour that breaks this code without a work around to obtain an offset will break gonum/matrix/mat64 behaviour. We use unsafe for the normal case and we don't expect Go1 coverage for that, but we also use the reflect equivalent for environments that disallow unsafe.

@mdempsky
Copy link
Member

@RLH Sorry, when you say "for which this will not work", what are you referring to by "this"? E.g., would the overlap function I wrote above (setting aside momentarily that it doesn't address @kortschak's actual use case) not be safe under some of those GC implementations?

The Go spec says "For struct s with field f: uintptr(unsafe.Pointer(&s)) + unsafe.Offsetof(s.f) == uintptr(unsafe.Pointer(&s.f))". I've generally extrapolated from this that within a single expression, pointers when converted to uintptr must represent a consistent view of memory, as otherwise the equality wouldn't necessarily hold. Also that other uintptr expressions like uintptr(unsafe.Pointer(&s.f)) - uintptr(unsafe.Pointer(&s)) == unsafe.Offsetof(s.f) and uintptr(unsafe.Pointer(&s.f)) >= uintptr(unsafe.Pointer(&s)) would be valid and guaranteed to evaluate to true.

If those aren't safe extrapolations, perhaps the spec should be revised to only guarantee unsafe.Pointer(uintptr(unsafe.Pointer(&s)) + unsafe.Offsetof(s.f)) == unsafe.Pointer(&s.f).

@kortschak
Copy link
Contributor Author

@mdempsky I think this is #8994.

@mdempsky
Copy link
Member

@kortschak They're certainly very related. But my impression is #8994 is about the rules for how you can safely convert unsafe.Pointer to uintptr and back, whereas this issue seems to be more about which arithmetic laws still hold after converting unsafe.Pointer to uintptr. E.g., the overlap/offset functions being discussed above don't care about convert uintptr back to unsafe.Pointer.

@kortschak
Copy link
Contributor Author

kortschak commented Nov 29, 2015 via email

@RLH
Copy link
Contributor

RLH commented Nov 29, 2015

Let me try again. Various concurrent moving collectors including those
using a Brooks' barrier as well as the Sapphire algorithm rely on the fact
that two different bit pattern can be used to represent the same (pointer
to an) object. One bit pattern may refer to the "old" version of the object
and another to the "new" version of the object. Depending on the algorithm
a read and/or write barrier is used to ensure that values returned from the
object are consistent. The literature explains how the algorithms
extinguish the "old" version by replacing them with "new" version and what
invariants the read and write barriers need to maintain to ensure object
identity, consistency, completeness, and termination.

This creates implementation issues around the implementation of == which
doesn't dereference the pointer and thus fall outside the barriers used
when the object is dereferenced. This == problem was first noted and solved
in the original Brooks paper as well as being noted, referenced, and solved
in the Sapphire paper. It is important to note that in at least the
Sapphire algorithm the implementation of == is the only place where reads
of (pointers to) objects require special consideration. Go's spec allows
for such an implementation.

The single example found in the unsafe part of the spec seems to extend the
semantics of == to include calls to unsafe.Pointer in the same ==
expression where the compiler can statically determine that the arguments
passed to unsafe.Pointer refer to the same object and eliminate one of the
calls using CSE. The unsafe.Offset call can be replaced with known static
offset to a field within a struct. I would be surprised if the compiler
doesn't already do this. I believe that this extension of == semantics is
tractable and object identity can be maintained.

The examples provided in this thread use unsafe.Pointer in different
statements and then expect the returned bits to be comparable. This is
unsafe though it works in current implementation.

In summary, the statement that "Something like below seems like it should
be safe even under a moving GC." is not true for at least two well known
concurrent copying collectors.

More to the point is that both #7192
#7192 and #8994
#8994 have resisted changing the spec.
What has changed since those decisions? Changing the stable spec is a very
high bar, particularly when the change will alter fundamental concepts like
object identity that GC algorithms have wrestled with for decades.

On Sat, Nov 28, 2015 at 9:32 PM, Dan Kortschak notifications@github.com
wrote:

Yes, that's true, though I think that once the rules for the other issue
are settled that defines what happens here.


Reply to this email directly or view it on GitHub
#12445 (comment).

@mdempsky
Copy link
Member

@RLH Thanks. I understand the idea that under a moving GC that a single "pointer value" (in the Go spec/language sense) might at a given time have multiple bit patterns at the machine implementation level, and so this complicates the implementation of Go pointer equality.

However, we seem to have different interpretations of the spec's uintptr(unsafe.Pointer(&s)) + unsafe.Offsetof(s.f) == uintptr(unsafe.Pointer(&s.f)) guarantee (which I'll emphasize again is currently written using integer equality, not pointer equality). It sounds like you interpret the guarantee is precisely worded and only holds when &s and &s.f are all locally visible to the compiler so it can be satisfied via CSE optimizations, whereas I've instead interpreted it generalizes to examples like this (even assuming no inlining/cross-package compiler optimizations):

package pkg

import "unsafe"

type S struct { E, F int }

func Func(p, q unsafe.Pointer) bool {
    return uintptr(p) + unsafe.Offsetof(S{}.F) == uintptr(q)
}

.

package main

import (
    "unsafe"
    "./pkg"
)

func main() {
    var s pkg.S
    if !pkg.Func(unsafe.Pointer(&s), unsafe.Pointer(&s.F)) {
        panic("spec violation")
    }
}

So at the very least I think it's worth deciding/clarifying which interpretation of the spec is intended.

@dr2chase
Copy link
Contributor

The problem with

uintptr(unsafe.Pointer(&s)) + unsafe.Offsetof(s.f) == uintptr(unsafe.Pointer(&s.f))

is that it can be viewed as making assumptions about atomicity with respect to garbage collection. The issue is not rules of integer equality, it is instead about what GC activity might occur between evaluating uintptr(unsafe.Pointer(&s)) and uintptr(unsafe.Pointer(&s.f)). If it is indeed a matter of exactly what the Go spec says, then one cost of some future collector might be the lock/barrier operations needed to obtain that atomicity -- be careful what you are asking for, because you just might get it.

I'd be far happier if the runtime had a variant of kortschak's offset(a,b) function that returned difference int, related bool. Arguing against this is that it might require some extra cost to determine that two pointers are unrelated, arguing for it is that it would keep unsafe stuff out of user's code and would be appropriately future-proof against changes in the garbage collector. Against the extra cost, you have that if the unrelated case is common, then you avoid doing all the calculations with lengths and sizes and proceed directly to code that does updates in any order whatsoever.

To someone who's worked on GC (and optimizers, and concurrency) it is extremely nervous-making to see people so happy to assign precise semantics to unsafe code (and generally, so happy to use unsafe code).

@RLH
Copy link
Contributor

RLH commented Nov 30, 2015

Technically I think we understand each other.

Short term expanding the semantics of unsafe will lead to more and
increasingly clever uses of unsafe and that will lead to an increase in
corner case bugs. Clever uses of unsafe coupled with an increasing
sophisticated runtime that depends on the type system being safe is the
road to pain.

Long term expanding the semantics of unsafe will lead to technical debt and
cut off possible solutions that moving objects might solve. Some of the
possible problems include fragmented memory and poor spatial locality
complicated by deeper caches and various as yet undefined NUMA
architectures. We lack visibility on how important these things will be in
the future and whether the advantages of allowing Go to grow in unsafe ways
today outweighs the accumulated technical debt.

I think we should spend some time and see if we can come up with solutions
that lead to less use of unsafe instead of more.

On Sun, Nov 29, 2015 at 2:57 PM, Matthew Dempsky notifications@github.com
wrote:

@RLH https://github.com/RLH Thanks. I understand the idea that under a
moving GC that a single "pointer value" (in the Go spec/language sense)
might at a given time have multiple bit patterns at the machine
implementation level, and so this complicates the implementation of Go
pointer equality.

However, we seem to have different interpretations of the spec's uintptr(unsafe.Pointer(&s))

  • unsafe.Offsetof(s.f) == uintptr(unsafe.Pointer(&s.f)) guarantee (which
    I'll emphasize again is currently written using integer equality, not
    pointer equality). It sounds like you interpret the guarantee is
    precisely worded and only holds when &s and &s.f are all locally visible to
    the compiler so it can be satisfied via CSE optimizations, whereas I've
    instead interpreted it generalizes to examples like this (even assuming no
    inlining/cross-package compiler optimizations):

package pkg

import "unsafe"

type S struct { E, F int }

func Func(p, q unsafe.Pointer) bool {
return uintptr(p) + unsafe.Offsetof(S{}.F) == uintptr(q)
}

.

package main

import (
"unsafe"
"./pkg"
)

func main() {
var s pkg.S
if !pkg.Func(unsafe.Pointer(&s), unsafe.Pointer(&s.F)) {
panic("spec violation")
}
}

So at the very least I think it's worth deciding/clarifying which
interpretation of the spec is intended.


Reply to this email directly or view it on GitHub
#12445 (comment).

@mdempsky
Copy link
Member

@dr2chase Arguably there are atomicity constraints even in expressions like unsafe.Pointer(uintptr(unsafe.Pointer(&s)) + 1), which is why I'd previously proposed adding a function like func Add(unsafe.Pointer, uintptr) unsafe.Pointer to the standard library / runtime (#7192). However, it was rejected, so I've been operating under the impression that the preferred approach is codifying safe unsafe.Pointer/uintptr conversion patterns.

If adding more primitive operations is on the table, I'd suggest something like:

// Base returns a pointer to the beginning of the outermost enclosing variable,
// and the offset of p into that variable.
// If p does not point into a variable, then it returns (p, 0).
func Base(p unsafe.Pointer) (base unsafe.Pointer, offset uintptr)

So for example, Base(unsafe.Pointer(&s.f)) would return (unsafe.Pointer(&s), unsafe.Offsetof(s.f)).

Then @kortschak's offset can be implemented as:

func offset(p, q []float64) (difference int, related bool) {
    pb, po := Base(unsafe.Pointer(&p[0]))
    qb, qo := Base(unsafe.Pointer(&q[0]))
    if pb != qb {  // GC-aware pointer equality
        return 0, false
    }
    if po < qo {
        return int((qo - po) / unsafe.Sizeof(float64(0)))
    }
    return -int((po - qo) / unsafe.Sizeof(float64(0)))
}

@ianlancetaylor
Copy link
Contributor

@mdempsky It's not clear to me that unsafe.Add was rejected. It just hasn't been accepted.

@dr2chase
Copy link
Contributor

Base looks like a nice abstraction. How much do you also want the corresponding inverse? That would be more unsafe than Base, since you could use wrong arithmetic to do arbitrary unchecked damage.

I think the preferred approach is don't use unsafe.

@mdempsky
Copy link
Member

@dr2chase By corresponding inverse, do you mean the Add function I mentioned? If so, in my unsafe.Pointer arithmetic proposal, I further suggested that Add(p, n) should only be guaranteed to be safe if the resulting pointer points into the same variable as p, which is checkable at runtime.

@randall77
Copy link
Contributor

I'm not a huge fan of Base because it implies that the runtime knows the answer and can compute it in a timely fashion. I think that's mostly true today but might not be in some possible GC implementations.

For instance, if p is a pointer into a global variable, we don't have an efficient way right now to determine, given p, the starting point of the containing global variable.

It also doesn't work on non-Go-managed memory, which will be confusing. If a function gets two slices and uses Base to determine aliasing, why should it work for slices whose backing store is allocated by Go and not for backing stores allocated by mmap?

@mdempsky
Copy link
Member

@randall77 Fair points!

So perhaps tying the wording to "Go variables" is wrong. Instead something like "indivisible block of memory" or whatever appropriately describes the granularity at which the GC may rearrange memory.

Also, for pointers to objects that will never be moved by the GC (e.g., global Go variables, or unmanaged memory), Base(p) would return (nil, uintptr(p)) (instead of (p, 0)).

That would allow offset (as written above) to work for slices whose backing store is allocated by mmap, because their base pointers would be equal (both nil).

Downside is it would also work for slices of even separate mmap allocations (or of an mmap allocation and a Go global variable), but at least since those objects aren't going to move around in memory anyway, I don't think it could cause problems? (Though I'll continue thinking about that.) Also, I don't see how Go could be expected to reliably distinguish those cases anyway.

@randall77
Copy link
Contributor

Even for Go-allocated objects, I'm not sure we want to commit to Base. Imagine the following scenario. We allocate objects from a region using a bump pointer and record size&type info in a side buffer. So object start info is encoded in the side buffer, but not in any easily seekable way. Presumably GC would need to find object starts, but maybe it only does so in bulk - i.e. a whole region at a time. To answer an individual Base query would require a scan of the side buffer.

I'm not saying a GC like that would ever work, but I'm concerned that by adopting Base we'd be preventing any such implementation.

I'm with Minux that if we're trying to determine overlap, ask the runtime directly. I think this handles the common case, effectively the stride==1 case. unsafe.Overlap(p unsafe.Pointer, plen uintptr, q unsafe.Pointer, qlen uintptr) bool? The nice thing about this function is that the API is GC-agnostic (the implementation would have to understand the GC, of course).

Maybe we handle stride>1 queries as the stride==1 query above together with the guarantee that uintptr(p)-uintptr(q) is always correct if p and q are in the same allocation? I'm less sure about whether this makes sense and/or is possible given a moving GC.

@kortschak
Copy link
Contributor Author

I'm with Minux that if we're trying to determine overlap, ask the runtime directly. I think this handles the common case, effectively the stride==1 case. unsafe.Overlap(p unsafe.Pointer, plen uintptr, q unsafe.Pointer, qlen uintptr) bool? The nice thing about this function is that the API is GC-agnostic (the implementation would have to understand the GC, of course).

This does not handle the block matrix use that we have.

Maybe we handle stride>1 queries as the stride==1 query above together with the guarantee that uintptr(p)-uintptr(q) is always correct if p and q are in the same allocation? I'm less sure about whether this makes sense and/or is possible given a moving GC.

Nor does this AFAICS.

@randall77
Copy link
Contributor

@kortschak, could you be more specific about what you're trying to do then? I'm afraid I don't understand what you mean in the second half of your description:

We want to return whether there is an i,j and x,y such that a[i + j_stride] and b[x + y_stride] address the
same variable when the indices are constrained such that they index a subset of the possible blocks
that may be described with the given stride over the slices.

@kortschak
Copy link
Contributor Author

@randall77, probably the best thing to do is look at the docs linked above and the implementation of the checkOverlap methods in the code linked above. However, in brief we need to be able to detect overlap in views of a 2-d matrix. An overlap is when two matrices share an element in the backing store - this is more complex that just a trivial overlap as described in minux's first proposal and also more complex than a stride>1 overlap, since a description of a matrix is struct { data []float64; rows, cols, stride int }, not just struct { data []float64; stride int }. To assess the overlap of two blocks when the stride-only bool-returning function, you would require (AFAICS) O(rows) operations for a row major storage.

@randall77
Copy link
Contributor

I think you could do it with:

func checkOverlap(a, b matrix) bool {
    if !unsafe.Overlap(&a.data[0], a.rows*a.stride, &b.data[0], b.rows*b.stride) { return false }
    off := uintptr(&a.data[0])-uintptr(&b.data[0])
    ...do the same thing that your checkOverlap does, starting at the stride comparison...
}

@kortschak
Copy link
Contributor Author

I don't see how that helps. The issue is that the test for a block overlap requires that arithmetic be performed on an offset, specifically the test in rectanglesOverlap. You are right that it simplifies these two conditions, but that is a minor component of the test.

@randall77
Copy link
Contributor

But my code is computing an offset you can use. Isn't it exactly what your offset call returns? (Modulo size of the base type.) Just pass off/unsafe.Sizeof(float64(0)) to rectanglesOverlap.

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 16, 2019
@gitspeaks
Copy link

Thanks Ian. I will refrain from doing this so to avoid the “I told you so” moment :)

However, please consider guaranteeing this for this type of usage. I’ve coded a free list over a single byte slice consisting of byte-records where the “next pointers” (record offsets) are encoded within the records themselves and records are removed from the list as reslices. Having a reliable way to do pointer subtraction between a reslice and the initial byte slice would have allowed me to stay with an Api for returning back records (reslices) which accepts just the reslice itself without having to maintain an additional “record offset” context adjacent (or encoded within) each record (reslice).

@mdempsky
Copy link
Member

@kortschak Re-reviewing this thread, in your earlier comment, you mention "We want to return whether there is an i,j and x,y such that a[i + j*stride] and b[x + y*stride] address the same variable when the indices are constrained such that they index a subset of the possible blocks that may be described with the given stride over the slices."

Is this still the main/only motivating case within the Go BLAS libraries, or are there others to consider?

Also, can you link us to the code you're currently using to do this? (I imagine it's unchanged, but the code you linked here says it's been archived, so I wanted to double check.)

The other half of the issue is how the reflect equivalent can be ensured to work since a move may happen between the uinptr conversion in reflect and the subtraction on return from the call to reflect's UnsafeAddr.

I think this concern is obsolete? App Engine has supported package unsafe for a while now, and cmd/compile subsequently removed the -safe flag.

@kortschak
Copy link
Contributor Author

Thanks @mdempsky. The code now lives here and here.

Is this still the main/only motivating case within the Go BLAS libraries, or are there others to consider?

This is still the concern.

I think this concern is obsolete? App Engine has supported package unsafe for a while now, and cmd/compile subsequently removed the -safe flag.

This is good to know. I think we will conitinue to provide the safe option, but separate that from the appengine case. Is this documented somewhere?

@mdempsky
Copy link
Member

Is this documented somewhere?

Do you mean App Engine support for package unsafe?

If so, surprisingly, it doesn't seem to be documented anywhere officially. You can certainly find mention of this, like here: https://news.ycombinator.com/item?id=18230938 ("You can even import 'unsafe'!"). But even the App Engine KB still suggests "unsafe" is a reason a program might not work on App Engine (though perhaps "work" is meant more broadly than "compile").

It's been supported by App Engine since the Go runtime switched to using gVisor, which was the Go 1.11 runtime. According to Long-term support for legacy runtimes, it sounds like projects using older Go runtimes are going to be migrated to Go 1.11, so they'll have access to package "unsafe" too.

@kortschak
Copy link
Contributor Author

kortschak commented Aug 15, 2020

Yes, that's what I was referring to. It's not that surprising. My experience with appengine has been pretty much generally that the documentation is not so good. The reason I ask, is because we test with the "appengine" tag on the basis that it's a synonym of "safe". If we differentiate the two, it would be good to know if using the "appengine" tag is essentially the same as no tag for the purposes of our code.

@rsc
Copy link
Contributor

rsc commented Aug 27, 2020

There is no longer any available way to deploy on App Engine that rejects use of unsafe (Go 1.11 runtime is the minimum one available now), and the -u compiler option is gone. You don't need to worry about unsafe not being available in standard binaries. (Not sure about wasm?)

@kortschak
Copy link
Contributor Author

Thanks. We will go through and remove the appengine tag and just leave safe and noasm for those that want them.

@gitspeaks
Copy link

gitspeaks commented Aug 28, 2020

@ianlancetaylor

I don't think the language guarantees that that expression will always work. I can imagine unlikely failures in an implementation with a moving garbage collector.

In practice with today's implementations that expression will work. (Of course in the general case in order to get a slice index you will need to adjust by the size of the element type.)

I’m confused.

Given @rsc recent comment #41049 (comment), in particular pointing out a similar usage in aliasing.go I feel compelled to ask why should I consider the sample above #12445 (comment) to be “unsafe” when a similar calculation (which I figure depends on the same guarantees) is considered “safe” by the go team ?
At the very least, could the team guarantee to vet and document a workaround in case such expressions break in future versions of go ? (as would probably be required for the code in aliasing.go)

@rsc
Copy link
Contributor

rsc commented Aug 28, 2020

Ian said above that it's not, strictly speaking, guaranteed to work on all possible Go implementations.
I said on 41049 that, realistically, it's not going to break on any current Go implementation any time soon.
Those statements can both be true (and are both true).

@ianlancetaylor
Copy link
Contributor

I don't think there is any contradiction here. The standard library sometimes does unsafe things. If the implementations change, then the standard library changes.

The code in crypto/internal/subtle isn't considered to be safe. It's just considered to work with the current implementation. And that is what I said about your code as well. The difference is that if we change the implementation we will fix crypto/internal/subtle, but we won't fix your code.

@kortschak
Copy link
Contributor Author

@rsc one follow up on the appengine environment. Will it allow non-stdlib assembly code, or should we protect that?

@mdempsky
Copy link
Member

mdempsky commented Aug 28, 2020

@kortschak As I understand it, Go 1.11+ on App Engine is just running inside gVisor, which emulates a typical Linux VM. So I would expect assembly, cgo, etc. to all function normally like with non-App-Engine builds.

Evidently App Engine's Go 1.11 runtime doesn't even set the appengine build tag anymore: golang/oauth2#334 (comment)

@kortschak
Copy link
Contributor Author

Great. Thanks.

@gitspeaks
Copy link

@ianlancetaylor

The code in crypto/internal/subtle isn't considered to be safe. It's just considered to work with the current implementation. And that is what I said about your code as well. The difference is that if we change the implementation we will fix crypto/internal/subtle, but we won't fix your code.

That doesn't sound fair to me because it strongly suggests preferential treatment to package authors on the Go team in the sense they are allowed to benefit through deviating from the language spec and be assured that if things break, the code will get fixed one way or another where I as a Go user not only in inferior position to know what unspecified /prohibited language usage "actually works" with the current implementation, but have no way to be informed when such "unsafe" usage is expected to stop working, let alone taking on the risk of not being able to find a workaround in such case. In short, what's good for package authors on the Go team should be good for all external package authors which in this instance implies that you really should fix these types of expressions if they break in the future.

@gitspeaks
Copy link

gitspeaks commented Aug 28, 2020

@rsc

Ian said above that it's not, strictly speaking, guaranteed to work on all possible Go implementations.
I said on 41049 that, realistically, it's not going to break on any current Go implementation any time soon.
Those statements can both be true (and are both true).

My intent was not to challenge the truthiness of your statements but rather point out an "illegal" example I did not expect to see in the Go library asking how you would deal with such similar expressions in user code if they ever break.

@ianlancetaylor
Copy link
Contributor

it strongly suggests preferential treatment to package authors on the Go team

It is preferential treatment for the Go standard library. That is not the same as authors on the Go team. Many people not on the Go team have made changes to the Go standard library. Most code written by people on the Go team is not in the Go standard library.

In any case I think the whole point of this open issue is to find some way to make this safe for any Go program.

@gitspeaks
Copy link

Perhaps a similar approach to how this issue is handled for syscalls is sufficient?

func Fix(f func(a []uintptr), a ...uintptr) // (*)

buf := make([]byte, 8)
buf2 := buf[4:]

var offset uintptr

// the compiler recognizes this pattern so the referenced objects are retained and not moved until the call complete 
unsafe.Fix(func(a []uintptr) {
	offset = a[0] – a[1]
}, uintptr(unsafe.Pointer(&buf2[0]), uintptr(unsafe.Pointer(&buf[0]))

(*) actually, it would be nice to have f func(a ...uintptr), but it seems Go doesn't support passing on variadic arguments

@mdempsky
Copy link
Member

mdempsky commented Aug 31, 2020

unsafe.Fix? If the only extension needed here is for determining slice overlap, I think an unsafe.Fix operation is overkill. But if there are other use cases, it would be worthwhile to list them.

(Also, Go does support passing on variadic arguments. See the last two paragraphs of "Passing arguments to ... parameters".)

Pointer subtraction. I think we should define uintptr(q) - uintptr(p) to compute the difference between the addresses pointed to by q and p. (And maybe unsafe.Sub(q, p), in the style of #40481.) In particular, we should guarantee to compute that difference atomically with regard to any moving GC (herein GC-atomic).

That would guarantee that if p and q point into the same variable and x := uintptr(q) - uintptr(p), then q == unsafe.Pointer(uintptr(p) + x) and p == unsafe.Pointer(uintptr(q) - x).

It would also guarantee that if p points to an n-byte variable, and q does not point into that variable, then uintptr(q) - uintptr(p) >= uintptr(n). (Note: There would be no guarantee in this case that q == unsafe.Pointer(uintptr(p) + (uintptr(q) - uintptr(p))), because this would violate rule #3's "the result must continue to point into the original allocated object" requirement.)

This would make gonum's code correct as-is. The crypto/internal/subtle code though would need revision to something like:

func AnyOverlap(x, y []byte) bool {
	if len(x) == 0 || len(y) == 0 {
		return false
	}
	diff := uintptr(unsafe.Pointer(&x[0])) - uintptr(unsafe.Pointer(&y[0]))
	return diff < uintptr(len(y)) || -diff < uintptr(len(x))
}

Pointer comparison? It might be tempting to codify additional operations beyond just subtraction; for example, uintptr(p) <= uintptr(q), as used in the original AnyOverlap code. After all, once we guarantee one primitive operation on two pointers to be GC-atomic, I don't think it's substantially more difficult for implementations to guarantee more.

However, guaranteeing uintptr(p) <= uintptr(q) to be GC-atomic would not be sufficient by itself to ratify the original AnyOverlap code. That code relies on the entire uintptr(x0) <= uintptr(ylast) && uintptr(y0) <= uintptr(xlast) expression to be GC-atomic. Otherwise, if x and y are slices of distinct arrays, it's possible x is before y in memory for the first comparison, but then moved after y for the second comparison, which would cause the function to incorrectly return true.

unsafe.Overlap? It was also previously suggested to add an unsafe.Overlap function. This would address crypto/internal/subtle's need, but gonum relies on knowing the actual overlapping interval. So gonum would still require that we define pointer subtraction. (Or unsafe.Overlap could return the offset too, but that just sounds like pointer subtraction with extra steps.)

It was also suggested earlier to allow pointer subtraction, but only for pointers into the same variable. However, that still requires pointer subtraction to be GC-atomic, at least for pointers into the same variable. Moreover, it would require users to perform two separate GC-atomic operations (test for overlap, then subtraction), whereas gonum's code currently only requires one.

@gitspeaks
Copy link

unsafe.Fix? If the only extension needed here is for determining slice overlap, I think an unsafe.Fix operation is overkill. But if there are other use cases, it would be worthwhile to list them.

Here is another case: #12445 (comment)

(Also, Go does support passing on variadic arguments. See the last two paragraphs of "Passing arguments to ... parameters".)

Why does the following error on playground ?:

https://play.golang.org/p/AIIBrJr86QS

package main

import (
	"fmt"
)

func main() {
	f1(1, 2)
}

func f1(a ...int) {
	f2(a)
}
func f2(b ...int) {
	fmt.Printf("%d, %d", b[0], b[1])
}

./prog.go:12:4: cannot use a (type []int) as type int in argument to f2

Pointer subtraction?
Pointer comparison?
unsafe.Overlap?

I think unsafe.Fix provides a general solution so you don't need to "solve" for each particular usage.

@kortschak
Copy link
Contributor Author

https://play.golang.org/p/9CStA2kerRX for how to pass on slices to variadic parameters.

The Fix API looks like the kind of thing that you would not see in the unsafe package; it's unnecessarily complicated.

@gitspeaks
Copy link

gitspeaks commented Aug 31, 2020

https://play.golang.org/p/9CStA2kerRX for how to pass on slices to variadic parameters.

Good to know, but this requires a mental leap thinking about the caller variadic parameter as a slice when passed
to the callee, which to me seems unnecessary when the callee itself is defined to accept a variadic argument of the same type.

The Fix API looks like the kind of thing that you would not see in the unsafe package; it's unnecessarily complicated.

Why? you can explicitly control which objects are locked in memory and these are clearly apparent from the argument list,
then you are free to do whatever you want with them in scope, perhaps involving other non-fix objects which effectively gives you a way to express an atomic transaction with respect to the memory locked objects and you don't need to learn and remember a different api for each programing case.

@gitspeaks
Copy link

Additionally, we can have higher order api’s like AnyOverlap built on top of Fix having solutions both for the general case and one-liners for specific common cases .

@mdempsky
Copy link
Member

mdempsky commented Aug 31, 2020

Here is another case: #12445 (comment)

In that comment you said: "Having a reliable way to do pointer subtraction between a reslice and the initial byte slice would have allowed me to stay with an Api for returning back records (reslices) which accepts just the reslice itself without having to maintain an additional “record offset” context adjacent (or encoded within) each record (reslice)."

I don't follow what you mean by the rest of that sentence, but it sounds to me like the same slice overlap/offset issue, which I addressed in my previous comment (with pointer subtraction even). If you don't think that's the case, it would be helpful to know what code you're currently writing instead, and what code you would like to write (e.g., using unsafe.Fix).

Notably, your unsafe.Fix example above is simply pointer subtraction too.

Why does the following error on playground ?:

This is getting off-topic. Please see https://github.com/golang/go/wiki/Questions.

Why? you can explicitly control which objects are locked in memory and these are clearly apparent from the argument list,
then you are free to do whatever you want with them in scope, perhaps involving other non-fix objects which effectively gives you a way to express an atomic transaction with respect to the memory locked objects and you don't need to learn and remember a different api for each programing case.

Because pinning memory is undesirable. It constrains the runtime, and we prefer to avoid that if possible. We pin memory for cgo and syscalls out of necessity.

Here there's not yet any evident need for pinning memory. Pointer subtraction seems adequate for the use cases enumerated so far.

@gitspeaks
Copy link

gitspeaks commented Aug 31, 2020

@mdempsky

You wrote:

unsafe.Fix? If the only extension needed here is for determining slice overlap, I think an unsafe.Fix operation is overkill. But if there are other use cases, it would be worthwhile to list them.

I read this as an ask for additional programing scenarios which would benefit from fixing objects in memory, thus my free list example.

Notably, your unsafe.Fix example above is simply pointer subtraction too.

Yes, and it's a key requirement for the free-list implementation I attempted to describe above. If I can determine the reslice (e.g node) position in the free list (e.g source slice) simply by subtracting it's pointer value from the base pointer value of the free list slice then I have it's "address" in the free-list. I can then use it to replace the current address of the next first element in the list.
The "address" being replaced is written in the header part of the node represented by the "returned" reslice

Why? you can explicitly control which objects are locked in memory and these are clearly apparent from the argument list,
then you are free to do whatever you want with them in scope, perhaps involving other non-fix objects which effectively gives you a way to express an atomic transaction with respect to the memory locked objects and you don't need to learn and remember a different api for each programing case.

Because pinning memory is undesirable. It constrains the runtime, and we prefer to avoid that if possible. We pin memory for cgo and syscalls out of necessity.

Here there's not yet any evident need for pinning memory. Pointer subtraction seems adequate for the use cases enumerated so far.

(1) My comment was in response to the argument made that the Fix API is "unnecessarily complicated".
(2) Of course pinning memory is undesirable! the whole point of Fix is to provide a general abstraction for cases in which locking objects in memory is a necessity. I don't see how this abstraction in itself encourages use or misuse more than any other api in the unsafe package which deals with native memory. I don't see anything in the api that hints "use me".

Here there's not yet any evident need for pinning memory. Pointer subtraction seems adequate for the use cases enumerated so far.

The Fix abstraction was introduced using a simple example in order to illustrate a general approach which addresses more complex cases like the one you described under Pointer comparison? where multiple distinct objects need to be locked atomically.

@randall77
Copy link
Contributor

I need some sort of overlap detection for #60138.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

9 participants