Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reflect: add a function for treating C memory as a Go slice #13656

Closed
bcmills opened this issue Dec 17, 2015 · 44 comments
Closed

reflect: add a function for treating C memory as a Go slice #13656

bcmills opened this issue Dec 17, 2015 · 44 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. v2 A language change or incompatible library change
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Dec 17, 2015

Now that cgo supports exporting Go functions to C, it would be useful for Go functions to be able to read C memory without making unnecessary copies. In particular, we would like to be able to read (and write) a C char* using functions that operate on a Go []byte (e.g. bytes.Reader or io.Reader.Read).

reflect.NewAt and reflect.ArrayOf get us partway there: we can treat the pointer to C memory as a Go pointer to an array of the correct underlying length, then use reflect.Value.Slice to convert that to a slice. Unfortunately, reflect.ArrayOf strands a small amount of memory for a unique reflect.Type for every distinct length used: http://play.golang.org/p/lc3QnMnKP_

We could consider using reflect.SliceHeader instead, but it "cannot be used safely or portably" (https://golang.org/pkg/reflect/#SliceHeader).

So it seems that we don't have an efficient solution for this conversion.

The simplest solution seems like it would be to add a function to the reflect package to construct the slice directly. Following the example of reflect.MakeSlice and reflect.NewAt, it could be something like:

    // SliceAt returns a slice value for the specified slice type, length, and capacity,
    // using p as the address of the first element and leaving its contents unchanged.
    func SliceAt(typ Type, len, cap int, p unsafe.Pointer) Value
@ianlancetaylor ianlancetaylor added this to the Go1.7 milestone Dec 17, 2015
@minux
Copy link
Member

minux commented Dec 17, 2015 via email

@bcmills
Copy link
Contributor Author

bcmills commented Dec 17, 2015

We already have C.GoBytes that converts a unsafe.Pointer
and a length to []byte, why do we need another?

  1. My understanding was that C.GoBytes makes a copy. Is that not the case?
    (If correct, that implies less efficiency - particularly for large arrays - and no mutative APIs such as io.Reader.Read.)

  2. It doesn't work for element types other than byte. (For example, you could envision a C API that wants to use an array of wchar_t or uint32_t.)

After converting the pointer to a byte slice, it's very easy to convert to []T for arbitrary T.

Not without copying the underlying array. Or am I missing something?

@minux
Copy link
Member

minux commented Dec 17, 2015 via email

@bcmills
Copy link
Contributor Author

bcmills commented Dec 18, 2015

That isn't portable: on 64-bit platforms it only expresses sizes up to 2GB. (And you can't work around it by increasing the constant because then it doesn't work on 32-bit platforms.)

It also seems likely to interact poorly with analysis tools: it temporarily produces a pointer to an "array" with elements in invalid memory.

@minux
Copy link
Member

minux commented Dec 18, 2015 via email

@bcmills
Copy link
Contributor Author

bcmills commented Dec 18, 2015

  1. Expecting CGo users in general to come up with those constants on their own is fairly user-hostile, when we could just provide a function that does it correctly.

  2. Where in the spec does it say that pointers to arrays with arbitrary lengths are valid? gc is not the spec, and gc is also not the only Go implementation. (One could easily envision a compiler that, say, sanity-checks that all pointers to arrays point to addresses that are mapped in the program's address space.)

@bcmills
Copy link
Contributor Author

bcmills commented Dec 18, 2015

Also, those constants aren't even correct on amd64:

$ cat test/foo.go

package main

import (
        "fmt"
        "reflect"
        "unsafe"
)

func main() {
        s := []byte("Hello, ⚛")
        n := len(s)
        p := reflect.ValueOf(s).Pointer()

        s2 := (*[1<<(^uint(0)>>27&63)/unsafe.Sizeof([]byte{}) - 1]byte)(p)[:n:n]
        fmt.Println(string(s2))
}
$ go run test/foo.go
# command-line-arguments
test/foo.go:14: type [384307168202282324]byte larger than address space
test/foo.go:14: cannot convert p (type uintptr) to type *[384307168202282324]byte

That comes from here:
https://tip.golang.org/src/cmd/compile/internal/gc/align.go?h=address+space

To me, the fact that even you - someone deeply familiar with the Go language and its implementations - couldn't get those constants right strongly suggests that we should not expect more casual cgo users to do so.

@randall77
Copy link
Contributor

I agree that we should probably find a solution to this problem. We want to make cgo easier & safer.

Using SliceHeader is ok, that "cannot be used safely" warning is aimed at people trying to store Go pointers in its Data field. Storing C pointers in the Data field should be ok. The rest of the warning is basically the same as the one for unsafe.Pointer.
It's still tricky, though, you'd have to do:

h := reflect.SliceHeader{uintptr(p), n, n}
s := *(*[]T)(unsafe.Pointer(&h))

(In an ideal world, SliceHeader.Data should be an unsafe.Pointer, not a uintptr. Too late now.)

I'm hesitant to put the fix in reflect, it is too heavyweight. Maybe just something in package unsafe?

// MakeSlice makes a slice out of ptr/len/cap and stores it to s.
func MakeSlice(s, ptr unsafe.Pointer, len, cap int)

@bcmills
Copy link
Contributor Author

bcmills commented Dec 18, 2015

Using SliceHeader is ok, that "cannot be used safely" warning is aimed at people trying to store Go pointers in its Data field. Storing C pointers in the Data field should be ok.

If that's the case, then just a doc fix might be sufficient. (I suggested a similar approach in some code I was reviewing and @ianlancetaylor said not to do it.)

I'm hesitant to put the fix in reflect, it is too heavyweight. Maybe just something in package unsafe?

Maybe? But unsafe doesn't otherwise have anything to do with built-in data structures like slices, and it's not entirely clear to me how you'd get from that to a value of slice type. Would you convert "s" to a pointer-to-slice, or to some other type (e.g. the slice itself)? If it's a pointer-to-slice, is the slice header allocated on the Go heap?

reflect.Value.Interface() at least makes the answers to those questions consistent, which is why it seems like a more natural fit to me - and SliceAt isn't that far off from the existing reflect.MakeSlice and reflect.NewAt.

@bcmills
Copy link
Contributor Author

bcmills commented Dec 18, 2015

In particular, this snippet of existing reflect code would be equivalent to SliceAt modulo the memory leakage for the reflect.ArrayOf call:

    func SliceAt(typ reflect.Type, len, cap int, p unsafe.Pointer) reflect.Value {
      return reflect.NewAt(reflect.ArrayOf(len, typ.Elem()), p).Elem().Slice(0, len)
    }

@randall77
Copy link
Contributor

You'd do something like this:

var s []T
unsafe.MakeSlice(unsafe.Pointer(&s), ptr, len, cap)
...use s...

Doing it this way gets around the polymorphic return type you would need if MakeSlice returned a slice, or the allocation you would need if MakeSlice returned an unsafe.Pointer to a slice.

I agree that this is functionality that unsafe hasn't had historically. But I think it would be the right place for this operation. On the other hand, reflect.NewAt has a similar feel, and maybe both it and MakeSlice should be together.

@ianlancetaylor
Copy link
Contributor

I don't see how to use reflect.SliceHeader safely unless we exempt it from the Go 1 guarantee or promise that we will never change the representation of slices. Since in the past we've discussed changing the representation of slices, it seems unwise to commit to the current representation for all time and for all implementations. Same for reflect.StringHeader, of course.

@randall77
Copy link
Contributor

To use reflect.SliceHeader you need to use unsafe, and unsafe is exempt from the Go 1 guarantee.

@oec
Copy link

oec commented Dec 21, 2015

FWIW, this is not just a cgo related issue, it already occurs in some of the syscal-wrappers. For example, an implementation of syscall.Mmap needs to do basically the same construction. It would be helpful if there is an API - even under unsafe - available that hides the implementation details and memory-layout for slices away. (Assuming the API doesn't need to change or can be augmented with an refined version).

@rsc rsc modified the milestones: Go1.8, Go1.7 May 18, 2016
@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 10, 2016
@rsc rsc added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsFix The path to resolution is known, but the work has not been done. labels Oct 13, 2016
@rsc
Copy link
Contributor

rsc commented Oct 13, 2016

This doesn't seem exactly what reflect is for. And it's not specific to cgo either: this comes up in system calls like mmap. I could see possibly adding something like unsafe.Slice(p, n, m) with the right types (p is pointer to T, result is []T). That would be a language change.

I guess we could add reflect.UnsafeSlice(t, p, n, m) to return an interface{} or a Value holding a slice of type t with pointer p, len n, cap m. It's not really reflect but maybe it's fine. I guess it's more reflect than Swapper, although that's not a great argument.

Personally, I don't mind writing *(*[1<<30]T)(unsafe.Pointer(p))[:n:m].

@rsc
Copy link
Contributor

rsc commented Oct 13, 2016

I think it's too late to do this for Go 1.8 at least.

@rsc rsc modified the milestones: Go1.9, Go1.8, Go1.9Early Oct 13, 2016
@bradfitz
Copy link
Contributor

@bcmills, are you still interested in this?

@bcmills
Copy link
Contributor Author

bcmills commented Mar 17, 2017

Yes. (But I'm not sure whether I'll have the bandwidth to implement it for 1.9.)

@rsc
Copy link
Contributor

rsc commented May 22, 2017

It's #19367 that I'm thinking of. That's on hold for Go 2, so probably this should be too. (That would address this exact need.)

@rsc rsc added the v2 A language change or incompatible library change label May 22, 2017
@bcmills
Copy link
Contributor Author

bcmills commented May 22, 2017

FWIW, I have implemented a library locally using reflect and based on the comments above.
Oddly, it appears that reflect does not enforce the same limits on array sizes as the compiler.

My solution for reflect would be roughly:

// SliceAt returns a view of the memory at p as a slice of elem.
// The elem parameter is the element type of the slice, not the complete slice type.
func SliceAt(elem Type, p unsafe.Pointer, n int) Value {
	if p == nil && n == 0 {
		return Zero(SliceOf(elem))
	}
	return NewAt(bigArrayOf(elem), p).Elem().Slice3(0, n, n)
}

// bigArrayOf returns the type of a maximally-sized array of t.
//
// This works around the memory-stranding issue described in
// https://golang.org/issue/13656 by producing only one array type per element
// type (instead of one array type per length).
func bigArrayOf(t Type) Type {
	n := ^uintptr(0) / uintptr(t.Size())
	const maxInt = uintptr(^uint(0) >> 1)
	if n > maxInt {
		n = maxInt
	}
	return ArrayOf(int(n), t)
}

[Edit: based on #19367 (comment), I would recommend a somewhat different approach, available as github.com/bcmills/unsafeslice.SetAt.]

@jimmyfrasche
Copy link
Member

The problem is having to write (*[C]T)(unsafe.Pointer(v))[:n:n], for some C, T, v, and n.

A simpler way to write that entire expression would be ideal.

If that can't be fixed before Go2, perhaps an easier problem could be.

A lot of people much smarter than I am have written the constant C many different ways in this thread. It's the most magic part of the expression and the part that I am the least confident I am writing (well, copy-and-pasting) correctly.

If cgo (or runtime or somewhere appropriate) exported a constant with the correct value of C that is the largest possible C for the current platform that would, at least, make it easier to write portable, correct code.

I would much rather a solution to the whole problem be found and am only suggesting this if there is no better solution that can be implemented before Go2.

@ianlancetaylor
Copy link
Contributor

@jimmyfrasche It's an interesting idea but the maximum value for C depends on the size of T.

@jimmyfrasche
Copy link
Member

@ianlancetaylor if it's the maximum for the platform assuming a size of 1 you could use it to compute the correct value for arbitrary T. Still not pretty and far from ideal, but its use could be documented and explained on the wiki (which currently just has 1 << 30 without explanation now https://github.com/golang/go/wiki/cgo#turning-c-arrays-into-go-slices )

@bradfitz bradfitz added early-in-cycle A change that should be done early in the 3 month dev cycle. and removed early-in-cycle A change that should be done early in the 3 month dev cycle. labels Jun 14, 2017
@bradfitz bradfitz modified the milestones: Go1.10Early, Go1.10 Jun 14, 2017
@rsc
Copy link
Contributor

rsc commented Jun 16, 2017

Going to close this since it seems clear that anything we do in reflect is going to be awful. Better to wait for #19367.

@gobwas
Copy link

gobwas commented Aug 26, 2019

Hi @rsc @ianlancetaylor, I am also curious about the specification of the GC behavior when it does not interacts with slices/strings which Data field points to the C-allocated memory.

I have only found this paragraph from the cgo wiki.

This question relates this comment by @bcmills :

Where in the spec does it say that pointers to arrays with arbitrary lengths are valid? gc is not the spec, and gc is also not the only Go implementation. (One could easily envision a compiler that, say, sanity-checks that all pointers to arrays point to addresses that are mapped in the program's address space.)

Thanks!

@ianlancetaylor
Copy link
Contributor

@gobwas The issue tracker is not the right place for this kind of discussion, especially not on a closed issue. Please use a forum, probably golang-nuts. See https://golang.org/wiki/Questions. Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. v2 A language change or incompatible library change
Projects
None yet
Development

No branches or pull requests