Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: simd/m128: single instruction multiple data API #60149

Open
qiulaidongfeng opened this issue May 12, 2023 · 10 comments
Open

proposal: simd/m128: single instruction multiple data API #60149

qiulaidongfeng opened this issue May 12, 2023 · 10 comments
Labels
Milestone

Comments

@qiulaidongfeng
Copy link
Contributor

qiulaidongfeng commented May 12, 2023

Background

Discussed the issue #53171 of adding advanced APIs using SIMD instructions, which was closed due to lack of clear plans.

See also #53171 (comment)

Proposal

Inspired by C++' s intrinsics function using specific types (such as __m128) to represent data that can be used for single instruction multiple data operation), a simd/m128 package can be added to realize 128-bit single instruction multiple data operation API.

The proposed API is as follows:

// Package m128 provides a 128 bit single instruction multiple data operation API 
package m128

// M128 represents n data of type T, and the total length of M128 is 128 bits
// n=unsafe.Sizeof(M128)/unsafe.Sizeof(T)
type M128[T Num] struct

// Num Single Instruction Multiple Data Operable Data Types
type Num interface{
int8|int16|int32|int64|uint8|uint16|uint32|uint64|float32|float64
}

// Set Set M128 to data of type T starting from offset
// The value of offset must be a multiple of unsafe.Sizeof(T) and less than unsafe.Sizeof(M128)
func (mem *M128[T])Set(offect int8value T)

// Get Get the data of type T starting from M128 offset
// The value of offset must be a multiple of unsafe.Sizeof(T) and less than unsafe.Sizeof(M128)
func (mem *M128[T])Get(offect int8) T

// Store copies all data from src into the memory of unsafe.Add(unsafe.Pointer(mem),offset)
// The value of offset must be a multiple of unsafe.Sizeof(T) and less than unsafe.Sizeof(M128).
// offset+len(src)*unsafe.Sizeof(T) must be less than unsafe.Sizeof(M128).
func (mem *M128[T])Store(offset int8,src []T)

// Load sets dst to the M128 data starting with offset
// The value of offset must be a multiple of unsafe.Sizeof(T) and less than unsafe.Sizeof(M128).
// offset+len(dst)*unsafe.Sizeof(T) must be less than unsafe.Sizeof(M128).
func (mem *M128[T])Load(offset int8, dst []T)

// Add treats parameters a and b as n data of type T for single instruction multiple data addition
// n=unsafe.Sizeof(M128)/unsafe.Sizeof(T)
func Add[T Num](a,b M128[T])M128

// Sub treats parameters a and b as n data of type T for single instruction multiple data subtraction
// n=unsafe.Sizeof(M128)/unsafe.Sizeof(T)
func Sub[T Num](a,b M128[T])M128

// Mul considers parameters a and b as n data of type T for single instruction multiple data multiplication
// n=unsafe.Sizeof(M128)/unsafe.Sizeof(T)
func Mul[T Num](a,b M128[T])M128

// Div treats parameters a and b as n data of type T for single instruction multiple data division
// n=unsafe.Sizeof(M128)/unsafe.Sizeof(T)
func Div[T Num](a,b M128[T])M128

according to

#53171 (comment) and #53171 (comment)

The above APIs are arranged by the compiler for appropriate implementation,

For the ease of use of the API, SIMD operations use functions instead of methods

@gopherbot gopherbot added this to the Proposal milestone May 12, 2023
@seankhliao seankhliao changed the title proposal: New package simd/m128, adding single instruction multiple data API proposal: simd/m128: single instruction multiple data API May 12, 2023
@merykitty
Copy link

Some notes:

  • A vector is a value, not dissimilar to an int, as a result, it is illogical to change a value, you can change a variable containing a value, but the value itself is immutable. As a result, the Set method should be Set(vec M128[T], offset int, value T) M128[T].

  • In addition, the most basic operation on a vector is to load/store them, so there should be Store(dst []T, offset int, vec M128[T]) and Load(src []T, offset int) M128[T].

@ianlancetaylor
Copy link
Contributor

// The value of offset must be a multiple of 8 and less than unsafe. Sizeof(M128)

I'm not sure I understand this. You say that the total length of M128 is always 128 bits, so we know that unsafe.Sizeof(M128) is 16. So if offset has to be a multiple of 8 then it can only be 0 or 8. Is that what you mean?

@qiulaidongfeng
Copy link
Contributor Author

// The value of offset must be a multiple of 8 and less than unsafe. Sizeof(M128)

I'm not sure I understand this. You say that the total length of M128 is always 128 bits, so we know that unsafe.Sizeof(M128) is 16. So if offset has to be a multiple of 8 then it can only be 0 or 8. Is that what you mean?

This is the imperfection of the proposal, and I have revised it to the following statement:

The value of offset must be a multiple of unsafe.Sizeof(T) and less than unsafe.Sizeof(M128)

@qiulaidongfeng
Copy link
Contributor Author

Some notes:

  • A vector is a value, not dissimilar to an int, as a result, it is illogical to change a value, you can change a variable containing a value, but the value itself is immutable. As a result, the Set method should be Set(vec M128[T], offset int, value T) M128[T].
  • In addition, the most basic operation on a vector is to load/store them, so there should be Store(dst []T, offset int, vec M128[T]) and Load(src []T, offset int) M128[T].
  1. M128 represents n data of type T, and the total length of M128 is 128 bits,
func (mem *M128[T])Set(offect int8value T)

This is just changing one of these n T-type data

2.Regarding the API for loading and storing, taking storage as an example:

Store(dst []T, offset int, vec M128[T])

Can be written as:

for i:=int8(0);i<len(dst);i++{
vec.Set(int8(unsafe.Sizeof(T))*i+int8(offset),dst[i])
}

It seems that this can be regarded as a syntactic sugar that calls the Set or Get method continuously.

@qiulaidongfeng
Copy link
Contributor Author

qiulaidongfeng commented May 13, 2023

Some notes:

  • A vector is a value, not dissimilar to an int, as a result, it is illogical to change a value, you can change a variable containing a value, but the value itself is immutable. As a result, the Set method should be Set(vec M128[T], offset int, value T) M128[T].
  • In addition, the most basic operation on a vector is to load/store them, so there should be Store(dst []T, offset int, vec M128[T]) and Load(src []T, offset int) M128[T].
  1. M128 represents n data of type T, and the total length of M128 is 128 bits,
func (mem *M128[T])Set(offect int8value T)

This is just changing one of these n T-type data

2.Regarding the API for loading and storing, taking storage as an example:

Store(dst []T, offset int, vec M128[T])

Can be written as:

for i:=int8(0);i<len(dst);i++{
vec.Set(int8(unsafe.Sizeof(T))*i+int8(offset),dst[i])
}

It seems that this can be regarded as a syntactic sugar that calls the Set or Get method continuously.

Rethinking
In order to avoid the need to use the unsafe package when calling Set continuously, the proposal has added
// Store sets M128 data to dst starting from offset
// The value of offset must be a multiple of unsafe.Sizeof(T) and less than unsafe.Sizeof(M128)
// offset+len(dst)*unsafe.Sizeof(T) must be less than unsafe.Sizeof(M128)
func (mem *M128[T])Store(dst []T, offset int)

@merykitty,I don't quite understand the semantics of Load (src [] T, offset int) M128[T] you wrote. If you use src data to generate M128[T], this is the function of Store.

@merykitty
Copy link

merykitty commented May 13, 2023

1, In vector world, to change an element, you take a vector, and return a new vector with one element changed. This is similar to how you flip a bit in an int: x = y ^ mask, this leaves y unchanged and returns a new int which has some bits flipped, this new int is stored into the variable x. That's why the function I propose is func Set(vec m128[T], offset int, value T) m128[T]. The signature of a corresponding C++ intrinsics is __m256i _mm256_insert_epi32(__m256i a, __int32 i, const int index), and this is also how the machine work: vpinsrd dst, src, val, idx puts into dst a vector similar to src with the element at idx changed to val.

2, How do you use Store to load a vector when you need a vector to invoke Store? Please provide a simple example for example addition of 2 slices into a third one using this api.

@qiulaidongfeng
Copy link
Contributor Author

2, How do you use Store to load a vector when you need a vector to invoke Store? Please provide a simple example for example addition of 2 slices into a third one using this api.

It seems that I misunderstood your expression. I thought Load was reading [] T from M128, but what you wanted to express seems to be loading [] T into M128

@qiulaidongfeng
Copy link
Contributor Author

1, In vector world, to change an element, you take a vector, and return a new vector with one element changed. This is similar to how you flip a bit in an int: x = y ^ mask, this leaves y unchanged and returns a new int which has some bits flipped, this new int is stored into the variable x. That's why the function I propose is func Set(vec m128[T], offset int, value T) m128[T]. The signature of a corresponding C++ intrinsics is __m256i _mm256_insert_epi32(__m256i a, __int32 i, const int index), and this is also how the machine work: vpinsrd dst, src, val, idx puts into dst a vector similar to src with the element at idx changed to val.

M128 [T] is just a 128 bit continuous memory that can be used for SIMD operations, storing n data of type T.

func (mem *M128[T])Set(offect int8,value T)Equivalent to the following code:

if offset>unsafe.Sizeof(M128)||offset%unsafe.Sizeof(T)!=0{//Check if the offset is greater than unsafe. Sizeof (M128) or not a multiple of unsafe. Sizeof (T)
//panic or runtime.throw
}
ptr:=unsafe.Pointer(mem)
ptr=unsafe.Add(ptr,offset)
*(*T)(ptr)=value

@qiulaidongfeng
Copy link
Contributor Author

After consideration, the proposal proposes to add 2 APIs for loading or storing M128 [T] using [] T:

// Store copies all data from src into the memory of unsafe.Add(unsafe.Pointer(mem),offset)
// The value of offset must be a multiple of unsafe.Sizeof(T) and less than unsafe.Sizeof(M128).
// offset+len(src)*unsafe.Sizeof(T) must be less than unsafe.Sizeof(M128).
func (mem *M128[T])Store(offset int8,src []T)

// Load sets dst to the M128 data starting with offset
// The value of offset must be a multiple of unsafe.Sizeof(T) and less than unsafe.Sizeof(M128).
// offset+len(dst)*unsafe.Sizeof(T) must be less than unsafe.Sizeof(M128).
func (mem *M128[T])Load(offset int8, dst []T)

@cristaloleg
Copy link

Should this issue be added to the review minutes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

5 participants