CUDASIMDTypes.jl
CUDASIMDTypes.BFloat16x2 — Type
struct BFloat16x2A SIMD type holding 2 BFloat16 in a combined 32-bit value.
CUDASIMDTypes.Float16x2 — Type
struct Float16x2A SIMD type holding 2 Float16 in a combined 32-bit value.
CUDASIMDTypes.Int16x2 — Type
struct Int16x2A SIMD type holding 2 16-bit integers in a combined 32-bit value.
CUDASIMDTypes.Int2x16 — Type
struct Int2x16A SIMD type holding 16 2-bit integers in a combined 32-bit value.
CUDASIMDTypes.Int2x4 — Type
struct Int2x4A SIMD type holding 4 2-bit integers in a combined 8-bit value.
CUDASIMDTypes.Int4x2 — Type
struct Int4x2A SIMD type holding 2 4-bit integers in a combined 8-bit value.
CUDASIMDTypes.Int4x8 — Type
struct Int4x8A SIMD type holding 8 4-bit integers in a combined 32-bit value.
CUDASIMDTypes.Int8x4 — Type
struct Int8x4A SIMD type holding 4 8-bit integers in a combined 32-bit value.
CUDASIMDTypes.bitifelse — Method
bitifelse(cond, x, y)Bitwise version of ifelse.
For each bit of the output, the respective bit in cond determines whether the respective bit of x or of y is selected.
CUDASIMDTypes.cvt_pack_s16 — Method
d = cvt_pack_s16(a::Int32, b::Int32)
d::UInt32
d[1] = sat(b)
d[2] = sat(a)CUDASIMDTypes.cvt_pack_s8 — Method
d = cvt_pack_s8(a::Int32, b::Int32, c::UInt32)
d::UInt32
d[1] = sat(b)
d[2] = sat(a)
d[3] = c[1]
d[4] = c[2]CUDASIMDTypes.cvt_pack_s8 — Method
d = cvt_pack_s8(a::Int32, b::Int32)
d::UInt32
d[1] = sat(b)
d[2] = sat(a)
d[3] = 0
d[4] = 0CUDASIMDTypes.dp4a — Method
d = dp4a(a::UInt32, b::UInt32, c::Int32)
d::Int32
d = a[1] * b[1] + a[2] * b[2] + a[3] * b[3] + a[4] * b[4] + cCUDASIMDTypes.lop3 — Method
lop3(a, b, c, lut)Arbitrary logical operation on 3 inputs.
Call the PTX prmt instruction. This computes a bitwise logical operation on the inputs a, b, and c.
See make_lop3_lut for creating the look-up table lut.
CUDASIMDTypes.make_lop3_lut — Method
make_lop3_lut(f)Create a look-up table for lop3.
CUDASIMDTypes.prmt — Method
prmt(a, b, op)Permute bytes bytes from a pair of inputs.
Call the PTX prmt instruction. This picks four arbitrary bytes from the input values a and b.