| Instruction |
Description |
MPSADBW |
Compute eight offset sums of absolute differences, four at a time (i.e., |x0−y0|+|x1−y1|+|x2−y2|+|x3−y3|, |x0−y1|+|x1−y2|+|x2−y3|+|x3−y4|, ..., |x0−y7|+|x1−y8|+|x2−y9|+|x3−y10|); this operation is important for some HD codecs, and allows an 8×8 block difference to be computed in fewer than seven cycles.[9] One bit of a three-bit immediate operand indicates whether y0 .. y10 or y4 .. y14 should be used from the destination operand, the other two whether x0..x3, x4..x7, x8..x11 or x12..x15 should be used from the source. |
PHMINPOSUW |
Sets the bottom unsigned 16-bit word of the destination to the smallest unsigned 16-bit word in the source, and the next-from-bottom to the index of that word in the source. |
PMULDQ |
Packed 32-bit signed "long" multiplication, two (1st and 3rd) out of four packed integers multiplied giving two packed 64-bit results. |
PMULLD |
Packed 32-bit signed "low" multiplication, four packed sets of integers multiplied giving four packed 32-bit results. |
DPPS, DPPD |
Dot product for AOS (Array of Structs) data. This takes an immediate operand consisting of four (or two for DPPD) bits to select which of the entries in the input to multiply and accumulate, and another four (or two for DPPD) to select whether to put 0 or the dot-product in the appropriate field of the output. |
BLENDPS, BLENDPD, BLENDVPS, BLENDVPD, PBLENDVB, PBLENDW |
Conditional copying of elements in one location with another, based (for non-V form) on the bits in an immediate operand, and (for V form) on the bits in register XMM0. |
PMINSB, PMAXSB, PMINUW, PMAXUW, PMINUD, PMAXUD, PMINSD, PMAXSD |
Packed minimum/maximum for different integer operand types |
ROUNDPS, ROUNDSS, ROUNDPD, ROUNDSD |
Round values in a floating-point register to integers, using one of four rounding modes specified by an immediate operand |
INSERTPS, PINSRB, PINSRD/PINSRQ, EXTRACTPS, PEXTRB, PEXTRD/PEXTRQ |
The INSERTPS and PINSR instructions read 8, 16 or 32 bits from an x86 register or memory location and inserts it into a field in the destination register given by an immediate operand. EXTRACTPS and PEXTR read a field from the source register and insert it into an x86 register or memory location. For example, PEXTRD eax, [xmm0], 1; EXTRACTPS [addr+4*eax], xmm1, 1 stores the first field of xmm1 in the address given by the first field of xmm0. |
PMOVSXBW, PMOVZXBW, PMOVSXBD, PMOVZXBD, PMOVSXBQ, PMOVZXBQ, PMOVSXWD, PMOVZXWD, PMOVSXWQ, PMOVZXWQ, PMOVSXDQ, PMOVZXDQ |
Packed sign/zero extension to wider types |
PTEST |
This is similar to the TEST instruction, in that it sets the Z flag to the result of an AND between its operands: ZF is set, if DEST AND SRC is equal to 0. Additionally it sets the C flag if (NOT DEST) AND SRC equals zero.
This is equivalent to setting the Z flag if none of the bits masked by SRC are set, and the C flag if all of the bits masked by SRC are set. |
PCMPEQQ |
Quadword (64 bits) compare for equality |
PACKUSDW |
Convert signed DWORDs into unsigned WORDs with saturation. |
MOVNTDQA |
Efficient read from write-combining memory area into SSE register; this is useful for retrieving results from peripherals attached to the memory bus. |