Thursday, May 21, 2009

SSE2 Vs SSSE3

As you already know that I am a bit busy with learning intel SIMD like mmx, sse2 , ssse3 etc stuff. I am enjoying SIMD and playing with all these MMX, SSE versions.While working with SSSE3 after sse2 or sse3, I thought what is the advantage of SSSE3 over SSE2? Some people even ask me why there is not a dramatic change in performance after adding SSSE3. I knew the answer but thought to do some more R&D on it.

And here is my view....

I will start with some brief intro of SSE versions and also as I am in video field I will talk about integer operations only that will be my primary concern as of now.

SSE2 instructions are an extension of the SIMD introduced with the MMX technology and the SSE extensions.The key benefits of SSE2 are that both MMX ans SSE2 instructions can work on 8 XMM (128-bit, XMM0- XMM7) register along with the MMX registers (mm0-mm7), and that SSE instructions now support 64-bit floating-point values. So there was huge change between MMX and SSE2(or SSE). Now because of XMM registers instead of playing with 8 bytes, we can play 16 bytes simultaneously. So improving the performance just by double from the MMX assembly or 16 times from the C code. There are some instructions we are missing in MMX assembly which are present in SSE2 like paddsb/w, movapd,movupd, pshufw/d ,pavgb/w etc, which are very much helpful here in video compression.

While SSSE3 (Supplemental Streaming SIMD Extension 3) is an extension of SSE3 or I should say revision of SSE3. In SSE3(13 new instructions) the most notable change is the capability to work horizontally in a register, as opposed to the more or less strictly vertical operation of all previous SSE instructions. There are instructions to add and subtract the multiple values stored within a single register have been added. But note those are not for Integer operations only floating point, that's why I am talking about SSSE3.

SSSE3 contains 16 new discrete instructions over SSE3. Each can act on 64-bit MMX or 128-bit XMM registers. Therefore, Intel's manuals has 32 new instructions.The instructions are PSIGNB/W/D, PABSB/W/D, PALIGNR, PSHUFB, PMULHRSW, PMADDUBSW, PHSUBW/D, PHSUBSW, PHADDW/D and PHADDSW.So if you these, the processing block or registers are same as SSE2, no new registers.

By using SSSE3 the only advantage in video compression side integer operations is horizontally processing. So by SSSE3 we can add/subtract the data within the registers instead of adding or subtracting with other registers. So I feel SSSE3 only removes some overheads and save some cycles by using horizontal operations if your video code is having that kind of module like SAD, SSD and all, but there are be many places where transition from MMX to SSE2 gives huge improvement in performance but transition from SSE2 to SSSE3 may not give you even noticeable change. Even there will be lots of functions where SSSE3 will not be required over SSE2 in code. As to work vertically (between two registers) we sometimes do some data manipulations by padding 0's or by shuffling data between registers, and then process the data like addition/multiplication etc., those shuffling or padding are overheads that can be avoided here in SSSE3.

I guess we should not think that each next generation of SIMD will just magically double the performance of the code same like MMX to SSE/SSE2. Function module (like DCT, SAD etc.) and data fetching to those functions matters a lot to decide which SIMD we should use .... SSE2 or SSSE3 for better performance. So before converting any new code from SSE2 to SSSE3, just stop for a moment, have a close look on the module and then choose SSE2 Vs SSSE3.

Enjoy SIMD optimization.

No comments:

Post a Comment