Hi Everybody!!!!

This is the last week of year 2007. The whole world is celebrating X-mas, everyone is so excited to welcome the new year. People are too busy in shopping and trying to bag more n more free offers by paying festival price like purchasing a pack of 4 soaps (3+1 free) of each 75 gms. in 75/- only instead of buying 3 soaps of each 100 gms. in 60/- or buy 2 get 1 free offer in cloths; point to be mentioned here that offer is NOT valid for new stocks, just be aware of * sign with attached sentence "*Condition Apply".

If you are thinking that people are busy in shopping and arranging the new year parties , so I will just say you are having the bull's eye not the omelet. I am smelling again the fragrance of New Year Resolutions in the air. So the very first question comes in my mind ..."why do we do new year resolution?" And the natural answer comes "To make our life better". Some persons speak loudly about their resolution and some think deeply in their heart but dont want to share the secret. May be because of fear of failure but anyway that's sure that all are thinking about it.

If you see mathematically the number of new year resolution is directly proportional to the homo sapiens population, dont think much over this word, you also belongs to this category... :) In resolutions people generally want to change/adapt particular habit like some want to quit smoking or drinking, some resolve to work hard or to earn that much money, some resolve to learn new art or instruments and some make a time table for them and resolve to follow it whole year/life etc. etc. If I remember my new year resolutions, it were almost the same I mentioned before. But in my case the very interesting thing is every year I remember my new year resolution just for two weeks that the last week of ending year and first week of new year. After that everything is same as previous. Please dont think that this happens with me only; around 90% of new year resolutions go like that, please dont ask me about the remaining percents... :0

Actually we should think each day as a new year as a new life. And each day we should think that "how can I make it better". Plan your day; every morning, try to execute in accordingly.Just make each day is your day, the year will come automatically and same with the life.So my new year resolution is "There will be No New Year Resolution, just enjoy the life naturally".

Wish you Happy New year.

bbye and think naturally.

## Friday, December 28, 2007

### No New Year Resolution...

## Thursday, December 27, 2007

### Optimized Huffman Coding

Introduction :

Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable length code table for encoding a source symbol (such as a character in a file) where the variable -length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. It was developed by David A. Huffman. Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a pre Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix-free code (that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common characters using shorter strings of bits than are used for less common source symbols [1].

There are various ways to decode the Huffman code either by arithmetic operation or look up table. Each method has various advantages over to one like arithmetic operation takes less memory space, less memory reference but arithmetic processor must be capable to handle that mathematical operation efficiently while the look up table method is fast but require more space in memory and need many memory references.Here optimized ways of decoding the Huffman code by look up table method are explained.

The basic properties of Huffman code is prefix free code and the more frequent symbols has less codelength as compared the other symbols which is used less frequently. In this manner Huffman code optimized the memory requirement but this advantage become problem while decoding by look up table method. As the symbols having codelength less than the bits read to be decoded the symbols have multiple entries in the table that we have to take care while decoding. The example is taken for Huffman decoding is shown in table1 [2].

Table 1 Symbols and codewords

Char(Symbols) Codes(in Binary)

A ----> 010

B ----> 0000

C ----> 0001

D ----> 011

E ----> 10

F ----> 0010

G ----> 0011

H ----> 11

While decoding by the table look up method there will be 16 entries of symbols. The table itself will tell you what symbol you decoded and how many bits you used. The table will be as follows:-

index 0000 would contain B,4

index 0001 C,4

...

index 1000 to 1011 would all contain E,2

index 1100 to 1111 would all contain H,2

Table 2 Table representation of codewords

---------------------------------------------------------

| 0000(B,4) | 0001(C,4) | 0010(F,4) | 0011(G,4)|

| 0100(A,3) | 0101(A,3) | 0110(D,3) | 0111(D,3)|

| 1000(E,2) | 1001(E,2) | 1010(E,2) | 1011(E,2) |

| 1100(H,2) | 1101(H,2) | 1110(H,2) | 1111(H,2) |

----------------------------------------------------------

And the input bitstream to be decoded will be 010 0000 0001 011 10 0010 0011 11 0000 10,so the output symbols should be ‘ABCDEFGHBE’. There are various other practical ways of huffman coding implementation just refer the links given below. But I wish to discuss only the efficient way of implementation.The various ways of Huffman decoding are as follows:-

1)Single level Table Decoding

When we carefully observe the code we find that we can decode the codes by single table.

Table 4.5 Single level table

Table Index Symbol Codelength

0 ----> B ----> 4

1 ----> C ----> 4

2 ----> F ----> 4

3 ----> G ----> 4

4 ----> A ----> 3

5 ----> A ----> 3

6 ----> D ----> 3

7 ----> D ----> 3

8 ----> E ----> 2

9 ----> E ----> 2

10 ----> E ----> 2

11 ----> E ----> 2

12 ----> H ----> 2

13 ----> H ----> 2

14 ----> H ----> 2

15 ----> H ----> 2

Here we have made a table having two values the symbol and codelength of that symbol with each table index. Now the algorithm is as follows:

1.Each time we will read the fixed number of bits from the input bitstream; that is equal to maximum codelength of all the codewords and the bit pointer is at location 0.

2.Now the bits read will be used as the index for the table, which will directly give the output symbol for that code and corresponding codelength.

3.Now the bit pointer will be updated as bit pointer + = codelength.

For the given input bitstream the steps will be

1.As per table 1 the maximum codelength will be 4, so each time we will read 4 bits. Say for start it will be 0100. The bit pointer is at location 0.

2.So by using it as table index the output symbol will be ‘A’ and the corresponding codelength will be 3.

3.Now the new bit pointer location will be bit pointer +=3.

4.Now the next 4 bits will be 0000 and process will continue so on till the end of bitstream.

The basic drawback of this method is the table size as large memory space is required. It is suitable only for small codelength codewords not for the big codelength like CAVLC codes of H.264/AVC.

2)Efficient Automated Merged Table:

This is the efficient/optimized way of implementation.Actually before this , I tried various other algorithms but finally come to this implementation, and implemented in my project work on that huffman decoding was taking around 30% of overall computational complexity.

Successive number of bits to read can be automated i.e. bits to be read is decided by the algorithm and the table is arranged accordingly. For decoding we need the value of how many maximum numbers of bits to be read as Max Bit Read. The table contents the symbols and the corresponding codelengths but with some modifications. Now the algorithm will be as follows:

1.Firstly read the n number of bits is equal to Max Bit Read so Bit Read = Max Bit Read.

2.Use the value as table index to get symbol and the corresponding codelength. If the codelength is not equal to -1(Flag) means the symbol is valid, and the bit pointer is modified accordingly.

3.If the codelength is equal to -1 then more bits are required to be read. The corresponding symbol will be considered as table offset value for table index. And next number of bits to be read is calculated as Bit Read = (Bit Read+1) >>1, now the Bit Read number of bits will be read. And the value is used as table index as table index = table offset + value, to decode the codelength for corresponding table index.

4.If the codelength is not equal to -1 then we will precede as step 2 otherwise step 3 until we get proper codelength of particular symbol.

The example for this method for our input bitstream will be explained here. For that we can use table 3 with the Max Bit Read equal to 2.

Table 3 Automated Merged table for Max Bit Read =2

Table Index Symbol Codelength

0 ----> 4 ----> -1

1 ----> 6 ----> -1

2 ----> E ----> 2

3 ----> H ----> 2

4 ----> 8 ----> -1

5 ---->10 ----> -1

6 ----> A ----> 3

7 ----> D ----> 3

8 ----> B ----> 4

9 ----> C ----> 4

10 ----> F ----> 4

11 ----> G ----> 4

1.In the very first read the Max Bit Rate = Bit Read number of bits. So here Max Bit Read is 2, we will read first two bits that are 1 (01).

2.Use the value as table index; we get the symbol as 6 and codelength -1. As the codelength is -1 hence we have to read some more bits to decode and 6 will be table offset.

3.Calculate the next Bit Read as Bit Read = (Bit Read+1)>>1 = (2+1)>>1 =1.Now next 1 bit will be read and the value is 0 (0).

4.Now again calculate the table index as table index = table offset + value =6+0 =6.

5.Again table is read with new table index (6) and corresponding codelength is read that is 3 i.e. the codelength and corresponding symbol ‘A’ are valid now no need to read more bits. Modify the bit pointer accordingly.

6.Now read next 2 (Max Bit Read) bits to decode the next codeword that are 0 (00).

7.When we use this as table index; we get the symbol as 4 and the codelength equal to -1 hence we need to read more bits to decode the next codeword.

8.Calculate the next Bit Read = (2+1)>>1 = 1. Hence the next 1 bit value is 0 (0).

9.Now again calculate the table index as table index = 4+0 = 4.

10.Table is read with table index the corresponding codelength is again -1 and the symbol is 8. So we need more bits to read and now table offset is equal to 8.

11.Calculate the next Bit Read = (1+1)>>1 =1. Hence the next 1 bit value is 0(0).

12.Again calculate the table index as table index = 8+0 =8.

13.Use table index, we get codelength is 4 and the symbol is ‘B’ which is now valid symbol. Modify the bit pointer accordingly.

14.Now follow from the step 6 and so on until the end of bitstream.

This method is most efficient method is and independent of Max Bit Read i.e. this method is applicable for our example of Max Bit Read = 1, 2, 3, 4.

Note:In the all table look up methods we have to take care while decoding the last bits whenever Max Bit Read is greater than the last remaining bits to read. So insert 0’s towards LSB and then read the bitstream.

Conclusion:

The algorithm discussed here is the generalized efficient/optimized huffman decoding as it takes less memory area as compared to single level table decoding method (just apply the algo for max codelength of 16/24/32 bits, you will get huge memory saving) with keeping the accessing the code property of single level table. In below I am putting the quantitative analysis for the memory and complexity.

Table 4 Performance Analysis

1)Single level table 2^N (Memory consumption) , 2^N search for Worst case.

2)Automated merged table 2k + A.2k/2+ B. 2k/4 +… (Memory consumption), 2^k for Good case and 2^(k+k/2+k/4…) for Worst case(search complexity).

Where N is codelength in bits and k + m + n +… = N. And A, B,…….= Unidentified codes in the previous search.

Useful Links:

1:Huffman coding http://en.wikipedia.org/wiki/Huffman_code

2:Practical Huffman coding http://www.compressconsult.com/huffman/

Enjoy this techie discussion.....

Wait for new post in this catogory.

Till then bbye n think naturally.

Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable length code table for encoding a source symbol (such as a character in a file) where the variable -length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. It was developed by David A. Huffman. Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a pre Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix-free code (that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common characters using shorter strings of bits than are used for less common source symbols [1].

There are various ways to decode the Huffman code either by arithmetic operation or look up table. Each method has various advantages over to one like arithmetic operation takes less memory space, less memory reference but arithmetic processor must be capable to handle that mathematical operation efficiently while the look up table method is fast but require more space in memory and need many memory references.Here optimized ways of decoding the Huffman code by look up table method are explained.

The basic properties of Huffman code is prefix free code and the more frequent symbols has less codelength as compared the other symbols which is used less frequently. In this manner Huffman code optimized the memory requirement but this advantage become problem while decoding by look up table method. As the symbols having codelength less than the bits read to be decoded the symbols have multiple entries in the table that we have to take care while decoding. The example is taken for Huffman decoding is shown in table1 [2].

Table 1 Symbols and codewords

Char(Symbols) Codes(in Binary)

A ----> 010

B ----> 0000

C ----> 0001

D ----> 011

E ----> 10

F ----> 0010

G ----> 0011

H ----> 11

While decoding by the table look up method there will be 16 entries of symbols. The table itself will tell you what symbol you decoded and how many bits you used. The table will be as follows:-

index 0000 would contain B,4

index 0001 C,4

...

index 1000 to 1011 would all contain E,2

index 1100 to 1111 would all contain H,2

Table 2 Table representation of codewords

---------------------------------------------------------

| 0000(B,4) | 0001(C,4) | 0010(F,4) | 0011(G,4)|

| 0100(A,3) | 0101(A,3) | 0110(D,3) | 0111(D,3)|

| 1000(E,2) | 1001(E,2) | 1010(E,2) | 1011(E,2) |

| 1100(H,2) | 1101(H,2) | 1110(H,2) | 1111(H,2) |

----------------------------------------------------------

And the input bitstream to be decoded will be 010 0000 0001 011 10 0010 0011 11 0000 10,so the output symbols should be ‘ABCDEFGHBE’. There are various other practical ways of huffman coding implementation just refer the links given below. But I wish to discuss only the efficient way of implementation.The various ways of Huffman decoding are as follows:-

1)Single level Table Decoding

When we carefully observe the code we find that we can decode the codes by single table.

Table 4.5 Single level table

Table Index Symbol Codelength

0 ----> B ----> 4

1 ----> C ----> 4

2 ----> F ----> 4

3 ----> G ----> 4

4 ----> A ----> 3

5 ----> A ----> 3

6 ----> D ----> 3

7 ----> D ----> 3

8 ----> E ----> 2

9 ----> E ----> 2

10 ----> E ----> 2

11 ----> E ----> 2

12 ----> H ----> 2

13 ----> H ----> 2

14 ----> H ----> 2

15 ----> H ----> 2

Here we have made a table having two values the symbol and codelength of that symbol with each table index. Now the algorithm is as follows:

1.Each time we will read the fixed number of bits from the input bitstream; that is equal to maximum codelength of all the codewords and the bit pointer is at location 0.

2.Now the bits read will be used as the index for the table, which will directly give the output symbol for that code and corresponding codelength.

3.Now the bit pointer will be updated as bit pointer + = codelength.

For the given input bitstream the steps will be

1.As per table 1 the maximum codelength will be 4, so each time we will read 4 bits. Say for start it will be 0100. The bit pointer is at location 0.

2.So by using it as table index the output symbol will be ‘A’ and the corresponding codelength will be 3.

3.Now the new bit pointer location will be bit pointer +=3.

4.Now the next 4 bits will be 0000 and process will continue so on till the end of bitstream.

The basic drawback of this method is the table size as large memory space is required. It is suitable only for small codelength codewords not for the big codelength like CAVLC codes of H.264/AVC.

2)Efficient Automated Merged Table:

This is the efficient/optimized way of implementation.Actually before this , I tried various other algorithms but finally come to this implementation, and implemented in my project work on that huffman decoding was taking around 30% of overall computational complexity.

Successive number of bits to read can be automated i.e. bits to be read is decided by the algorithm and the table is arranged accordingly. For decoding we need the value of how many maximum numbers of bits to be read as Max Bit Read. The table contents the symbols and the corresponding codelengths but with some modifications. Now the algorithm will be as follows:

1.Firstly read the n number of bits is equal to Max Bit Read so Bit Read = Max Bit Read.

2.Use the value as table index to get symbol and the corresponding codelength. If the codelength is not equal to -1(Flag) means the symbol is valid, and the bit pointer is modified accordingly.

3.If the codelength is equal to -1 then more bits are required to be read. The corresponding symbol will be considered as table offset value for table index. And next number of bits to be read is calculated as Bit Read = (Bit Read+1) >>1, now the Bit Read number of bits will be read. And the value is used as table index as table index = table offset + value, to decode the codelength for corresponding table index.

4.If the codelength is not equal to -1 then we will precede as step 2 otherwise step 3 until we get proper codelength of particular symbol.

The example for this method for our input bitstream will be explained here. For that we can use table 3 with the Max Bit Read equal to 2.

Table 3 Automated Merged table for Max Bit Read =2

Table Index Symbol Codelength

0 ----> 4 ----> -1

1 ----> 6 ----> -1

2 ----> E ----> 2

3 ----> H ----> 2

4 ----> 8 ----> -1

5 ---->10 ----> -1

6 ----> A ----> 3

7 ----> D ----> 3

8 ----> B ----> 4

9 ----> C ----> 4

10 ----> F ----> 4

11 ----> G ----> 4

1.In the very first read the Max Bit Rate = Bit Read number of bits. So here Max Bit Read is 2, we will read first two bits that are 1 (01).

2.Use the value as table index; we get the symbol as 6 and codelength -1. As the codelength is -1 hence we have to read some more bits to decode and 6 will be table offset.

3.Calculate the next Bit Read as Bit Read = (Bit Read+1)>>1 = (2+1)>>1 =1.Now next 1 bit will be read and the value is 0 (0).

4.Now again calculate the table index as table index = table offset + value =6+0 =6.

5.Again table is read with new table index (6) and corresponding codelength is read that is 3 i.e. the codelength and corresponding symbol ‘A’ are valid now no need to read more bits. Modify the bit pointer accordingly.

6.Now read next 2 (Max Bit Read) bits to decode the next codeword that are 0 (00).

7.When we use this as table index; we get the symbol as 4 and the codelength equal to -1 hence we need to read more bits to decode the next codeword.

8.Calculate the next Bit Read = (2+1)>>1 = 1. Hence the next 1 bit value is 0 (0).

9.Now again calculate the table index as table index = 4+0 = 4.

10.Table is read with table index the corresponding codelength is again -1 and the symbol is 8. So we need more bits to read and now table offset is equal to 8.

11.Calculate the next Bit Read = (1+1)>>1 =1. Hence the next 1 bit value is 0(0).

12.Again calculate the table index as table index = 8+0 =8.

13.Use table index, we get codelength is 4 and the symbol is ‘B’ which is now valid symbol. Modify the bit pointer accordingly.

14.Now follow from the step 6 and so on until the end of bitstream.

This method is most efficient method is and independent of Max Bit Read i.e. this method is applicable for our example of Max Bit Read = 1, 2, 3, 4.

Note:In the all table look up methods we have to take care while decoding the last bits whenever Max Bit Read is greater than the last remaining bits to read. So insert 0’s towards LSB and then read the bitstream.

Conclusion:

The algorithm discussed here is the generalized efficient/optimized huffman decoding as it takes less memory area as compared to single level table decoding method (just apply the algo for max codelength of 16/24/32 bits, you will get huge memory saving) with keeping the accessing the code property of single level table. In below I am putting the quantitative analysis for the memory and complexity.

Table 4 Performance Analysis

1)Single level table 2^N (Memory consumption) , 2^N search for Worst case.

2)Automated merged table 2k + A.2k/2+ B. 2k/4 +… (Memory consumption), 2^k for Good case and 2^(k+k/2+k/4…) for Worst case(search complexity).

Where N is codelength in bits and k + m + n +… = N. And A, B,…….= Unidentified codes in the previous search.

Useful Links:

1:Huffman coding http://en.wikipedia.org/wiki/Huffman_code

2:Practical Huffman coding http://www.compressconsult.com/huffman/

Enjoy this techie discussion.....

Wait for new post in this catogory.

Till then bbye n think naturally.

## Friday, December 21, 2007

### Efficient FFT Implementation...

Hi....

If you are a DSP expert or working any of the field related to DSP, you would have come across this word definitely, if NOT you r lucky!!!!

As this is very important module and it consume lots of cycles (many millions) of your processor depends upon the N-point you are calculating.Although the FFT (Fast Fourier Transform) is quite fast (O(N log N) operations) compared to DFT (O(N^2) operations).

I am not in mood to describe all the basics of FFT, that you can read from any DSP books or just by googling you will get good tutorials over it.For beginners in the last of this post there are some good links, just go through it.

After going through the basics of N-point FFT, radix-4 and radix-2 FFT and Decimation in time(DIT)and Decimation in frequency(DIF).Now you are looking for actual efficient implementation.

So here is explanations for n point efficient radix-2 DIT (in-place)implementation...

----------------------------------------------------------------------------------

The inner loop of butterflies is like this : [Code taken from [4]]

for (k=j; k < n; k=k+n2)

{

t1 = c*x[k+n1] - s*y[k+n1];

t2 = s*x[k+n1] + c*y[k+n1];

x[k+n1] = x[k] - t1;

y[k+n1] = y[k] - t2;

x[k] = x[k] + t1;

y[k] = y[k] + t2;

x[]= Real part of point

y[]= Imaginary part of point

c = Cos[] value

s = Sin[] value

1) The code should not be very hard coded like specifically made for 2048 points FFT only.There will be NO reusability.Actually I seen some professional code very hard coded.

2) In FFT the twiddle factor plays the very important role i.e. the Wn. And as this is nothing but cos A -j SinA.So it's range will be between 0 to 1. And some other twiddle factor properties are:

W(superscript, subscript)

W(0,n) = 1. W(n/4, n) = -j. W(r+n/2, n) = -W(r, n).

So we can remove/reduce multiplications by using this properties. For W(0, n), W(n/4, n), W(n/8, n) and W(3n/8, n) these points will require just max two multiplication (one for 't1' and one for 't2') , NOT the four multiplication.

3) Then come to loops.In N-point FFT the no.of butterflies stages will be log N.Now we have three for loops.

1: no. of stages

2: no. of butterflies group(1024 times, 512 times..... )

3: butterflies values.(x[0]) and x[1]

We can unroll the loop upto 3 stages ...upto where we can use W(3n/8, n) property.This is the way(NO condition check is required...No if-else):

Note: Suppose we have 2048 points.

Stage 1: It has 1024 butterflies and Need no multiplication.As W(0, n) = 1. And by using MACRO we can unfold this thing upto 16.

for(i =0; i< (n/2); i+=16)

{

FFTUNROLL1();

FFTUNROLL1();

.... (16 times)

}

Now first stage loop will go just for 64 times NOT the 1024 times.

Stage 2: It has 512 group of double butterflies. We can make one macro that incorporate both butterflies.And the twiddle factor will be here W(0,n) and (W n/4, n) only.

And then unfold it like stage 1...

for(i =0; i< (n/4); i+=16)

now loop goes for only 32 times only NOT the 512 times.

stage 3: It has 256 group of butterflies with 4 butterflies.But it uses the twiddle factors..

W(0,n) , W(n/8, n), W(n/4, n) and W(3n/8, n), group them in macro.

Then unroll the loop like previous one

for(i =0; i< (n/8); i+=16)

... loop will go for 16 times only.

Now from the 4th stage means........we can continue our normal FFT code.

Now from 4 to 11 stage u can again apply the twiddle factor property as these points will come again...but here the additional cost will be of if-else situation for twiddle factor of W(0,n) , W(n/8, n), W(n/4, n) and W(3n/8, n) and normal FFT (with 4 multiplication), otherwise just do the Normal FFT.

Now i m ending my explanation..... I think it's already HUGE ;)

Try to implement accordingly or with better idea...... we can save huge amount of cycles.If anybody need more clarification , let me know.And if my explanation really helpful for you and if you thinking to implement it ...just inform me once, just to make me more happy n more confident... ;)

Just one thing this is NOT the end...... there may be more options.

I came upto this point.This I extracted from the basics only.

---------------------------------------------------------------------

Some useful links:

1.http://www.dspguide.com/ch12.htm

2.http://en.wikipedia.org/wiki/Fast_Fourier_transform

3.http://etoile.berkeley.edu/~jrg/ngst/fft/fft.html

4.cnx.org/content/m12016/latest/

5.http://www.cmlab.csie.ntu.edu.tw/cml/dsp/training/coding/transform/fft.html

Bbye n think naturally.

If you are a DSP expert or working any of the field related to DSP, you would have come across this word definitely, if NOT you r lucky!!!!

As this is very important module and it consume lots of cycles (many millions) of your processor depends upon the N-point you are calculating.Although the FFT (Fast Fourier Transform) is quite fast (O(N log N) operations) compared to DFT (O(N^2) operations).

I am not in mood to describe all the basics of FFT, that you can read from any DSP books or just by googling you will get good tutorials over it.For beginners in the last of this post there are some good links, just go through it.

After going through the basics of N-point FFT, radix-4 and radix-2 FFT and Decimation in time(DIT)and Decimation in frequency(DIF).Now you are looking for actual efficient implementation.

So here is explanations for n point efficient radix-2 DIT (in-place)implementation...

----------------------------------------------------------------------------------

The inner loop of butterflies is like this : [Code taken from [4]]

for (k=j; k < n; k=k+n2)

{

t1 = c*x[k+n1] - s*y[k+n1];

t2 = s*x[k+n1] + c*y[k+n1];

x[k+n1] = x[k] - t1;

y[k+n1] = y[k] - t2;

x[k] = x[k] + t1;

y[k] = y[k] + t2;

x[]= Real part of point

y[]= Imaginary part of point

c = Cos[] value

s = Sin[] value

1) The code should not be very hard coded like specifically made for 2048 points FFT only.There will be NO reusability.Actually I seen some professional code very hard coded.

2) In FFT the twiddle factor plays the very important role i.e. the Wn. And as this is nothing but cos A -j SinA.So it's range will be between 0 to 1. And some other twiddle factor properties are:

W(superscript, subscript)

W(0,n) = 1. W(n/4, n) = -j. W(r+n/2, n) = -W(r, n).

So we can remove/reduce multiplications by using this properties. For W(0, n), W(n/4, n), W(n/8, n) and W(3n/8, n) these points will require just max two multiplication (one for 't1' and one for 't2') , NOT the four multiplication.

3) Then come to loops.In N-point FFT the no.of butterflies stages will be log N.Now we have three for loops.

1: no. of stages

2: no. of butterflies group(1024 times, 512 times..... )

3: butterflies values.(x[0]) and x[1]

We can unroll the loop upto 3 stages ...upto where we can use W(3n/8, n) property.This is the way(NO condition check is required...No if-else):

Note: Suppose we have 2048 points.

Stage 1: It has 1024 butterflies and Need no multiplication.As W(0, n) = 1. And by using MACRO we can unfold this thing upto 16.

for(i =0; i< (n/2); i+=16)

{

FFTUNROLL1();

FFTUNROLL1();

.... (16 times)

}

Now first stage loop will go just for 64 times NOT the 1024 times.

Stage 2: It has 512 group of double butterflies. We can make one macro that incorporate both butterflies.And the twiddle factor will be here W(0,n) and (W n/4, n) only.

And then unfold it like stage 1...

for(i =0; i< (n/4); i+=16)

now loop goes for only 32 times only NOT the 512 times.

stage 3: It has 256 group of butterflies with 4 butterflies.But it uses the twiddle factors..

W(0,n) , W(n/8, n), W(n/4, n) and W(3n/8, n), group them in macro.

Then unroll the loop like previous one

for(i =0; i< (n/8); i+=16)

... loop will go for 16 times only.

Now from the 4th stage means........we can continue our normal FFT code.

Now from 4 to 11 stage u can again apply the twiddle factor property as these points will come again...but here the additional cost will be of if-else situation for twiddle factor of W(0,n) , W(n/8, n), W(n/4, n) and W(3n/8, n) and normal FFT (with 4 multiplication), otherwise just do the Normal FFT.

Now i m ending my explanation..... I think it's already HUGE ;)

Try to implement accordingly or with better idea...... we can save huge amount of cycles.If anybody need more clarification , let me know.And if my explanation really helpful for you and if you thinking to implement it ...just inform me once, just to make me more happy n more confident... ;)

Just one thing this is NOT the end...... there may be more options.

I came upto this point.This I extracted from the basics only.

---------------------------------------------------------------------

Some useful links:

1.http://www.dspguide.com/ch12.htm

2.http://en.wikipedia.org/wiki/Fast_Fourier_transform

3.http://etoile.berkeley.edu/~jrg/ngst/fft/fft.html

4.cnx.org/content/m12016/latest/

5.http://www.cmlab.csie.ntu.edu.tw/cml/dsp/training/coding/transform/fft.html

Bbye n think naturally.

## Wednesday, December 19, 2007

### This is the Start....

Hi everyone......

Today I am writing my first blog. Now I am also entering in this world, become or wanna be a member of intellectual world , although I have no relation with this word.

After lot n lots of thinking to start this blog... I gone through the very first natural question in my mind...

Why I should start blog???

But after calculating all the losses n profits I came to the point that I should start it.

As the blog gives me full freedom to convey my thought to others;

It ill make me alert to get more n more and new knowledge so that I can post on it on my blog ..... you can say it's kind of motivation.

And it says me to keep my eyes open even at the time of sleeping to my surroundings , try to grab the things, think over it, post it for the discussion to all....

As I think naturally blog is the place of exchanging the information/knowledge ....

Very soon I 'll post the next thread...till then

bbye n think naturally

Today I am writing my first blog. Now I am also entering in this world, become or wanna be a member of intellectual world , although I have no relation with this word.

After lot n lots of thinking to start this blog... I gone through the very first natural question in my mind...

Why I should start blog???

But after calculating all the losses n profits I came to the point that I should start it.

As the blog gives me full freedom to convey my thought to others;

It ill make me alert to get more n more and new knowledge so that I can post on it on my blog ..... you can say it's kind of motivation.

And it says me to keep my eyes open even at the time of sleeping to my surroundings , try to grab the things, think over it, post it for the discussion to all....

As I think naturally blog is the place of exchanging the information/knowledge ....

Very soon I 'll post the next thread...till then

bbye n think naturally

Subscribe to:
Posts (Atom)