Friday, December 21, 2007

Efficient FFT Implementation...

Hi....
If you are a DSP expert or working any of the field related to DSP, you would have come across this word definitely, if NOT you r lucky!!!!

As this is very important module and it consume lots of cycles (many millions) of your processor depends upon the N-point you are calculating.Although the FFT (Fast Fourier Transform) is quite fast (O(N log N) operations) compared to DFT (O(N^2) operations).

I am not in mood to describe all the basics of FFT, that you can read from any DSP books or just by googling you will get good tutorials over it.For beginners in the last of this post there are some good links, just go through it.

After going through the basics of N-point FFT, radix-4 and radix-2 FFT and Decimation in time(DIT)and Decimation in frequency(DIF).Now you are looking for actual efficient implementation.

So here is explanations for n point efficient radix-2 DIT (in-place)implementation...
----------------------------------------------------------------------------------
The inner loop of butterflies is like this : [Code taken from [4]]

for (k=j; k < n; k=k+n2)
{
t1 = c*x[k+n1] - s*y[k+n1];
t2 = s*x[k+n1] + c*y[k+n1];
x[k+n1] = x[k] - t1;
y[k+n1] = y[k] - t2;
x[k] = x[k] + t1;
y[k] = y[k] + t2;

x[]= Real part of point
y[]= Imaginary part of point
c = Cos[] value
s = Sin[] value

1) The code should not be very hard coded like specifically made for 2048 points FFT only.There will be NO reusability.Actually I seen some professional code very hard coded.

2) In FFT the twiddle factor plays the very important role i.e. the Wn. And as this is nothing but cos A -j SinA.So it's range will be between 0 to 1. And some other twiddle factor properties are:

W(superscript, subscript)
W(0,n) = 1. W(n/4, n) = -j. W(r+n/2, n) = -W(r, n).

So we can remove/reduce multiplications by using this properties. For W(0, n), W(n/4, n), W(n/8, n) and W(3n/8, n) these points will require just max two multiplication (one for 't1' and one for 't2') , NOT the four multiplication.

3) Then come to loops.In N-point FFT the no.of butterflies stages will be log N.Now we have three for loops.

1: no. of stages
2: no. of butterflies group(1024 times, 512 times..... )
3: butterflies values.(x[0]) and x[1]

We can unroll the loop upto 3 stages ...upto where we can use W(3n/8, n) property.This is the way(NO condition check is required...No if-else):

Note: Suppose we have 2048 points.

Stage 1: It has 1024 butterflies and Need no multiplication.As W(0, n) = 1. And by using MACRO we can unfold this thing upto 16.
for(i =0; i< (n/2); i+=16)
{
FFTUNROLL1();
FFTUNROLL1();
.... (16 times)
}

Now first stage loop will go just for 64 times NOT the 1024 times.

Stage 2: It has 512 group of double butterflies. We can make one macro that incorporate both butterflies.And the twiddle factor will be here W(0,n) and (W n/4, n) only.
And then unfold it like stage 1...

for(i =0; i< (n/4); i+=16)

now loop goes for only 32 times only NOT the 512 times.

stage 3: It has 256 group of butterflies with 4 butterflies.But it uses the twiddle factors..
W(0,n) , W(n/8, n), W(n/4, n) and W(3n/8, n), group them in macro.
Then unroll the loop like previous one

for(i =0; i< (n/8); i+=16)

... loop will go for 16 times only.


Now from the 4th stage means........we can continue our normal FFT code.

Now from 4 to 11 stage u can again apply the twiddle factor property as these points will come again...but here the additional cost will be of if-else situation for twiddle factor of W(0,n) , W(n/8, n), W(n/4, n) and W(3n/8, n) and normal FFT (with 4 multiplication), otherwise just do the Normal FFT.


Now i m ending my explanation..... I think it's already HUGE ;)
Try to implement accordingly or with better idea...... we can save huge amount of cycles.If anybody need more clarification , let me know.And if my explanation really helpful for you and if you thinking to implement it ...just inform me once, just to make me more happy n more confident... ;)

Just one thing this is NOT the end...... there may be more options.
I came upto this point.This I extracted from the basics only.
---------------------------------------------------------------------
Some useful links:

1.http://www.dspguide.com/ch12.htm
2.http://en.wikipedia.org/wiki/Fast_Fourier_transform
3.http://etoile.berkeley.edu/~jrg/ngst/fft/fft.html
4.cnx.org/content/m12016/latest/
5.http://www.cmlab.csie.ntu.edu.tw/cml/dsp/training/coding/transform/fft.html

Bbye n think naturally.

No comments:

Post a Comment