ChipFind - документация

Электронный компонент: CS2460

Скачать:  PDF   ZIP

Document Outline

TM
Virtual Components for the Converging World
Amphion continues to expand its family of application-specific cores
1
See http://www.amphion.com for a current list of products
CS2460
64-Point Pipelined FFT/IFFT
The CS2460 is an online programmable, pipelined architecture 64-Point FFT/IFFT core. This highly integrated
application specific core computes the FFT/IFFT based on a radix-4 decimation in frequency (DIF) algorithm. It
performs the computations concurrently in three highly pipelined cascaded stages, as illustrated in Figure 1. The
CS2460 is capable of processing continuous input data and contains all the necessary circuits to support this
continuous processing. It is available in both ASIC and FPGA versions that have been handcrafted by Amphion
for maximum performance while minimizing power consumption and silicon area.
Figure 1: Block Diagram of CS2460 Core
Stage 1
Radix-4
FFT
Input
Buffer
QOV
Stage 2
Radix-4
FFT
Stage 3
Radix-4
FFT
Output
Re-
ordering
Buffer
enable_out
Ich_out [11:0]
Qch_out [11:0]
DBSO
BSY
enable_in
Ich_in [11:0]
Qch_in [11:0]
QSC[1:0]
FEATURES
On-line programmable FFT/IFFT core
12-bit complex input and output in two's
complement format
12-bit twiddle factors generated inside the
core
15-bit fixed-point internal arithmetic operation
Programmable shift down control
Radix-4 based architecture
Both input and output in normal order
No external memory required
Optimized for both ASIC and FPGA
technologies with the same functionality
Fully synchronous design
APPLICATIONS
OFDM modulation for WLAN IEEE 802.11a and
HiperLAN2
Image processing
Atmospheric imaging
Spectral representation
2
CS2460
64-Point Pipelined FFT/IFFT
FAST FOURIER TRANSFORM
FFT (Fast Fourier Transform) and IFFT (Inverse Fast Fourier
Transform) are algorithms computing 2
P
-point discrete
Fourier transform, as defined below:
FFT:
, k = 0, 1, 2...N-1
[1]
IFFT:
, k = 0, 1, 2...N-1
[2]
Where N=2
P
and
.
The computational complexity of FFT and IFFT is
proportional to Nlog
R
N, where R is the radix base on which
FFT/IFFT is performed. The higher the radix, the less number
of multiplication is required, however the more simultaneous
multiple data access is required which causes the circuits to be
more complicated. The radix-4 algorithm offers a balance
between the computational and circuit complexity and is often
used in construction of higher radix FFT computation units
when designing high performance FFT/IFFT hardware.
CS2460 SYMBOL
AND PIN DESCRIPTION
Table 1 describes input and output ports (shown graphically
in Figure 2) of the CS2460 64-point FFT/IFFT core. Unless
otherwise stated, all signals are active high and bit(0) is the
least significant bit.
Figure 2: CS2460 Symbol
Y k
( )
X n
( )
n
0
=
N 1
W
n
k
N
=
Y k
( )
1
N
----
X n
( )
n
0
=
N 1
W
nk
N
=
e
j2
N
/
CS2460
64-Point
FFT
Function
Qch_out [11:0]
Qch_out [11:0]
enable_out
Ich_in [11:0]
RST
QSC[1:0]
QOV
BSY
enable_in
CLK
DBSO
Qch_in [11:0]
Table 1: CS2460 - 64-Point FFT/IFFT Interface Signal Definitions
Name
I/O
Description
Ich_in[11:0]
I
12-bit Real input data Ich_in enters the function in natural order.
Qch_in[11:0]
I
12-bit Imaginary input data Qch_in enters the function in natural order.
enable_in
I
Shows the valid section within the input data, when a 64pt block is input, enable_in should remain
HIGH for 64 clock cycles.
QSC[1:0]
I
Output Shifting Control determines how the internal 15 bit real and imaginary values are shifted to
provide the 12 bit outputs:
00: [14:3]
01: [13:2]
10: [12:1]
11: [11:0]
RST
I
Asynchronous global reset signal, active HIGH.
CLK
I
System clock signal, rising edge active.
Ich_out[11:0]
O
12-bit transformed Real data Ich_out is output in normal order. The output FFT data in normal
order is indicated by enable_out.
Qch_out[11:0]
O
12-bit transformed Imaginary data Qch_out is output in normal order. The output FFT data in nor-
mal order is indicated by enable_out.
enable_out
O
Shows the valid section within the output data, when a 64pt block is output, enable_out should
remain HIGH for 64 clock cycles. Enable_out goes to HIGH when the first complex data comes
out and remains HIGH until the last one is output.
3
TM
FUNCTIONAL DESCRIPTION
CS2460 performs decimation in frequency (DIF), radix-4,
forward or inverse Fast Fourier Transforms on complex data.
Data is loaded into its workspace RAM in normal sequential
(natural) order. The transformed data is output from the core
in normal order.
DATA FORMATS
Input and output data is represented by fixed-point real and
imaginary components in two's complement format. For IFFT,
the real (Ich_in) and imaginary (Qch_in) components are
exchanged. The input component wordlength is 12 bits and
the output 12 bits. Internal words stored between
computation stages are 15 bits including the input to the final
output scaling stage from which the required 12 bits are
selected under the control of QSC. Twiddle factor (Sine and
Cosine values), which are generated by the function internally,
has a wordlength of 12 bits.
FFT COMPUTATION
The FFT computation for one data block is scheduled to
complete in three highly pipelined stages. In each stage 64
radix-4 operations are performed. The first stage computation
starts just after the last data of the 64-point block is loaded.
The successive data block can follow the preceding one
immediately. Figure 3 shows the flowchart for FFT
computations in CS2460.
Figure 3: FFT Execution Flowchart
DATA LOADING
Before the CS2460 operates, a reset of the circuit should be
performed if input port BSY is HIGH. This is done by
asserting the input RST for the specified CPLD reset time.
Data is clocked in on the clock rising edge in normal order. It
is synchronized by signal enable_in, that shows the valid
section within the input data. For example, when a 64pt block
is input, enable_in should remain HIGH for 64 clock cycles.
When loading is started, indicator BSY gets asserted to
acknowledge that loading is in progress and computation will
be started after loading. When the function is ready to accept
the next data block, signal BSY returns to LOW. Figure 4,
illustrates the data loading operation in CS2460.
QOV
O
Output overflow signal. Overflow may occur depending on the QSC selection. When overflow
occurs the output is saturated. When an overflow occurs QOV remains asserted for the remainder
of that block.
DBSO
O
Output start of block signal, DBSO is asserted HIGH for the first two complex results outputs from
the core.
BSY
O
Input port status indicator. An input data block can only start when BSY is LOW. It goes to HIGH in
the next clock cycle after loading is started and returns to LOW when the last data of the 64-point
block is loaded.
Table 1: CS2460 - 64-Point FFT/IFFT Interface Signal Definitions
Name
I/O
Description
Yes
Load data into input buffer,
then start stage 1 radix-4
computation
Input
enable_in= 1 ?
Stage 2 radix-4
computation
Stage 3 radix-4
computation
Output scaling
& Re-ordering
Output
enable_in= 1 ?
Yes
No
No
4
CS2460
64-Point Pipelined FFT/IFFT
Figure 4: Data Loading Operation
COMPUTATION ACCURACY
The stage 1 and stage 2 radix-4 operations for 64-point
transform consist of a radix-4 butterfly followed by a twiddle
multiplication. Theoretically the result value may grow by a
factor of up to 5.242. In stage 3, the result value may grow by a
factor of 4 and the final result value may grow by a factor of
up to 81.4 (this occurs when the input data represents a
complex square wave). As the output is only 12 bits and fixed-
point arithmetic is employed in the radix-4 processor, it is
necessary to be able to scale the result to avoid overflow while
still obtaining a good dynamic range. Since the input and
output wordlengths are both 12 bits, a 2 bit unconditional
down shift is performed in stage 2 and the remainder of the
shifting is performed at the output under the control of input
QSC. The internal wordlength for data transferred between
stages is 15 bits including 15 bits before output scaling.
Figure 5 shows the internal wordlength of CS2460 core.
To improve the computation accuracy a rounding technique is
employed. Therefore, when the intermediate value is derived
from the twiddle multiplication result, or the input to the
butterfly is scaled down, rounding is performed to ensure the
highest possible accuracy is achieved.
Figure 5: Internal Wordlengths
LATENCY
The CS2460 function has a fixed latency of 140 clock cycles, as
shown in Figure 6. The first transformed data of a 64-point
block is 140 clock cycles behind the first data of the
corresponding input block. A version of the core with no
reorder of the outputs is also available. This has a reduced
latency of 89 clock cycles. Signal enable_out gets asserted
when the first data of the output block appears on the output
port and remains asserted until the last data is outputted. The
signal DBSO is asserted for two clock cycles when the first
data of each output block appears.
OUTPUT DATA ORDER
The transformed data is clocked out in normal order, as
illustrated in Figure 6.
I
0
CLK
RST
enable_in
Ich_in
I
1
I
2
I
3
I
4
BSY
I
N-20
I
N-19
Q
0
Qch_in
Q
1
Q
2
Q
3
Q
4
Q
n-20
Q
n-19
Q
63
Q
0
Q
1
I
63
I
0
I
1
15 bits
15 bits
15 bits
12 bits
2 bit unconditional
down shift
Stage1
Data output
QOV
Data input
12 bits
QSC
Stage2
Stage3
Scale
5
TM
Figure 6: Transformed Data Output Operation
TRANSFORM TIME
When clocked at 50 MHz, for example, with continuous input
data, the CS2460 64-point FFT/IFFT core achieves the
following transform time:
64 point transform time = 64 x 1/50 x 10
6
= 1.28s
TIMING CHARACTERISTICS
Table 2, represents the timing characteristics of CS2460 64-
point FFT/IFFT, implemented on a EP20K600EBC652-1X
device under the commercial temperature range operating
conditions.
AVAILABILITY AND IMPLEMENTATION INFORMATION
Amphion offers the CS2460 core in ASIC and programmable logic versions. Consult your local Amphion representative for
product specific performance information, current availability of individual products, and lead times on ASIC or different
programmable logic core porting.
The implementation information provided in Table 3 has been obtained for a stand-alone design on a Altera EP20K600EBC652-1X
device. It should be noted that if CS2460 is implemented on different Altera devices, the performance metrics and density might
vary accordingly.
* The implementation information on ASIC or Xilinx devices is available upon request.
I block n
CLK
enable_in
Ich_in
I block (n+3)
I block (n+1)
enable_out
Ich_out
Qch_out
140 cycles
QOV
overflow occurs
DBSO
2*CLK
Q block n
Qch_in
Q block (n+3)
Q block (n+1)
0 1
3
2
4 5
0 1
3
2
4 5
62 63
62 63
Table 2: CS2460 Timing Characteristics
Characteristic
Min
Max
Units
Clock Frequency
60
MHz
Input setup time
4.5
ns
Input hold time
0
ns
Signal BSY and QOV output delay
8.8
ns
Other output delay
10
ns
Table 3: Programmable Logic Cores
PRODUCT
ID
SILICON
VENDOR
PROGRAMMABLE
LOGIC PRODUCT
MAXIMUM
FREQUENCY (MHz)
DEVICE RESOURCES
USED (LOGIC)
DEVICE RESOURCES
USED (MEMORY)
AVAILABILITY
CS2460
Altera*
Apex20K600E-1
60
5078 LCs
12 ESBs
Now