ChipFind - документация

Электронный компонент: 79RV4700

Скачать:  PDF   ZIP

Document Outline

1 of 25
April 10, 2001
2001 Integrated Device Technology, Inc.
DSC 9096
64-Bit RISC Microprocessor
Features
Features
Features
Features
x
True 64-bit microprocessor
64-bit integer operations
64-bit floating-point operations
64-bit registers
64-bit virtual address space
x
High-performance microprocessor
260 Dhrystone MIPS at 200MHz
100 peak MFLOP/s at 200MHz
Two-way set associative caches
Simple 5-stage pipeline
x
High level of integration
64-bit, 200 MHz integer CPU
64-bit floating-point unit
16KB instruction cache
16KB data cache
Flexible MMU with large, fully associative TLB
x
Low-power operation
3.3V power supply, for the "RV" part
5V power supply, for the "R" part
Dynamic power management
Standby mode reduces internal power
x
Fully software & pin-compatible with 40
XX
Processor Family
x
Available in 179-pin PGA or 208-pin QFP
x
Available at 80-200MHz, with mode bit dependent output
clock frequencies
x
64GB physical address space
x
Processor family for a wide variety of embedded
applications
LAN switches
Routers
Color printers
Description
Description
Description
Description
The IDT79R4700 64-bit RISC Microprocessor is both software and
pin-compatible with the R4
XXX
processor family. With 64-bit processing
capabilities, the R4700 provides more computational power and data
movement bandwidth than is delivered to typical embedded systems by
32-bit processors.
The R4700 is upwardly software compatible with the IDT79R3000
TM
microprocessor family, including the IDTRISController
TM
79R3051
TM
,
R3052
TM
, R3041
TM
, R3081
TM
as well as the R4640
TM
, R4650
TM
, RC64474/
475
TM
and R5000
TM
. An array of development tools facilitates rapid
development of R4700-based systems, allowing a variety of customers
access to the MIPS Open Architecture philosophy.
Block Diagram
Block Diagram
Block Diagram
Block Diagram
The IDT logo is a registered trademark and RC32134, RC32364, RC64145, RC64474, RC64475, RC4650, RC4640, RC4600,RC4700 RC3081, RC3052, RC3051, RC3041, RISController, and RISCore are trade-
marks of Integrated Device Technology, Inc.
Read Buffer
Integer Register File
Integer/Address Adder
Data TLB Virtual
Shifter/Store Aligner
Logic Unit
Program Counter
PC Increm enter
Branch Adder
Load Aligner
Floating-point
Unpacker/Packer
Floating-point
Add/Sub/Cvt/Div/Sqrt
Integer Divide
Floating-point/Integer
Phase Lock Loop, Clocks
Instruction TLB Virtual
Joint TLB
Data Set A
Data Set B
Data Tag A
DTLB Physical
Address Buffer
Data Tag B
Instruction Tag A
Instruction Tag B
ITLB Physical
Store Buffer
Write Buffer
DVA
IVA
Instruction Set A
Instruction Set B
Multiply
F
l
oa
t
i
ng-
po
i
n
t
C
o
nt
r
o
l
I
n
t
e
g
e
r
C
ont
r
o
l
SysAD
IBus
DBus
Coprocessor 0
System /Mem ory
Control
Tag
AuxTag
Instruction Select
Control
Instruction Register
Register File
IDT79R4700
2 of 25
April 10, 2001
IDT79R4700
This data sheet provides an overview of the R4700's CPU features
and architecture. A more detailed description of this processor is
provided in the IDT79R4700 RISC Processor Hardware User's Manual,
available from Integrated Device Technology (IDT). Information on
development support, applications notes and complementary products
is available on the IDT Web site www.idt.com or through your local IDT
sales representative.
Note: Throughout this data sheet and any other IDT materials for this
device, the R4700 indicates a 5V part; RV4700 designates a reduced
voltage (3V) part; and the RC4700 reflects either.
Figure 1 The RC4700 CPO Registers
Hardware Overview
Hardware Overview
Hardware Overview
Hardware Overview
The RC4700 processor family brings a high-level of integration
designed for high-performance computing. The R4700's key elements
are briefly described below. A more detailed explanation of each
subsystem is available in the user's manual.
Pipeline
Pipeline
Pipeline
Pipeline
The RC4700 uses a simple 5-stage pipeline, similar to the pipeline
structure implemented in the IDT79R32364. This pipeline's simplicity
allows the RC4700 to be lower cost and lower power than super-scalar
or super-pipelined processors. The pipeline stages are shown in Figure
3 on page 3.
Integer Execution Engine
Integer Execution Engine
Integer Execution Engine
Integer Execution Engine
The R4700 implements the MIPS-III Instruction Set architecture and
is upwardly compatible with applications that run on earlier generation
parts.
Implementation of the MIPS-III architecture results in 64-bit opera-
tions, better code density, greater multi-processing support, improved
performance for commonly used code sequences in operating system
kernels and faster execution of floating-point intensive applications. All
0
47
TLB
(entries protected
from TLB W R )
E ntryH i
10*
EntryLo0
2*
E ntryLo1
3*
PageMask
5*
Wired
6*
Random
1*
Index
0*
S tatus
12*
Cause
13*
E P C
14*
ErrorEPC
30*
C ount
9*
Com pare
11*
C ontext
4*
X C ontext
20*
P RId
15*
Config
16*
TagH i
29*
TagLo
28*
E C C
26*
CacheErr
27*
BadVAddr
8*
LLA ddr
17*
* Register number
3*
resource dependencies are made transparent to the programmer,
insuring transportability among implementations of the MIPS instruction
set architecture.
The MIPS integer unit implements a load/store architecture with
single cycle ALU operations (logical, shift, add, sub) and an autono-
mous multiply/divide unit. Register resources include:
x
32 general-purpose orthogonal integer registers
x
HI/LO result registers, for the integer multiply/divide unit
x
Program counter
Also, the on-chip floating-point co-processor adds 32 floating-point
registers and a floating-point control/status register.
Register File
Register File
Register File
Register File
The R4700 has 32 general-purpose registers (shown in Figure 2).
These registers are used for scalar integer operations and address
calculation. The register file consists of two read ports and one write
port and is fully bypassed to minimize operation latency in the pipeline.
ALU
ALU
ALU
ALU
The RC4700 ALU consists of the integer adder and logic unit. The
adder performs address calculations in addition to arithmetic operations,
and the logic unit performs all logical and shift operations. Each of these
units is highly optimized and can perform an operation in a single pipe-
line cycle.
Integer Multiply/Divide
Integer Multiply/Divide
Integer Multiply/Divide
Integer Multiply/Divide
To perform integer multiply and divide operations, the RC4700 uses
the floating-point unit. The results of the operation are placed in the HI
and LO registers. The values can then be transferred to the general
purpose register file using the MFHI/MFLO instructions. To prevent the
General Purpose Registers
Multiply/Divide Registers
63
0
0
63
0
r1
HI
r2
63
0
LO

Program Counter
63
0
r29
PC
r30
r31
Figure 2 R4700 CPU Registers
3 of 25
April 10, 2001
IDT79R4700
Figure 3 RC4700 Pipeline Stages
I
0
1I
2I
1R
2R
1A
2A
1D
2D
1W
2W
I
1
1I
2I
1R
2R
1A
2A
1D
2D
1W
2W
I
2
1I
2I
1R
2R
1A
2A
1D
2D
1W
I
3
1I
2I
1R
2R
1A
2A
1D
I
4
1I
2I
1R
2R
1A
one cycle
Key to Figure
1I-1R
Instruction cache access
2I
Instruction virtual-to-physical address translation in ITLB
2A-2D
Data cache access and load align
1D
Data virtual-to-physical address translation in DTLB
1D-2D
Virtual-to-physical address translation in JTLB
2R
Register file read
2R
Bypass calculation
2R
Instruction decode
2R
Branch address calculation
1A
Issue or slip decision
1A-2A
Integer add, logical, shift
1A
Data virtual address calculation
2A
Store align
1A
Branch decision
2W
Register file write
4 of 25
April 10, 2001
IDT79R4700
occurrence of an interlock or stall, a required number of processor
internal cycles must occur between an integer multiply or divide and a
subsequent MFHI or MFLO operation.
Floating-Point Co-Processor
Floating-Point Co-Processor
Floating-Point Co-Processor
Floating-Point Co-Processor
The RC4700 incorporates a complete floating-point co-processor on
chip and includes a floating-point register file and execution units. The
floating-point co-processor forms a "seamless" interface with the integer
unit, decoding and executing instructions in parallel with the integer unit.
Floating-Point Units
Floating-Point Units
Floating-Point Units
Floating-Point Units
The RC4700 floating-point execution units support single and double
precision arithmetic, as specified in the IEEE Standard 754. The execu-
tion unit is separated into a multiply unit and a combined add/convert/
divide/square root unit. Overlap of multiplies and add/subtract is
supported. The multiplier is partially pipelined, allowing a new multiply to
begin every four cycles.
The RC4700 maintains fully precise floating-point exceptions while
allowing both overlapped and pipelined operations. Precise exceptions
are extremely important in mission-critical environments and highly
desirable for debugging in any environment.
The floating-point unit operation's set includes floating-point add,
subtract, multiply, divide, square root, conversion between fixed-point
and floating-point format, conversion among floating-point formats and
floating-point compare. These operations comply with the IEEE Stan-
dard 754.
Table 1 lists the latencies of some of the floating-point instructions in
internal processor cycles. Note that multiplies are pipelined so that a
new multiply can be initiated every four pipeline cycles
Floating-Point General Register File
Floating-Point General Register File
Floating-Point General Register File
Floating-Point General Register File
The floating-point register file is made up of thirty-two 64-bit regis-
ters. With the LDC1 and SDC1 instructions the floating-point unit can
take advantage of the 64-bit wide data cache and issue a co-processor
load or store doubleword instruction in every cycle.
The floating-point control register space contains two registers: one
for determining configuration and revision information for the copro-
cessor and one for control and status information. These are primarily
involved with diagnostic software, exception handling, state saving and
restoring, and control of rounding modes.
Operation
32-bit
64-bit
MULT
6 - 9
7 - 10
DIV
42
74
System Control Co-processor (CP0)
System Control Co-processor (CP0)
System Control Co-processor (CP0)
System Control Co-processor (CP0)
The system control co-processor in the MIPS architecture is respon-
sible for the virtual memory sub-system, the exception control system
and the diagnostics capability of the processor. In the MIPS architec-
ture, the system control co-processor (and thus the kernel software) is
implementation dependent.
System Control Co-Processor Registers
System Control Co-Processor Registers
System Control Co-Processor Registers
System Control Co-Processor Registers
The RC4700 incorporates all system control co-processor (CP0)
registers, on-chip. These registers (shown in Figure 1 on page 2)
provide the path through which the virtual memory system's page
mapping is examined and changed, exceptions are handled and oper-
ating modes are controlled (kernel vs. user mode, interrupts enabled or
disabled, cache features). In addition, to aid in cache diagnostic testing
and assist in data error detection, the RC4700 includes registers to
implement a real-time cycle counting facility.
Virtual-to-Physical Address Mapping
Virtual-to-Physical Address Mapping
Virtual-to-Physical Address Mapping
Virtual-to-Physical Address Mapping
To establish a secure environment for user processing, the RC4700
provides the user, supervisor, and kernel modes of virtual addressing,
available to system software. Bits in a status register determine which
virtual addressing mode is used.
While in user mode, the RC4700 provides a single, uniform virtual
address space of 256GB (2GB for 32-bit address mode). When oper-
ating in the kernel mode, four distinct virtual address spaces--totalling
1024GB (4GB in 32-bit address mode)--are simultaneously available
and are differentiated by the high-order bits of the virtual address.
Operation
Single
Precision
Double
Precision
ADD
4
4
SUB
4
4
MUL
4
5
DIV
32
61
SQRT
31
60
CMP
3
3
FIX
4
4
FLOAT
6
6
ABS
1
1
MOV
1
1
NEG
1
1
LWC1, LDC1
2
2
SWC1, SDC1
1
1
Table 1 RC4700 Instruction Latencies
5 of 25
April 10, 2001
IDT79R4700
The RC4700 processor also supports a supervisor mode in which the
virtual address space is 256.5GB (2.5GB in 32-bit address mode),
divided into three regions that are based on the high-order bits of the
virtual address. If the RC4700 is configured for 64-bit virtual addressing,
the virtual address space layout is an upwardly compatible extension of
the 32-bit virtual address space layout. Figure 4 on page 5 shows the
address space layout for the 32-bit virtual address operation.
Memory Management Unit (MMU)
Memory Management Unit (MMU)
Memory Management Unit (MMU)
Memory Management Unit (MMU)
The Memory management unit controls the virtual memory system
page mapping. It consists of an instruction address translation buffer
(the ITLB), a data address translation buffer (the DTLB), a Joint TLB (the
JTLB), and co-processor registers used for the virtual memory mapping
sub-system.
Instruction TLB (ITLB)
Instruction TLB (ITLB)
Instruction TLB (ITLB)
Instruction TLB (ITLB)
The RC4700 also incorporates a two-entry instruction TLB. Each
entry maps a 4KB page. The instruction TLB improves performance by
allowing instruction address translation to occur in parallel with data
address translation. When a miss occurs on an instruction address
translation, the least-recently used ITLB entry is filled from the JTLB.
The operation of the ITLB is invisible to the user.
Data TLB (DTLB)
Data TLB (DTLB)
Data TLB (DTLB)
Data TLB (DTLB)
The RC4700 also incorporates a four-entry data TLB. Each entry
maps a 4KB page. The data TLB improves performance by allowing
data address translation to occur in parallel with instruction address
translation. When a miss occurs on a data address translation, the DTLB
is filled from the JTLB. The DTLB refill is pseudo-LRU: the least recently
used entry of the least recently used half is filled. The operation of the
DTLB is invisible to the user.
Joint TLB (JTLB)
Joint TLB (JTLB)
Joint TLB (JTLB)
Joint TLB (JTLB)
For fast virtual-to-physical address decoding, the RC4700 uses a
large, fully associative TLB that maps 96 virtual pages to their corre-
sponding physical addresses. The TLB is organized as 48 pairs of even-
odd entries and maps a virtual address and address space identifier into
the large, 64GB physical address space.
Two mechanisms are provided to assist in controlling the amount of
mapped space and the replacement characteristics of various memory
regions. First, the page size can be configured, on a per-entry basis, to
map a page size of 4KB to 16MB (in multiples of 4). A CP0 register is
loaded with the page size of a mapping, and that size is entered into the
TLB when a new entry is written. Thus, operating systems can provide
special purpose maps; for example, a typical frame buffer can be
memory mapped using only one TLB entry.
The second mechanism controls the replacement algorithm, when a
TLB miss occurs. The RC4700 provides a random replacement algo-
rithm to select a TLB entry to be written with a new mapping; however,
the processor provides a mechanism whereby a system specific number
of mappings can be locked into the TLB and avoid being randomly
replaced. This facilitates the design of real-time systems, by allowing
deterministic access to critical software.
The joint TLB also contains information to control the cache coher-
ency protocol for each page. Specifically, each page has attribute bits to
determine whether the coherency algorithm is uncached, non-coherent
write-back, non-coherent write-through write-allocate or non-coherent
write-through no write-allocate. Non-coherent write-back is typically
used for both code and data on the RC4700; however, hardware-based
cache coherency is not supported.
Cache Memory
Cache Memory
Cache Memory
Cache Memory
To keep the RC4700's high-performance pipeline full and operating
efficiently, the RC4700 incorporates on-chip instruction and data caches
that can be accessed in a single processor cycle. Each cache has its
own 64-bit data path and can be accessed in parallel.
Instruction Cache
Instruction Cache
Instruction Cache
Instruction Cache
The RC4700 incorporates a two-way set associative on-chip instruc-
tion cache. This virtually indexed, physically tagged cache is 16KB in
size and is protected with word parity.
0xFFFFFFFF
0xE0000000
Kernel virtual address space
(kseg3)
Mapped, 0.5GB
0xDFFFFFFF
Supervisor virtual address space
(sseg)
Mapped, 0.5GB
0xC0000000
0xBFFFFFFF
0xA0000000
Uncached kernel physical address space
(kseg1)
Unmapped, 0.5GB
0x9FFFFFFF
0x80000000
Cached kernel physical address space
(kseg0)
Unmapped, 0.5GB
0x7FFFFFF
0x00000000
User virtual address space
(useg)
Mapped, 2.0GB
Figure 4 Kernel Mode Virtual Addressing (32-bit Mode)