ChipFind - документация

Электронный компонент: MIPS32M4K

Скачать:  PDF   ZIP

Document Outline

MIPS32TM M4KTM Processor Core Datasheet
January 8, 2003
MIPS32TM M4KTM Processor Core Datasheet, Revision 01.01Document Number: MD00247
Copyright 2002-2003 MIPS Technologies Inc. All rights reserved.
The MIPS32TM M4KTM core from MIPS Technologies is a high-performance, low-power, 32-bit MIPS RISC core
designed for custom system-on-silicon applications. The core is designed for semiconductor manufacturing
companies, ASIC developers, and system OEMs who want to rapidly integrate their own custom logic and
peripherals with a high-performance RISC processor. It is highly portable across processes, and can be easily
integrated into full system-on-silicon designs, allowing developers to focus their attention on end-user products. The
M4K core is ideally positioned to support new products for emerging segments of the routing, network access,
network storage, residential gateway, and smart mobile device markets. It is especially well-suited for applications
requiring multiple cores, or even a single core, when high performance density is critical.
The M4K core implements the MIPS32 Release 2 Architecture with the MIPS16eTM ASE, and the 32-bit privileged
resource architecture. The Memory Management Unit (MMU) consists of a simple, Fixed Mapping Translation
(FMT) mechanism for applications that do not require the full capabilities of a Translation Lookaside Buffer- (TLB-
) based MMU.
The synthesizable M4K core includes two different Multiply/Divide Unit (MDU) implementations, selectable at
build-time, allowing the implementor to trade off performance and area. The high-performance MDU option
implements single cycle MAC instructions, which enable DSP algorithms to be performed efficiently. It allows 32-
bit x 16-bit MAC instructions to be issued every cycle, while a 32-bit x 32-bit MAC instruction can be issued every
2 cycles. The area-efficient MDU option handles multiplies with a one-bit-per-clock iterative algorithm.
The M4K core is cacheless; in lieu of caches, it includes a simple interface to SRAM-style devices. This interface
may be configured for independent instruction and data devices or combined into a unified interface. The SRAM
interface allows deterministic response, while still maintaining high performance.
An optional Enhanced JTAG (EJTAG) block allows for single-stepping of the processor as well as instruction and
data virtual address/value breakpoints. Additionally, real-time tracing of instruction program counter, data address,
and data values can be supported.
Figure 1
shows a block diagram of the M4K core. The core is divided into required and optional blocks as shown.
Figure 1 M4K Core Block Diagram
System
Coprocessor
MDU
FMT
MMU
TAP
EJTAG
Power
Mgmt
Off-Chip
Debug I/F
Fixed/Required
Optional
Execution
Core
(RF/ALU/Shift)
On-chip
SRAM
Trace
Off/On-Chip
Trace I/F
CP2
UDI
On-Chip
Coprocessor 2
SRAM
Interface
Dual or
Unified
SRAM I/F
2
MIPS32TM M4KTM Processor Core Datasheet, Revision 01.01
Copyright 2002-2003 MIPS Technologies Inc. All rights reserved.
1.1 Features
5-stage pipeline
32-bit Address and Data Paths
MIPS32-Compatible Instruction Set
Multiply-Accumulate and Multiply-Subtract Instructions (MADD, MADDU, MSUB, MSUBU)
Targeted Multiply Instruction (MUL)
Zero/One Detect Instructions (CLZ, CLO)
Wait Instruction (WAIT)
Conditional Move Instructions (MOVZ, MOVN)
MIPS32 Enhanced Architecture (Release 2) Features
Vectored interrupts and support for external interrupt controller
Programmable exception vector base
Atomic interrupt enable/disable
GPR shadow registers (optionally, one or three additional shadows can be added to minimize latency for interrupt
handlers)
Bit field manipulation instructions
MIPS16eTM Code Compression
16 bit encodings of 32 bit instructions to improve code density
Special PC-relative instructions for efficient loading of addresses and constants
SAVE & RESTORE macro instructions for setting up and tearing down stack frames within subroutines
Improved support for handling 8 and 16 bit datatypes
Memory Management Unit
Simple Fixed Mapping Translation (FMT) mechanism
Simple SRAM-Style Interface
Cacheless operation enables deterministic response and reduces size
32-bit address and data; input byte enables enable simple connection to narrower devices
Single or multi-cycle latencies
Configuration option for dual or unified instruction/data interfaces
Redirection mechanism on dual I/D interfaces permits D-side references to be handled by I-side
Transactions can be aborted
CorExtendTM User Defined Instruction Set Extensions (available in M4K ProTM core)
Allows user to define and add instructions to the core at build time
Maintains full MIPS32 compatibility
Supported by industry standard development tools
Single or multi-cycle instructions
Separately licensed; a core with this feature is known as the M4K ProTM core
1.2 Architecture Overview
MIPS32TM M4KTM Processor Core Datasheet, Revision 01.01
3
Copyright 2002-2003 MIPS Technologies Inc. All rights reserved.
Multi-Core Support
External lock indication enables multi-processor semaphores based on LL/SC instructions
External sync indication allows memory ordering
Reference design provided for cross-core debug triggers
Multiply/Divide Unit (high-performance configuration)
Maximum issue rate of one 32x16 multiply per clock
Maximum issue rate of one 32x32 multiply every other clock
Early-in iterative divide. Minimum 11 and maximum 34 clock latency (dividend (rs) sign extension-dependent)
Multiply/Divide Unit (area-efficient configuration)
32 clock latency on multiply
34 clock latency on multiply-accumulate
33-35 clock latency on divide (sign-dependent)
Coprocessor 2 interface
32 bit interface to an external coprocessor
Power Control
Minimum frequency: 0 MHz
Power-down mode (triggered by WAIT instruction)
Support for software-controlled clock divider
Support for extensive use of local gated clocks
EJTAG Debug
Support for single stepping
Virtual instruction and data address/value breakpoints
PC and data tracing
TAP controller is chainable for multi-CPU debug
Cross-CPU breakpoint support
Testability
Full scan design achieves test coverage in excess of 99% (dependent on library and configuration options)
1.2 Architecture Overview
The M4K core contains both required and optional blocks. Required blocks are the lightly shaded areas of the block
diagram in
Figure 1
and must be implemented to remain MIPS-compliant. Optional blocks can be added to the M4K
core based on the needs of the implementation.
The required blocks are as follows:
Execution Unit
Multiply/Divide Unit (MDU)
System Control Coprocessor (CP0)
Memory Management Unit (MMU)
4
MIPS32TM M4KTM Processor Core Datasheet, Revision 01.01
Copyright 2002-2003 MIPS Technologies Inc. All rights reserved.
Fixed Mapping Translation (FMT)
SRAM Interface
Power Management
Optional blocks include:
Coprocessor 2 interface
CorExtendTM User Defined Instruction (UDI) support
MIPS16e support
Enhanced JTAG (EJTAG) Controller
The section entitled "M4K Core Required Logic Blocks" on page 6 discusses the required blocks. The section entitled
"M4K Core Optional Logic Blocks" on page 18 discusses the optional blocks.
1.3 Pipeline Flow
The M4K core implements a 5-stage pipeline with performance similar to the R3000
pipeline. The pipeline allows the
processor to achieve high frequency while minimizing device complexity, reducing both cost and power consumption.
The M4K core pipeline consists of five stages:
Instruction (I Stage)
Execution (E Stage)
Memory (M Stage)
Align (A Stage)
Writeback (W stage)
The M4K core implements a bypass mechanism that allows the result of an operation to be forwarded directly to the
instruction that needs it without having to write the result to the register and then read it back.
Figure 1-1 shows a timing diagram of the M4K core pipeline.
Figure 1-1 M4K Core Pipeline
I
E
M
A
W
I-A1
RegRd
I Dec
ALU Op
Align
RegW
D-AC
Bypass
Bypass
Mul-16x16, 32x16
RegW
Bypass
Acc
Mul-32x32
RegW
Acc
I-A2
Bypass
Div
RegW
Acc
I-SRAM
D-SRAM
1.3 Pipeline Flow
MIPS32TM M4KTM Processor Core Datasheet, Revision 01.01
5
Copyright 2002-2003 MIPS Technologies Inc. All rights reserved.
I Stage: Instruction Fetch
During the Instruction fetch stage:
An instruction is fetched from instruction SRAM.
MIPS16e instructions are expanded into MIPS32-like instructions
E Stage: Execution
During the Execution stage:
Operands are fetched from register file.
The arithmetic logic unit (ALU) begins the arithmetic or logical operation for register-to-register instructions.
The ALU calculates the data virtual address for load and store instructions, and the MMU performs the fixed virtual-
to-physical address translation.
The ALU determines whether the branch condition is true and calculates the virtual branch target address for branch
instructions.
Instruction logic selects an instruction address.
All multiply and divide operations begin in this stage.
M Stage: Memory Fetch
During the Memory fetch stage:
The arithmetic ALU operation completes.
The data SRAM access is performed for load and store instructions.
A 16x16 or 32x16 multiply calculation completes (high-performance MDU option).
A 32x32 multiply operation stalls the MDU pipeline for one clock in the M stage (high-performance MDU option).
A multiply operation stalls the MDU pipeline for 31 clocks in the M stage (area-efficient MDU option).
A multiply-accumulate operation stalls the MDU pipeline for 33 clocks in the M stage (area-efficient MDU option).
A divide operation stalls the MDU pipeline for a maximum of 34 clocks in the M stage. Early-in sign extension
detection on the dividend will skip 7, 15, or 23 stall clocks (only the divider in the fast MDU option supports early-in
detection).
A Stage: Align
During the Align stage:
Load data is aligned to its word boundary.
A 16x16 or 32x16 multiply operation performs the carry-propagate-add. The actual register writeback is performed
in the W stage.
A MUL operation makes the result available for writeback. The actual register writeback is performed in the W
stage.
W Stage: Writeback
During the Writeback stage:
For register-to-register or load instructions, the instruction result is written back to the register file.