
CPU Functional Description
ARCHITECTURE
Our accumulator-style CPU uses a simple 8-bit instruction set and on-chip memory to achieve fast execution. Using an accumulator decreases the hardware complexity, which is important given our chip-size constraints. This architecture will tend to decrease code size but increase the memory traffic. We hope that the decreased access time of the on-chip memory will offset this drawback. We developed fixed-length instructions for the same reasons: simpler hardware and faster decoding. Additionally, this ensures memory alignment. Having on-chip memory will facilitate our I/O and decrease the size of our bus, but will make the job of the software/compiler writer more difficult (since only 10 instructions can be in memory at a time).
We tend to think of our IM as a single-line instruction cache with compiler-controlled prefetching. A block of sequential memory is loaded to the "cache." Instructions are executed in order or as indicated by the control stream until a prefetch instruction is detected. At this point, a new block is loaded into the "cache." However, we lack the hardware to specify main memory addresses (or any main memory, for that matter), so it is the software's job to load the
correct next block.
INSTRUCTION SET
The instruction set includes data accesses, ALU operations, control, and a few miscellaneous operations. Each 8-bit instruction has a 4-bit opcode and 4-bit operand. Since one source register and the destination register is implicit in the accumulator architecture, we need only one operand in the instruction.
See the complete
Instruction Set.
Adressing Modes
Our CPU uses 4 addressing modes:
- Immdiate
- Direct
- Offset
- Indexed
Immediate
Some ALU operations (ADDI, AND, etc) have operands specified in the hi
4-bits of the instruction.
Direct
Other ALU operations (ADD, SUB) fetch operands from the location in memory
specified by these hi bits of the instruction.
Offset
Taken branches add the offset specified by the instruction to the current
PC value, for use in IM addressing.
Indexed
The ADDX instruction fetches its operand from the DM location specified by
the IX (IndeXing Register). The IX is loaded at least on instruction
previous to the ADDX, and is not overwritten on every instruction like the TP.
FUNCTIONAL UNITS and LATENCIES
ALU
Our
main functional unit is a 4-bit ALU. It supports ADD, SUB, ShiftR, and
bit-wise logical operations. Using accumulator architecture, one operand
to the ALU will always be the value in the ACC register and the output will
always be written to the ACC. The ALU uses a register to buffer the ACC
input. At least one cycle before the output is to be stored, this buffer
must be loaded with the current ACC value.
All ALU operations have a 1
cycle latency.
Memory Another important block in our CPU is the memory. Both
Instruction Memory (IM) and Data Memory (DM) have registers to specify the address being
accessed. This value is passed to a decoder which, in combination with a
READ or WRITE signal, assert the output to control the relevant location.
IM uses input and output buffers in order to store the entire instruction
byte given 4-bit inputs and bus lines.
DM accesses have a 1 cycle
latency, while IM accesses take 2 cycles because of the 4-bit bus.
Program Counter The PC is a simplified ALU, with just an adder and a buffer register, that is used to index the . It uses a mux to chose between incrementing by 1, or adding an operand in the case of branch instructions.
The PC has a single cycle latency to increment.
Controller We have a 3-stage Instruction Load: load opcode, load
data, memory write and 4-stage Execution: instruction fetch, decode,
operand fetch, and execute. Loading has a 4-cycle latency and execution
averages 6 cycles. The controller is implemented as a PLA .

This is a Group Project for Elec422.
Members: Rebecca Ma | Jill Nelson | Deborah Watt
Back to Table of Contents