CPU Functional Description

ARCHITECTURE

Our accumulator-style CPU uses a simple 8-bit instruction set and on-chip memory to achieve fast execution. Using an accumulator decreases the hardware complexity, which is important given our chip-size constraints. This architecture will tend to decrease code size but increase the memory traffic. We hope that the decreased access time of the on-chip memory will offset this drawback. We developed fixed-length instructions for the same reasons: simpler hardware and faster decoding. Additionally, this ensures memory alignment. Having on-chip memory will facilitate our I/O and decrease the size of our bus, but will make the job of the software/compiler writer more difficult (since only 10 instructions can be in memory at a time).

We tend to think of our IM as a single-line instruction cache with compiler-controlled prefetching. A block of sequential memory is loaded to the "cache." Instructions are executed in order or as indicated by the control stream until a prefetch instruction is detected. At this point, a new block is loaded into the "cache." However, we lack the hardware to specify main memory addresses (or any main memory, for that matter), so it is the software's job to load the correct next block.

INSTRUCTION SET

The instruction set includes data accesses, ALU operations, control, and a few miscellaneous operations. Each 8-bit instruction has a 4-bit opcode and 4-bit operand. Since one source register and the destination register is implicit in the accumulator architecture, we need only one operand in the instruction.
See the complete Instruction Set.

Adressing Modes

Our CPU uses 4 addressing modes:

Immediate

Some ALU operations (ADDI, AND, etc) have operands specified in the hi 4-bits of the instruction.

Direct

Other ALU operations (ADD, SUB) fetch operands from the location in memory specified by these hi bits of the instruction.

Offset

Taken branches add the offset specified by the instruction to the current PC value, for use in IM addressing.

Indexed

The ADDX instruction fetches its operand from the DM location specified by the IX (IndeXing Register). The IX is loaded at least on instruction previous to the ADDX, and is not overwritten on every instruction like the TP.

FUNCTIONAL UNITS and LATENCIES

ALU
Our main functional unit is a 4-bit ALU. It supports ADD, SUB, ShiftR, and bit-wise logical operations. Using accumulator architecture, one operand to the ALU will always be the value in the ACC register and the output will always be written to the ACC. The ALU uses a register to buffer the ACC input. At least one cycle before the output is to be stored, this buffer must be loaded with the current ACC value.
All ALU operations have a 1 cycle latency.

Memory Another important block in our CPU is the memory. Both Instruction Memory (IM) and Data Memory (DM) have registers to specify the address being accessed. This value is passed to a decoder which, in combination with a READ or WRITE signal, assert the output to control the relevant location. IM uses input and output buffers in order to store the entire instruction byte given 4-bit inputs and bus lines.
DM accesses have a 1 cycle latency, while IM accesses take 2 cycles because of the 4-bit bus.

Program Counter The PC is a simplified ALU, with just an adder and a buffer register, that is used to index the . It uses a mux to chose between incrementing by 1, or adding an operand in the case of branch instructions.
The PC has a single cycle latency to increment.

Controller We have a 3-stage Instruction Load: load opcode, load data, memory write and 4-stage Execution: instruction fetch, decode, operand fetch, and execute. Loading has a 4-cycle latency and execution averages 6 cycles. The controller is implemented as a PLA .


This is a Group Project for Elec422.
Members: Rebecca Ma | Jill Nelson | Deborah Watt

Back to Table of Contents