We propose to design and implement an eight-bit ALU/Calculator. The operations supported by this calculator will be addition, subtraction, shift and rotate operations, and boolean operations. (AND, OR, and NOT) A four-bit opcode will distinguish the function the ALU will perform on its eight-bit two's-complement inputs.
After receiving a restart and start signal, the ALU expects to receive a 4-bit opcode which the control unit uses to determine which operation will be performed. As soon as the opcode is received, the first 8-bit value is loaded into the A input register. If a second value is required, it is then loaded into the B input register. At this point, the control unit enables the appropriate functional unit inside of the ALU to perform the requested operation and channels the input data to this unit. Once the requested operation has been completed, the control logic places the new value on the output bus where it is stored in the 8-bit output register.
This ALU uses three basic functional units. Addition and subtraction will be performed using a carry-lookahead adder. Logic operations are performed in the standard way (i.e. bitwise). Left and right logical shift and rotate operations are supported by a single-position transmission-gate shifter.
Here is the opcode table for our CPU:
Operation | Opcode | Operation | Opcode |
---|---|---|---|
add | 0000 | rtl | 1000 |
sub | 0001 | rtr | 1001 |
unused | 0010 | lsl | 1010 |
unused | 0011 | lsr | 1011 |
and | 0100 | unused | 1100 |
or | 0101 | unused | 1101 |
not | 0110 | unused | 1110 |
unused | 0111 | unused | 1111 |
The work was divided up amongst the team members. Each person was responsible for one or two units, from choosing the architecture to designing the unit and testing it.
For the logic unit, there were several options in the architecture and more in the layout. The class text, Principles of CMOS VLSI Design, depicted a multiplexer-based logic unit that used four control lines to accomplish the functions of XOR, NOR, NAND, AND and OR. However, this unit was very large, and required more space for control lines. Also, the NOT function was difficult to implement since it involved adding an array of multiplexers to the input latches so that the NOT function was A NAND A. While this would have allowed more logical functions, under the space constraints, we opted to use a simple AND/OR/NOT structure. The final unit was very compact and relatively fast.
We decided very early that we would be using a carry lookahead adder. The text contained several different kinds of adder architectures. The Manchester scheme seemed to be a smaller, less complicated, and faster structure than the others. The book recommended the Manchester as an improvement on the carry-lookahead implementation because it eliminates the middle level carry gates needed for the carry-lookahead. Again, we were very concerned about space, so we wanted to minimize the size of the adder if at all possible.
Since we would be using the shift unit to do four different functions, we had to multiplex signals to pick the outputs. We decided to use a single-position shift, again thinking of our space problem. For the shift operations, we chose to shift in a zero. The rotate is obvious. Our shift unit is a simple barrel shifter realized in transmission gates.
The registers we used were the simple two-clock latches that were discussed in class and also in the book. However, studying the layout in the text allowed us to condense our latches to be very space-efficient.
The multiplier was the most impressive portion of our design and was fully functional and experimentally compatible with the PLA from a timing and control signal perspective. However, our multiplier was not included in our final chip design due to several design errors that accumulated over our design process that resulted in the incompatibility of the multiplier with the rest of our circuit. The source of our problems was the size of the unit: we simply ran out of space to place the unit.
Our first mistake was that our approach to counting eight cycles was inefficient. Initially, we expected to construct a three-bit counter to indicate when eight cycles were complete, thereby indicating the completion of the loop. As we progressed, we concluded that a three-bit counter would be unreasonably large, and adding eight states onto our PLA would be more space efficient. While our decision was correct, the eight shift registers in series that were needed on the output were not considered until the "point of no return" had been crossed.
Secondly, we felt that separate a separate adder and multiplier would be the best choice for our design. The individual units would allow for speed optimization and also (we thought) reduce routing concerns. In retrospect, our concern with speed was foolish: we ended up sacrificing functionality for speed. The routing concern derived itself from an incorrect approach to the problem: we were considering using the adder to multiply instead of the multiplier to add.
Also, we felt that we could be conservative with our space; i.e., we felt it was appropriate to sacrifice space for design. At first, we approached the space issue assuming we would need approximately 1/3 of our circuit for routing. Since our functional units took up under 60% of the silicon, we were confident that further space reduction was not warranted. However, the 1/3 assumption was not valid for our circuit: we needed over 50% for routing. This space condition was not he result of poor layout, but rather a lack of understanding. We simply were not prepared for the fact that a the sixteen-bit data input bus to the functional units and the eight-bit output bus, no matter how cleverly laid out, on a 1800x1800 micron chip, would dictate the need for half the circuit for routing. We should have approached the design from building functional units around these two primary buses, and not visa versa. We would have done so if we had appreciated the proportional size of the buses.
The multiplier functionality is described fully in the attached document. It was not included here because it was not part of our final design. Additional documents are devoted to explanations of the state sequence, the PLA, and timing. They are included in the appropriate subsections.