Arithmetic Block Timing



Overview

As with the functional simulation, timing analysis was performed on each of the sub-units individually as well as on the whole arithmetic block. In each case, a crystal command file was used to exercise the unit under test in all possible ways. This involved finding the appropriate settings for a given operation, causing a single input bit to toggle at time zero, observing the longest critical path, and repeating the process for all possible toggles of all input bits. The "critical -s" command was used within crystal to generate a SPICE file implementing the longest path. For each SPICE file, ".MODEL" cards were extracted from .spcin files generated by pspicetool.

Input Bit Selector

The crystal command file is separated into four sections:

Summary of the results
input node toggle output node result delay (ns)
to aout
ASEL low aout low 1.83
ASEL high aout high 2.98
ain low aout low 1.83
ain high aout high 3.37
Addition: bin to bout
bin low bout low 5.91
bin high bout high 11.16
Subtraction: bin to bout
bin low bout high 15.77
bin high bout low 10.66
Multiplication: to bout
-MRESET low bout low 6.23
-MRESET high bout high 8.55
fsum low bout low 6.23
fsum high bout high 8.94
Feedback: sumin to fsum
sumin low fsum low 5.57
sumin high fsum high 8.35

crystal determined that bin switching low during a subtraction operation created the longest path: bout switched high after 15.77 ns. crystal was used to generate a SPICE input file, which in turn was used to generate the following graph:

red v(11)=bin
blue v(5)=bout

(The above graph is also available in .ps format.) bout appears to be asymptotically approaching 4 V. If this is true, it achieves 90% of its final value after about 15 ns. We are not sure why bout does not approach a full 5 V.


Carry Lookahead Adder

The crystal command file for the carry lookahead adder toggled each possible input bit to the adder in turn, looking for the longest path to any of the sum output bits, or the two high order carry-out bits (which are used in the calculation of overflow).

Summary of the results
input node toggle output node result delay (ns)
cin high c3 high 16.41
cin low s3 high 18.35
a0 high s3 high 29.61
a0 low s3 low 29.35
a1 high s3 high 30.25
a1 low s3 low 29.92
a2 high s3 high 29.05
a2 low s3 low 30.09
a3 high c3 high 23.17
a3 low c3 high 24.31
b0 high s3 high 29.22
b0 low s3 low 29.35
b1 high s3 high 29.85
b1 low s3 low 29.92
b2 high s3 high 28.66
b2 low s3 low 30.09
b3 high c3 high 23.17
b3 low c3 high 24.31

crystal determined that if a1 is driven high, in the worst case, s3 will be driven low 30.25 ns later. The crystal-generated SPICE input file generated the following graph:

red v(41)=a1
blue v(5)=s3

(The above graph is also available in .ps format.) s3 reaches 90% of its final value at about 25 ns.


Gating/Feedback

A crystal command file observed the paths from the input to both the latched and the gated outputs. For the latched output, ph2 is held high. For the gated output, ADDOUT is held high.

Summary of the results
input node toggle output node result delay (ns)
Latched output: sumin to ssum
sumin low ssum low 5.11
sumin high ssum high 7.89
Gated output: sumin to sumout
sumin low sumout low 5.11
sumin high sumout high 7.89

crystal determined that the longest paths through the latch and the gate are both 7.89 ns, and both occur when sumin is driven high. The crystal-generated SPICE input file generated the following graph:

red v(11)=sumin
blue v(5)=ssum

(The above graph is also available in .ps format.) ssum achieves 90% of its final value after about 7 ns.


Arithmetic Block

A crystal command file was used to investigate the longest path through the entire block. For the purposes of determining the maximum clock speed, we must find the longest path from the outputs of the input latches to the outputs of output latches. In all cases, the input latches for the feedback values and MRESET (the a and b latches are external to the adder) are turned off (ph1 is held low) and crystal drives the rest of the circuit from the outputs of these latches. CO and OVERFLOW are latched inside the adder, but the outputs are not. A new magic file containing the adder and the output register was used for the simulation. crystal found the longest path to any of the latched output bits (out[15:8], CO, or OVERFLOW) when each input bit was toggled.

For addition and subtraction, OP4 was asserted. For multiplication, OP4 and OP0 were deasserted. In all cases, ADDOUT and LATCHOUT were asserted to enable the output gates and latches.

Summary of the results
input node toggle output node result delay (ns)
Addition or subtraction
a0 low out15 high 95.72
a0 high out15 high 101.37
a1 low out15 high 96.56
a1 high out15 high 102.01
a2 low out15 high 98.04
a2 high out15 high 102.27
a3 low out15 high 97.00
a3 high out15 high 100.85
a4 low out15 high 65.88
a4 high out15 high 71.53
a5 low out15 high 66.72
a5 high out15 high 72.16
a6 low out15 high 66.94
a6 high out15 high 71.17
a7 low OVERFLOW high 57.97
a7 high OVERFLOW high 61.83
b0 low out15 high 120.74
b0 high out15 high 114.89
b1 low out15 high 121.38
b1 high out15 high 115.53
b2 low out15 high 121.64
b2 high out15 high 115.79
b3 low out15 high 120.22
b3 high out15 high 114.37
b4 low out15 high 90.90
b4 high out15 high 85.05
b5 low out15 high 91.53
b5 high out15 high 85.68
b6 low out15 high 90.54
b6 high out15 high 84.69
b7 low OVERFLOW high 81.19
b7 high OVERFLOW high 75.34
Multiplication
f0 low out15 high 97.84
f0 high out15 high 80.20
f1 low out15 high 100.49
f1 high out15 high 109.03
f2 low out15 high 103.41
f2 high out15 high 111.08
f3 low out15 high 102.37
f3 high out15 high 109.67
f4 low out15 high 71.25
f4 high out15 high 80.34
f5 low out15 high 72.09
f5 high out15 high 80.98
f6 low out15 high 72.31
f6 high out15 high 79.98
f7 low OVERFLOW high 63.35
f7 high OVERFLOW high 70.64
b0 low out15 high 105.71
b0 high out15 high 108.45
-MRESET low out15 high 103.41
-MRESET high out15 high 110.69

crystal determined that the longest path occurs when b2 is driven low: out15 is driven high 121.64 ns later. The crystal-generated SPICE input file generated the following graph:

red v(71)=b2
blue v(5)=out15

(The above graph is also available in .ps format.) out15 achieves 90% of its final value after about 70 ns.


Summary and Analysis

crystal found the longest path through the entire adder (including output latches) to be 121.64 ns. This represents the minimum time from ph1 fall to ph2 fall. If the time from ph2 fall to ph1 is the same, the total period is 243.28 ns for each clock. The maximum clock speed is the inverse of this value: 4.11 MHz. If the time from ph2 fall to ph1 is reduced to the minimum value of 8.35 ns (as found in the analysis of the input bit selector), the total period is 129.99 ns, and the maximum clock speed is 7.69 MHZ.

SPICE simulated the longest path through the entire adder with a rise time of about 70 ns. If the two phases are of equal length, the total period is 140 ns and the maximum clock speed is 7.14 MHz. If the second phase is shortened to about 10 ns to allow the feedback value to latch, the total clock period is 80 ns and the maximum clock speed is 12.5 MHz.

If the adder is the limiting factor of the chip, it should easily be capable of running at 4 MHz, and possible as high as 7 MHz, with equal phase clocks. If we shorten the second phase, it should be able to achieve speeds of 7 to 12 MHz.


Regrets

I did not realize that crystal could be provided with a single long list of delay statements and remember the longest critical path. Therefore, my crystal command files consist of many sets of setup commands, each followed by a single delay command. This also resulted in a large quantity of data to be filtered, as is evidenced by the tables above. The whole process could have been simplified immensely, and the quantity of data reduced.