GSAT Chip: Testing Report

Tim Danner, Reuven Lax, and Andy MacKay

Contents

Contents

1  OmniLab Setup

2  Test Vectors

3  Functional Testing
    3.1  Register File
    3.2  Whole Chip

4  Speed Testing

5  Possible Improvements


A photograph of the die taken with a light microscope.

1  OmniLab Setup

The chip was connected through a breadboard to an Orion Instruments OmniLab E 9250 stimulus/logic analyzer. Inputs to the chip were also wired through to the analyzer. The analyzer was driven from a PC using the OmniLab 3.09 software.

The design of the system called for the expression to be stored in an external non-volatile RAM chip. For testing purposes, we omitted that chip and instead simulated it through OmniLab.

2  Test Vectors

We began with a basic functional test of the register file block, checking that each cell could store a zero, a one, and something different from the cells around it (an alternating pattern of bits). Next, we ran the very complete test developed last year for simulation. This long (several hundred cycles) test verifies that the chip accurately solves a problem with 15 variables and 10 clauses, and was our litmus test for verifying the functionality of the design via irsim.

3  Functional Testing

3.1  Register File

The first testing phase, functionality of the registers, almost passed. The register file is structured as 16 columns of 8 bits. The first 15 columns on all 5 chips passed the tests, but the behavior of the last column was strange, though identical for the 5 chips. The alternating-bits test and the all-ones test work fine, but in the all-zeros test, the last column appears to be stuck high. This cannot be explained as a simple stuck-at-one fault in the last column, because it reproduces the alternating pattern correctly.


This picture illustrates the fault.


Another screenshot showing correct behavior for the alternating bits case.

We attempted various simulations of the final cif file using irsim, but the simulated results always matched our expectations. We were unable to explain the anomalous behavior of the last column of the register file.

Fortunately, the workaround for this problem is easy -- simply don't use the last column of the register file. This reduces the chip from a 128-variable solver to a 120-variable solver, but doesn't otherwise interfere with functionality.

3.2  Whole Chip

The first thing we noticed in running the big test is that the 12-bit counter used to address the external memory wasn't quite right.


The problem begins immediately to the left of marker number 2.

The pins ADDR[0:11] count normally until they reach 15, then the higher order bits begin counting rapidly, one state per clock cycle, instead of the 4 cycles per state that they should use. This comes from the structure of the 12-bit counter: 3 4-bit meg-generated counters strung together. The increment logic on the middle counter is wrong.


The AND gate above ``#1'' should include the increment signal in its inputs to produce ``#2'', the increment signal for the second sub-counter.

The proper increment logic for the middle counter should be that it increments the same cycle that the lower-order counter transitions to zero. This happens when all 4 of the state bits of the lower-order counter are high, and the counter's overall increment input signal is asserted. While we remembered this in other parts of the chip, the middle sub-counter of the 12-bit memory addressing counter is missing that clause. It increments if the lower order 4-bit counter is in state 15, without regard to the 12-bit counter's global increment signal. If the counter is incrementing on every clock cycle (as it is in the test vectors we used while developing this module), this problem does not surface. But in normal operation, the counter will be expected to hold a state for several cycles. As built, it can't wait in states where the 4 low-order bits are all high.

The second major problem we found is that the logic for detecting whether the expression is satisfied is faulty. After completing the first pass through evaluating the expression, the DONE signal is asserted and the chip behaves as if no further work were necessary to satisfy the expression. This logic consists of several components:

None of these items are visible from the pins, so we could find no way to determine by testing which of these is failing. Careful visual inspection in magic turned up no obvious flaws. Behavior was consistent across all 5 chips, so it is not likely to be a fabrication flaw.

4  Speed Testing

Given that the chip isn't usable, its exact maximum operating frequency is less interesting. It seems to behave the same up to about 2.5 MHz, but we couldn't test above that due to some very strange effects involving the clock.


Notice the very strange patterns of clock A and clock B near marker number 1.

In the presence of this odd pattern of almost-overlapping, out of order clock pulses, the PLA controller behaves erratically.

5  Possible Improvements

More thorough testing would have found the problem with the 12-bit counter. We tested its ability to count through its whole state space and deal properly with overflow, but we did not test its ability to sit in the various intermediate states.

If we had the whole thing to do over, I think we would have chosen a less complicated project. Everything was made more difficult by the fact that a basic test run requires hundreds of clock cycles, and getting the timing right was very tricky.

Assuming everything had worked (except perhaps for the last column of the register file -- that wasn't a show-stopper), the next step would have been to load a difficult expression into the non-volatile RAM and allow the chip to interface with it directly, rather than simulating it with the testing equipment.

Last modified: Tues, May 2, 2000, 2:34 am