Run-length test


The general outline of this test is simple. The sequence produced by random() is broken up into maximum-length subsequences of monotonically increasing values. The length of each such subsequence is determined, and a count is kept of each possible length.

When testing short bit-fields from the outputs of random(), the maximum-length subsequences are also short (for three bits, a monotonically increasing subsequence is at most length 8). We chose to implement this test on short bit-fields to simplify the computation of the expected counts, by throwing away repeated values. For example, if we are testing 3-bit fields and random() produces the subsequence (after extracting the selected field)

1 3 1 1 4 2

the test will throw away the second and third occurrences of 1 and regard this as a run of length 3, since the 2 terminates the run at the previous value 4. Based on this approach, we throw away the value after the end of a run to eliminate correlation between successive runs, UNLESS the last value of a run was the maximum possible value, in which case we do not need to throw away the next value. The tests for short bit-fields used enough outputs (only counting the outputs NOT thrown away) to insure that the smallest expected count for any run length was 20.

Using the entire 31-bit quantity returned by random() creates two problems. First, while most subsequences characterized this way will be quite short, some subsequences can be quite long. To deal with this, we group all subsequences over a given length together, and choose the total number of values generated to be large enough to ensure that the expected count in this "overflow" group will be much larger than 5.

Second, because the last value in one subsequence is always greater than the first value in the next subsequence by definition, the subsequences are correlated. We eliminate this correlation by throwing away the value after the end of each subsequence, similar to what was done for short-bit fields where the run did not end in the maximum value. This allows the next subsequence to start with any value, regardless of the value on which the previous subsequence ended. The penalty we pay for this solution is that we through away one value for each subsequence, require that we generate about 58% more values than we actually use. However, we feel that the simplicity that this approach allows more than compensates for the cost.

In testing the complete output of random(), we used subsequence lengths of 1 to 8, plus the overflow group. We also used 1,000,000 values actually retained (not including the values thrown away at the end of each subsequence). This resulted in an expected count in the overflow group of about 12.8, of which only about 1.44 would be length 9 subsequences.

Return to the testing home page.