How random is random()?


random() is the random number generator that comes as part of the standard C library on Unix. According to the man pages, random() is a non-linear, additive feedback generator that returns successive pseudo-random numbers in the range (0, (2**31)-1). It belongs to a class of what is sometimes referred to as additive congruential random number generators. random() uses a state table of size 8, 32, 64, 128, or 256 bytes, selected by the programmer, to remove correlation between successive values. The man pages further state that "All the bits generated by random() are usable. For example, random()&01 will produce a random binary value."

This last statement should be taken with some reservations. Standard statistical tests (based on a single-tailed chi-square test) show that the low-order bits of the numbers returned by random() are correlated, especially when the state size is 8 or 32 (the default) bytes. This is not too surprising for 8 bytes, since in this case random() defaults to a simple linear congruential random number generation method.

This page contains links to figures showing the results of six tests applied to sequences of values produced by random(). In all cases, random() was seeded "randomly" for each test by calling gettimeofday() and using the returned microseconds field as the seed. The tests were:

In all cases, we examined fields of one to a small number of bits, starting at all relevant bit positions. For more information on the nature of each test, the fields tested, and the results, see the associated links.

In addition, we tested the full 31 bits returned by random() using all six tests. The results of these tests are given in a summary table.

A brief explanation of the testing procedure: Each "test" mentioned above was actually 1000 trials, each started with a random and independent seed. All of the trials were based on a chi-square test. In each trial, we computed the critical value of the chi-square statistic for a 95% probability (i.e., the value of a chi-square-distributed random variable such that 95% of the probability mass lies to the left). For each of the 1000 trials, we recorded a success when the chi-square statistic was less than or equal to the critical value. The results show the fraction of successful trials out of 1000. Since a 95% critical value was chosen, a random number generator that produces independent, uniformly distributed integer random values in the range (0,(2**31)-1) should have about a 0.95 rate of success.

The answer to the title question, "How random is random()?", is, "It depends." No pseudo-random number generator can truly be random; what is important is that the sequence of values it produces "looks" random() for the application using the sequence. If you plan to use the entire 31-bit value generated by random, for instance, in generating other, non-uniform random values via inversion, the summary table indicates that random() is good, except for a state size of 32 bytes, although the results for the interval test are not particularly confidence-enhancing. On the other hand, both the permutation and interval tests seem to indicate that neither 8 nor 32 bytes is enough for the state array, if you are extracting small fields from the 31-bit number and particularly if those fields are in the least-significant end of the number.

One final comment: despite what the man pages say, random()&01 is not a particularly "random" value. In virtually every test that examines single-bit fields, no matter what size the state array, the results for random()&01 are poor to bad.

What these results do not (and can not) show is how the lack of randomness in random()-produced sequences will affect your application. They can, however, serve as a guide in avoiding certain uses of the outputs of random() that may not be appropriate.