Introduction
The functions for computations of basic statistics generally have relatively simple input parameters. The data are input in either a one- or two-dimensional array. As usual, when a two-dimensional array is used, the rows contain observations and the columns represent variables. Most of the functions in this chapter allow for missing values. Missing value codes can be set by using function MACHINE.
Several functions in this chapter perform statistical tests. These functions generally return a “p-value” for the test, often as the return value for the C function. The p-value is between 0 and 1 and is the probability of observing data that would yield a test statistic as extreme or more extreme under the assumption of the null hypothesis. Hence, a small p-value is evidence for the rejection of the null hypothesis.
Overview of Random Number Generation
Chapter 11: Basic Statistics and Random Number Generation describes functions for the generation of random numbers and of random samples and permutations. These functions are useful for applications in Monte Carlo or simulation studies. Before using any of the random-number generators, the generator must be initialized by selecting a seed or starting value. This can be done by calling the Set keyword with the RANDOMOPT procedure. If the user does not select a seed, one is generated using the system clock. A seed needs to be selected only once in a program, unless two or more separate streams of random numbers are maintained. There are other utility functions in this chapter for selecting the form of the basic generator, restarting simulations, and maintaining separate simulation streams.
In the following discussions, the phrases “random numbers,” “random deviates,” “deviates,” and “variates” are used interchangeably. The phrase “pseudorandom” is sometimes used to emphasize that the numbers generated are really not “random” since they result from a deterministic process. The usefulness of pseudorandom numbers is derived from the similarity, in a statistical sense, of samples of the pseudorandom numbers to samples of observations from the specified distributions. In short, while the pseudorandom numbers are completely deterministic and repeatable, they simulate the realizations of independent and identically distributed random variables.
Basic Uniform Generator
The random-number generators in this chapter use a multiplicative congruential method. The form of the generator is as follows:
xi = cxi – 1mod(231 – 1)
Each xi is then scaled into the unit interval (0,1). If the multiplier, c, is a primitive root modulo 231 – 1 (which is a prime), then the generator has a maximal period of 231 – 2. However, there are several other considerations. See Knuth (1981) for a general discussion. The possible values for c in the generators are 16807, 397204094, and 950706376. The selection is made by using the Gen_Option keyword with the RANDOMOPT procedure. The choice of 16807 results in the fastest execution time, but other evidence suggests that the performance of 950706376 is best among these three choices (Fishman and Moore 1982). If no selection is made explicitly, the functions use the multiplier 16807, which has been in use for some time (Lewis et al. 1969).
The default action of the RANDOM function is the generation of uniform (0,1) numbers. This function is portable in the sense that, given the same seed, it produces the same sequence in all computer/compiler environments.
Shuffled Generators
The user also can select a shuffled version of these generators using the Gen_Option keyword with the RANDOMOPT procedure. The shuffled generators use a scheme due to Learmonth and Lewis (1973). In this scheme, a table is filled with the first 128 uniform (0,1) numbers resulting from the simple multiplicative congruential generator. Then, for each xi from the simple generator, the low-order bits of xi are used to select a random integer, j, from 1 to 128. The jth entry in the table is then delivered as the random number, and xi, after being scaled into the unit interval, is inserted into the jth position in the table. This scheme is similar to that of Bays and Durham (1976), and their analysis is applicable to this scheme as well.
Setting the Seed
Using the RANDOMOPT procedure with the Set keyword, the seed of the generator can be set and can be retrieved with the Get keyword. Prior to invoking any generator in this section, the user can call RANDOMOPT to initialize the seed, which is an integer variable with a value between 1 and 2147483647. If it is not initialized by RANDOMOPT, a random seed is obtained from the system clock. Once it is initialized, the seed need not be set again.
If the user wants to restart a simulation, RANDOMOPT can be used to obtain the final seed value of one run to be used as the starting value in a subsequent run.