0% found this document useful (0 votes)
91 views

Algorithms For Randomness in The Behavioral Sciences

Computer algorithm for random in behavioral sciences

Uploaded by

Rafael Almeida
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views

Algorithms For Randomness in The Behavioral Sciences

Computer algorithm for random in behavioral sciences

Uploaded by

Rafael Almeida
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Behavior Research Methods, Instruments, & Computers 1991, 23 (1) 45-60

COMPUTER TECHNOLOGY Algorithms for randomness in the behavioral sciences: A tutorial


University of

MARC BRYSBAERT Leuven, Leuven, Belgium

Simulations and experiments frequently demand the generation of random numberathat have specific distributions. This article describes which distributions should be used for themost cammon problems and gives algorithms to generate the numbers~ Iti~ aiso shown that~a commonly used permutation algorithm (Nilsson, 1978) is deficient. Numerous studies in the behavioral sciences make use of randomness: Subjects must be randomized over conditions, stimuli are to be presented in an unpredictable sequence, simulations involve an unsystematic component, or events must take place at random time intervals. Unfortunately, randomness is not an unambiguous concept. There are several types of randomness, each of which is appropriate only under well-specified conditions. This article consists of an attempt to give an idea of the most common types of randomness and the situations in which they are to be applied. It is intended as a practical guide for researchers, with mathematical proofs and justifications omitted as much as possible. This may make the text unsatisfactory for mathematically oriented scientists, but it should make it readable for everyone who wants a short review and a brief answer to problems that can occur in the actual use and generation of random numbers. Emphasis has been placed on problems encountered in experimental psychology. Researchers interested in simulations will find a groundwork in the text, but they may additionally wish to consult more specialized texts such as Kennedy and Gentle (1980), Knuth (1981), or Ripley (1987). GENERATING RANDOM NUMBERS Although in a strict sense a sequence of random numbers can only be obtained from a truly random phenomenon, practical limitations have led to the almost exclusive use of pseudorandom number generators in science. These are mathematical functions that are essentially deterministic, but ones that mimic the properties of a sequence of independent uniformly distributed random variables (i.e., variables of which each value has the same probability of occurring). These sequences can further be
The author wishes to thank G. dYdewalle, as well as the editor and three reviewers, for helpful comments on earlier drafts ofthe manuscript. Correspondence should be addressed to Marc Brysbaert, University of Leuven, B-3000 Leuven, Belgium.

translated into samples from other distributions (e.g., from the standard normal distribution; see below). First, we will consider the generation of uniformly distributed random numbers. Uniform Distribution By far the most successful pseudorandom number generators known today are special cases of the (linear) congruential method first proposed by Lehmer (Knuth, 1981; Ripley, 1987). A sequence of random numbers is generated with the use of the following equation: X,1+1 in which a, c m X0 X,~ mod
=

(aX~ + c) mod m

(1)

positive constants, a > 0, C 0, = the modulus, m > a, m > Xo, m > c, = the starting value or seed, = the nth value of the sequence, and = the modulus operator (returns the remainder of the division of two integer operands, e.g., 10 mod 3 = 1, because 3*3+1 = 10; likewise, 9 mod 3 = 0, and 11 mod 3 = 2). In most cases, the numbers generated by Equation 1 are further divided by the modulus m, in order to obtain a (uniform) distribution of real numbers between 0 and 1. However, not all values of a, c, and m yield a good random generator. There is a large literature about which values to use (see Nance & Overstreet, 1972; Sahai, 1979; Sowey, 1972, 1978, 1986, for bibliographies). Table 1 gives some of the best values that have been suggested hitherto, and references to where more information may be found. As shown in Table 1, the modulus of the generators is quite large. This is because all congruential random number generators ultimately get into a loop, producing a sequence ofnumbers that is repeated endlessly. The length of the repeating sequence is called the period ofthe generator;
=

45

Copyright 1991 Psychonomic Society, Inc.

46

BRYSBAERT
Table 1 Some Good Random Number Generators Given in the Literature m a c Source 2~ 13~ 0 Ripley (1983) 248 44485709377909 0 Ripley (1983) 2~ 5 odd Knuth (1981) 232 69069 odd Ripley (1983) 100485 232 1 Atkinson (1980) 232 1589013525 odd Ripley (1983) 2~ 663608941 0 Dudewicz and Ralley (1981) 2~l 630360016 0 Dudewicz and Ralley (1981) 2~1 764261123 0 Dudewicz and Ralley (1981) 2~1 950706376 0 Fishman and Moore (1986)

X 0 odd odd

Period 2
246

2~
232 232 232

odd <>0 <>0 <>0

2~ 2~2 2~2 2~2

it is always less than or equal to the modulus m. A generator with modulus 8 thus yields a period of 8 different numbers at most. For instance, a generator with a = c = 5 and m = 8 always gives the sequence 1, 2, 7, 0, 5, 6, 3, 4, 1, 2, 7 irrespective of the starting value (the only difference that the starting value makes consists of where in the period the process is started). The importance of the values of a and c can be illustrated if we take, for instance, a = c = 2, m = 9. This leads not only to a period shorter than the maximum period (i.e., 9), but also to a period that depends on the starting value, as can be seen below
...,

& Hill, 1984). In addition, Wichmann and Hill (1982) provided an implementation of the algorithmthat requires arithmetic only up to 30323 and therefore is easy to implement on a 16-bit microprocessor (the maximum value of a 16-bit integer is 32767). Wichmann and Hills (1982) implementation is the following: Algorithm 1 Wichmann and Hills (1982) Random Number Generator 1. Define 3 starting values (one for each subgenerator) set seed 1 = 0 < seed 1 < 30269, set seed2 = 0 < seed2 < 30307, set seed3 = 0 < seed3 < 30323, 2. Calculate random number and redefine 3 starting values set seedi = 171 * (seedi mod 177) 2 * (seedi div 177) setseed2= 172 * (seed2 mod 176) 35 * (seed2 div 176) setseed3= 170 * (seed3 mod 178) 63 * (seed3 div 178) if seedl < 0 then set seedl = seedI + 30269 if seed2 < 0 then set seed2 = seed2 + 30307 if seed3 < 0 then set seed3 = seed3 + 30323 set random number = fractional part of (seed1/30269 + seed2/30307 + seed3/30323) 3. Optional, to check for rounding-offerrors (McLeod, 1985) if random number 0 then set random number = 1E-30 if random number 1 then set random number = 0.9999999999 4. Return random number

X0 = 0 0, 2, 6, 5, 3, 8, 0, X0 = 1 p 1, 4, 1, 4, X0=2~2,6,5, 3, 8, 0, 2, X0 = 3 3, 8, 0, 2, 6, 5, 3, X0 = 4 1, 4, 1, 4, X0=55,3,8, 0, 2, 6, 5, X0 = 6 6, 5, 3, 8, 0, 2, 6, X0=7*7,7,7,... X0 = 8 8, 0, 2, 6, 5, 3, 8, All the generators in Table 1 produce satisfactory sequences of random numbers if the requirements with respect to the starting value are met (in some cases this value must be odd or different from zero). One problem, however, seriously limits their use in everyday scientific life. Because round-off errors must not occur, the algorithms are difficult to implement on the 16-bit microprocessors frequently used in psychological laboratories. This problem for a long time seemed unsolvable, because of the need for a sufficiently large modulus, but in 1982 Wichmann and Hill presented a rather simple solution. They showed that adding several congruential generators and taking the fractional part led to a new congruential generator with a much larger modulus and much better statistical properties. More specifically, they used the following three multiplicative (i.e., c = 0) generators:
~ ... ... * ... ~ ~ ...

The generator needs three seeds to start (see Part 1 ol the algorithm). All seeds must be larger than 0 and smallet than their modulus. They only need to be defined before m = 30269, generator 1: a = 171 the first random number is calculated, because they are m = 30307, generator 2: a = 172 2~ updated every time a new number is produced. Taking generator 3: a = 170 m = 30323. the same seeds leads to the same sequence of random numThe composite generator is equivalent to a simple mul- bers, which may be appropriate in simulations to test the effect of a small variation in one of the parameters, bui tiplicative congruential generator with a = 1655 54252 which usually is not necessary. Random seeds can be 64690 and m = 2781 71856 04309 (Zeisel, 1986), and obtained by using either the time-of-day clock informait has an estimated period length of 6.95 * 1012 (Wichmann
/

ALGORITHMS FOR RANDOMNESS

47

the same results as will Wichmann and Hills (1982) gention or the random number generator of the machine (i.e., the routines RANDOMIZE and RND in Microsoft BASIC). erator. The only difference will be that the latter takes However, the best way to guarantee that two subsequent more time (for those who are interested, the exact value of sequences are independent is to take the last values of the P(x2 <y) = 2/3; 100,000 trials with Algorithm 1 gave an first series as the seeds of the second series. Random estimate of 0.6647, and 100,000 trials with the IBM builtseeds involve the (small) risk that the generator starts in generator yielded 0.6641). If, however, the indepensomewhere in the sequence of the previous series and thus dence of the numbers in a sequence is predominant, most produces two related strings of random numbers. In the built-in generators will fail (see, e.g., Lordahl, 1988). worst case, all numbers in the second series match the Since many processes do not have a uniform (rectansequence of numbers in the first series. gular) distribution, the random numbers generated by AlMcLeod (1985) pointed to the possibility that round- gorithm 1 are only occasionally useful without transforoff errors in some systems may yield random numbers mation. Additional algorithms are needed to transform the numbers into samples from other distributions. This artiequal to 0 or 1. In the remainder of this article, random numbers are always assumed to be larger than 0 and cle only deals with two of these distributions, the stansmaller than 1, and therefore it may be advisable to in- dard normal and the standard exponential. The normal clude Part 3 of the algorithm. Otherwise, problems may distribution is considered because of its importance in arise. For instance, in the randomization procedure (see many simulations, the exponential distribution because we below), there will be a range error if the random number need it for randomness in time. Algorithms for other disequals 1, and in many of the algorithms for standard nor- tributions (Students t, F, chi-square) can be found in mal distributions, to take the logarithm will be impossi- Kennedy and Gentle (1980) or Ripley (1987), or in preprogrammed statistical simulation packages such as ble if the random number equals 0. Wichmann and Hills (1982) random number genera- DATASIM (Bradley, 1988; Bradley, Senko, & Stewart, 1990). It should also be noted that it is possible to contor has been tested several times (Wichmann & Hill, 1982; MacLaren, 1989; see also below), and it produces a very vert uniformly distributed random numbers into random satisfactory output. Therefore, its use is strongly recom- numbers from any distribution by using the simple fact mended. An additional advantage is that it can easily be that the cumulative density function (cdf) of any distrireproduced in different laboratories, because the algorithm bution is uniform between 0 and 1. All that is necessary yields the same sequences of numbers on different sys- is to generate a uniformly distributed random variable (which denotes a point on the cdf) and take the inverse tems (at least if the starting values are known). The only disadvantage of the algorithm is that it is rather of the cdf function whose distribution you desire to samslow. On our system (an IBM AT 286 clone running at ple from. Examples are given below for the standard nor8.9 MHz according to the Landmark CPU speed test), mal and the standard exponential distribution, but the rule with Turbo Pascal 4.0 software (Borland), it takes can be extended to any distribution. 1.38 msec to generate one random number. This is 9.2 times slower than the built-in random number generator Normal Distributions (0.15 msec per number). However, although things may There are numerous ways to convert a uniform distrihave improved for recent versions of languages, one bution of numbers between 0 and 1 to a standard normal should be skeptical about the performance of the built-in distribution (mean equal to 0 and variance equal to 1). generators (see, e.g., Afflerbach, 1985, on Commodore Five of them will be discussed here. They are chosen beand Apple; Aldridge, 1987, on the Apple H; Edgeil, 1979, cause they are reasonably fast and accurate, and they reon the DECsystem-10; Lordahl, 1988, on IBM; Modianos, quire but a small amount of memory. Scott, & Cornwell, 1987, on several PCs; Strube, 1983, As indicated above, a first way to convert a random on the Commodore VIC-20). Therefore, if true ran- number generated by Algorithm 1 into a standard nordomness is essential, researchers should at least do some mal deviate is to consider each number as a value of the empirical tests on the appropriateness of their system (see cdf of the standard normal. We all know that a z value below) if Wichmann and Hills (1982) generator is not of 1.96 corresponds to a cdf value of 0.025 and a z value to be used. Researchers should also check to see that the of 1.96 to a cdf value of 0.975, because we have all used built-in algorithm is reseeded every time a new sequence it to calculate (two-tailed) statistical significance. Thus, of numbers is desired. Microsoft BASIC and Turbo Pascal, what we need is an algorithm that converts cdf numbers for instance, always return the same sequence if they are such as 0.025 or 0.975 into their corresponding z values not reseeded with the RANDOMIZE statement. of 1.96 and +1.96. Brophy (1985) compares several Of course, the choice of random number generators to of these algorithms, one of which (Hill & Daviss) will a large extent depends on what is investigated. For some be used in Algorithm 2. This algorithm has been chosen applications, the most important requirement is that all because it is quite accurate (maximum absolute error of values have the same probability of occurring, a require- 0.00035) and relatively fast (see below). Other algorithms ment that most built-in generators meet. For instance, if may be preferred if either speed or accuracy is to one wants to estimate the probability that x2 < y (x and predominate (see Brophy, 1985, for these algorithms). In y being uniformly distributed random variables between the following algorithms, Z denotes a random variate from 0 and 1), a built-in generator will create approximately the standard normal distribution.

48

BRYSBAERT generate U1 ,U2,U3 = random numbers setZ =2(U, + U, + U ) 3 else if U < 0.9745 then 3 generate U1,U2 = random numbers setZ = l.5(U, + U3 1) else if U < 0.9973002039 repeat generate U, = random number set V = 6*U, 3 generate U2 = random number until 0.358*U2 ~ g(V) * set Z = V else repeat repeat generate U1 ,U2 = random numbers set V1 = 2*U1 1, V2 = 2*U2 1 until W = W + V~ < 1 set A = sqrt((9_2*ln(W))/W) set B = AV,, set C = AV2 until jBJ > 3 or id > 3 if IBI > 3 then Z = B else Z = C endif return Z *g(v)ae2/22b(3v2)c(l.5~v~), vi < 1 ae_v2~2_b(3_ivj)2_c(l.5_ivi), 1 lvi <1.5 ae~Z~2_b(3_jvi)2, l.5 ivi<3 a = 17.49731196, b = 2.36785163, c = 2.15787544

Algorithm 2 Standard Normal via Inverse Function generate U1 = random number if U1 > 0.5 then set U2 = 1 U1 else set U2 = U, if U2 < 1E 20 then set Z = 10 else set A = sqrt(_2*ln(U2)) Z = A_((7.45551*A+450.636)*A+ 1271.059)! (((A+ 1 lO.42l2)*A+750.365)*A+500.756) if U1 > 0.5 then set Z = Z return Z

A second method for the normal distribution owes to Box and Muller (1958). The underlying rationale is rather simple (see, e.g., Ripley, 1987, p. 54), but, for the sake of brevity, it will not be explained here. The algorithm is the following: Algorithm 3 Standard Normal According to Box-Muller generate U1 = random number, set A = 2irU, generate U2 = random number set B = ln(U2), C = sqrt(2B) return Z1 = C*cos(A), Z2 = C*sin(A) Algorithm 3 produces two independent standard normal deviates, Z1 and Z2, at least if the random number generator is good. If the generator is not good, the (Z1 Z2) pairs are likely to be situated on a limited number of circles or radii (see the cautionary tale in Ripley, 1987, pp. 55-59). We have plotted several hundreds of thousands of these (Z1 Z2) pairs based on Algorithm Ito check whether they are indeed dispersed throughout the whole plane. Algorithm 1 passed the test very well. A third method to generate standard normal deviates is Marsaglias (1962) polar method, a modification of the Box-Muller algorithm. It avoids evaluation of sines and cosines.
, ,

A final algorithm we will include is the ratio of uniforms (Best, 1979; Knuth, 1981, pp. 125127; Ripley, 1987, p. 82). Algorithm 6 Standard Normal via Ratio of Uniforms 1. generate U, ,U~= random number set V = 0.8578*(2*U2 1) set Z = V/U1 set A = 0.25*Z2 2. if A < 1 U, then go to 3 if A > 0.259/U,+0.35 then go to 1 if A > ln(U,) then go to 1 3. return Z

Algorithm 4 Standard Normal via the Polar Method repeat generate U, = random number, set V1 generate U2 = random number, set V2 until W = V~+ V~< 1 set A = sqrt(_2*ln(W)/W) return Z, = AV1, Z2 = AV2
= =

2*U, 2*U2

I 1

Marsaglia and Bray (1964) published a modification of the polar method that is slightly more complicated but faster. Speed is acquired by introducing simple auxiliary functions that can be assessed most of the time, and by restricting the time-consuming polar algorithm to fill in the gaps between the theoretical distribution and the approximation. Algorithm 5 Standard Normal According to Marsaglia-Bray generate U = random number if U < 0.8638 then

An algorithm is good if it is reasonably fast and accurate in the tails, and if it returns deviates with a cdf value close to the expected value. Table 2 gives these results for the five algorithms presented above. The first three columns give the tail probabilities at the low end of the distribution, the next three at the high end. The seventh column tabulates the maximal absolute difference between the obtained and the expected cdf, and the eighth column returns the average time needed for the generation of one deviate.2 All algorithms used random numbers generated with the use of Algorithm 1. Because the generation of these random numbers is relatively slow, the average number of uniform deviates needed for the calculation of a normal deviate will be a considerable factor in the speed of an algorithm; this value is therefore given in the next to

ALGORITHMS FOR RANDOMNESS

49

Table 2 Performance of Algorithms 26 for the Generation of Normal Deviates; Estimates Based on 100,000 Trials Mean Mean Number Number of Random of TimeTails Numbers Consuming Lower Upper Mean Time! Required Maximum Functions Algorithm .00050 .00500 .02500 .02500 .00500 .00050 Jcdfo_cdf,I* Deviatet per Deviate per Deviate Inverse .00051 .00490 .02532 .02437 .00498 .00055 .00256 9.37 1.00 2.00 Box-Muller .00042 .00500 .02563 .02519 .00521 .00069 .00325 6.22~ 2.00 3.00 Polar .00039 .00504 .02533 .02481 .00485 .00047 .00351 5.65~ 2.54 2.00 .00511 .00045 .00229 3.93 Marsaglia-Bray .00049 .00509 .02583 .02561 6.51 0.06 Ratio of uniforms .005 16 .02535 .02551 .00540 .00050 .00372 7.25 .00053 2.73 0.52 *Maximum absolute difference between obtained and expected cdf. Normal cumulative density function calculated using equation 26.2.17 of Zelen and Severo (1964), which has an error < 7.5*108. tTime in milliseconds. Estimated with an IBM AT 286 clone running at 8.9 MHz according to the Landmark CPU Speed Test. Turbo Pascal 4.0 (Borland) software. lAlgorithm returns two standard normal deviates.

the last column of Table 2. Another important aspect of the speed is the average number of time-consuming operations (logarithms, exponentials, square roots, sines and cosines) that need to be evaluated. This figure is presented in the last column of Table 2. All estimates are based on 100,000 trials. The accuracy in the tails and the maximum absolute deviation of observed and expected cdf values are good and comparable for all five algorithms. Only the time needed to evaluate a standard normal deviate differs and ranges from 5.65 msec for the polar method to 9.37 msec for the inverse cdf method (for the system and the language used). Because the Marsaglia-Bray method requires the most random numbers (i.e., 3.9), the results are more in favor of it if random number generation is fast. One way of speeding it up might be to evaluate the random number in the first step with the built-in generator. The major requirement of this number is that it be uniformly distributed, as is the case for most built-in generators (e.g., in our system, estimates based on 1,000,000 trials yielded an error smaller than 1.4 * l0~ between the observed probabilities and the probabilities required for the Marsaglia-Bray algorithm). The time needed for the generation of one standard normal deviate then drops from 6.51 to 5.34 msec. Exponential Distributions Just like normal deviates, exponential deviates can be generated by using the inverse cdf function. The cdf of an exponential distribution is F(x) = 1 e~, nd the ina verseisF(U) = ln(lU)!X. If X = 1, wehavethe standard exponential. U is a uniformly distributed random number between 0 and 1, so that, for programming purposes, it makes no difference whether we take 1 U or U. This gives us the following algorithm(E denotes a random variate from the standard exponential distribution):

A second way to generate exponential deviates is to split the range of E up into intervals. More specifically, the exponential distribution is considered as the compound of a geometric and a new exponential distribution with pdf e_x!(l e~).The following algorithm owes to von Neumann (1951). Its advantage is that it avoids the explicit use of the logarithm function (which is rather timeconsuming). Algorithm 8 Standard Exponential According to von Neumann let I = 0 1. generate U, = random number set A =~U1 2. generate U2 = random number if U1 U, then return E = I + A 3. generate U3 = random number if U3 s U2 then go to 2 4. set I = I + 1 gob 1. The last algorithm that we will discuss for generating exponential deviates makes use of the ratio-of-uniforms method. The fourth, the fifth, and the sixth steps are optional pretests to avoid calculation of the logarithm in step seven. More information on the algorithm is to be found in Ripley (1987, pp. 69-71; note, however, the mistake in the algorithm outline on p. 71). Algorithm 9 Standard Exponential via Ratio of Uniforms 1. generate U1,U2 = random numbers set V = 2/e * U2 set E = V/U, if E/2 ~ (l+ln(a) a*Uj then go to 2 if E!2 > b,/U, (l+ln(b,)) then go to 1 if E/2 > b2/U, (l+ln(b2)) then go to 1 if E/2 > ln(U,) then go to 1 2. return E a = 1.6487 b1 = 0.105 b2 = 0.773

Algorithm 7 Standard Exponential via Inverse Function generate U = uniformly distributed random number set E = ln(U) return E

Accuracy in the tails, maximum absolute deviation between observed and expected cdf values, and speed of the

50

BRYSBAERT

Table 3 Performance of Algorithms 79 for the Generation of Exponential Deviates; Estimates Based on 100,000 Trials Mean Mean Number Number of Random of TimeTails Numbers Consuming Lower Upper Maximum Mean Time! Required Functions .00500 .02500 .02500 .00500 .00050 Icdfo_cdftl* Deviatet per Deviate per Deviate Algorithm .00050 Inverse .00046 .00484 .02450 .02418 .00483 .00055 .00353 5.80 1.00 1.00 Neumann .00057 .00470 .02483 .02518 .00461 .00052 .00263 6.59 4.29 0.00 Ratio of uniforms .00049 .00521 .02501 .02404 .00456 .00046 .00272 8.71 2.95 0.23 *Maximum absolute difference between obtained and expected cdf. tTime in milliseconds. Estimated with an IBM AT 286 clone running at 8.9 MHz according to the Landmark CPU Speed Test. Turbo Pascal 4.0 (Borland) software.

three algorithms are listed in Table 3. Again, accuracies are similar, but this time the inverse cdf function (Algorithm 7) is fastest, at least if uniformly distributed random numbers are generated with the use of Algorithm 1. The average sum of random numbers needed to generate an exponential deviate shows that von Neumanns (1951) method will be superior if random number generation is faster (e.g., with the built-in generator, von Neumanns algorithm only takes 1.32 msec, against 4.35 for the inverse, and 5.06 for the ratio of uniforms).

Dudewicz & Ralley, 1981). Thus, the distribution of 100 chi-square values based on the frequency test of the digits 0-9 (see above) should correspond to a chi-square distribution with 9 degrees of freedom. This again can be examined with the use of a KS test or a chi-square goodnessof-fit test. The second category consists of empirical tests to investigate whether the numbers in a sequence are well spread. It is not enough for Algorithm 1 to generate a uniform distribution of numbers. The numbers of the sequence must also be independent, and this is where most generators fail (see above). There is virtually an infinite TESTiNG RANDOM NUMBER GENERATORS number of tests that can be conceived to measure this Empirical tests of random number generators can be property quantitatively. dhi-square and correlation tests divided into two broad categories. The first category con- can be used to test whether subsequent numbers are unsists of tests to examine whether the distribution of gener- related, although attention should be paid to the factthat, ated numbers corresponds to the theoretical distribution. for some tests, the data are not independent and thereThis can be done with either a chi-square goodness-of-fit fore a modified version is required (see Knuth, 1981, and test or the Kolmogorov-Smirnov (K-S) test. The latter Ripley, 1987, for more information). Independence can test is more powerful when continuous functions are in- also be examined by making predictions from the (asvolved, but care should be taken with respect to which sumed) independence and checking whether the data do source is consulted. Many textbooks in the behavioral indeed conform to these predictions. For instance, if digits sciences do not provide a correct description of the KS between 0 and 9 are generated, we can tabulate the fretest (Kraner, Mohanty, & Lyons, 1980). The obtained quency of the interval lengths between two identical digits value of the fit statistic (chi-square as well as KS) should and compare the obtained frequencies with the expected ones. This test is known as the gap test. We can also look be close to the expected value and not to zero, since small values indicate that the sequence fits the distribution too at the monotone increasing and decreasing subsequences well. For instance, if we want to check whether Wich- and see whether their frequencies conform to the expected mann and Hills (1982) random number generator pro- probabilities (a test known as the runs test). Or we can duces an equal number of digits between 0 and 9 when consider the lengths of sequences needed to collect all multiplied by 10 and truncated, we should find a chi- digits (coupon collectors test), look at the number of square value around 9that is, the number of degrees of matching digits in subsequences of four (the poker test), freedom of the frequency test (100 chi-square tests based or calculate the frequency of the middle digits being the maximum in a chain of three (the maximum test), and on 1,000 numbers each yielded an average value of 8.87, which is indeed close to the expected value). Similarly, so on. More information about such empirical tests can be a frequency test of the numbers 099 after multiplication by 100 and truncation should have an expected value of found in Kennedy and Gentle (1980), Knuth (1981), and 99 (100 chi-square tests based on 1,000 numbers gener- Ripley (1987), mentioned in the introduction, or in Gruenberger and Jaffray (1965; but see below). The tests are ated by Algorithm 1 yielded an average value of 97.90). The test can be made more precise, because not only not discussed at full length here, because all generators should the average value of the fit statistic be close to the of Table 1 and Algorithm 1 are known to pass them sucexpected value, but also the distribution of obtained values cessfully. For the same reason, theoretical tests that can be applied on random number generators (Atkinson, 1980; should coincide with the theoretical distribution (see, e.g.,

ALGORITHMS FOR RANDOMNESS Knuth, 1981; MacLaren, 1989; Ripley, 1987) are omitted in this article. THE USE OF RANDOM NUMBERS To know how to generate random numbers is important, but it is only a first step. Once we have the source of randomness, we need to apply it correctly. In the remainder of the article, three selected topics will be discussed: randomization of stimuli and subjects, random sampling, and randomness in time. These topics have been chosen because of their importance in experimental psychology, and because they tend to be neglected in more general texts dealing with simulations.

51

random permutation of the array is made by applying the following algorithm: set I = 0 repeat set I = I + 1 set U = integer random number ranging from 1 to N set A = X[I] (i.e., the ith element from the array) set X[I] = X[U] set X[UJ = A until I = N

Each element I of the array can be exchanged with all possible alternatives, which leads to a total sum of imaginable rearrangements equal to N (e.g., an array of 10 digits can be rearranged in lOdifferent ways). At Randomization of Stimuli and Subjects the time, this sounded very convincing to us, and we apStrings of independent random numbers are not very plied the algorithm without further testing. However, interesting for randomizing subjects or stimuli, because when for testing purposes we listed the 5! = 120 possithey may lead to serious imbalances. Suppose an ex- ble orderings of 5 digits and made 12,000 simulations to perimenter needs a string of 100 binary digits (zero and check whether the 55 = 3,125 possible rearrangements one) in order to determine whether a subject is included were equally divided over the orderings, we always obin Condition A or Condition B. Generating such a string tained chi-square values around 700, even though values with a random number generator involves a risk of at least around 119 were expected (see above). This indicated that 0.27 that more than 55 of the 100 subjects are included something was wrong. If we looked further at the first in one condition, and fewer than 45 in the other. To avoid digit of the permuted array, we saw that the probability these imbalances in randomization, sampling without re- was 0.194 that this digit was the first of the original arplacement is a better technique. This is achieved by first ray, 0.245 that it was the second, 0.2 14 that it was the listing all alternatives and then making a random permu- third, 0.183 that it was the fourth, and 0.164 that it was tation of the list. the fifth (the expected value each time was 0.200). That Before presenting a good permutation algorithm, how- is, the probability (based on the 12,000 simulations) of ever, we would like to give a cautionary tale. For years the first digits being 1 was good, but then the probabilwe used an algorithm (Nilsson, 1978) in our laboratory ity decreased monotonically from the first digits being that at first sight seemed very sound and that we actually 2 to the first digits being 5. The inequality increased as were going to defend in this article. Yet the algorithm the number of elements in the string was augmented. For failed on the first and most basic test that was applied to instance, the probability that the second element of a 100it in our analysis for the present article. Nilssons (1978) item array was the first in the permuted array amounted algorithm is the following. First, all alternatives are listed to 0.014 (100,000 simulations; expected value, 0.010), in an array with N elements (e.g., for the example above, whereas the probability of the 100th elements being first an array of 50 Os and 50 ls would be created). Then, a was only 0.007 (again 0.010 expected). Remember that
Table 4 Distribution of Data from a 10-Item Array after Permutation with Nilssons Algorithm (100,000 simulations); Entries Are Conditional Probabilities of Items Being in Final (Column) Position, Given Initial (Row) Position Position in Original Position of Item after Permutation Array 1 2 3 4 5 6 7 8 9 10 1 .100 .099 .099 .101 .100 .098 .101 .101 .102 .100 .099 2 .130 .095 .094 .095 .097 .096 .098 .098 .099 3 .120 .124 .090 .091 .090 .095 .094 .098 .099 .099 4 .111 .116 .121 .087 .090 .091 .093 .094 .096 .099 5 .103 .109 .112 .119 .085 .087 .092 .095 .097 .101 6 .099 .102 .107 .111 .118 .087 .089 .091 .096 .101 7 .092 .096 .102 .106 .112 .122 .086 .090 .096 .098 8 .087 .091 .096 .102 .107 .113 .119 .090 .094 .100 9 .081 .086 .091 .096 .103 .107 .116 .124 .096 .100 10 .077 .082 .087 .093 .097 .105 .111 .120 .127 .101

52

BRYSBAERT of a different type). Those corrections should be avoided, unless experimental tests ofmodels require such nonrandom constraints. A second common mistake with respect to permutation is the idea that it suffices to make just one random permutation of a stimulus series and to present that permuted series to all subjects. The major aim of randomization is to preclude sequence effects, and because this is largely done by averaging out influences, every systematization may involve a bias. With the ubiquitous use of microcomputers, it is not difficult to generate a new permutation for each subject and/or experimental session. Random Sampling Two procedures for drawing a random sample from a population can be distinguished, depending on the need to preserve the order of the subjects/stimuli. If the order is of no importance, Algorithm 10 can be used. For instance, if 10 stimuli must be drawn from a population of 100, the algorithm is applied from I = 100 till I = 91, and the last 10 items of the array are used as the sample. If, on the other hand, the order of the stimuli is critical, either a sorting algorithm(see, e.g., Dreger, 1989; Dwyer & Critchfield, 1978; Ellis, 1985; Knuth, 1973; Press, Flannery, Teukolsky, & Vetterling, 1986) must be added to Algorithm 10, or another algorithm must be used. Bissell (1986) proposed the following procedure: Algorithm 11 Random Sampling with Order Preservation 1. set population size = N, sample size = n setA = N n, N = N, A = A 2. generate U = random number set B = 1 3. set B = B * A/N if B U then select item NN+l set N = N 1 if N > 0 then go to 2 else stop else set N = N 1 set A = A 1 if N > 0 then go to 3 else stop

an array of 100 elements could be rearranged in 100 different ways. Table 4 gives the distribution of a 10-item array after permutation (the data are based on 100,000 simulations). The differences between observed and expected probability are largest in the lower left corner. Nilssons (1978) algorithm learns two things. First, that something looks complicated and/or seems appropriate does not make it random; and second, not all procedures published in establishedjournals and/or books have been well tested (though we admit that mistakes are sometimes very difficult to trace both by authors and by reviewers). If we then look for a good permutation algorithm, we have to return to the basic process we want to simulate. What is needed is random sampling without replacement. This can be compared with a bowl that contains N elements, from which one element after another is picked out and processed. Each item has a probability of 1/N to be picked out first. If it is not picked out the first time, it has a chance of 1/(N 1) to be picked out second, a chance of 1/(N2) to be picked out third, and so on, until all items are removed from the bowl. The following algorithm does just this. It draws an element! with chance l/(NI+ 1) from the array and places it at the end. Note that the number of possible rearrangements is smaller than that for Nilssons algorithm (N! instead of NN) yet, it produces much better results. Algorithm 10 Permutation set I = N + 1 repeat set I = I 1 generate U = integer random number from 1 to I set A = X[I] set X[I} = XEUI set X[U] = A until I = 2

Twenty simulations in which 12,000 permutations of a five-item array were generated yielded a mean chisquare value of 116.16 for the differences between the observed and expected frequencies over the 120 possible orderings (see above). This is close to the expected value of 119. Algorithm 10 was first proposed by Moses and Oakford (1963) and Green (1963). There are two more things to be said about randomization. First, Algorithm 10 produces truly random sequences of items. There is no need to correct it by adding constraints, as is sometimes seen in the literature. For instance, it is not necessaryto alter the sequences with more than three stimuli belonging to the same condition, in order to make the sequence more random. Actually, these corrections are usually mistakes due to human failure to produce randomness without special training. Their net result is more often an increase of information rather than a decrease (e.g., excluding all sequences with more than three subsequent stimuli of the same condition informs the subject about the fact that if three stimuli of the same type have been presented, the fourth will surely be one

To test Algorithm 11, all possible samples of 5 elements drawn from a population of 10 elements were listed. This yielded a total of 10!/5!5! = 252 samples. Twenty replications of 25,200 sample drawings were completed, which gave an average chi-square value for the difference between the observed and the expected frequencies equal to 248.93, very close to the expected value of 251. Randomness in Time: The Exponential and Geometric Distribution The generation of exponential deviates has been included in this paper because the exponential distribution is the only one that yields true randomness in time. Suppose, for instance, that an experimenter wants to control eye fixations in a visual word recognition task. The experimenter does so by flashing a digit instead of a word at the fixation location from time to time. Subjects have

ALGORITHMS FOR RANDOMNESS to identify the tachistoscopically presented digit, and if too many errors are made, the session is called invalid. Luce (1986, pp. 13-15) argues that in such a case it would be a bad strategy to use a random variable with a uniform distributionsay, a time-interval varying from 0 to 9 stimuli, with each value having the same probability because such a procedure changes the amount of information between different values of the variable. Immediately after presentation of a digit, chances are 1 / 10 that a new digit will be presented. However, if no digit has been presented on the 1st trial, chances become 1/9 that it will be shown on the 2nd trial. Similarly, if no digit was shown in the first two trials, chances are 1/8 that it will appear on the 3rd trial, and so on. Finally, after 9 trials without a digit, the probability of a digit on the 10th trial reaches 1, which is a complete lack of randomness. Thus, what is needed is a procedure that will keep the probability of presenting a digit constant at each trial. If chances are 1 / 10 that a digit is presented immediately after another digit, the probability that a digit is presented on Trial 2 if no digit has been presented on Trial 1 must also be 1/10. Or to put it differently, the probability that a digit is presented on Trial 2 must equal 9/10 * 1/10 = 0.09 (i.e., the probability that no digit has been presented on Trial 1 times the probability that a digit is presented on Trial 2). Similarly, the probability of a digit on Trial 3 is 0.9 * 0.9 * 0.1 = .081, and so on. More formally, the probability that a digit is presented on Trial i equals (1 p)1 * p, which is the geometric distribution, the discrete equivalent of the exponential function. An algorithm for sampling from a geometric distribution is the following: Algorithm 12 Geometric Distribution generate E = random standard exponential deviate setA = ln(lp) return G = integer part of E/A G = 0,1,2,... The exponential and/or geometric distribution should be utilized whenever randomness in time is required (e.g., also for random foreperiods in reaction time studies; see Luce, 1986, pp. 54-55). Exponential distributions can be used for simulations as well. For instance, Strube (1983; see also Gruenberger & Jaffray, 1965) proposed the following test to check the usefulness of a random number generator. Integers between 0 and 9 are generated and the average interval between repetitions of the digits in the series is examined. The distribution of these intervals is geometric with probability density function = * 0.1, expected value = 9 = (lp)/p, and variance = 90 = (1 p)/p2 (Luce, 1986, p. 41). This test is known in the literature as the gap test (see above). However, whereas most authors (see, e.g., Knuth, 1981) verify the usefulness of a generator by comparing the observed and the expected frequencies of the different gaps with the use of a chi-square test, Strube (1983) proposed to compare the average gap with the expected value and to compute a t test. Furthermore, he calculated the variance

53

of the gaps and compared it with the expected value of 90 via a chi-square test. However, both the ttest of means and the chi-square test of variances assume normality of data (Hays, 1988, pp. 292-293 and 327-331). To check whether the geometric distribution of the raw data distorted the test statistics, 10,000 means and variances of 100 geometric deviates were calculated and compared with 2 99 the expected t(99) and x ( ) distributions. More specifically, the probabilities at the .005 and .025 tails were evaluated. For the t test, this gave lower tail values of .016 and .047, respectively, and upper tail values of .001 and .011. That is, the t test was too conservative at the upper part and too liberal at the lower part. The chi-square test, which was even worse, gave rather enhanced values of .097 and .173 at the low end, and .085 and .137 at the high end. Therefore, Strubes (1983) gap test (see also Gruenberger & Jaffray, 1965) should not be used to test the usefulness of a random number generator, unless better tests than t and chi-square are available (an alternative might be to run a number of simulations and estimate the critical values). CONCLUSION Algorithms have been described to generate random numbers with a uniform, a normal, and an exponential (geometric) distribution. The utility of these numbers was illustrated with procedures for randomization, random sampling, and randomness in time. Other uses are simulations, numerical approximations of compound mathematical equations, and the creation of nonrandom sequences in which various forms of autocorrelation are present (for these procedures, see Malmi, 1986, and Box & Jenkins, 1976, pp. 4684). AVAILABILITY In addition to the algorithm descriptions in the text, Appendix B provides Turbo Pascal listings of all procedures discussed. However, it is our experience that a gap exists between the availability of algorithms and their actual implementation. Small mistakes are easily made, so that one is obliged to rerun some elementary tests in order to check the correctness of the implementation. Therefore, Appendix A displays the first few numbers generated by the algorithms when all seeds are equal to 1. In this way, everyone can check the correctness of their implementation. Turbo Pascal and BASIC implementations can also be obtained by sending a formatted disk in a returnable box to the author. For administrative costs, $10 must be included.
REFERENCES L. (1985). The pseudo-random number generators in Commodore and Apple microcomputers. Sraristische Hefte, 26, 321-333. ALDRIDGE, J. W. (1987). Cautions regarding random number generation on the Apple II. Behavior Research Methods, Instrume,us, & Computers, 19, 397-399.
AFFLERBACH,

54

BRYSBAERT
G. (1962). Random variables and computers. In J. Kozesnik (Ed.), Information theory, statistical decision fi4nctions, random processes: Transactions ofthe Third Prague Conference (pp. 499-5 10). Prague: Czechoslovak Academy of Sciences. MARSAGUA, G., & BRAY, T. A. (1964). A convenient method for generating normal variables. SIAM Review, 6, 260-264. MODIANOS, D. T., SCOTT, R. C., & CORNWELL, L. W. (1987). Testing intrinsic random-number generators. Byte, 12, 175-178. MOSES, L. E., & OAKFORD, R. V. (1963). Tables of random permutations. Stanford, CA: Stanford University Press. NANCE, R. E., & OVERSTREET, C. L. (1972). A bibliography on random number generators. Computer Review, 13, 495-508. NII.ssoN, T. H. (1978). Randomization without replacement using replacement without losing your place. Behavior Research Methods & Instrumentation, 10, 419. PRESS, W. H., FLANNERY, B. P., TEUKOLSKY, S. A., & VETTERLING, W. T. (1986). Numerical recipes: The art of scientific computing. Cambridge, U.K.: Cambridge University Press. RIPLEY, B. D. (1983). Computer generation of random variables: A tutorial. International Statistical Review, 51, 301-319. RIPLEY, B. D. (1987). Stochastic simulation. New York: Wiley. SAHAI, H. (1979). A supplement to Soweys bibliography on random number generation and related topics. Journal ofStatistical Computation & Simulation, 10, 31-52. SOWEY, E. R. (1972). A chronological and classified bibliography on random number generation and testing. International Statistical Review, 40, 355-371. SOWEY, E. R. (1978). A second classified bibliography on random number generation and testing. International Statistical Review, 46, 355-37 1. SOWEY, E. R. (1986). A third classified bibliography on random number generation and testing. Journal of the Royal Statistical Society, 149A, 83-107. STRUBE, M. J. (1983). Tests of randomness for pseudorandom number generators. Behavior Research Methods & Instrumentation, 15, 536-537. VON NEUMANN, J. (1951). Various techniques in connection with random digits. NBS Applied Mathematics Series, 12, 36-38. WICHMANN, B. A., & HILL, J. D. (1982). Algorithm AS183: An efficient and portable pseudo random number generator. Applied Statistics, 31, 188-190. WICHMANN, B. A., & HILL, J. D. (1984). An efficient and portable pseudo random number generator: Correction. Applied Statistics, 33, 123. ZEISEL, H. (1986). A remark on Algorithm A5183: An efficient and portable pseudo-random number generator. Applied Statistics, 35, 89. ZELEN, M., & SEvERO, N. C. (1964). Probability functions. In M. Abramowitz & I. A. Stegun (Eds.), Handbook of mathematical functions (pp. 925-995). New York: Dover.
MARSAGUA,

ATKINSON,

A. C. (1980). Tests of pseudo-random numbers. Applied Statistics, 29, 164-171. BEST, D. J. (1979). Some easily programmed pseudo-random normal generators. Australian Computer Journal, 11, 60-62. BISSELL, A. F. (1986). Ordered random selection without replacement. Applied Statistics, 35, 73-75. Box, G. E. P., & JENKINS, G. M. (1976). Time series analysis: Forecasting and control. San Francisco: Holden-Day. Box, G. E. P., & MULLER, M. E. (1958). A note on the generation of random normal deviates. Annals of Mathematical Statistics, 29, 610-611. BRADLEY, D. R. (1988). DATASIM. Lewiston, ME: Desktop Press. BRADLEY, D. R., SENKO, M. W., & STEWART, F. A. (1990). Statistical simulation on microcomputers. Behavior Research Methods, Instruments, & Computers, 22, 236-246. BROPHY, A. L. (1985). Approximation of the inverse normal distribution function. Behavior Research Methods, Instruments, & Computers, 17, 415-417. DREGER, R. M. (1989). A BASIC program for the Shell-Metzner sort algorithm. Educational & Psychological Measurement, 49, 6 19-622. DUDEWICZ, E. J., & RALLEY, T. G. (1981). The handbook of random number generation and testing with TESTRAND computer code. Columbus, OH: American Sciences Press. DWYER, T., & CRITCHFIELD, M. (1978). BASIC and the personal computer. Reading, MA: Addison-Wesley. EDGELL, S. E. (1979). A statistical check of the DECsystem-10 FORTRAN pseudorandom number generator. Behavior Research Methods & Instrumentation, 11, 529-530. ELLIS, J. K. (1985). Distribution counting as a method for sorting test scores. Behavior Research Methods, Instruments, & Computers, 17, 419-420. FISHMAN, G. S., & MooRE, L. R., ifi (1986). An exhaustive analysis ofmultiplicative congruential random number generators with modulus 2~ 1. SIAM Journal on Sciennfic & Statistical Computing, 7, 24-45. GREEN, B. F. (1963). Digital computers in research: An introduction for behavioral and social scientists. New York: McGraw-Hill. GRUENBERGER, F., & JAFFRAY, G. (1965). Problems for computer solution. New York: Wiley. HAYS, W. L. (1988). Statistics. New York: Holt, Rinehart and Winston. KENNEDY, W. J., & GENTLE, J, E. (1980). Statistical computing. New York: Marcel Dekker. KNUTH, D. E. (1973), The art ofcomputerprogramming: Vol. 3. Sorting and searching. Reading, MA: Addison-Wesley. KNUTH, D. E. (1981). The art of computerprogramming: Vol. 2. Seminumerical algorithms. Reading, MA: Addison-Wesley. KRANER, H. C., MOHANTY, S. G., & LYONS, J. C. (1980). Critical values of the Kolmogorov-Smirnov one-sample test. Psychological Bulletin, 88, 498-501. LORDAHL, D. S. (1988). Repairing the Microsoft BASIC RND function. Behavior Research Methods, Instruments, & Computers, 20, 22 1-223. LUCE, R. D. (1986). Response times. New York: Oxford University Press. MACLAREN, N. M. (1989). The generation of multiple independent sequences of pseudorandom numbers. Applied Statistics, 38, 351-359. MCLEOD, A. I. (1985). A remark on Algorithm AS183: An efficient and portable pseudo-random number generator. Applied Statistics, 34, 198-200. MALMI, R. A. (1986). Intuitive covariation estimation. Memory & Cognition, 14, 501-508.

NOTES 1. The div operation stands for integer division (i.e., 14 div 5 = 2). The div and mod operation are available in most software packages. 2. Note that for many algorithms the number of generated random numbers (and the time required) varies as a function of the algorithm, because random numbers are sampled (and discarded) until some criterion is met.

(Continued)

ALGORITHMS FOR RANDOMNESS

55

APPENDIX A Numbers Generated by the Different Algorithms


seedl
= = =

seed2
seed3

1 I 1

first 10 numbers of Wichmann and Hills random number generator (Algorithm I): 0.01693090620 0.89525391124 0.11149102121 0.93952679641 0.12822985510 0.17800399298 0.29982708249 0.34971840637 0.05928746025 0.82197931465 first 10 numbers of standard normal, inverse cdf (Algorithm 2):
2.12205889020 1.25512190220

0.92295174709

0.52458866900

1.21877656770 0.38572616881

1.55109245260 1.56106514540

1.13489054900 0.92288764201

first 10 numbers of standard normal, Box-Muller (Algorithm 3): 0.46776157925 0.27003245504 1.28682417770 0.44644375106 0.5832 1777179 1.40685839470 0.71746985100 0.71278233544 1.07699514850 0.28908727769 first 10 numbers of standard normal, polar method (Algorithm 4): 0.19407337327 1.33042159440 2.19755506130 0.59082236112 1.13620439410 0.87865940120 0.50754615265 0.17307865854
first 10 numbers of standard normal, MarsagliaBray (Algorithm 5):

0.68175817609 0.53106697446

0.89254345772 0.99555832695

1.34490103630 0.82905654588

0.72689870961

1.01316404230 0.3203037 1023

0.51709027840

0.12444994842 0.22350462413

first 10 numbers of standard normal, ratio of uniforms (Algorithm 6): 0.85990598276 0.661652882 10 0.03200237951 1.68554875660 0.03422323645 0.46775744684 0.58781477852 0.97552442825 0.31896217480 0.46142694379 first 10 numbers of standard exponential, inverse cdf (Algorithm 7): 4.07861455800 0.11064790124 2.19381121860 0.06237893855 1.72594929650 1.20454936220 1.05062700160 2.82535745820 first 10 numbers of standard exponential, von Neumann (Algorithm 8): 0.01693090620 0.11149102121 0.12822985510 0.29982708249 1.48791600110 0.51884426837 6.80740561510 1.09824758910 first 10 numbers of standard exponential, ratio of uniforms (Algorithm 9):
1.02135355940 0.85818939924 0.13362312147 0.55416121890 0.44833448843

2.05393088250

0.19604004890
0.05928746025 0.73856139688

0.08964587217

0.75126375694

0.82019400019

0.66731253059

1.20040721920

randomization array of 10 stimuli (Algorithm 10): before randomization: 1 2 3 4 5 6 7

8
10

9 10
9 1

after randomization:

random sample from population (Algorithm 11): population: 1 2 3 4 5 6 7 8 9 10 sample: 5 6 8 9 10

(Continued)

56

BRYSBAERT
APPENDIX B
Listing TURBO PASCAL ROUTINES FOR RANDCtI NLJIBER GENERATION

(I

VAR seedl,seed2,seed3 integer; seedi 1 seed2l ,seed3l : longint;


(1111~1+1

(optional, only if rndnumbl used)


~I

Wiclinann & Hills random number generator.

Produces random numbers ~tricly larger than 0 and strictly smaller than 1. Uniform distribution. Needs three seeds to start. Integer arithmetic up to 5212632 required. If danger of values 0.0 and 1.0 due to rounding (McLeod, 1985), add carrections. *) FUNCTION rndnumbl(xpl,xp2,xp3:longint): real; (VAR rp : real; BEGIN seedll := (17l*xpl) mod 30269; seed2l := (172*xp2) mod 30307; seed3l : = (170xp3) mod 30323; rndnumbl : = frac(seedl 1/30269 + seed2l/30307 + seed3l/30323); { rp := frac(seedll/30269 + seed2l/30307 + seed3l/30323); IF rp <= 0.0 THEN rp := 0.0000000001; IF rp >= 1.0 THEN rp := 0.9999999999; rndnumbl : = rp; END;
(++++It

{xp = seed for generator) optional, rounding errors)

(take fractional part sum)

optional,

rounding errors

Wichmann & Hills

random number generator.

Better implementation of Wichmann & Hills random number generator. Produces the same numbers, but requires only arithmetic up to 30323 and usually is faster. See also rounding error checking i ri rndnumbl. FUNCTION rndnumb2(xpl,xp2,x3:integer): real; BEGIN seedl := 171 * (xpl mod 17]) 2*(xpl div 177); seed2 := 172 * (xp2 mod 176) 35*(xp2 div 176); seed3 := 170 * (xp3 mod 178) 63*(xp3 div 178); IF seedl < 0 THEN seedl : = seedl + 30269; IF seed2 < 0 THEN seed2 = seed2 + 30307; IF seed3 < 0 THEN seed3 : = seed3 + 30323; rndnumb2 : = frac(seedl /30269 + seed2/30307 + seed3/30323); END;
(*~1h1hI

(xp

seed for generator)

Standard norma~ distribution

inverse

cdf

~1~.1.11t14++1411i,4

Function

to produce random numbers with a standard normal distribution using the inverse cdf. For information about the values used, see Brophy (1985).

FUNCTION random stand normi: real; (inverse F Hill & Davis) VAR tpl,tp2,tp3tp4: Feal; BEGIN tpl := rndnumb2(seedl ,seed2,seed3); IF tpl > 0.5 THEN tp2 := 1tpl else tp2 := tpl; IF tp2 < 1E20 THEN tp4 := 10 ELSE BEGIN tp3 := sqrt(_2*ln(tp2)); 3 0 636 3 2 tp4 := tp3_((7.45551*tp 145 . )*tp +1 71 .059)/ (((tp3+l 10.421 2)*t.p3750. 65)*tp3+500. 756); 3 END; IF tpl > 0.5 THEN tp4 := tp4; random stand nonnl : = tp4; END;

ALGORITHMS FOR RANDOMNESS


(1~11~

57

Standard normal distribution

Boxl4jller

~ *)

Function to produce random numbers with a standard normal distribution using the Box-14411er algorithm. For information about the values used, see Ripley (1987). FUNCTION random stand norm2: real; (Box14i1 len VAR tp1,tp2,tp3tp4,t~5: r~sl; BEGIN tpl := rndnumb2(seedl,seed2,seed3); tp2 := 2*pi*tpl; tp3 := rndnumb2(seedl,seed2,seed3); tp4 := sqrt(2*(~~ln(tp3))); tp5 := tp4 * cos(tp2); random stand norm2 : = tps; END;

(+~~1*1~

Standard normal distribution polar method

++II+++++4++*4++I++++I+I+*G+*

Function to produce random numbers with a standard normal distribution using the polar method. For information about the values used, see Ripley (1987). FUNCTION random stand norin3: real; (polar) VAR tpl.tp2,tp3tp4,t~5 : real; BEGIN REPEAT tp2 : = 2*rndnumb2(seedl , seed2, seed3)-l .0; tp3 : = 2*rndnumb2(seedl , seed2, seed3)-1 .0; tp4 := sqr(tp2) + sqr(tp3); L~ffILtp4 < 1.0; random stand norm3 : = sqrt(_2.O*ln(tp4)/tp4)

tp2;

END;

(+tt+++tt

Standard normal distribution MarsagliaBray

Function to produce random numbers with a standard normal distribution using the MarsagliaBray algorithm. For information about the values used, see Ripley (1987).

*)

FUNCTION random stand norm4: real; (Marsagl ia-Bray) VAR tpl .tp2,tp3tp4,t~5,tp6,tp7,tp8,tp9: real; BEGIN tpl : = rndnumb2(seedl , seed2, seed3); IF tpl < 0.8638 THEN begin tp2 := rndnumb2(seedl ,seed2,seed3); tp3 := rndnumb2(seedl,seed2,seed3); tp4 := rndnumb2(seedl,seed2,seed3); tp5 := 2*(tp2.i.tp3+tp4) - 3; END ELSE IF tpl < 0.9745 THEN BEGIN tp2 := rndnumb2(seedl.seui2,seed3); tp3 := rndnumb2(seedl,seed2,seed3); tp5 := 1.5*(tp2+tp3_l); END ELSE IF tpl < 0.9973002039 THEN BEGIN REPEAT tp2 := rndnumb2(seedl,seed2,seed3); tp3 := 6*tp2 3; 4 3 IF abs(tp3) < 1.0 THEN tp6 := l7.4973ll96*exp(_sqr(tp3)/2) .l 51O326*(3.O~~sqr(tp3)) - 2.15787544*(l .Sabs(tp3)) ELSE IF abs(tp3) < 1.5 THEN tp6 := l7.4973ll96*exp(_sqr(tp3)/2) 2.36785163*sqr(3.O_abs(tp3)) 2.15787544*(l .5abs(tp3)) ELSE IF abs(tp3) < 3,0 THEN tp6 := 17.49731196*exp(_sqr(tp3)/2) 2.367851 63*sqr(3. 0-abs(tp3)) ELSE wniteln(tp3:lO:4); tp4 := rndnumb2(seedl,seed2,seed3); UNTIL O.358*tp4 <= tp6;

tp5
END

:=

tp3;

58

BRYSBAERT
ELSE BEGIN REPEAT REPEAT tp2 := rndnumb2(seedl,seed2,seed3); tp3 := rndntunb2(seedl,seed2,seed3); tp4 : = sqr(2*tp2l) + sqr(2*tp3_l); UNTIL tp4 < 1; tp6 := sqrt((9.~2*ln(tp4))/tp4); tpl := tp6*(2*tp2_1); tp8 := tp6*(2*tp3~~1); UNTIL ((abs(tpl) > 3.0) OR (abs(tp8) > 3.0)); IF abs(tp7) > 3.0 THEN tp5 := tpl ELSE tp5 := tp8; END; random stand nonn4 := tp5; end;

(~+*4*

Standard nonna~distribution ratioofuniforms

+t+h14*4***hh+hhhh*1hh1

Function to produce random numbers with a standard normal distribution using the ratio-of-uniforms method. For information about the values used, see Ripley (1987).

*)

FUNCTION random stand norm5: real; (ratio-of -unforms} VAR tpl , tp2, tp3tp4 :real; LABEL stop; BEGIN REPEAT tpi := rndnumb2(seedl,seed2,seed3); tp2 := 0.8578*(2*rndnumb2(seedl,seed2,seed3) 1); tp3 := tp2/tpl; tp4 := 0.25*sqr(tp3); IF tp4 < l-tpl THEN GOTO stop; (optional, to speed up the generation) UNTIL ((tp4 <= O.259/tpl+0.35) and (tp4 <= ln(tpl))); stop: random stand nonn5 : = tp3; END;

(**1~~1*4

Exponential distribution

inverse

cdf

Function to generate random numbers with a standard exponential distribution. Inverse cdf; For norm i rn~onnation, see Ripley (1987) FUNCTION random expl: real; (inverse cdf) BEGIN random expi := ln(rndnumb2(seedl ,seed2,seed3)); END;
(****+~tt

Exponential dis;tribution von

Neumann

*~4****G*~tt~t*******~**f*t*t*I*tt*+

Function to generate random numbers with a standard exponential von Neumann; For more information, see Ripley (1987) FUNCTION randcin_exp2: real; (von Neumann) VAR api,ap2,ap3,ap4,ap5 :real; LABEL stop,opnieuw, nog; BEGIN api := 0.0; opnieuw: ap2 := rndnumb2(seedl,seed2,seed3); ap5 := ap2; nog: ap3 := rndnunth2(seedl,seed2,seed3); IF ap2 < ap3 THEN goto sto; ap2 : = rndnuith2(seedi , seed2 , seed3); IF ap2 .c ap3 THEN goto nog; api := api + 1.0; goto opnieuw; stop: randcxn_exp2 := api + ap5; END;

distribution.

ALGORITHMS FOR RANDOMNESS


(444*44+4

59

Exponential di~tnibution ratioofuniforms

4+4* I

II*+I+IG+********4I+I****

to generate random numbers with a standard exponential distribution. Ratio-ofuniforms; For more information, see Ripley (1987)
Function

FUNCTION random_exp3: real; (ratio-of-uniforms) VAR api ,ap2,ap3,ap4,ap5,ap6,apl,ap8,ap9,aplO : real; LABEL stop,opnieuw; BEGIN ap4 := 2/2.7182818285; ap5 := 1.6487; ap8 := 1.49998709858; ap6 := 0.105; ap9 := 1.25379492880; api := 0.773; aplO := 0.74252376960; opnieuw: apl := rndnumb2(seedl,seed2,seed3); ap2 := rndnumb2(seedl ,seed2,seed3)*ap4; ap3 := ap2/apl; IF ap3/2 <= ap8~~ap5*apl HEN goto stop; 9 T IF ap3/2 > ap6/apl ap TH!N goto opnieuw; IF ap3/2 > ap7/apl-aplO THEN goto opnieuw; IF ap3/2 > ln(api) THEN goto opnieuw; stop: random exp3 := ap3; END;

(1+414+4+

Gec~etnic distribution

*++++t1~++++++++*++++***

Function

to generate random numbers with a geometric distribution.

FUNCTION random_geom(p: real ~: integer; VAR tpl : real; BEGIN tpl := ln(lp); random geom := trunc(ln(rndnumb2(seedl,seed2,seed3))/tpl); END; TYPE stim_array
(+444+4+4

array[1. .100]

of

integer;
~

Permutation routine

Procedure to make a random permutation of an array with Np stimuli. PROCEDURE penmite(var data:stim_array;Np:integer); VAR ip,rndp,datap : integer; BEGIN FOR ip := Np downto 2 DO BEGIN rndp : = trunc(rndnumb2(seedl , seed2, seed3)*int( ip)) datap := data[ip]; data[ip] := data{rndp]; data[rndp] := datap; END; END;
(+I*41**4

(Np

number of Si in array)

1;

(important that rndnumb2

<

1)

Procedure to take a sample without replacenent

444444+44441~+44*4444~$

Procedure that takes a random sample of Np2 elements from a population of Npl elements. Rankorder preservation. Sample is placed at the beginning of the data array. S) PROCEDURE sample_without replacenent(var data:stim array;Npl ,Np2: integer); VAR ip,Nplb,Np2b,counterT,counter2 : integer; pb,rndp : real; datal,data2 : stimarra~ LABEL stop; BEGIN counterl : 0; counter2 := 0; Nplb := Npl; Np2b : = NplNp2; REPEAT

60

BRYSBAERT
pb : = Np2b/Npl b; rndp := rndnumb2(seedl,seed2,seed3); ~11ILEpb > rndp DO BEGIN counterl := counterl + ~ data2[counterl] := data~NplNpib+i]; Nplb := Npibl; Np2b := Np2bl; IF Nplb > 0 THEN pb := pb*Np2b/Nplb ELSE GOTO stop; end; counter2 : = counter2 + 1; datal[counter2] := data[NplNplb+l}; Nplb := Nplb 1; UNTIL Nplb = 0; stop: FOR ip := 1 TO Np2 DO data[ip] := datal[ip]; FOR i p : = Np2-t-l TO Npl DO data[ip] := data2[ip-Np2]; END;
(++II+I*I*l+++++++v++4~+44+1+~4+41+1+t+4+.+4I+++t++It++~ *44*
.,,--.~

(Manuscript received December 4, 1989; revision accepted for publication November 2, 1990.)

You might also like