The first task of simulation is to produce pseudo-random numbers from a specified distribution. We first investigate how to generate pseudo-random numbers from U[0,1], then discuss how to extend this.
This is a way of generating a sequence of "random" numbers between 0 and 1, but using an algorithmic procedure rather than trying to produce truly random numbers.
The basic equation requires parameters m (the base), a (the multiplier), c (the additive constant). The formula used is
| xn | º | axn-1 + c (mod m) |
| un | = | xn / m |
Here {xi} is a sequence of integers in the range from 0 to m - 1, {ui} a sequence of real numbers between 0 and 1. The sequence repeats after a finite number of steps (the period cannot be greater than m and may be less). It is important to choose m to be large.
The value of a is also important: some values of a work well, others less well, and the choice depends on m.
The value of c is not important. It is normally acceptable to choose c = 0.
Exercise: Use c = 0, m = 100, a = 7 or 13 and starting value assigned in the lecture to generate 10 pseudo-random numbers.
Any generator needs to be tested. Things to test might include
The first two are dealt with by c2 tests, the third by methods from Time Series.
A good general method is the inverse distribution function method.
If we want to simulate from a distribution with cumulative distribution function F, we can simulate U from Uniform[0,1] and return X = F-1(U).
Check this works: P(X £ x) = P(F-1(U) £ x) = P(U £ F(x)) = F(x).
Example: exponential distribution.
For some distributions there are specific transformations. For example, the polar method converts a pair of independent U[0,1] random variables into a pair of indpendent N(0,1) variables:
X = sin(2pU)Ö(-2 log V), Y = cos(2pU)Ö(-2 log V).
When a r.v. can be represented as a transformation of another, or as the sum of number of copies of another, this can be used to aid its simulation.
Example 1: if Y has density f(y) = lyl-1 for 0 < x < 1, then X = - log Y is exponentially distributed.
Example 2: to simulate from chi-squared(n), simulate a set of n standard Normal variables and sum their squares.
This is used for simulating from difficult densities when all else has failed.
Suppose f is the density to simulate from. First find another density h such that
Then the technique is:
The value which is accepted has density f.
On average we need to simulate C pairs of values (y, u) in order to get one accepted, so it is best to make C as small as possible.
Example: to simulate from the density f(x) = cxe-x (0<x<1), where c-1 = 1-e-1. The distribution function F is not readily invertible, but f(x) £ ce-1 for all 0<x<1, so we can take h as the uniform density, with C = ce-1.
The inverse distribution function method can be seen as picking a point
(x, y) at random
from under the graph of y = f(x) and returning its
x-coordinate.
The acceptance-rejection method picks a point (x, y) at random from under the graph of
y = Ch(x) and then either returns its
x-coordinate if it lies under the graph of
y = f(x), or rejects it if not.
Although in general one has to be careful when inverting discontinuous functions such as the distribution function of a discrete random variable, this method works fine.
Example: simulate an observation from a distribution with probability function p(0) = 0.4, p(1) = 0.25, p(2) = 0.2, p(0) = 0.15.
We calculate the distribution function, simulate a single U[0,1] random variable U and use a simple rule to generate X.
| x | p(x) | F(x) | Rule |
|---|---|---|---|
| 0 | 0.40 | 0.40 | IF U<0.40 THEN X=0 |
| 1 | 0.25 | 0.65 | ELSE IF U<0.65 THEN X=1 |
| 2 | 0.20 | 0.85 | ELSE IF U<0.85 THEN X=2 |
| 3 | 0.15 | 1.00 | ELSE X=3 |
For specific distributions there may be simpler methods of generating pseudo-random variables, but these depend on the properties of the distribution concerned.
Example 1: to simulate a Binomial(3, 0.6) r.v. we can simulate 3 U[0,1] r.v.s and count how many of them are less than 0.6. This has the advantage that it is simple to program.
Example 2: to simulate a Poisson(5) r.v. we can simulate a Poisson process and stop it at time 5. This involves simulating a sequence of expo(1) variables, stopping when the sum exceeds 5.
Ease of programming is the only advantage to using distribution-specific methods. They are generally much less efficient than the inverse distribution function method if many simulated values are required.