The Markov property is about independence, specifically about the future development of a process being independent of anything that has happened in the past, given its present value.
A random process X possesses the Markov property, and is called a Markov Chain, if
P(X_{n+1} = j | X_{n} = i, H_{n}) |
For the simple random walk,
P(X_{n+1} = i+1 | X_{n} = i, X_{n-r} = k) |
is just equal to
P(J_{n+1} = 1 | X_{n} = i, X_{n-r} = k) = p, |
since J_{n+1} is independent of anything that happened previously.
The Markov property also holds for SRW with barriers. But there are many more examples.
Example: Weather. Sunny or Cloudy. Today's weather affects tomorrow's, but yesterday's is already irrelevant to tomorrow's forecast.
The probability P(X_{n+1} = 1 | X_{n} = i) is called a transition probability and we use the notation p_{ij}. In most of the examples we consider, the p_{ij} do not depend on n; such processes are termed time-homogeneous.
Any random process has a fixed set of values which it can take. There could be finitely many or infinitely many; they could be numerical or descriptive (e.g. Sunny). The collection of possible values is called the state space, S, and the possible values are termed states.
The transition probability p_{ij} is defined for all i and j in S, though may be 0 in many cases.
We can assemble the p_{ij} into a matrix P, called the transition matrix of X.
Example: the transition matrix of a SSRW with a reflecting boundary at 0 and an absorbing one at 4 is
0 | 1 | 0 | 0 | 0 |
½ | 0 | ½ | 0 | 0 |
0 | ½ | 0 | ½ | 0 |
0 | 0 | ½ | 0 | ½ |
0 | 0 | 0 | 0 | 1 |
The elements of a transition matrix must lie between 0 and 1. In addition, since each transition must take the chain somewhere, the row sums must be equal to 1.
Knowing today's weather, the transition probabilities give information about tomorrow's. But what of the day after tomorrow?
If today is sunny:
We can say, then, that
P(X_{2} = S | X_{0} = S) = å_{j=S,C} p_{Sj} p_{jS}. |
In general, the Law of Total Probability shows that the same holds for more than two states:
P(X_{2} = j | X_{0} = i) = å_{k} p_{ik} p_{kj}. |
This is the (i,j)th entry of the matrix P^{2}.
P(X_{2} = j | X_{0} = i) is a two-step transition probability; we denote it by p_{ij}^{(2)}. We see that the two-step transition matrix P^{(2)} is equal to P^{2}.
The Chapman-Kolmogorov equations state that
P^{(m+n)} = P^{(m)} P^{(n)}, |
from which we deduce that
P(X_{n} = j | X_{0} = i) = (P^{n})_{ij}. |
In principle we only need to find the powers of P to evaluate the distribution of X_{n} for all n.
Example: in the weather model suppose p_{SS} = 0.6 and p_{CC} = 0.7. Then the 1-step, 2-step and 3-step transition matrices are
0.6 | 0.4 | , | 0.48 | 0.52 | and | 0.444 | 0.556 | |
0.3 | 0.4 | 0.39 | 0.61 | 0.417 | 0.583 |
Within each column the values are getting closer.
Given a k ´ k square matrix P there exist k pairs (l,v), where l is a number and v a vector, such that
Pv = lv. |
We term l an eigenvalue, v an eigenvector of P.
Any vector can be expressed as a linear combination of eigenvectors of P.
The eigenvalues of a transition matrix may be real or complex, but will all lie in or on the unit circle.
Find the eigenvalues of P by solving the equation
det (P - lI) = 0, |
where I is the k ´ k identity matrix.
Then, for each solution l, solve linear equations to calculate the corresponding v.
Notice that the eigenvectors v are only unique up to a scalar multiple, since cv is also an eigenvector for any scalar c.
Eigenvectors represent directions rather than points. When an eigenvector of P is subjected to the linear transformation which P represents, the resulting vector is in the same direction as the original.
Construct a matrix L which has the eigenvalues of P down the major diagonal and zeroes everywhere else. (Such matrices are called diagonal matrices.) Construct also a matrix V by writing the eigenvectors in the columns, in the same order as the corresponding eigenvalues in L. Then
PV = VL and VLV^{-1} = PVV^{-1} = P |
Therefore
P^{n} = VL^{n}V^{-1}. |
L^{n} is diagonal, with entries l_{j}^{n}.
A transition matrix always has one eigenvalue equal to 1, with corresponding eigenvector (1, 1, . . . 1)^{T}. Usually (see later for conditions) all the other eigenvalues will lie strictly inside the unit circle.
If |l_{j}| < 1 then |l_{j}|^{n} ® 0 as n ® ¥. For most transition matrices the limit of L^{n} is a matrix with a single 1 on the diagonal, making the limit of P^{n} easy to find.
Using the Law of Total Probability,
P(X_{n+1} = j) = å_{i} P(X_{n+1} = j | X_{n} = i) P(X_{n} = i). |
Assuming we know that P(X_{n} = j) converges to some limit or other (call it p_{j}) as n ® ¥, we have
p_{j} = å_{i} p_{i} p_{ij} |
In matrix notation,
p^{T} = p^{T} P |
This is usually easy to solve. The solution p is called the equilibrium probability vector of X.
Example: for the weather model with p_{SS} = 0.6, p_{CC} = 0.7, we have
p_{S} = p_{S}p_{SS} + p_{C}p_{CS} = 0.6p_{S} + 0.3p_{C} |
p_{C} = p_{C}p_{CC} + p_{S}p_{SC} = 0.4p_{S} + 0.7p_{C} |
As may be seen, the solution is
p_{S} = 3/7, p_{C} = 4/7. |
Note that the vector p is always scaled so that å_{i} p_{i} = 1, as it is a probability vector.
Exercise: A rat runs around a maze as shown. Each time it changes room it picks an exit from its current room entirely at random, independently of where it has been before. Let X_{n} be the room it occupies after the nth room change. Find the transition matrix of X and the limit of P(X_{n} = A).
Example: take a simple random walk with absorbing barriers at 10 and 0. The walk is sooner or later absorbed, so P(X_{n} = i) ® 0 unless i is one of the absorbing states.
If X_{0} is close to 10, it is more likely that absorption takes place at 10 than at 0, whereas if X_{0} is close to 0 the reverse is true. The limiting value of P(X_{n} = 0) therefore depends on the value of X_{0}, even for large n.
A Markov chain is called irreducible if, starting from any one of the states, it is possible to get to any other state (not necessarily in one jump). If there are any absorbing states, the chain is not irreducible. (The reverse is not true.)
Consider a simple random walk with reflecting barriers above and below. If the walk starts in state 0 it will always be in an even state at even times and an odd state at odd times. In other words, P(X_{n} = j) will not converge; the effect of the starting point does not die away as n®¥.
A state i has period d if, given that X_{0}=i, we can only have X_{n}=i when n is a multiple of d. In the simple random walk example all states have period 2. We call i periodic if it has some period > 1.
If X is irreducible, either all states are periodic or none are.
The transition matrices of periodic Markov chains have eigenvalues on the unit circle.
Example: a gambler plays roulette, staking £1 each turn. Each turn either the £1 is lost or £35 is won. X_{n} is the amount of money the gambler has after the nth spin. Then any state (amount of money) is periodic with period 36, as can be checked by considering X_{n} (mod 36).
Suppose X is a Markov chain with only finitely many possible states. If
then X is ergodic. We have:
Theorem (Ergodic Theorem) If the discrete-time Markov chain X is ergodic, with transition matrix P, then there is exactly one probability vector p which satisfies
p^{T} P = p^{T} |
P(X_{n} = j | X_{0} = i) ® p_{j}. |
(The proof is too advanced for this course.)
The fact that p^{T}P = p^{T} means that p is a stationary distribution for X: if X_{0} is not fixed but random, and P(X_{0} = i) = p_{i} for each i, then P(X_{1} = i) = p_{i} for all i, and similarly for X_{2}, X_{3}, etc.
If we can find any probability vector p which satisfies the equation, we know it is the right answer. For example, if we can find p such that
p_{i} p_{ij} = p_{j} p_{ji} |
Example: frog on lilypads.
The states of any Markov chain can be grouped together. We say that any two states i and j belong to the same communicating class if it is possible, starting from i, to get to j and, starting from j, to get back to i.
A communicating class is classified as recurrent if, having entered the class, the Markov chain can never leave it, transient otherwise. (Note: this definition is only valid when there are finitely many states. If there are infinitely many the situation can be more complicated.)
Each absorbing state forms a c.c. on its own.
When the state space is finite the chain must eventually enter a recurrent c.c., which it can then never leave, so we can apply the Ergodic Theorem to obtain limiting probabilities.
Example: children's playground.
We need to decide in advance whether the situation being modelled is suited to a Markov model and whether the model should be time-homogeneous.
Examples
But when new entrants keep the age profile stable we can aggregate individuals so that age-dependent rates are smoothed out.
Once we accept that a MC model is appropriate, it is easy to fit. Having observed x_{t}, t=1, 2, ..., n we define n_{i} as the number of times the chain was in state i, n_{ij} as the observed number of transitions from i to j, then use n_{ij}/n_{i} as the estimate of p_{ij}.
Testing goodness of fit:
Both are hard to test objectively.
Quite easy in Excel using lookup tables. See second lab sheet.