Markov Chains in discrete time

4.1 The Markov Property

The Markov property is about independence, specifically about the future development of a process being independent of anything that has happened in the past, given its present value.

A random process X possesses the Markov property, and is called a Markov Chain, if

P(Xn+1 = j | Xn = i, Hn)
depends only on i and j, not on any past values.

For the simple random walk,

P(Xn+1 = i+1 | Xn = i, Xn-r = k)

is just equal to

P(Jn+1 = 1 | Xn = i, Xn-r = k) = p,

since Jn+1 is independent of anything that happened previously.

The Markov property also holds for SRW with barriers. But there are many more examples.

Example: Weather. Sunny or Cloudy. Today's weather affects tomorrow's, but yesterday's is already irrelevant to tomorrow's forecast.

The probability P(Xn+1 = 1 | Xn = i) is called a transition probability and we use the notation pij. In most of the examples we consider, the pij do not depend on n; such processes are termed time-homogeneous.

4.2 The transition matrix

Any random process has a fixed set of values which it can take. There could be finitely many or infinitely many; they could be numerical or descriptive (e.g. Sunny). The collection of possible values is called the state space, S, and the possible values are termed states.

The transition probability pij is defined for all i and j in S, though may be 0 in many cases.

We can assemble the pij into a matrix P, called the transition matrix of X.

Example: the transition matrix of a SSRW with a reflecting boundary at 0 and an absorbing one at 4 is

½ ½ 
½ ½ 
½ ½ 

The elements of a transition matrix must lie between 0 and 1. In addition, since each transition must take the chain somewhere, the row sums must be equal to 1.

4.3 Multi-step transitions

Knowing today's weather, the transition probabilities give information about tomorrow's. But what of the day after tomorrow?

If today is sunny:

We can say, then, that

P(X2 = S | X0 = S) = åj=S,C pSj pjS.

In general, the Law of Total Probability shows that the same holds for more than two states:

P(X2 = j | X0 = i) = åk pik pkj.

This is the (i,j)th entry of the matrix P2.

P(X2 = j | X0 = i) is a two-step transition probability; we denote it by pij(2). We see that the two-step transition matrix P(2) is equal to P2.

The Chapman-Kolmogorov equations state that

P(m+n) = P(m) P(n),

from which we deduce that

P(Xn = j | X0 = i) = (Pn)ij.

In principle we only need to find the powers of P to evaluate the distribution of Xn for all n.

Example: in the weather model suppose pSS = 0.6 and pCC = 0.7. Then the 1-step, 2-step and 3-step transition matrices are

0.60.4,  0.480.52 and 0.4440.556
0.30.4 0.390.61 0.4170.583

Within each column the values are getting closer.

4.4 Matrix algebra

Given a k ´ k square matrix P there exist k pairs (l,v), where l is a number and v a vector, such that

Pv = lv.

We term l an eigenvalue, v an eigenvector of P.

Any vector can be expressed as a linear combination of eigenvectors of P.

The eigenvalues of a transition matrix may be real or complex, but will all lie in or on the unit circle.

Find the eigenvalues of P by solving the equation

det (P - lI) = 0,

where I is the k ´ k identity matrix.

Then, for each solution l, solve linear equations to calculate the corresponding v.

Notice that the eigenvectors v are only unique up to a scalar multiple, since cv is also an eigenvector for any scalar c.

Eigenvectors represent directions rather than points. When an eigenvector of P is subjected to the linear transformation which P represents, the resulting vector is in the same direction as the original.

Construct a matrix L which has the eigenvalues of P down the major diagonal and zeroes everywhere else. (Such matrices are called diagonal matrices.) Construct also a matrix V by writing the eigenvectors in the columns, in the same order as the corresponding eigenvalues in L. Then

PV = VL and VLV-1 = PVV-1 = P


Pn = VLnV-1.

Ln is diagonal, with entries ljn.

A transition matrix always has one eigenvalue equal to 1, with corresponding eigenvector (1, 1, . . . 1)T. Usually (see later for conditions) all the other eigenvalues will lie strictly inside the unit circle.

If |lj| < 1 then |lj|n ® 0 as n ® ¥. For most transition matrices the limit of Ln is a matrix with a single 1 on the diagonal, making the limit of Pn easy to find.

4.5 Equilibrium

Luckily we only need the matrix algebra (a) to show that a limit exists, (b) to find P(Xn = j | X0 = i) for all n. If all we need to do is find the limiting probabilities, there is a quicker way.

Using the Law of Total Probability,

P(Xn+1 = j) = åi P(Xn+1 = j | Xn = i) P(Xn = i).

Assuming we know that P(Xn = j) converges to some limit or other (call it pj) as n ® ¥, we have

pj = åi pi pij

In matrix notation,

pT = pT P

This is usually easy to solve. The solution p is called the equilibrium probability vector of X.

Example: for the weather model with pSS = 0.6, pCC = 0.7, we have

pS = pSpSS + pCpCS = 0.6pS + 0.3pC
pC = pCpCC + pSpSC = 0.4pS + 0.7pC

As may be seen, the solution is

pS = 3/7, pC = 4/7.

Note that the vector p is always scaled so that åi pi = 1, as it is a probability vector.

Picture of maze Exercise: A rat runs around a maze as shown. Each time it changes room it picks an exit from its current room entirely at random, independently of where it has been before. Let Xn be the room it occupies after the nth room change. Find the transition matrix of X and the limit of P(Xn = A).

4.6 Irreducibility

Example: take a simple random walk with absorbing barriers at 10 and 0. The walk is sooner or later absorbed, so P(Xn = i) ® 0 unless i is one of the absorbing states.

If X0 is close to 10, it is more likely that absorption takes place at 10 than at 0, whereas if X0 is close to 0 the reverse is true. The limiting value of P(Xn = 0) therefore depends on the value of X0, even for large n.

A Markov chain is called irreducible if, starting from any one of the states, it is possible to get to any other state (not necessarily in one jump). If there are any absorbing states, the chain is not irreducible. (The reverse is not true.)

4.7 Periodicity

Consider a simple random walk with reflecting barriers above and below. If the walk starts in state 0 it will always be in an even state at even times and an odd state at odd times. In other words, P(Xn = j) will not converge; the effect of the starting point does not die away as n®¥.

A state i has period d if, given that X0=i, we can only have Xn=i when n is a multiple of d. In the simple random walk example all states have period 2. We call i periodic if it has some period > 1.

If X is irreducible, either all states are periodic or none are.

The transition matrices of periodic Markov chains have eigenvalues on the unit circle.

Example: a gambler plays roulette, staking £1 each turn. Each turn either the £1 is lost or £35 is won. Xn is the amount of money the gambler has after the nth spin. Then any state (amount of money) is periodic with period 36, as can be checked by considering Xn (mod 36).

4.8 Ergodicity

Suppose X is a Markov chain with only finitely many possible states. If

then X is ergodic. We have:

Theorem (Ergodic Theorem) If the discrete-time Markov chain X is ergodic, with transition matrix P, then there is exactly one probability vector p which satisfies

pT P = pT
In addition, for each i and j,
P(Xn = j | X0 = i) ® pj.

(The proof is too advanced for this course.)

The fact that pTP = pT means that p is a stationary distribution for X: if X0 is not fixed but random, and P(X0 = i) = pi for each i, then P(X1 = i) = pi for all i, and similarly for X2, X3, etc.

If we can find any probability vector p which satisfies the equation, we know it is the right answer. For example, if we can find p such that

pi pij = pj pji
for all i and j, then p is the required solution. But note that the reverse is not true: there are many cases in which the equilibrium distribution p does not satisfy these equations.

Example: frog on lilypads.

4.9 Reducible Markov Chains

The states of any Markov chain can be grouped together. We say that any two states i and j belong to the same communicating class if it is possible, starting from i, to get to j and, starting from j, to get back to i.

A communicating class is classified as recurrent if, having entered the class, the Markov chain can never leave it, transient otherwise. (Note: this definition is only valid when there are finitely many states. If there are infinitely many the situation can be more complicated.)

Each absorbing state forms a c.c. on its own.

When the state space is finite the chain must eventually enter a recurrent c.c., which it can then never leave, so we can apply the Ergodic Theorem to obtain limiting probabilities.

Example: children's playground.

4.10 Modelling with Markov Chains

We need to decide in advance whether the situation being modelled is suited to a Markov model and whether the model should be time-homogeneous.


But when new entrants keep the age profile stable we can aggregate individuals so that age-dependent rates are smoothed out.

Once we accept that a MC model is appropriate, it is easy to fit. Having observed xt, t=1, 2, ..., n we define ni as the number of times the chain was in state i, nij as the observed number of transitions from i to j, then use nij/ni as the estimate of pij.

Testing goodness of fit:

Both are hard to test objectively.

4.11 Simulating Markov Chains

Quite easy in Excel using lookup tables. See second lab sheet.

Previous Index Next
This page is maintained by Russell Gerrard: