In the Prisoners’ Dilemma we saw that the Nash equilibrium occurs when both players confess, even though they would be better off if they both denied.

One can ask if there is any way that cooperation in such games can be encouraged, short of coercion by some third party (such as the state).

Similarly, when two parties take part in a trade, is there any rational reason for them to trust each other (independent of some ethical belief system that they may have)?

One possible resolution of these problems is to appeal to the fact that in real life we often interact repeatedly with the same people, and so we may believe that our current behaviour will be punished (or rewarded) in the future.

For this reason we will now consider repeated games.

Consider first the case where there is a single state in which a particularly simple single decision game is played.

This game is called the stage game, and after it has been played, the players find themselves in the same situation and the stage game is repeated.

When each stage is considered separately, the players should use a Nash equilibrium for that stage game.

But when the game is considered as a whole, there are more possible strategies, as players may condition their response on the past behaviour of the other player, so as to potentially reward or punish them.

For the purposes of this section we will consider the following variant of the Prisoners’ Dilemma game as our stage game:

A B
A \((3,3)\) \((0,5)\)
B \((5,0)\) \((1,1)\)

In this game the pure Nash equilibrium is \((B,B)\), although each player would get a greater payoff from the strategy \((A,A)\).

We begin by considering the case where this game is repeated a finite number of times.

When giving a player’s strategy we will represent it as a sequence of \(A\)s and \(B\)s in the order that the games are played. So the strategy \(AAB\) means “first play \(A\), then \(A\), then \(B\) in the third stage game”.

First suppose that the stage game is played exactly twice. As before, we proceed by backward induction.

In the final stage we are playing a single version of the original game, so the best response is to play \(B\) regardless of the opponent’s strategy, as \((B,B)\) is the Nash equilibrium. Therefore each player will gain a payoff of \(1\) from the final stage of the game.

Now consider the first stage of the game. As the strategies have been fixed for the final stage, we can add the payoffs from that stage to each payoff in the first stage to create a payoff table for the whole repeated game.

The full pure strategy set for each player is \(\{AA,AB,BA, BB\}\), but as we are only interested in a subgame perfect Nash equilibrium we only need to consider the following subset of the payoff table.

AB BB
AB \((4,4)\) \((1,6)\)
BB \((6,1)\) \((2,2)\)

The Nash equilibrium in this game is \((BB,BB)\) and so the subgame perfect Nash equilibrium for the repeated game is for both players to play \(B\) in each stage.

The same argument using backward induction can be repeated for any finite number of repeats, so it appears that repeated games have not created any incentive for cooperation.

Players can always play \(B\) in the final game without penalty from the other player, and so they cannot induce cooperation by appearing to promise to cooperate in the next stage as there is no penalty for them if they do not (in fact, they would gain an advantage), and both players know this.

This seems rather disappointing, can there be any way around it?

The problem occurs because there is a final stage game, and then backward induction propagates this through the other stages. So we will next consider the case where a stage game is repeated infinitely many times.

This may seem unrealistic, but it can reasonably be used to model a situation where a game will be repeated an unknown number of times, and so does potentially have actual applications.

After one stage of the game is played, there are still infinitely many stages left, and so all of the subgames of the original game look the same (and are the same as the original game). For this reason we introduce

Definition 5.10

A stationary strategy is one in which the rule for choosing an action is the same at every stage.

Note that although the rule is the same at every stage, that does not mean that the action is always the same.

Example 5.11

Examples of stationary strategies include

  • \(s_A\): Play \(A\) at every stage.
  • \(s_B\): Play \(B\) as every stage.
  • \(s_G\): Play \(A\) if the other player has never played \(B\) and play \(B\) otherwise.

Strategy \(s_G\) is sometimes called the grim strategy, and says that you should cooperate with the other player in playing \(A\) until they defect, and then you should always defect.

We would like to define the payoff from an infinite game to be the sum of the payoffs of all of the stages. But clearly if both players choose strategy \(s_A\) (or \(s_B\)) this sum will be infinite!

So this naive method will not allow us to compare payoffs.

One solution is to future payoffs by a factor of \(\delta\) where \(0<\delta<1\).

This can be interpreted as a player valuing payoffs less the further they are in the future (perhaps because of inflation, or because they are less and less certain that the total game will last that long).

Then the total payoff for player \(i\) will be \[\sum_{j\geq 0}^{\infty} \delta^j(\text{Payoff for player $i$ in the $j$th stage game}).\]

Example 5.12

If both players always cooperate, ie both play \(s_A\) then the payoff for each will be \[\sum_{j\geq 0}^{\infty}\delta^j3=\frac{3}{1-\delta}\] while if player 1 plays \(s_A\) and player 2 plays \(s_B\) then the payoff for player 1 will be \(0\) while the payoff for player 2 will be \[\sum_{j\geq 0}^{\infty}\delta^j5=\frac{5}{1-\delta}.\]

We call a repeated game with such a payoff factor \(\delta\) a discounted repeated game.

Now that we can compare payoffs for the infinite iterated Prisoners’ Dilemma, we can start to look for new Nash equilibria.

Example 5.13

Consider the grim strategy \(s_G\) introduced in Example 5.11, and assume that both players are restricted to the pure strategy set \(\{s_A, s_B, s_G\}\). Show that \((s_G,s_G)\) is a Nash equilibrium if \(\delta>\frac{1}{2}\).

Example completed by hand in the lecture

The restriction of the allowable strategies in the last example may seem rather unnatural. We will see a way to relax this condition shortly.