Prisoner's Dilemma (Math Lair)

Math Lair Home > Topics > Prisoner's Dilemma

Prisoner's Dilemma is a game model in game theory, first formulated by A. W. Tucker in 1950. In its original form, it involved two prisoners. The two prisoners were arrested on suspicion of a crime (say, a big burglary) and separated. The police don't have enough evidence for a conviction, so they encourage each prisoner to turn state's evidence on each other. This results in the following possibilities (and assume that both prisoners are also aware of these possibilities):

If both prisoners remain silent, they'll both be convicted of a lesser crime (say, possession of burglary tools) and receive a one-year sentence.
If one prisoner confesses but the other doesn't, the one that confesses will go free and the other will receive an eight-year sentence.
If both prisoners confess, they'll both receive a five-year sentence.

Now, the dilemma is this: The first prisoner, when deciding what to do, might think: "Well, if the other guy stays silent, then my best bet is to confess, since I'll go free. If he confesses, then my best bet is also to confess, since I'll get five years instead of eight. Since my best bet is to confess no matter what the other guy does, I'll confess." The second prisoner has the same thoughts, and so both confess and receive a five-year sentence. However, if both had kept silent, they could have done a lot better; however, without knowing what the other will do, the only rational choice is to confess so as not to be double-crossed. Even allowing the the two prisoners to talk with each other beforehand wouldn't solve the problem; even if they agreed to cooperate, they would then be faced with the dilemma of whether to keep their promise or break it.

People soon realized that Prisoner's Dilemma can be applied to many other real-life scenarios than just whether to talk with the police or not. For example:

Two neighbouring countries have some dispute. Each country can act in either a friendly manner or a warlike manner towards the other country. If one country acts friendly while the other acts warlike, the friendly country will be defeated. However, the two countries would be better off if both acted friendly than if both acted warlike.
Two online stores are selling a popular item at the same price. Each store can either leave the item at the same price, or lower the price of the item. If only one store lowers the price, that store will get all of the sales. However, if both lower the item's price, their profits will be lower than if both had held the price of the item.
One person is buying an item from another person. The seller can cheat the buyer (by providing defective merchandise instead of good merchandise) and the buyer can cheat the seller (by issuing a bad cheque instead of a good cheque). However, if both cheat, neither get what they want.

To analyze the game, it helps to generalize it. In the general game there are two players, each of whom has two choices: To "cooperate" or to "defect." Each player moves simultaneously and so has no knowledge of the opponent's move. These choices can be represented in a payoff matrix, as shown below. Note that, in the matrix below, (x, y) means that the first player's payoff is x and the second player's payoff is y.

Second Player
Cooperate Defect
First Player Cooperate (3,3) (0,5)
Defect (5,0) (1,1)

		Second Player
		Cooperate	Defect
First Player	Cooperate	(3,3)	(0,5)
Defect	(5,0)	(1,1)

If both players cooperate, they both get 3 points. If one cooperates and one defects, the player defecting gets 5 points and the one cooperating gets 0. If both defect, they each get 1 point. Note that defecting dominates cooperating. No matter what your opponent does, it's always better to defect. However, again, if both players cooperate, they do better than if both defect.

We could further generalize the game as follows:

Second Player
Cooperate Defect
First Player Cooperate (R,R) (S,T)
Defect (T,S) (P,P)

		Second Player
		Cooperate	Defect
First Player	Cooperate	(`R`,`R`)	(`S`,`T`)
Defect	(`T`,`S`)	(`P`,`P`)

Where:

R is a "reward" for mutually cooperating,
P is a "punishment" for mutually defecting,
T is a "temptation" payoff for taking advantage of a cooperative opponent,
S is a "sucker" payoff for cooperating when the opponent defects.

A game is called Prisoner's Dilemma when T > R > P > S (i.e. the temptation payoff is largest, followed by the mutual cooperation payoff, followed by the mutual defection payoff, followed by the sucker payoff) and when the total number of points scored by mutually cooperating is greater than the total number of points scored by one player defecting and the other cooperating; in other words, R > ½(T + S).

Iterated Prisoner's Dilemma

What if the two players play more than once? This would be closer to the real-life applications of the game: The two countries will have other disputes in the future, the stores can change their prices again if they aren't happy with their sales or profits, and the buyer and the seller may want to transact other business in the future. In a one-move game, it's always best to defect, but in an iterated game, players may have an incentive to cooperate in the hope that their opponent will see the benefit of cooperating in the long run. This scenario was first explored in the 1980s by Robert Axelrod. There are many different strategies that a player could take; here are some simple ones:

All C: Cooperate on every move.
All D: Defect on every move.
Retaliation: Cooperate as long as your opponent cooperates, but once your opponent defects, defect from then on.
Tit for Tat: Cooperate on the first move. After that, do whatever the opponent did on the previous move.
Random: Select whether to cooperate or defect at random.
Alternate: Cooperate on even-numbered moves; defect on odd-numbered moves.
Fraction: Cooperate on your first move. After that, cooperate if your opponent has cooperated at least 50% of the time, and defect otherwise.

If your goal is to maximize your score, which strategy should you pick? It's pretty clear that there isn't one strategy that will do better than all other strategies. If you always defect, your opponent will never outscore you; however, playing a cooperative strategy may get you more points in the long run, even if your opponent gets more points.

Axelrod held two Prisoner's Dilemma tournaments, with the entrants being computer programs that implemented strategies such as the ones above. Each program played each other program in a five-game match of 200 moves. The scores for each match were added, with the winner being the one with the highest score. In the first tournament, there were 15 entrants. The winner was Tit for Tat, a four-line BASIC program submitted by Anatol Rapoport, formerly a professor at the University of Toronto who had previously written about the iterated Prisoner's Dilemma. The second tournament features 62 entrants. Even though entrants knew the results of the first tournament and were gunning for Tit for Tat, Tit for Tat won again.

While it has not been proven that Tit for Tat is the "best" possible strategy, it is believed that Tit for Tat is an optimal strategy against a wide range of strategies. Even though Tit for Tat can never get a higher score than a particular opponent, it does well against "nice" strategies (strategies that are never the first to defect) but cannot be exploited too much by "nasty" strategies. It will also revert to cooperation if the opponent does so as well, thus maximizing points.

Sources used (see bibliography page for titles corresponding to numbers): 52, 53.