This again is a guest post by Dan who has been very productive the last week.
We’ve all heard that if you had an infinite number of monkeys plugging away on an infinite number of typewriters for an infinite amount of time then eventually those monkeys would produce all the works of Shakespeare. In his “paper” Mathis sets his sights on disproving this statement, and fails miserably. This is fundamentally a problem of statistics, yet nowhere does Mathis ever use a single statistical argument.
Mathis asserts that since it is possible for a monkey to type an infinite string of “S”’s, that there is a non-zero probability that this monkey will not produce a work of Shakespeare. It is true that there is the possibility that no work of Shakespeare will be produced, but there is a greater probability one will be produced. Mathis seems to quibble over the statement of the monkey theorem given in layman’s terms, had he taken the time to see the precise formal statement of the theory he would realize his argument is flawed. There are a few ways this theorem can be stated, I will use a formulation that does not appeal to the notion of infinite monkeys. To make this formulation precise a few simplifying assumptions are made that in no way change the problem and the process is clearly defined.
Instead of all Shakespeare’s works, let us consider only Macbeth. Let us assume that to reproduce Macbeth it would require n keystrokes. On the given typewriter being used let us assume there are m keys. I stress not knowing the exact numbers in no way changes the results being presented.
The experiment will be conducted as follows. One monkey will randomly press one key at a time on the typewriter. Once the monkey has made n keystrokes we will examine the document produced. If the document matches the Macbeth, then we will call it a success. If it is not a success then the monkey will be given another try. The experiment is complete once the monkey produces a success.
Statistical Analysis & Predictions
We are interested in predicting how many tries the monkey will need until he reproduces Macbeth. Let P(S) be the probability the monkey succeeds on a given try. In order to produce Macbeth the monkey must press n characters in the perfect order. Since there are m possible keys and only one of them is right for the jth character, the probability the monkey presses the right key for the jth character is (1/m). Since he has to press n keys in total and by assumption the keystrokes are statistically independent, the probability of success is given by:
Let X be the random variable that represents the number of trials before the monkey succeeds. Let P(X = k) be the probability that the monkey fails the first (k -1) tries and succeeds on the kth try. The probability of failure on a given try is 1 minus probability of success. Since each try is statistically independent we multiply the probabilities of the individual events to find the probability of the compound event. Thus we have the following:
This is an example of what is known as a geometric random variable. What we are interested in is the expected value of X. If you are unfamiliar with the language of statistics, the expected value of X means if we were to repeat the experiment over and over, how many trials it takes for the monkey to succeed on average.
Expected Value of a Discrete Random Variable:
If X is a discrete random variable with probability function P(X), then the expected value of X is given by:
Fortunately since X is a geometric random variable there is a simple formula that gives the expected value of X.
Expected Value of a Geometric Random Variable:
If X is a geometric random variable with probability P(S) of success on any given trial then the expected value of X is given by:
Now in our case we know P(S) and upon substituting the known value we have:
Now mn is an admittedly vary large number in this situation, but it is a finite number. This means that if we were to repeat this experiment over and over then on average the monkey would need mn tries before he successfully typed Macbeth. Note that if we assume that Macbeth has 100 000 characters and a typewriter has 50 keys, this gives us 50100 000 tries on average. Now this is staggeringly large number and shows that this is not a practical undertaking, but practical or not the result is valid.
A Second Analysis
Above we calculated the probability of a success on a given try to be P(S). Now instead of calculating the expected number of tries before success, we can look at it a different way and ask what is the probability the monkey eventually succeeds?
What exactly does it mean for the monkey to eventually succeed? I know this sounds like a pedantic question, but it is useful to properly formulating the problem. This means that on some attempt, say the jth one he finally reproduces Macbeth, but fail on all previous attempts. So to find the probability that he eventually succeeds we need to add the probabilities of success on each try. We need to add the probability he succeeds on the first try, the second try, the third try, and so on. To succeed on the jth try means on the first (j-1) tries he fails then finally gets it right on the jth try. Since each attempt is assumed statistically independent, we simply multiply the probability of failing (j-1) times by the probability of succeeding once on the jth try.
So now we add these probabilities up.
P(eventual success) = P(success on 1st) + P(success on 2nd) + P(success on 3rd)…
This is a geometric series with first term P(S) and ratio [1 – P(S)]. Since the common ratio [1 – P(S)] is less than one, we can apply a simple formula to calculate this sum.
Sum of Geometric Series:
We now apply this formula to our series:
This tells us that that as we give the monkey more and more tries, as the number of tries grows without bound; the probability of a success goes to 1. In the language of statistics this means the success occurs with probability 1, or the event occurs almost surely. Simply put, given enough tries the monkey will almost surely reproduce Macbeth.
Mathis has accomplished a few things with this “paper”. First he has provided a great example illustrating the need for theorems to be stated in precise mathematical language. Everyday language is far too cumbersome to properly attack a problem, the advantages of equations are clear. When a mathematical problem is formulated in everyday language there is the possibility of it being interpreted incorrectly, as Mathis did when he confused the word must with the notion with probability 1. Subtle as the difference is, it completely changes the analysis and final conclusion. Equations give us concrete representation with very precise meaning, a clear way to model the problem and move forward solving it. Second he has showed that no amount of folksy wisdom or “common sense” can ever be a suitable replacement for true mathematical or statistical analysis of a problem. There is a very good reason why mathematical statements are made precise; among other things it helps minimize possibility of what you think should be true interfering with what really is true.
Mathis has clearly demonstrated his lack of understanding of the most basic concepts of probability. I once again suggest he spend the money and buy a decent introductory statistics text, An Introduction to Mathematical Statistics and Its Applications (3rd Edition) by Larson and Marx might do. It is obvious that Mathis has a deep interest in mathematics and physics, so what I don’t understand is why he refuses to learn any of it properly.