Best Laid Plans

http://commons.wikimedia.org/wiki/File:Leonid_Pasternak_001.jpg

Suppose you hire two proofreaders to go through the same manuscript independently. The first reports A mistakes, the second reports B mistakes, and C mistakes are reported by both. How can you estimate how many errors remain undiscovered?

Let M be the total number of mistakes in the manuscript. Then the number undiscovered by the two proofreaders is M – (A + BC). Let p and q be the probabilities that the first and second proofreaders, respectively, notice any given mistake. Then ApM and BqM. And because they work independently, the chance that they both find a given mistake is CpqM.

But now

\displaystyle M = \frac{pM \times qM}{pqM} \approx \frac{AB}{C},

and the number of misprints that remain unnoticed is just

\displaystyle M - (A + B - C) \approx \frac{AB}{C} - (A + B - C) = \frac{(A-C)(B-C)}{C}.

This means that as long as the proofreaders work independently, you can estimate the number of errors they’ve overlooked without even knowing how skillful they are. If they find a large number of mistakes in common but relatively few independently, then the manuscript is probably relatively clean. But if they generate large independent lists of errors with few in common, there are probably many mistakes remaining to be found (which matches our intuition).

(George Pólya, “Probabilities in Proofreading,” American Mathematical Monthly 83:1 [January 1976], 42.)