PREDICTIVE POWER OF THE POLYGRAPH: CAN THE "LIE DETECTOR" REALLY DETECT LIARS?
by Allan S. Brett, Michael Phillips, John F. Beary
Allan S. Brett,
John F. Beary III,
THE LANCET, MARCH 8, 1986
PREDICTIVE POWER OF THE POLYGRAPH: CAN THE "LIE DETECTOR" REALLY DETECT LIARS?
ALLAN S. BRETT
Departments of Medicine, New England Deaconess Hospital and Harvard Medical School, Boston, Massachusetts; Chicago Medical School, Chicago, Illinois; and Georgetown University School of Medicine, Washington, DC, USA
Expanded use of the polygraph as a detector of lies has been proposed in the United States and the United Kingdom. The positive predictive value of the polygraph (ie, the proportion of positive test results that are true positives) was assessed, on the evidence of the best published data for the sensitivity and specificity of the device. In many screening or investigative situations, the predictive value would be poor; most of the positive results would be false positives. Consequently, truthful persons incriminated as liars by the polygraph would outnumber actual liars with a positive result on the test.
THE polygraph is a psychophysiological recording device employed to detect lies. In the United States it has been used by law enforcement agencies in investigations of criminal suspects, and by Government and private industry in the screening of employees for criminal activity. During the interrogation of a suspected liar, the device records physiological variables that are under autonomic control (heart rate, blood pressure, respiration rate, and galvanic skin response). Proponents of the polygraph assert that a trained examiner can detect a characteristic pattern of responses in this record when the subject is telling a lie. [l]
The polygraph has been a source of controversy in both the United States,   where the device has been used extensively for many years, and the United Kingdom,   where increased use has lately been proposed. Critics have expressed two major concerns. First, lie detection by the polygraph is based on the unproven assumption that the act of telling a lie is accompanied by a specific and reproducible set of physiological responses. Second, the alleged accuracy of the method is in doubt, since several reviewers of the experimental work have found serious flaws in research design and widely disparate published results of polygraph performance.  
Can the polygraph really detect liars? If so, how well does it perform? Physicians practising clinical medicine frequently ask similar questions about diagnostic tests -- eg, how accurately does exercise treadmill testing identify patients with coronary artery disease? We know that a positive exercise stress test in a 25-year-old symptomless woman does not imply the same probability of coronary artery disease as a similar result in a 60-year-old man with exertional chest pain. By the same token, the meaning of a positive polygraph result is dependent upon the population being studied. In both situations, the person interpreting the test result must accept a degree of uncertainty. The findings may reflect false positive or false negative results, and the proportion of these erroneous results may vary among different populations. However, if one knows the performance characteristics of the test (ie, sensitivity and specificity), and the estimated prevalence of the abnormality in the population, one can calculate the probability that a test result confirms or excludes that abnormality. Vecchio  defined this concept as the predictive value of a diagnostic test -- ie, the probability that a person with a positive test result actually has the disease or that a person with a negative test result does not have the disease.
We discuss here an application of these methods to determine the predictive value of polygraph testing. Specifically, we calculated the probability that a person is lying when the test result is positive, or that a person is truthful when the test result is negative. This study was undertaken because in existing publications on polygraph interpretation the concept of predictive value is either ignored or alluded to only briefly.
The analysis consisted of two components. First, we conducted a literature review of empirical studies of polygraph performance to determine sensitivity and specificity of the test. Second, we used Bayes' formula to derive predictive values of the polygraph in several plausible clinical settings.
After a computerised Medline search and a general review of English language publications concerning the polygraph, four basic criteria were used to select studies most likely to have valid and generalisable results (since there was considerable variation in case selection and methodology):
1. Polygraph data were obtained from field investigations of suspected criminals.
2. Truth or falsehood was explicitly stated to have been established by subsequent confession of the guilty party. This criterion ensured a consistent standard of truth against which to compare polygraph interpretations.
3. There was no discernible bias in the selection of records (aside from satisfaction of the first two criteria) or in the assignment of files to evaluators.
4. Evaluators based their interpretations solely on polygraph data.
We identified two studies fulfilling these criteria. In one study, Horvath  assigned to ten evaluators the records of 56 suspected criminals, half of whom were verified as guilty and half as innocent by subsequent confessions. These evaluators achieved an average sensitivity of 77% (ie, the probability that a liar has a positive test) and specificity of 51% (ie, the probability that a truth-teller has a negative test). The other study was performed by Kleinmuntz and Szucko.  They assigned to six evaluators the polygraph records of 100 suspected thieves, half of whom were subsequently verified by confession as guilty and half as innocent. The evaluators achieved an average sensitivity of 76% (range 64-82% and a specificity of 63% (range 50-82%). Because the results of these studies are remarkably similar, we used figures of 76% sensitivity and 63% specificity as "average test performance" in subsequent calculations.
To allay criticism that the above studies are not representative of the published work, we also reviewed studies not conforming to our four criteria. For example, we identified four studies performed by examiners using records in their own private polygraph firm.     Problematic areas in these studies included non-random selection of files and lack of explicit statements as to how truth or falsehood was independently verified. An average sensitivity of 87% and specificity of 88% was achieved by these examiners. Although these studies are methodologically flawed, we will use these figures as "extreme test performance" in subsequent calculations, since they are among the highest published figures for accuracy of the polygraph.
The positive predictive value [PV(pos)] was calculated as follows:
offenders with a positive test
where p=prevalence of offenders; a=sensitivity of the test; b=specificity of the test. The negative predictive value (PV9neg)] is analogously defined as true negatives divided by total negatives:
The sensitivity and specificity values for "average test performance" were inserted into equation 1 to construct a curve of the PV(pos) as a function of the prevalence of offenders in the population (an offender being defined as a person who commits a crime and subsequently denies it). We also calculated positive and negative predictive values at several specific prevalences of offenders, using figures for both "average" and "extreme" test performance.
The plot of the positive predictive value of the polygraph (for average test performance) as a function of the prevalence of lying offenders in the population is shown in the figure. The increment in the y-axis between the dashed and solid lines represents the marginal gain contributed by the polygraph over the pre-test probability. For example, when the prevalence of offenders in the population is 5%, the positive predictive value is about 10%. That is, of every 10 positive tests generated in such a population, only 1 is a true positive.
Prevalence of Lying Offenders in the
Table I shows the positive predictive value at various prevalences of liars. When the polygraph is used to screen prospective employees for previous theft, a low prevalence of offenders (eg, 5%) might be expected. Whether one uses the average or extreme test performance, the positive predictive value is low: only 10-28% of persons with positive results will actually be liars. Conversely, 72-90% of these persons will be falsely accused of lying. The same figures would apply to a criminal investigation in which only 1 of 20 suspects is likely to be the offender. For criminal investigations in which the pre-test probability of guilt is intermediate (eg, 50%) the average test performance yields a positive predictive value of only 67%. Thus the incremental gain in certainty after the test is only 17%, and 33% of positive results are still false positives. The extreme test performance yields a positive predictive value of 88%, but even here the number of persons incorrectly labelled as liars (12% of positives) is not trivial.
Table I also shows the negative predictive value. When the prevalence of liars is low (as in the employment screening example), a negative result merely corroborates the known pre-test assumption that nearly all subjects are truthful. However, when the prevalence of liars is high (eg, 90%), the negative predictive value is only 23% for average test performance. This means that fully 77% of negative test results are generated by lying subjects.
The predictive values can be more simply understood by constructing a 2 x 2 contingency table (see table 11). Consider the screening of 1000 prospective employees, of whom 5% have committed previous offences. If we assume that all offenders will lie about their crimes, the number detected by the polygraph will be (number of liars x sensitivity of the test) = 50 x 0.76 = 38. Similarly, if we multiply the number of non-offenders (950) by the specificity of the test (0.63), the polygraph will indicate that 599 are telling the truth. The remaining 351 non-offenders will be read as liars, owing to false positive results. Hence polygraph testing of a random sample of 1000 subjects will yield a total of 389 (38+351) positive test results, of which 38 are true positives and 351 are false positives. Thus, the predictive value of a positive test is 10% (38/389 x 100). The same result is obtained by substitution of the appropriate values for the variables in equation 1.
We have shown that the concept of predictive value should be applied to the polygraph in the same way that predictive value is applied to any diagnostic test. Published figures for the sensitivity and specificity of a test may be misleading when background prevalence of the disease (or, in this cast, liars or criminals) is not considered. When the prevalence of a condition in a population is low, large numbers of false positive results "drown out" true positives, and the positive predictive value is poor. One might expect such a situation when employers use the polygraph to screen large numbers of prospective employees. When the prevalence is high, the polygraph result adds little certainty to the estimated probability of lying, while the negative predictive value becomes poor. When the background prevalence of offenders or likelihood of lying is unknown, the polygraph result is essentially uninterpretable.
We recognise that our selection criteria excluded studies that deemed the polygraph to be both more and less accurate than those we cited. For example, specificities in field studies have ranged from 12.5% to 94.1%.  In addition, several investigations, including that of Kleinmuntz,  have shown poor inter-observer agreement in the interpretation of records. Such wide variability in performance should raise questions regarding the validity of the technique. Lykken  has argued persuasively that most studies with sensitivities and specificities in the 90% range have serious methodological flaws. It is-therefore likely that our figures for "average test performance" are in reality representative of the must accurate capabilities of the polygraph.
One possible criticism of this analysis is that we have inappropriately applied results from studies of criminal investigations to the screening situation. However, no field studies of polygraph accuracy as a screening device have been published. Such investigations would be difficult, if not impossible, to perform because of a lack of independent criteria for truthfulness. Thus one is faced with two plausible alternatives -- (a) abandon the polygraph in screening since no data exist to confirm its accuracy in that specific setting, or (b) apply the available data to that setting. When the latter is done, positive predictive values are extremely poor.
The implications of our calculations are disturbing. Polygraph testing in several settings will generate large numbers of false positive results, thus incriminating many truthful persons. In some circumstances truthful persons diagnosed as liars will outnumber actual liars by a wide margin. Furthermore, the idea of hoping to prove one's innocence by taking a polygraph test is misguided, since the false positive rate among truthful persons may be 37% (ie, 1-specificity) or higher. Supporters of polygraph use might reply that the polygraph should not be the sole arbiter of guilt or innocence, but that results should rather be integrated with other information about a case. We feel that this position is unrealistic; the lure of investing a seemingly "objective" test with excessive confidence seems inescapable.
Our findings are not surprising. There is no rational scientific basis for any machine to detect liars consistently, since there is no known consistent physiological response unique to the cognitive state of lying.  Public policy makers should therefore ponder the very weak scientific foundation upon which the polygraph rests as they make decisions affecting its use in society.
We thank Dr Keith I. Marton for his invaluable assistance.
Correspondence should be addressed to A. S. B., Department of Medicine, New England Deaconess Hospital, Pilgrim Road, Boston, MA 02215, USA.
1. Garwood M., Ansley N., The accuracy and utility of polygraph testing. Washington, D.C. Department of Defense, 1983.
2. Keller, B., Pentagon will increase its use of polygraph tests. New York Times, Jan. 4, 1985.
3. Editorial, A machine to measure patriotism. Washington Post, Jan. 6, 1985.
4. Moore T., Lie detector company has twisted facts, say experts. Sunday Times, May 27, 1984.
5. Report of the Security Commission, London: HM Stationery Office, 1983.
6. Saxe L., Dougherty D., Cross T. Scientific validity of polygraph testing: A research review and evaluation. Technical memorandum OTA-TM-H-15. U.S. Congress, Office of Technology Assessment, 1983.
7. Lykken D.T., A tremor in the blood uses and abuses of the lie detector. New York: McGraw Hill, 1981.
8. Vecchio T.J. Predictive value of a single diagnostic test in unselected populations. N. Engl J. Med. 1966, 274: 1171-73.
9. Horvath F., The effect of selected variables on interpretation of polygraph records. J Appl. Psychol. 1977; 62: 127-36.
10. Kleinmuntz B., Szucko J., A field study of the fallibility of polygraphic lie detection, Nature 1984; 308: 449-50.
11. Horvath F.S., Reid J.E., The reliability of polygraph examiner diagnosis of truth and deception. J. Crim. Law Criminal Police Sci., 1971; 62: 276-81.
12. Hunter FL., Ash P., The accuracy and consistency of polygraph examiners' diagnoses. J. Police Sci. Admin., 1973; 1: 370-75.
13. Slowik S.M., Buckley J.P., Relative accuracy of polygraph examiner diagnosis of respiration, blood pressure and GSR recordings. J. Police Sci. Admin. 1975; 3: 305-09.
14. Wicklander, D.E., Hunter F.L., The influence of auxiliary sources of information in polygraph diagnoses. J. Police Sci. Admin., 1975; 3: 405-09.