Lichten and Simon (2007) argued for the use of a Total Quotient (TQ) that combines existing IQ and adaptive functioning scores into a single index. They proposed that the TQ index will improve the accuracy of mental retardation diagnoses in court cases, particularly in capital cases, and, hence, would make a court's decisions fairer and more consistent. The TQ combines scores from standardized intelligence tests and adaptive behavior scales and is said to exhibit greater divergent validity than either separate measure, as indicated by the modest correlations between the sets of scores that represent the two domains. Simon and Lichten argued that the current diagnostic standard (i.e., a score of at least two SDs below the mean on both standardized intelligence and adaptive behavior measures) reduces the prevalence rate of mental retardation by a factor of at least 10, depending on the exact correlations between the two instruments. A second benefit for the use of TQ, according to the authors, is that it will preserve the accepted prevalence rate for mental retardation (approximately 2.5% of the population), which is established through the application of the characteristics of the normal distribution.
The argument that Lichten and Simon (2007) advanced in advocating for TQ is based on interrelated claims pertaining to the reliability and validity of TQ and its component scores as well as how using TQ impacts the prevalence rate for mental retardation. In evaluating the usefulness of TQ for making consequential decisions, how each element in the authors' argument contributes to the total case should be considered. First, does the TQ index possess greater divergent validity than either an IQ or an adaptive behavior score or a profile of scores that incorporates both dimensions? The authors based their argument on the fact that the TQ index is composed of all the scales that assess or measure the various dimensions of the two constructs, IQ and adaptive behavior.
It is accurate to say that TQ reflects a broader range of the domains of human functioning than either IQ or adaptive behavior considered separately. However, each separate measure is composed of multiple subscale scores that describe functioning in a variety of areas within each domain. When the subscales of each measure are summed to compute a total score, and these two total scores are summed to compute a global TQ index, the result is a substantial loss of information about the individual. In other words, vital descriptive data pertaining to individual differences across each of the IQ and adaptive functioning subscales are sacrificed. Rather than a profile of scores that describes an individual's standing in relation to a standardization sample in each relevant area of functioning, we now have only the single TQ index. The profile of scale scores is particularly meaningful for various clinical or legal decisions, such as determining an individual's service needs or when a person is or is not sufficiently responsible to stand trial for a crime. As the authors noted, contemporary standards for the use of test information (American Educational Research Association et al., 1999) caution against using a single score to make treatment or educational decisions. Nearly two decades ago, Anastasi (1989) noted this common misuse of testing data and called it the “hazard of the single score” (p. 611). The TQ is a single score, and it does not convey the same richness of information as is expressed in a total profile of scores from the two instruments. Hence, the claim of greater divergent validity for the TQ may be accurate in a very narrow sense, but is untrue in the larger sense.
Lichten and Simon (2007) made their claim for greater divergent validity of TQ by contrasting it with IQ or adaptive behavior, not by contrasting TQ with the profile of IQ and adaptive behavior scores. This is an artificial contrast that is at odds with contemporary standards and practice. It is well-established by the guidelines of relevant professional organizations (e.g., American Association on Intellectual and Developmental Disabilities— AAIDD) and federal and state laws (e.g., Rehabilitation Act, Office of Special Education and Rehabilitative Services regulations) that formal diagnoses of mental retardation for eligibility purposes must include assessments of cognitive functioning (IQ) and adaptive behavior functioning. The relevant question, therefore, is whether the TQ index is diagnostically superior to using the IQ and adaptive functioning scores without summing them. The authors did not address this relevant question directly. In fact, based on the previous discussion point, the answer is certainly no.
Lichten and Simon (2007) argued for the use of TQ for diagnostic purposes rather than a profile of IQ and adaptive scores when they stated that TQ retains the accepted prevalence rate for mental retardation, whereas the prevalence rate is reduced by at least a factor of 10 when the current diagnostic criteria (i.e., a score of at least two SDs below the mean on each separate measure) is applied. Here, too, their assertions may be accurate within the narrow bounds of their own assumptions, but these do not accurately reflect contemporary guidelines. Luckasson et al. (2002) did not endorse the use of a total adaptive behavior score for this purpose. Rather, “significant limitations are operationally defined as performance that is at least two SDs below the mean of either (a) one of the following three types of adaptive behavior: conceptual, social, and practical skills” (p.14). In other words, summed scores for each of the three domains of adaptive functioning form the appropriate basis for diagnosis of mental retardation, not the total adaptive behavior score that is computed by summing the three domain scores. It can be deduced that the prevalence rate for the diagnosis of mental retardation will be substantially higher when the AAIDD criterion is applied than when the Lichten and Simon (2007) criterion is applied. Nonetheless, they have raised a significant concern related to the actual impact of the accepted diagnostic criteria on the prevalence rate of mental retardation. It is an empirical question that should be answered through investigation.
It is well accepted by experts in psychological testing that institutional decisions (e.g., courts, schools, service agencies) that have important social consequences in the lives of individuals should be based on a comprehensive assessment of all relevant domains of functioning. The concept of validity in testing incorporates this principle (Messick, 1980). Validity pertains to both the empirical evidence and social consequences of using particular assessments for specific purposes (American Educational Research Association et al., 1999). There is no more consequential decision than a court case that involves life and death consequences for the defendant.
Appropriate and valid social uses of test information depend on the quality and accuracy of the interpretations that are made based on test scores, and these inferences are most often about the behavior of the individual. For this reason, it is vital that there is substantial empirical evidence that test scores that form the basis for decision-making are related to the actual behaviors of the individual. Otherwise, the test lacks validity for that specific use (American Educational Research Association et al., 1999). As noted previously, summing the various subscales of IQ and adaptive behavior scales to compute a TQ score results in a substantial loss of descriptive information about the individual's functioning. The TQ is a highly abstract indicator that lacks the necessary specificity to accurately describe the cognitive, behavioral, or social skill status of the individual. In terms of social consequences of test data, the use of a profile of scores is more defensible, precisely because these scale scores are more closely linked to the actual behaviors of the person in the various functional domains, and more accurate inferences about present and future behavior can be made on this basis.
Lichten and Simon (2007) implied that the use of a TQ index will result in more accurate decisions by various entities, which is necessitated by the current situation for which they quoted Sandra Scarr in Elliot (1987): “two reputable judges can reach opposite decisions [on use of IQs to make special education assignments] from essentially the same evidence. (p. v). Presumably, decisions will be more accurate if they are based on assessment data that reflect greater precision in measurement. However, in the context of test validation, precision, consistency, and accuracy are not synonymous terms. Precision and consistency of measurement refer to the reliability of a test or assessment. Accuracy, on the other hand, refers to the validity of decisions that are made based on test data. This key distinction is reflected in the truism that reliability of a measure is a necessary but insufficient condition for its validity. The accuracy or inaccuracy of a decision can only be determined by reference to well-accepted criteria of the construct in question, in this case, the competence of the individual. The use of TQ as a diagnostic criterion in capital cases may lead to more consistent court judgments, in that decisions in separate cases would be based on the application of the same TQ benchmark. However, because TQ is a global measure that is composed of summed scores that assess two distinct constructs, two individuals could have the same TQ score, but these would be composed of different combinations of IQ and adaptive behavior scores. In these cases, comparable TQs do not have a consistent meaning across individuals and could not be interpreted in the same fashion. Use of TQ as a diagnostic benchmark would likely create greater ambiguity in classification because individuals with other disabilities (e.g., cerebral palsy, traumatic brain injury) who currently do not meet all relevant criteria for mental retardation could qualify using the TQ benchmark.
Finally, the primary issue that is being adjudicated in death penalty cases regarding people who are or may be diagnosed with mental retardation is the competence of the individual. In the legal context, IQ and adaptive behavior assessment data are used as proxy indicators of competence to determine when individuals should be considered responsible for their actions, and this evaluation by the court involves making inferences about real world behaviors from test scores. However, competence is more than the sum of individual aptitudes or behaviors as measured by these tests. Competence is also a function of the level and types of supports available to individuals in their performance of social roles. In the American Association on Mental Retardation (now the American Association on Intellectual and Developmental Disabilities—manual, Luckasson et al. (2002) stated that: “judicious application of supports can improve the functional capabilities of individuals with mental retardation” (p. 145). Functional capacity of an individual depends on the social context and reflects the life-long interaction of the individual with his or her environments. Adaptive behavior (competence) is not purely a characteristic of the individual. The perspective of the American Association on Mental Retardation (now the American Association on Intellectual and Developmental Disabilities) is consistent with other contemporary models of disability that have been adopted by the World Health Organization (2001) and the National Institute on Disability and Rehabilitation Research (Seelman, 1998) that also emphasize the importance of context in evaluating function and participation in the lives of individuals. Hence, level of support and other indicators of environmental context should be taken into account to achieve a more accurate picture of an individual's functional status. In contrast to these current perspectives, it seems that Lichten and Simon's (2007) case in advocating for the use of the TQ further reifies the test score as an individual characteristic, and this can only lead to less accurate decisions, whether made by a school district, a community agency, or a court because of what is left out of consideration.
If indeed defining mental retardation can be a matter of life and death, it is the responsibility of experts in assessment to help ensure that these decisions are based on the most reliable and valid data available. Evaluating the validity of tests and measurements for specific purposes is fundamentally akin to establishing the validity of any other scientific proposition. It involves marshalling the best available theoretical and empirical evidence to make a reasoned, defensible argument for a proposed interpretation (American Educational Research Association et al., 1999). Lichten and Simon (2007) advocate the use of TQ to reduce the level of ambiguity in making consequential selection decisions. However, within the adversarial, judicial context in which these decisions are made, the various sources of imprecision in test scores and ambiguity in how to interpret them make it difficult to base the most critical judgments on test data only.
Moreover, few guidelines have been proposed on how to study the consequences of measurement for selection decisions (American Educational Research Association et al., 1999). Test validation is always a work in progress. The consequential decisions that are made on the basis of test scores, including life and death decisions, are judgments on which experts, judges, and juries will disagree. This reliance on judgment does not satisfy our desire for consistency, accuracy, and humaneness in making the most consequential decisions. In this I agree wholeheartedly with the authors. However, we should also be mindful of the possibility that, in the attempt to eliminate subjectivity and enhance consistency by mandating a firm diagnostic benchmark such as TQ, the freedom of judges and juries to act from a humane motivation in certain cases where it may be warranted could also be sacrificed. In the final analysis, I believe that the use of TQ in these cases would result in more rigid and less humane decisions overall because these decisions will be based on a criterion that is less precise and more ambiguous than nonexperts may appreciate.
Lichten and Simon (2007) performed a valuable service to the field in prompting discussion about the use of test data in making various consequential decisions. Innovative approaches to using test data to improve human judgments should be welcomed, and research programs to evaluate the psychometric characteristics and social consequences of using test data for these purposes should be supported. However, the proposed use of TQ as a diagnostic benchmark in extraordinary circumstances such as capital court cases will require an extraordinarily high level of research evidence that supports these uses.
Author: James Bellini, PhD (Jlbellin@syr.edu), Associate Professor, School of Education, 264 Huntington Hall, Counseling & Human Services, Syracuse University, Center on Human Policy, 805 S. Crouse Ave., Syracuse, NY 13244-2280