To Match or Not To Match? Methodological Issues in Autism-Related Research

Jarrold, C., & Brock, J. (2004). To match or not to match? Methodological issues in autism-related research. Journal of Autism and Developmental Disorders, 34, 81-86.

Studies of autism typically adopt a factorial matched-groups design aimed at eliminating nonspecific factors such as mental retardation as explanations of performance on experimental tasks. This paper reviews the issues involved in designing such studies and interpreting their re- sults and suggests that the best approach to matching may be to equate performance on care- fully designed control tasks. However, we also argue that the interpretation of such studies is often complicated by the fact that associations between background measures and experimental task performance are not clear. Consequently, we also advocate the use of regression techniques that allow the researcher to determine the factors that relate to task performance and to assess the extent of group differences on the task of interest having taken these factors into account.


Cronbach (1957) argued that there are only two classes of methods of determining the factors that underpin psychological task performance. The analysis of individual differences aims to map out the associations among measures of performance across tasks and domains. The alternative approach—factorial comparison of experimental manipulations—focuses on dissociations, as experimental designs attempt to determine whether performance in a given group in a given condition differs reliably from that seen in other groups or conditions (Baddeley & Gathercole, 1999).

In autism research, the relative scarcity of the condition and the influence of theoretical approaches that emphasise the likely importance of discrete domainspecific deficits (e.g., Baron-Cohen, 1995; Frith, Morton, & Leslie, 1991) have arguably discouraged an individual differences approach. Instead, factorial matched designs are more commonly employed in order to determine whether the observed level of performance of individuals with autism is above or below that expected given their general level of intellectual functioning. The inferences that can be drawn from such designs necessarily depend on the nature of the comparisons made and the control conditions and comparison groups employed. This paper begins by considering the issues involved in the selection of appropriate comparison groups for experimental research with individuals with autism. However, matched-designs are not without their problems, and the general limitations of this type of approach, as well as the specific concerns it raises in the context of autism research will also be considered. Finally, we present and discuss the relative merits of alternatives to strict matching designs that potentially avoid some of these concerns and limitations.

Matching Choices

As noted above, the aim in matching is to rule out “noncentral” explanations of group differences. Perhaps the most commonly employed matching measures in autism research are fairly general ability measures, such as verbal or nonverbal mental age, aimed at controlling for the fact that poor performance among individuals with autism may be a general consequence of mental retardation rather than being specific to the condition. However, the difficulty with employing “intelligence” measures is that IQ profiles are not flat in autism or indeed in many other developmental disorders— individuals with autism show peaks and troughs of performance on the various subtests of most if not all intelligence test batteries (see Joseph, Tager-Flusberg, & Lord, 2002). Consequently, while matching groups for full-scale IQ or a global mental age measure may ensure that comparison participants have the same “average” ability as a group of individuals with autism, there is a very real danger that these groups will not actually be matched for any single ability assessed by the IQ test (Hobson, 1991; cf. Klein & Mervis, 1999).

Comparisons of verbal and nonverbal IQ scores within individuals with autism also tend to show that individuals have higher nonverbal than verbal abilities (Joseph et al., 2002). As a result, if one matches participants with and without autism for verbal ability, the individuals with autism are likely to have superior nonverbal skills, which clearly could affect performance on a task that depends to some extent on nonverbal abilities. Conversely, if one matches for nonverbal abilities the groups are unlikely to be equated for verbal skills (Hobson, 1991; Ozonoff, Pennington, & Rogers, 1990). The only way around this, in terms of strict matching designs, is to include a separate comparison group for each potential matching measure. Given that there is considerable variability in subskills even within the domains of verbal and nonverbal ability in autism (Happé, 1994; Jarrold, Boucher, & Russell, 1997; Joseph et al., 2002; Kjelgaard & Tager-Flusberg, 2001), this soon becomes impractical.

A further concern with using ability or mental age measures to control for developmental delay in autism is that they are relatively indirect ways of assessing “noncentral” correlates of task performance. The assumption, of course, is that mental age will predict ability to understand task instructions, use appropriate strategies, inhibit inappropriate responses, and so forth, and broadly speaking this is probably the case. However, with careful task design one can ensure that groups are equated for these abilities. The most informative experimental designs are those that build in control conditions, resulting in versions of the task that are closely matched in most respects but differ in whether they require the target ability or not. If individuals with autism perform as well as comparison participants on the basic version of the task but are impaired when the target ability is required, then this is potentially strong evidence of a specific deficit in this area. Of course, care needs to be taken to ensure that the absence of a group effect on the control task does not simply reflect a relative lack of sensitivity in this measure, which might arise as a result of ceiling or floor effects (Bishop, 1997; Strauss, 2001).

Many matched design studies have tested for this kind of interaction. For example, Fein, Lucci, Braverman, and Waterhouse (1992) compared the performance of individuals with autism on social and nonsocial versions of a picture matching procedure which were equated for difficulty among typically developing individuals, finding evidence for poorer performance on the social relative to the nonsocial version. However, there appears to be some reluctance to match groups at the outset of a study for their ability to perform the “noncentral” component processes of a task— a strategy that potentially does away with the need for mental age matching altogether (see Phillips, Jarrold, Baddeley, Grant, & Karmiloff-Smith, 2003).

The population from which comparison individuals should be drawn might appear to be an additional decision that the autism researcher faces. In fact, this choice arguably reduces to the issue of which variable to match for; in other words, whether or not groups are matched on chronological age and degree of learning disability or IQ. Typically developing individuals are commonly employed as a comparison group—their level of performance providing a benchmark for what one would expect among individuals with autism, everything else being equal (Prior, 1979). However, unless the individuals with autism are in the normal IQ range, matching such groups on some measure of ability will inevitably mean that they are mismatched on chronological age. Consequently, any underlying deficit may be masked by the fact that individuals with autism can use their greater “experience” to compensate for poor performance (cf. Bishop, 1997). This can be particularly problematic when considering so-called “crystallized” abilities such as vocabulary knowledge (cf. Chapman, 1997). Furthermore, such groups will necessarily be mismatched on degree of learning difficulty or, in other words, IQ. One would expect that on the majority of tasks, level of intellectual ability (mental age) rather than rate of intellectual development (IQ) will determine experimental task performance. However, evidence suggests that IQ differences can relate to success on experimental tasks even among individuals of equivalent mental age (Spitz, 1982; Weiss, Weisz, & Bromfield, 1986), particularly on tasks that tap fluid rather than crystallized intelligence (Anderson, 1992, 2001).

Of course, one way to avoid these difficulties is to limit research to those individuals with autism who do not suffer from generalized intellectual delay. These individuals can be equated to typically developing individuals for level of ability to provide, in theory at least, an informative test of autism-specific deficits. However, this means that one severely reduces the potential pool of participants. Lower functioning individuals with autism may show a different pattern of performance across tasks (e.g., Ropar & Mitchell, 2001; Turner, 1997) and, consequently, assessing only higher functioning participants limits the likely generalizability of any results (Charman, 1994). Alternatively, one could employ a learning disabled comparison group matched for both age and ability, and therefore also matched on degree of learning disability or IQ. The difficulty here is that there is arguably no such thing as a normative group of individuals with learning disabilities (Burack, Iarocci, Bowler, & Mottron, 2002). Many syndromes such as Down syndrome or Williams syndrome are associated with their own unique cognitive profiles (e.g., Klein & Mervis, 1999), so it is difficult to interpret differences between individuals with these syndromes and those with autism. Ideally, one might select a comparison group from those individuals with “mental retardation” whose low IQ simply represents the bottom end of the normal distribution of IQ in the general population (cf. Burack, 1990). Unfortunately, the absence of a specific diagnosis cannot be taken as a guarantee that an individual forms part of this population, rather than having an as yet unidentified disorder.

Alternatives To The Matching Approach

The above discussion outlines some of the practical difficulties associated with the design and implementation of matched-group studies. Perhaps more important, the same issues are also pertinent when attempting to evaluate critically and interpret their results. It is often the case that groups are matched for different measures in different studies, with different patterns of impairment among participants with autism being found as a result. Similarly, in studies with multiple comparison groups, individuals with autism may show impairments (or perhaps perform relatively well) relative to one comparison group but not another. It is tempting to assume that the absence of a group difference on an experimental task reflects a causal relationship between performance on that task and the variable on which the groups are matched. However, because the aim of a matching design is to test for dissociations in patterns of performance rather than to explicitly test the degree of association between measures, such an assertion cannot be verified. These difficulties are illustrated in much of our own work examining verbal short-term memory performance in individuals with specific learning disabilities including autism (e.g., Brock, McCormack, & Boucher, 2003; Jarrold, Baddeley, & Phillips, 2002; Russell, Jarrold, & Henry, 1996). Certain theoretical accounts (e.g., Baddeley, Gathercole, & Papagno, 1998) would predict a direct causal link from poor verbal short-term memory to delayed vocabulary acquisition. However, it is also likely that relatively poor vocabulary knowledge contributes to poor verbal short-term memory performance (Hulme & Roodenrys, 1995). In our studies, we matched groups for vocabulary level to ensure that any observed short-term memory deficit was not simply a result of generally poorer verbal abilities in a particular group. Consequently, when group differences in verbal short-term memory are found, this provides strong evidence of a specific impairment. However, when group differences are not observed, such a result is difficult to interpret because of the possibility that matching for vocabulary level has indirectly “matched away” group differences in verbal short-term memory ability (cf. Bishop, 1997).

As a result, in our current work we are increasingly adopting approaches that, in theory at least, allow us to determine the associations between target and matching variables. Thus, rather than explicitly matching individuals from different groups on any one particular measure, we instead test a relatively large comparison or “normative” sample in addition to our clinical group. By taking multiple background measures from all individuals, it is then possible to account statistically for variance in experimental task performance associated with variables such as age, mental age, IQ, or performance on some other task.

One such approach is to perform analysis of covariance. Covariance is often employed in matching designs to control for factors that have not been matched for. However, the logic of this approach can be extended to the situation where groups are not explicitly matched on any measure (D. V. M. Bishop, personal communication; cf. Jarrold, Baddeley, & Hewes, 1999; Thomas et al., 2001), potentially allowing the assessment of samples that are more representative of the target population. A disadvantage of this approach is that it assumes certain statistical properties of the data. The results of analysis of covariance are considerably less interpretable if groups differ substantially on the covariate in question (Evans & Anastasio, 1968; Huitema, 1980; Miller & Chapman, 2001). In these cases, observed means may be adjusted for spurious reasons, as a relationship between level of covariate and performance may be driven by the aggregation of data points into “group” clusters, even if there is no such relation at the level of individuals within each group (cf. Robinson, 1950). Similarly, analysis of covariance requires that all groups show the same pattern of relationship between covariate and performance (homogeneity of regression slopes, or planes in the case of multiple covariates). These assumptions are not always easily met, particularly in designs with relatively small samples.

Clearly, the issue of sample size is a particular concern in autism research, and recruiting a reasonably large homogeneous group of individuals with autism is time consuming. However, recruiting comparison individuals, particularly typically developing individuals, is usually less onerous. Consequently, a second approach that is related to the use of covariance, but which is less stringent in its assumptions, is to assess a relatively large comparison group in order to map out a normative relationship between background measures and task performance. One can then regress performance against age or ability in this normative sample and then, for each individual in the clinical group, determine the discrepancy between the observed and expected levels of task performance for their level of age or ability. The normative variation in performance (the standard error in the estimate of regression parameters for the normative group) can then be used to standardize the performance of any given individual in a clinical group for their level of age or ability. If this approach is applied to a range of tasks within the same target and normative populations, then standardized values can be directly and meaningfully compared to determine whether individuals’ relative level of impairment varies across these tasks.

Care does, however, need to be exercised when fitting a normative regression between age or ability on the one hand and task performance on the other. Floor effects in younger and less able individuals, and ceiling effects among older, more able individuals, are likely in populations of a large developmental range. Consequently, the relation between age or ability and performance may not be linear across the whole developmental range. As a result, one cannot confidently extrapolate regressions of this form beyond the range of the normative data, and this approach should only be employed to standardize the performance of individuals in a clinical sample who fall within this range. Furthermore, it may even be inappropriate to model such normative data with linear regression techniques, which instead might best be limited to the central section of the developmental function. Alternatively, nonlinear regression approaches may be used to capture the full range of developmental change on a task (cf. Happé, 1995).

These approaches allow one to account for potentially confounding factors, without requiring individuals to be matched on these variables. Indeed, it is relatively straightforward to take multiple background measures and then simultaneously account for differences on any that relates to task success. In contrast, matching groups on more than one criterion is often extremely difficult and, even if possible, will involve such a degree of selectivity that the generalizability of the findings will be reduced considerably.

However, as with matched group studies, “background measures” may not be the most direct indices of those noncentral factors that influence performance on the task of interest. The most informative matched designs are those that build in a control condition to assess groups’ ability to perform the noncentral aspects of the central task, and individual differences in performance on a control task can be used equally effectively. With the covariance approach, this amounts to determining whether group differences in the experimental task can be accounted for in terms of performance on the control task. Similarly, regression-based standardization can be used to determine whether the scores on the experimental task are significantly different from those predicted by performance on the control task.


Matching designs have the potential to provide a powerful test of the null hypothesis that groups, equated for a certain level of performance on one measure, do not differ reliably on another. This test is particularly informative when the variable used to equate groups directly controls for the “noncentral” constraints on experimental task performance. One of the difficulties with this approach, however, is knowing which of a range of potential matching variables best provides this kind of control. One approach we advocate is to match groups for performance on a task that is explicitly designed to share as many noncentral features of the key experimental task as possible and which is equally sensitive to variation in ability. However, researchers commonly adopt a more indirect form of matching, equating groups for age or mental age. This is problematic in autism research and in research with atypical groups in general because the unevenness of individuals’ cognitive profiles means that this necessarily leads to group differences on other aspects of ability. Even in the ideal situation where groups are closely matched on a number of measures, matched-design studies are fundamentally limited insofar as they provide little or no information about the associations between experimental and control variables.

These problems are circumvented somewhat by alternative approaches that make use of individual differences in performance and the association of background measures and task variables, such as the covariance and regression-based standardization techniques outlined above. These allow the researcher to map out the relationships among experimental task performance and multiple background measures, and subsequently to interpret the patterns of strengths and weaknesses of individuals with autism while simultaneously accounting for a number of potentially confounding factors. Few studies of autism have investigated and made use of individual differences information in this way. Nevertheless, as researchers become increasingly interested in the links between various domains of cognitive function in autism (cf. Jarrold, Butler, Cottington, & Jimenez, 2000), studies that take such an approach are likely to provide an important complement and counterpoint to matched-groups designs in future research with individuals with autism and related conditions.


Jon Brock is supported by a Charles J. Epstein Research Award from the National Down Syndrome Society of the United States.


Anderson, M. (1992). Intelligence and development: A cognitive theory. Oxford: Blackwell.

Anderson, M. (2001). Conceptions of intelligence. Journal of Child Psychology and Psychiatry, 42, 287–298.

Baddeley, A., & Gathercole, S. (1999). Individual differences in learning and memory: Psychometrics and the single case. In P. L. Ackerman, P. C. Kyllonen, & R. D. Roberts (Eds.), Learning and individual differences (pp. 31–50). Washington, DC: American Psychological Association.

Baddeley, A., Gathercole, S., & Papagno, C. (1998). The phonological loop as a language learning device. Psychological Review, 105, 158–173.

Baron-Cohen, S. (1995). Mindblindness: An essay on autism and theory of mind. Cambridge, MA: MIT Press/Bradford Books.

Bishop, D. V. M. (1997). Cognitive neuropsychology and developmental disorders: Uncomfortable bedfellows. Quarterly Journal of Experimental Psychology, 50A, 899–923.

Brock, J., McCormack, T., & Boucher, J. (2003). The relationship between phonological short-term memory and vocabulary knowledge in Williams syndrome.

Burack, J. A. (1990). Differentiating mental retardation: The twogroup approach and beyond. In R. M. Hodapp, J. A. Burack, & E. Zigler (Eds.), Issues in the developmental approach to mental retardation (pp. 27–48). Cambridge: Cambridge University Press.

Burack, J. A., Iarocci, G., Bowler, D., & Mottron, L. (2002). Benefits and pitfalls in the merging of disciplines: The example of developmental psychopathology and the study of persons with autism. Development and Psychopathology, 14, 225–237.

Chapman, R. S. (1997). Language development in children and adolescents with Down syndrome. Mental Retardation and Developmental Disabilities Research Reviews, 3, 307–312.

Charman, T. (1994). Brief report: An analysis of subject characteristics in research reported in the Journal of Autism and Developmental Disorders 1982–1991. Journal of Autism and Developmental Disorders, 24, 209–213.

Cronbach, L. J. (1957). The two disciplines of scientific psychology. American Psychologist, 12, 671–684.

Evans, S. H., & Anastasio, E. J. (1968). Misuse of analysis of covariance when treatment effect and covariate are confounded. Psychological Bulletin, 69, 225–234.

Fein, D., Lucci, D., Braverman, M., & Waterhouse, L. (1992). Comprehension of affect in context in children with pervasive developmental disorders. Journal of Child Psychology and Psychiatry, 33, 1157–1167.

Frith, U., Morton, J. A., & Leslie, A. M. (1991). The cognitive basis of a biological disorder: Autism. Trends in Neurosciences, 14, 433–438.

Happé, F. G. E. (1994). Wechsler I.Q. profile and theory of mind in autism: A research note. Journal of Child Psychology and Psychiatry, 35, 1461–1471.

Happé, F. G. E. (1995). The role of age and verbal ability in the theory of mind task performance of subjects with autism. Child Development, 66, 843–855.

Hobson, R. P. (1991). Methodological issues for experiments on autistic individuals’ perception and understanding of emotion. Journal of Child Psychology and Psychiatry, 32, 1135–1158.

Huitema, B. E. (1980). The analysis of covariance and alternatives. New York: Wiley.

Hulme, C., & Roodenrys, S. (1995). Practitioner review: Verbal working memory development and its disorders. Journal of Child Psychology and Psychiatry, 36, 373–398.

Jarrold, C., Boucher, J., & Russell, J. (1997). Language profiles in children with autism: Theoretical and methodological implications. Autism, 1, 57–76.

Jarrold, C., Baddeley, A. D., & Hewes, A. K. (1999). Genetically dissociated components of working memory: Evidence from Down’s and Williams syndrome. Neuropsychologia, 37, 637–651.

Jarrold, C., Baddeley, A., & Phillips, C. E. (2002). Verbal short-term memory in Down syndrome: A problem of memory, audition, or speech? Journal of Speech, Language, and Hearing Research, 45, 531–544.

Jarrold, C., Butler, D. W., Cottington, E. M., & Jimenez, F. (2000). Linking theory of mind and central coherence bias in autism and in the general population. Developmental Psychology, 26, 126–138.

Joseph, R. M., Tager-Flusberg, H., & Lord, C. (2002). Cognitive profiles and social-communicative functioning in children with autism spectrum disorder. Journal of Child Psychology and Psychiatry, 43, 807–821.

Kjelgaard, M. M., & Tager-Flusberg, H. (2001). An investigation of language impairment in autism: Implications for genetic subgroups. Language and Cognitive Processes, 16, 287–308.

Klein, B. P., & Mervis, C. B. (1999). Cognitive strengths and weaknesses of 9and 10-year-olds with Williams syndrome or Down syndrome. Developmental Neuropsychology, 16, 177–196.

Miller, G. A., & Chapman, J. R. (2001). Misunderstanding analysis of covariance. Journal of Abnormal Psychology, 110, 40–48.

Ozonoff, S., Pennington, B. F., & Rogers, S. J. (1990). Are there emotion perception deficits in young autistic children? Journal of Child Psychology and Psychiatry, 31, 343–361.

Phillips, C. E., Jarrold, C., Baddeley, A., Grant, J., & KarmiloffSmith, A. (2003). Comprehension of spatial language terms in Williams syndrome: Evidence for an interaction between domains of strength and weakness. Cortex.

Prior, M. R. (1979). Cognitive abilities and disabilities in infantile autism: A review. Journal of Abnormal Child Psychology, 7, 357–380.

Robinson, W. S. (1950). Ecological correlations and the behavior of individuals. American Sociological Review, 15, 351–357.

Ropar, D., & Mitchell, P. (2001). Susceptibility to illusions and performance on visuo-spatial tasks in individuals with autism. Journal of Child Psychology and Psychiatry, 42, 539–549.

Russell, J., Jarrold, C., & Henry, L. (1996). Working memory in children with autism and with moderate learning difficulties. Journal of Child Psychology and Psychiatry, 37, 673–686.

Spitz, H. H. (1982). Intellectual extremes, mental age, and the nature of human intelligence. Merrill-Palmer Quarterly, 28, 167–192.

Strauss, M. E. (2001). Demonstrating specific cognitive deficits: A psychometric perspective. Journal of Abnormal Psychology, 110, 6–14.

Thomas, M. S. C., Grant, J., Barham, Z., Gsödl, M. K., Laing, E., Lakusta, L., et al. (2001). Past tense formation in Williams syndrome. Language and Cognitive Processes, 16, 143–176.

Turner, M. (1997). Towards an executive dysfunction account of repetitive behavior in autism. In J. Russell (Ed.), Autism as an executive disorder (pp. 57–100). Oxford: Oxford University Press.

Weiss, B., Weisz, J. R., & Bromfield, R. (1986). Performance of retarded and nonretarded persons on information-processing tasks: Further tests of the similar structure hypothesis. Psychological Bulletin, 100, 157–175.