Joint attention difficulties in autistic adults: An interactive eye-tracking study

Caruana, N., Stieglitz Ham, H., Brock, J., Woolgar, A., Kloth, N., Palermo, R., & McArthur, G. (in press). Joint attention difficulties in autistic adults: An interactive eye-tracking study. Autism.

Preprint PDF downloadable here. See Nathan Caruana’s website for more studies using interactive eye-tracking to investigate joint attention and its neural correlates.

Joint attention – the ability to coordinate attention with a social partner – is critical for social communication, learning, and the regulation of interpersonal relationships. Infants and young children with autism demonstrate impairments in both initiating and responding to joint attention bids in naturalistic settings (e.g., Hobson & Hobson, 2007; Mundy, Sigman, & Kasari, 1994). However, little is known about joint attention abilities in adults with autism. Here, we tested 17 autistic adults and 17 age- and nonverbal IQ-matched controls using an interactive eye-tracking paradigm in which participants initiated and responded to joint attention bids with an on-screen avatar. Compared to control participants, autistic adults completed fewer trials successfully. They were also slower to respond to joint attention bids in the first block of testing but performed as well as controls in the second block. There were no group differences in responding to spatial cues on a non-social task with similar attention and oculomotor demands. These experimental results were mirrored in the subjective reports given by participants, with some commenting that they initially found it challenging to communicate using eye gaze, but were able to develop strategies that allowed them to achieve joint attention. Our study indicates that, for many autistic individuals, subtle difficulties using eye gaze information persist well into adulthood.

Joint attention is the ability to achieve a common focus of attention with another person during a social interaction, and is an important precursor to the development of language and social learning (Adamson, Bakeman, Deckner, & Romski, 2009; Baron-Cohen, 1995; Charman, 2003; Mundy, Sigman, & Kasari, 1990; Murray et al., 2008; Tomasello, 1995). In a joint attention episode, a person initiates a joint attention bid by intentionally guiding another person’s attention towards an object or event. Joint attention is achieved when the other person responds by following the instigator’s communicative bid (Bruinsma, Koegel, & Koegel, 2004) and usually involves mutual awareness of the shared experience (Emery, 2000).

Eye gaze is typically the first communicative modality that humans develop and use to experience joint attention with others, which is often accompanied in later life by language and pointing gestures (see Pfeiffer, Vogeley & Schilbach, 2013 for a review). In typical development, infants begin to use eye gaze to respond to and initate joint attention bids at approximately six months (Bakeman & Adamson, 1984; Scaife & Bruner, 1975) and twelve months of age respectively (Bates, Benigni, Bretherton, Csmaioni, & Volterra, 1979). However, in autistic children, responding to joint attention may not begin to emerge until cognitive development is equivalent to that of 30-36 months of typical development (Mundy et al., 1994), and impairments in initiating joint attention often persist well into adolescence (e.g. Charman, 2003; Hobson & Hobson, 2007; Mundy, Sigman, Ungerer, & Sherman, 1986; Sigman & Ruskin, 1999). Difficulties in initiating or responding to joint attention are reliable predictors of social communication and social interaction (Lord et al., 2000; Stone, Ousley, & Littleford, 1997) as well as significant predictors of future expressive language development in children on the autism spectrum (Charman, 2003; Dawson et al., 2004; Mundy et al., 1990).

To date, joint attention impairments in autism have mostly been investigated in observational studies of very young children in natural and semi-structured social interactions (Charman et al., 1997; Dawson et al., 2004; Loveland & Landry, 1986; Mundy et al., 1990; Osterling & Dawson, 1994; Osterling, Dawson, & Munson, 2002; Wong & Kasari, 2012). However, observational paradigms often lack sensitivity to joint attention difficulties that may affect older children and adults. Nor are they amenable to experimental manipulation that might provide insight into the cognitive or neural mechanisms that underlie joint attention impairments.

A separate group of studies have employed variations on the Posner cueing task to investigate the extent to which individuals reflexively orient their attention to gaze cues. These tasks require participants to respond to a target that is preceded by a gaze cue directing them towards the target or in the opposite direction (e.g., Friesen & Kingstone, 1998). The main outcome measure is the time taken to detect the target’s location. Such tasks provide a sensitive, standardised experimental manipulation of the mechanisms underlying the reflexive aspects of gaze processing. However, they fail to capture the truly interactive and intentional nature of joint attention. This may partly explain why studies using Posner cueing paradigms have failed to provide consistent evidence for impairments in gaze processing in autistic children and adults (see Leekam, 2015; Nation & Penny, 2008 for review).

The aim of the current study was to use a paradigm that minimised the limitations of observational and Posner cueing paradigms to better understand the joint attention behaviours of autistic adults. To this end, we employed a new interactive eye-tracking paradigm developed by Caruana, Brock and Woolgar (2015) in which participants played a cooperative game with an animated virtual character (avatar) whom they believed to be controlled by another person (cf. Bayliss et al., 2013; Courgeon, Rautureau, Martin, & Grynszpan, 2014; Kim & Mundy, 2012; Schilbach et al., 2010). The goal of the game was to catch a burglar who was hiding inside one of six houses displayed on a screen. Each trial began with a search phase in which both the participant and the avatar searched their allotted houses. Whomever found the burglar had to make their partner aware of its location by establishing eye contact and then gazing at the location of the burglar. This procedure created a social condition that (1) elicited intentional response-to-joint-attention (RJA) and initiating-joint-attention (IJA) behaviours, (2) informed participants of their social role (i.e., responder or initiator) throughout the course of each trial without overt instruction, and (3) required participants to use eye contact as a cue to identify joint attention opportunities. Performance on RJA and IJA trials was compared with performance in non-social conditions that presented the same task demands but did not require any social interaction (RJAc and IJAc).

Using a number of performance metrics, we investigated whether autistic participants performed the responding and initiating tasks as well as control (i.e., non-autistic) participants, and whether any group differences were specific to the social context or were also observed in the non-social control conditions. We also contrasted performance in the first versus the second block of testing to investigate the ability of participants in both groups to learn and adapt to the new task. This analysis aimed to determine whether autistic individuals were able to overcome any initial difficulties they may have had performing the task.


Ethical Statement

This study was approved by the Human Research Ethics Committee at Macquarie University (MQ; reference number: 5201200021) and ratified by the University of Western Australia (UWA). Participants received payment or course credit for their time and provided written consent before participating.


Eighteen autistic adults were tested at the University of Western Australia (UWA; Perth, Australia). All adults reported that they had been formally diagnosed with Autism or Asperger syndrome by a clinical psychologist in line with DSM-IV criteria (American Psychological Association, 2000). As such, they would automatically qualify for an Autism Spectrum Disorder under DSM-V (American Psychological Association, 2013). Participants also completed the Ritvo Autism Asperger Diagnostic Scale-Revised (RAADS-R; Ritvo et al., 2011). This is a self-reporting diagnostic measure that we used to provide a uniform diagnostic assessment. All but one participant (score = 60) scored above the diagnostic threshold of 65 on the RAADS-R. This participant was included in the final analyses as they scored close to threshold. The pattern of effects did not change if this participant was excluded.

Nonverbal IQ was assessed using the Matrices subtest of the Kaufman Brief Intelligence Test – Second Edition (KBIT-2; Kaufman & Kaufman, 2004). One participant was excluded because their nonverbal IQ score was below 85. This resulted in a final sample of 17 autistic adults (6 females). Their performance was compared to 17 control participants (6 females) with typical development who were tested at Macquarie University (MQ; Sydney, Australia). The two groups were matched on gender, age and nonverbal IQ. No control participant scored above threshold on the RAADS-R. Relative to the control group, autistic participants scored higher on the Autism Quotient (AQ; Baron-Cohen, 2003) and the Empathising Quotient (EQ; Wheelright et al., 2006), but not the Systemising Quotient (SQ; Wheelright et al., 2006) questionnaires. Demographic and questionnaire data for each group are shown in Table 1. All participants had normal vision, and reported no known history of acquired neurological impairment or injury.

Table 1. Participant details

Autism group   Control group   Statistics
M SD   M  SD
Age 26.47 11.86 26.43 14.53 t(32) =   0.01  p = .993
Nonverbal IQ 105.94 13.45 105.70 12.46 t(32) =   0.05 p = .958
RAADS-R 126.44 25.47 53.06 25.06 t(31) =   9.87 p < .001
AQ 27.81 11.08 9.24 6.31 t(31) =   5.96 p < .001
EQ 27.31 10.93 55.35 10.69 t(31) = – 7.45 p < .001
SQ 66.44 25.47 53.05 25.06 t(31) =   1.52 p = .139

Note. Nonverbal IQ scores were based on the standard score obtained using the KBIT-2 Matrices subtest (Kaufman & Kaufman, 2004). Total raw scores are reported for the Ritvo Autism Asperger Diagnostic Scale-Revised (RAADS-R; Ritvo et al., 2011), Autism Quotient (AQ; Baron-Cohen, 2003), Empathising Quotient (EQ), and Systemising Quotient (SQ; Wheelright et al., 2006).

Joint Attention Task

Social conditions (RJA and IJA). In the social conditions, participants played a cooperative game with an avatar that they believed represented the gaze behaviour of another person named “Alan” who was in a nearby eye-tracking laboratory. Alan was represented by a face, generated using FaceGen (Singular Inversions, 2008), that subtended 6.5 degrees of visual angle in the centre of the screen. His eyes could be directed either at the participant or towards one of the six houses that were presented on the screen (see Figures 1 and 2). The houses, which each subtended 4 degrees of visual angle, were arranged in two horizontal rows above and below the avatar. Participants were told that Alan could control the avatar’s gaze using live-infrared eye-tracking over a high-speed network. In reality, a gaze-contingent algorithm used the online recordings of the participant’s eye movements to program the avatar’s responsive behaviour (see Caruana et al., 2015, for a description of this algorithm and a video depicting example trials from each condition).

Figure 1. Experimental display showing the central avatar (“Alan”) and the six houses in which the burglar could be hiding. Gaze areas of interest (GAOIs), represented by blue rectangles, were not visible to participants.
Figure 2. Schematic representation of trial sequence by condition. The eye symbol represents the fixation required by the participant and was not visible to the participant. Analysis periods for each eye-tracking analysis are indicated in red cells.

Search phase. Each trial began with a search phase. During this period, the participant and the avatar (i.e., Alan) were required to search for the burglar. The participant searched houses with blue doors (e.g., bottom row in Figure 1) while the avatar searched houses with red doors (e.g., top row in Figure 1). Each time the participant fixated upon a blue door, it opened to reveal either an empty house or the burglar (Figure 2, first column). However, from the participant’s perspective, Alan’s doors remained closed as he completed his search. Participants were able to search their houses in any order they chose. On some trials, one or two blue doors were already open at the start of the trial, revealing an empty house. This ensured a different pattern of gaze behaviour on each trial made by the participant and provided a context for non-repetitive patterns of gaze behaviour made by the avatar.

RJA. On RJA trials, the participant opened all the blue doors to find them empty (Figure 2, row 1) and could thus infer that the burglar was hiding in one of Alan’s houses. Once the participant fixated back on the avatar’s face, he searched 0-2 more houses before establishing mutual gaze with the participant. This ensured that, for a brief interval, participants were required to monitor the avatar’s gaze behaviour to determine whether Alan was ready to initiate a joint attention bid. We randomised the location of the house that the avatar searched last to ensure that it was not predictive of the burglar’s location. Provided that the participant was still looking at the avatar when the avatar returned eye contact, the avatar then directed his gaze to the burglar’s location. The participant was required to follow the avatar’s gaze and fixate on the same location. We refer to this eye movement as a “responding saccade”.

IJA. On IJA trials, the participant found the burglar behind one of the blue doors (Figure 2, row 3). The relevant blue door “closed” to conceal the burglar once the participant fixated back on the avatar’s face. Again, the avatar searched 0-2 more houses before making eye contact with the participant. Once eye contact was established, participants were required to initiate joint attention by fixating on the blue door that concealed the burglar. We refer to this eye movement as an “initiating saccade”. The avatar’s gaze was programmed so that it always followed the participant’s initiating saccade even if the participant fixated on the incorrect house. The avatar only responded to a participant’s initiating saccade after eye contact was established.

Feedback. For “correct” RJA and IJA trials, the burglar appeared behind bars to indicate that the participant and Alan had succeeded in achieving joint attention to capture the burglar (e.g., Figure 2, seventh column). On “incorrect” trials, the burglar appeared in red at its true location to provide negative feedback. This occurred if participants (1) spent more than three seconds fixated on the background (i.e., away from the houses or the avatar stimulus), (2) took longer than three seconds to execute a responding or initiating saccade after being guided (RJA trials) or establishing eye contact (IJA trials), or (3) made a responding or initiating saccade to an incorrect location. If participants took longer than three seconds to begin searching their houses at the beginning of the trial, red text reading “Failed Search” appeared on the screen and the trial was terminated.

Non-social conditions (RJAc and IJAc). We developed two non-social conditions to control for the non-social task demands involved when responding to (RJAc; Figure 2, second row) and initiating (IJAc; Figure 2, fourth row) joint attention bids in the social conditions (i.e. to control for task complexity, attentional load, and number of eye movements required). The only differences between the non-social and social conditions were that (1) the avatar’s eyes remained closed throughout, (2) a grey fixation point was presented over the avatar’s nose until the participant completed their search and fixated upon it, (3) the fixation point turned green when fixated (analogous to the avatar making eye contact), and (4) on RJAc trials, a green arrow subtending three degrees of visual angle cued the burglar’s location (analogous to the gaze cue on RJA trials).


To ensure the testing environments of the two sites (UWA and MQ) were matched as closely as possible, we ensured that the testing rooms were similar in size, had no windows, and that the experimenter was positioned behind the participant during testing. The same experimenter (NC) conducted every testing session at each site, and all participants were provided with the same instructions (see Supplementary Material 1). Stimuli at both testing sites were presented at the same visual angle and eye movements were recoded using identical eye-tracking systems and recording parameters (described below).

Eye-tracking. Eye-movements were recorded with a sampling rate of 500Hz from the right eye using a desktop-mounted EyeLink 1000 Remote Eye-Tracking System (SR Research Ltd., Ontario, Canada). A chinrest was used to stabilise head movements and standardise viewing distance. We conducted a 9-point eye-tracking calibration and validation at the beginning of each block. Seven gaze-related areas of interest (GAOI) were used by our gaze-contingent algorithm (depicted by blue rectangles in Figure 1). A GAOI covered each of the six houses and the avatar. Eye movements were monitored online and recalibration was conducted on trials where the participant made at least two consecutive fixations on the borders or outside the GAOIs. The trials requiring recalibration were excluded from all analyses. On average this accounted for 0.87% of trials from the autism group (SD = 1.14) and 0.05% of trials from the control group (SD = 0.22).

Joint attention task. The task was presented using Experiment Builder 1.10.165 (SR Research, 2004). At the beginning of the experiment, a scripted set of instructions was read aloud to the participant, and a series of cue cards were used to provide a schematic representation of the interactive eye-tracking interface (see Supplementary Materials 1). Participants then completed two blocks of trials (Block 1 and Block 2). Each block comprised 27 trials from each condition (i.e., RJA, RJAc, IJA, IJAc). Social (RJA, IJA) and non-social (RJAc, IJAc) trials were presented in clusters of six trials throughout each block. Each cluster began with a 1000 millisecond (ms) cue presented in the centre of the screen that read “Together” for a social cluster and “Alone” for a non-social cluster of trials. The randomisation of trial order within and across clusters was constrained to ensure that, within each block, conditions were matched on (1) the frequency that the burglar appeared in each location, (2) the number and location of houses that required searching on each trial, and (3) the number of gaze shifts made by the avatar before establishing eye contact.

There were four trial-order protocols that could be completed on each block. Two required the participant to search the upper row of houses (upper blocks), and two required the participant to search the lower row of houses (lower blocks). For each pair of protocols, one began with a social cluster of trials (RJA, IJA) and the other began with a non-social cluster of trials (RJAc, IJAc). Each participant completed only one upper and one lower protocol. Protocol and cluster order were counterbalanced across participants, and matched between the autism and control groups. Participants were not offered any opportunity for practice so that learning effects between blocks could be examined.

Subjective ratings. At the end of testing, we interviewed participants about their subjective experience of the task to determine whether they were convinced that they were interacting with a real person. During the interview, participants rated their subjective experience of the task across a number of dimensions (see Supplementary Material 2 for procedural details and results).


Responding to joint attention (RJA, RJAc). We measured accuracy as the proportion of trials (excluding trials that required eye-tracking recalibration or where the participant failed to complete their search) where the participant succeeded in catching the burglar. For correct trials, we also measured saccadic reaction time, which was the latency (in ms) between the presentation of the orienting cue (gaze for RJA, arrow for RJAc) and the onset of the responding saccade that resulted in a fixation at the correct burglar location (see Figure 2, Analysis Period A).

Initiating joint attention (IJA, IJAc). In addition to trial accuracy, we derived two measures of participants’ use of eye contact. Target dwell time was the total amount of time (in ms) between finding the burglar and saccading back to the avatar’s face (see Figure 2, Analysis Period B). It is analogous to the saccadic reaction time (RJA, RJAc) insofar as it represents the time between the participant learning of the burglar’s location and making the next appropriate saccade. Premature initiating saccades was the proportion of trials on which participants made a saccade from the avatar to the burglar location before he had established eye contact (IJA) or the fixation point had turned green (IJAc; see Figure 2, Analysis Period C).

Statistical Analyses

Joint attention task. For each measure, we conducted an analysis of variance (ANOVA) using the ezANOVA (ez) package in R (Lawrence, 2013), reporting the generalised eta squared ( ) measure of effect size. Group (autism versus control) was the between-subjects factor, and condition (social versus non-social) and block (Block 1 versus Block 2) were within-subjects factors. Significant interactions were followed-up with ANOVAs and Welch’s two sample unequal variances t-tests as appropriate (Welch, 1947). Full details for these analyses, including syntax and data screening, can be found in Supplementary Material 3. For reaction time measures (i.e. saccadic reaction time and target dwell time) we report analyses of the mean reaction time, having excluded trials with dwell times less than 150 ms or more than 3000 ms (as trials timed out after 3000 ms in the RJA condition). We also re-analysed all reaction time data taking the median of the untrimmed data (see Supplementary Material 3). This did not change the pattern of effects for any of the analyses.


For each analysis, we report the main effects of condition and group to determine whether there were behavioural differences between the social and non-social conditions, and between autistic and control participants, respectively. In addition, we report the group*condition and group*condition*block interaction effects since we were primarily interested in exploring whether differences between autistic and control participants were specific to the social conditions, and whether these changed with increased exposure to the task (see Supplementary Material 3 for a full summary of the ANOVA output).

Responding to Joint Attention

Accuracy data are shown in Figure 3A. Participants made significantly more errors on RJA trials than RJAc trials [F(1,32) = 7.04, p = .012,  = 0.06]. Autistic adults made significantly more errors than control participants [F(1,32) = 9.04, p = .005,  = 0.13]. Importantly, we found a significant group*condition interaction [F(1,32) = 5.60, p = .024,  = 0.04]. Posthoc tests revealed that autistic adults made significantly more errors than the control group on RJA trials [t(16.97) = -3.08, p = .007], but not on RJAc trials [t(26.49) = -1.58, p = .127]. There was no significant group*condition*block effect [F(1,32) = 0.50, p = .485,  < 0.01].

Figure 3. Box plots displaying (A) average proportion of correct trials, and (B) average saccadic reaction times in RJA and RJAc conditions, separated by block and group. Data points represent individual participant means.

Saccadic reaction time data are presented in Figure 3B. Participants were significantly slower to respond on RJA trials than RJAc trials [F(1,32) = 73.65, p < .005,  = 0.33]. The main effect of group [F(1,32) = 3.96, p = .055,  = 0.07] and the group*condition interaction [F(1,32) = 3.57, p = .068,  = 0.02] failed to reach significance. However, there was a significant group*condition*block interaction [F(1,32) = 4.65, p = .039,  = 0.13] indicating different group*condition effects in the two blocks. For Block 1, there was a significant group*condition interaction [F(1,32) = 4.96, p = .033,  = 0.05] with the autistic participants being significantly slower than controls to respond on RJA trials [t(19.41) = 2.36, p = .029], but not on RJAc trials [t(20.45) = 1.25, p = .226]. For Block 2, there was no significant group by condition interaction [F(1,32) = 0.37, p = .546,  < 0.01] and no significant group effect for either RJA [t(28.86) = 1.12, p = .272] or RJAc [t(19.65) = 0.87, p = .392].

Initiating Joint Attention

Accuracy data for IJA and IJAc are shown in Figure 4A. There was no significant effect of condition [F(1,32) = 2.61, p = .116,  = 0.02]. Autistic adults made significantly more errors than controls [F(1,32) = 6.38, p = .017,  = 0.07]. We tested the effect of group in both conditions separately and found that autistic adults made more errors than controls on both IJA [t(30.58) = 2.45, p = .020] and IJAc trials [t(16.65) = 2.33, p = .033]. However, as depicted in Figure 4A, the majority of autistic participants performed at ceiling, and so these differences are largely driven by three individuals in the sample. There was no group*condition interaction [F(1,32) = 3.54, p = .069,  = 0.02] and no group*condition*block interaction [F(1,32) = 0.88, p = .355,  < 0.01].

Figure 4.  Box plots displaying (A) average proportion of correct trials in the IJA and IJAc conditions, (B) average dwell times on the burglar before establishing eye contact (IJA) or looking back at the fixation point (IJAc), and (C) average proportion of trials containing a saccade from the central stimulus to the burglar before the avatar made eye contact (IJA) or the fixation point turned green (IJAc), separated by block and group. Data points represent individual participant means.

Target dwell time data are presented in Figure 4B. Participants spent significantly more time fixated on the burglar before establishing eye contact on IJA trials relative to analogous eye movements on IJAc trials [F(1,32) = 10.68, p = .003,  = 0.04]. There was no main effect of group [F(1,32) = 2.06, p = .161,  = 0.05], and no group*condition [F(1,32) = 2.79, p = .104,  = 0.01] or group*condition*block interactions [F(1,32) = 1.99, p = .168,  < 0.03].

Data for prematurely initiated saccades are presented in Figure 4C. Participants made significantly more premature attempts at initiating joint attention on IJA trials relative to analogous eye movements on IJAc trials [F(1,32) = 20.76, p < .005,  = 0.14]. There was no significant main effect of group [F(1,32) = 1.02, p = .321,  = 0.02], no group*condition interaction [F(1,32) = 0.52, p = .478,  < 0.01] and no group*condition*block interaction [F(1,32) = 0.64, p = .429,  < 0.01].


Difficulty establishing joint attention is a cardinal feature of autism (American Psychiatric Association, 2013). However little is known about joint attention abilities in older children or adults, most likely due to a lack of sensitive and ecologically-valid experimental paradigms. In the current study, we addressed this issue using a novel interactive eye-tracking paradigm and provide the first evidence that joint attention impairments also affect autistic adults.

Responding to Joint Attention Bids

Compared to controls, autistic adults were less accurate at responding to the joint attention bid of an avatar. They also responded more slowly during the first block of testing. However, the autistic participants showed a significant improvement in response speed and, by the second block, were indistinguishable from control participants. Importantly, these group differences were specific to the social (RJA) condition: Autistic and non-autistic individuals did not differ in their responses to arrow cues in the non-social (RJAc) condition. Thus, the reduced and delayed ability to respond to joint attention exhibited by autistic participants cannot be explained by differences in oculomotor control, attention orienting, or executive function demands, which were equivalent in the RJA and RJAc conditions. Instead, the interaction between group and condition indicates that the difficulties of participants in the autism group were specific to the condition involving eye gaze cues.

One possible explanation for the difference between groups may be that autistic individuals have different sensitivities to the low-level perceptual properties that unavoidably differ between gaze cues and non-social arrow cues. However, existing empirical studies of autistic children and adults show little evidence of specific difficulties in processing eye gaze as compared to non-social attention cues (see Leekam, 2015; Nation & Penny, 2008). Importantly, there is a key difference between our RJA condition and conventional gaze-following tasks. In our study, the virtual partner made multiple eye-movements during each trial and participants had to differentiate eye-movements that were preceded by eye contact and thus signalled the intent to initiate joint attention, from eye-movements that were merely a continuation of their partner’s search (cf. Caruana, McArthur, Woolgar & Brock, 2016). This “intention monitoring” component is an important feature of real-life gaze-based interactions (Cary, 1978) but is absent from more conventional measures of gaze processing in which a single unambiguous gaze shift is presented on each trial. Poor performance in our RJA condition may, therefore, reflect difficulties determining the social relevance of different eye-gaze cues rather than a deficit in eye gaze processing per se.

Following this interpretation, our findings are consistent with the idea that joint attention impairments in autism reflect a difficulty in evaluating the meaning of particular gaze cues (i.e., what they tell us about the perspectives and intentions of others) rather than an inability to effectively discriminate and orient to gaze cues (Baron-Cohen, 1995; Senju & Johnson, 2009). They are also consistent with evidence that autistic individuals are less effective in using eye contact to understand the goals and actions of others (Phillips, Baron-Cohen, & Rutter, 1992) or to assess the relevance of an upcoming gaze shift (Böckler, Timmermans, Sebanz, Vogeley, & Schilbach, 2014).

Initiating Joint Attention Bids

On average, autistic participants made more errors than control participants in the initiating conditions. However, in contrast to the responding conditions, group differences were not specific to the social version of the task, but were evident for both IJA and IJAc. This finding indicates a difficulty with one or more of the task components that were common to both conditions, such as oculomotor control, attentional demands, or the requirement to remember the burglar’s location (recall that the burglar disappeared once the participant made a saccade back to the avatar). That said, it is important to note that the majority of participants in both groups performed at or close to ceiling in terms of successful trial completion (see Figure 4A). Our accuracy measure may, therefore, have lacked sensitivity to detect subtle group differences. However, we also considered two eye-tracking measures of how participants were completing the task – the length of time between finding the burglar and saccading back to the avatar; and the number of premature saccades. Again, there were no significant group differences, despite much greater individual variation.

These findings are at odds with previous studies of joint attention which suggest that IJA difficulties, unlike RJA difficulties, tend to persist into later development (Mundy et al., 1994). It has been suggested that IJA impairments in autism may be related to a reduced motivation to engage in social interactions (Chevallier, Kohls, Troiani, Brodkin & Shults, 2012). This idea is consistent with neuroimaging studies which associate the achievement of joint attention, following IJA behaviour, with activation in the ventral striatum, a region associated with social reward processing (Schilbach et al. 2010). It is possible that IJA difficulties were not observed in the current study because IJA behaviours were externally motivated by the goals defined by the task, rather than the participant’s intrinsic motivation to share a social experience with another person.

There were some interesting differences between the IJA and IJAc conditions that were common to both groups of participants. First, having found the burglar during the search phase, participants took longer to saccade back towards the avatar’s eyes in the IJA condition than the central fixation point in the IJAc condition. Second, they were more likely to make a premature guiding saccade to capture the burglar in the IJA condition than they were to make analogous saccades in the IJAc condition. Both findings were unexpected and may reflect the fact that participants expect a certain degree of flexibility from a human partner that they know not to expect from a computer. That is, participants may have expected Alan to follow their guiding gaze even when they did not wait to intentionally establish eye contact. Future studies could test this explanation by investigating whether participants are faster to establish eye contact, and make fewer premature initiating saccades, when they believe a virtual character is controlled by a computer rather than a human.

Furthermore, the fact that most participants attempted to initiate joint attention before establishing eye contact raises questions about the phenomenology of optimal joint attention behaviour, and how it ought to be measured and assessed. Specifically, it calls into question whether establishing eye contact should be considered a mandatory aspect of initiating joint attention, or simply an adaptive behaviour that may facilitate the achievement of joint attention under certain conditions. Naturalistic studies of genuine face-to-face interactions are needed to better characterise the role of eye contact during successful joint attention experiences between adults with typical development. This work will also inform the design and future implementation of joint attention paradigms.

Subjective Experiences

At the end of testing, we interviewed participants about their subjective experience of the task. Only two participants, both in the autism group, claimed to have suspected that Alan was not real. However, prior to being told that Alan was computer-controlled, neither participant had given any indication that the deception had been unsuccessful. For instance, when asked whether they preferred completing the task with Alan or on their own, one participant commented “Together…more accurate because you can see the other person’s perspective.”

The comments made by autistic individuals during the debriefing session also provide some intriguing insights into the difficulties they faced while completing the task. Six autistic adults explicitly stated that they found it challenging to complete a task that required them to establish and use eye contact. Different individuals commented that (1) “The eyes were harder to figure out”; (2) “Alone [i.e. non-social condition] was easier to complete because you didn’t have to catch his eye to tell him where to go”, and (3) “When they [eyes] were closed I didn’t have to worry about him and what he wants. Didn’t have to have the patience to wait for him.” (4) “I felt a bit anxious during the together task [i.e. social condition]. The alone task [i.e. non-social condition] was easier because it was clear what the dot and arrow meant […] I don’t normally look at peoples’ eyes […] In the game I had to look at the eyes […] Then I thought, “Why are eyes harder than arrows?” So I decided to treat the eyes like arrows.” None of the control participants reported difficulties processing the eyes.

These comments demonstrate an awareness on the part of many autistic adults that establishing eye contact and using gaze as a communicative technique was challenging for them. This is consistent with a larger body of literature suggesting that autistic individuals find it difficult to use eye contact to understand others and regulate social interactions (e.g., Pelphrey, Shultz, Hudac, & Vander Wyk, 2011; Senju & Johnson, 2009). Specifically, this difficulty in establishing eye contact may also explain why some autistic adults demonstrated markedly more premature saccades and took longer to establish eye contact with the avatar on IJA trials. For example, one participant spent up to 30 seconds fixating on the burglar before establishing eye contact on IJA trials (median 3 seconds). Another spent up to 6 seconds fixating on the burglar (median 2 seconds), and made premature saccades on 81% of IJA trials. Whilst this was not representative of the entire autism group, this reluctance to establish eye contact could hinder the achievement of joint attention during the fast-paced social interactions of real life. Further investigation is needed to elucidate the factors that contribute to the interindividual variation in joint attention behaviour and experiences for autistic individuals. For instance, one focus for future work could be to investigate the relationship between individual differences in social anxiety and joint attention behaviour (Kuusikko et al., 2008).

Some autistic participants also indicated that, while they preferred to complete the task alone than with Alan, they also preferred the virtual interaction over real-life face-to-face interactions. They indicated that the computer interface provided a less anxiety-provoking social interaction: “I don’t like dealing with people so this was better. Feels like you’re socialising, but not. Feels more relaxed.” Another autistic participant preferred real-life interactions, but only if eye contact could be avoided: “[Virtual interface] makes it more comfortable . . .“I am an ‘audio’ person. I like to ask things if they’re not clear. So I would prefer real life. Not face-to-face, but side-by-side”. Others noted that the virtual interface allowed them to focus on specific aspects of their social interaction without being overwhelmed by multiple cues: (1)“Easier to segment the task and interaction …only focus on one thing”, (2) “I can interact but don’t have too many things to think about”.


To our knowledge, this is the first study designed to use an ecologically-valid, objective, quantified, and experimentally-controlled measure to test the ability to respond to and initiate joint attention bids in autism. Our data indicates that autistic adults experience significant difficulties in responding to joint attention bids. Some autistic individuals also experienced difficulties in initiating joint attention but this was not consistent across our entire sample. These findings encourage further work investigating the individual characteristics that may account for the heterogeneity of joint attention abilities in autism. In particular, there is a need for further studies that apply these paradigms across larger samples, with individuals across the autism spectrum, and at different stages of development. Ideally, future studies would obtain additional detailed measures of individuals’ social functioning in daily life situations in order to determine which (if any) aspects of daily social functioning are associated with joint attention impairments. Virtual interaction paradigms could also be used in neuroimaging studies to investigate the neural correlates of atypical joint attention (cf. Caruana et al., 2015).

This study also highlights the potential for interactive computer-based tasks to guide the training of social information processing and communication skills among individuals with autism. Preliminary findings using virtual reality video games, albeit from a third person perspective, have already revealed promising gains in social cognitive skills in autistic children (Didehbani, Allen, Kandalaft, Krawczyak, & Chapman, 2016). Tasks like ours, which support real-time social interaction, may be used to identify the precise aspects of face-to-face social interactions that autistic people find difficult, and provide strategies that are likely to make social communication more effective and possibly less stressful. The minimal “social” environment of a computer interface may allow individuals to become gradually accustomed to one aspect of social communication at a time (e.g., eye gaze) whilst other aspects of the social interaction and environment are controlled. This could provide a less perceptually-overwhelming context for autistic individuals to develop their skills in social information processing and communication. This approach has the potential to make social interactions more pleasant and less-intimidating for autistic individuals, whilst improving opportunities for social learning, language acquisition, and fostering the development of social relationships (Adamson et al., 2009; Baron-Cohen, 1995; Charman, 2003; Mundy et al., 1990; Murray et al., 2008; Tomasello, 1995).


Adamson, L. B., Bakeman, R., Deckner, D. F., & Romski, M. A. (2009). Joint engagement and the emergence of language in children with autism and Down syndrome. Journal of Autism & Developmental Disorders, 39(1), 84-96. doi: 10.1007/s10803-008-0601-7

American Psychiatric Association. (2000). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC

American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). Washington, DC

Bakeman, R., & Adamson, L. B. (1984). Coordinating attention to people and objects in mother-infant and peer-infant interactions. Child Development, 55(4), 1278. doi: 10.1111/1467-8624.ep7302943

Baron-Cohen, S. (1995). Mindblindness. Cambridge, MA: MIT Press.

Baron-Cohen, S. (2008). Autism, hypersystemizing, and truth. Quarterly Journal of Experimental Psychology, 61(1), 64-75. doi: 10.1080/17470210701508749

Bates, E., Benigni, L., Bretherton, I., Csmaioni, L., & Volterra, V. (1979). The emergence of symbols: cognition and communication in infancy. New York: Academic Press.

Bayliss, A. P., Murphy, E., Naughtin, C. K., Kritikos, A., Schilbach, L., & Becker, S. I. (2013). Gaze leading: Initiating simulated joint attention influences eye movements and choice behavior. Journal of Experimental Psychology. General, 142(1), 76-92.

Böckler, A., Timmermans, B., Sebanz, N., Vogeley, K., & Schilbach, L. (2014). Effects of observing eye contact on gaze following in high-functioning autism. Journal of Autism & Developmental Disorders, 44(7), 1651-1658. doi: 10.1007/s10803-014-2038-5

Brock, J. (2011). Commentary: Complementary approaches to the developmental cognitive neuroscience of autism – reflections on Pelphrey et al. (2011). Journal of Child Psychology and Psychiatry, 52(6), 645-646. doi: 10.1111/j.1469-7610.2011.02414.x

Bruinsma, Y., Koegel, R. L., & Koegel, L. K. (2004). Joint attention and children with autism: A review of the literature. Mental Retardation & Developmental Disabilities Research Reviews, 10(3), 169-175. doi: 10.1002/mrdd.20036

Caruana, N., Brock, J., & Woolgar, A. (2015). A frontotemporoparietal network common to initiating and responding to joint attention bids. NeuroImage, 108, 34-46. doi: 10.1016/j.neuroimage.2014.12.041

Caruana N, McArthur G, Woolgar A, Brock J. (2016) Detecting communicative intent in a computerised test of joint attention. PeerJ Preprints 4:e2410v1

Cary, M. S. (1978). The role of gaze in the initiation of conversation. Social Psychology, 41(3), 269-271. doi: 10.2307/3033565

Charman, T. (2003). Why is joint attention a pivotal skill in autism? Philosophical Transactions Royal Society London Biological Sciences, 358, 315-324. doi: 10.1098/rstb.2002.1199

Charman, T., Swettenham, J., Baron-Cohen, S., Cox, A., Baird, G., & Drew, A. (1997). Infants with autism: An investigation of empathy, pretend play, joint attention, and imitation. Developmental Psychology, 33(5), 781-789. doi: 10.1037/0012-1649.33.5.781

Chawarska, K., Klin, A., & Volkmar, F. (2003). Automatic attention cueing through eye movement in 2-year-old children with autism. Child Development, 74(4), 1108-1122.

Chevallier, C., Kohls, G., Troiani, V., Brodkin, E. S. & Schultz, R. T. (2012). The social motivation theory of autism. Trends in Cognitive Science, 16, 231–239.

Courgeon, M., Rautureau, G., Martin, J.-C., & Grynszpan, O. (2014). Joint Attention Simulation Using Eye-Tracking and Virtual Humans. IEEE Transactions on Affective Computing, 5(3), 238 250.

Dawson, G., Toth, K., Abbott, R., Osterling, J., Munson, J. A., Estes, A., & Liaw, J. (2004). Early social attention impairments in autism: Social orienting, joint attention, and attention to distress. Developmental Psychology, 40(2), 271-283.

Didehbani, N., Allen, T., Kandalaft, M., Krawczyk, D., & Chapman, S. (2016). Virtual Reality Social Cognition Training for children with high functioning autism. Computers in Human Behavior, 62, 703-711. doi:

Friesen, C., & Kingstone, A. (1998). The eyes have it! Reflexive orienting is triggered by nonpredictive gaze. Psychonomic Bulletin & Revioew, 5(3), 490-495. doi:10.3758/bf03208827

Frischen, A., Bayliss, A. P., & Tipper, S. P. (2007). Gaze cueing of attention: visual attention, social cognition, and individual differences. Psychological Bulletin, 133(4), 694-724. doi: 10.1037/0033-2909.133.4.694

Hart, S. G. (2006). NASA-Task Load Index (NASA-TLX); 20 Years Later. Paper presented at the Human Factors and Ergonomics Society 50th Annual Meeting, Santa Monica: HFES.

Hobson, J., & Hobson, P. (2007). Identification: The missing link between joint attention and imitation? Development and Psychopathology, 19(02), 411-431. doi: doi:10.1017/S0954579407070204

Johnson, M. H., Griffin, R., Csibra, G., Halit, H., Farroni, T., De Haan, M., Tucker, L. A., Baron-Cohen, S., & Richards, J. (2005). The emergence of the social brain network: Evidence from typical and atypical development. Development and Psychopathology, 17(03), 599-619. doi: doi:10.1017/S0954579405050297

Kaufman, A. S., & Kaufman, N. L. (2004). Kaufman Brief Intelligence Test, Second Edition. Bloomington, MN: Pearson, Inc.

Kim, K., & Mundy, P. (2012). Joint attention, social-cognition, and recognition memory in adults. Frontiers in Human Neuroscience, 6,

Kuusikko, S., Pollock-Wurman, R., Jussila, K. et al. (2008). Journal of Autism and Developmental Disorders, 38, 1697. doi:10.1007/s10803-008-0555-9

Kylliainen, A., & Hietanen, J. K. (2004). Attention orienting by another’s gaze direction in children with autism. Journal of Child Psychology and Psychiatry, 45(3), 435-444. doi: 10.1111/j.1469-7610.2004.00235.x

Lawrence MA (2013). ez: Easy analysis and visualization of factorial experiments. R package version 4.2-2, URL

Leekam, S. (2015). Social cognitive impairment and autism: what are we trying to explain? Philosophical Transactions of the Royal Society of London B: Biological Sciences, 371(1686).

Lord, C., Risi, S., Lambrecht, L., Cook, E.H., Leventhal, B.L., DiLavore, P.C., Pickles, A., Rutter, M., 2000. The autism diagnostic observation schedule−generic: a standard measure of social and communication deficits associated with the spectrum of autism. Journal of Autism and Developmental Disorders, 30(3), 205–223. doi: 10.1023/A:1005592401947

Loveland, K. A., & Landry, S. H. (1986). Joint attention and language in autism and developmental language delay. Journal of Autism & Developmental Disorders, 16(3), 335-349. doi: 10.1007/BF01531663

Mundy, P., Sigman, M., & Kasari, C. (1990). A longitudinal study of joint attention and language development in autistic children. Journal of Autism and Developmental Disorders, 20(1), 115-128. doi: 10.1007/bf02206861

Mundy, P., Sigman, M., & Kasari, C. (1994). Joint attention, developmental level, and symptom presentation in autism. Development and Psychopathology, 6(03), 389-401. doi: doi:10.1017/S0954579400006003

Mundy, P., Sigman, M., Ungerer, J., & Sherman, T. (1986). Defining the social deficits of autism: The contribution of non-verbal communication measures. Journal of Child Psychology and Psychiatry, 27(5), 657-669. doi: 10.1111/j.1469-7610.1986.tb00190.x

Murray, D. S., Creaghead, N. A., Manning-Courtney, P., Shear, P. K., Bean, J., & Prendeville, J. A. (2008). The relationship between joint attention and language in children with autism spectrum disorders. Focus on Autism & Other Developmental Disabilities, 23(1), 5-14. doi: 10.1177/1088357607311443

Nation, K., & Penny, S. (2008). Sensitivity to eye gaze in autism: Is it normal? Is it automatic? Is it social? Development and Psychopathology, 20(01), 79-97. doi: 10.1017/S0954579408000047

Okada, T., Sato, W., Murai, T., Kubota, Y., & Toichi, M. (2003). Eye gaze triggers visuospatial attentional shift in individuals with autism. Psychologia, 46(4), 246-254. doi: 10.2117/psysoc.2003.246

Osterling, J., & Dawson, G. (1994). Early recognition of children with autism: A study of first birthday home videotapes. Journal of Autism and Developmental Disorders, 24(3), 247-257. doi: 10.1007/bf02172225

Osterling, J., Dawson, G., & Munson, J. A. (2002). Early recognition of 1-year-old infants with autism spectrum disorder versus mental retardation. Development and Psychopathology, 14(02), 239-251. doi: doi:10.1017/S0954579402002031

Pelphrey, K. A., Shultz, S., Hudac, C. M., & Vander Wyk, B. C. (2011). Research Review: Constraining heterogeneity: the social brain and its development in autism spectrum disorder. Journal of Child Psychology and Psychiatry, 52(6), 631-644. doi: 10.1111/j.1469-7610.2010.02349.x

Pfeiffer, U. J., Vogeley, K., & Schilbach, L. (2013). From gaze cueing to dual eyetracking: novel approaches to investigate the neural correlates of gaze in social interaction. Neuroscience and Biobehavioural Reviews, 37(10), 2516-2528. doi:10.1016/j.neubiorev.2013.07.017

Phillips, W., Baron-Cohen, S., & Rutter, M. (1992). The role of eye contact in goal detection: Evidence from normal infants and children with autism or mental handicap. Development and Psychopathology, 4(03), 375-383. doi: doi:10.1017/S0954579400000845

Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32(1), 3 – 25.

Ristic, J., Mottron, L., Friesen, C. K., Iarocci, G., Burack, J. A., & Kingstone, A. (2005). Eyes are special but not for everyone: The case of autism. Cognitive Brain Research, 24(3), 715-718. doi: 10.1016/j.cogbrainres.2005.02.007

Ritvo, R. A., et al. (2011). The Ritvo Autism Asperger Diagnostic Scale-Revised (RAADS-R): A Scale to Assist the Diagnosis of Autism Spectrum Disorder in Adults: An International Validation Study. J. Autism Dev. Disord., 41, 1076-1089.

Scaife, M., & Bruner, J. S. (1975). The capacity for joint visual attention in the infant. Nature, 253(5489), 265-266. doi:10.1038/253265a0

Schilbach, L., Wilms, M., Eickhoff, S. B., Romanzetti, S., Tepest, R., Bente, G., … Vogeley, K. (2010). Minds made for sharing: Initiating joint attention recruits reward-related neurocircuitry. Journal of Cognitive Neuroscience, 22(12), 2702-2715.

Senju, A., & Johnson, M. H. (2009). The eye contact effect: mechanisms and development. Trends in Cognitive Sciences, 13(3), 127-134. doi:

Sigman, M., & Ruskin, E. (1999). Continuity and change in the social competence of children with autism, Down syndrome and developmental delays. Monographs of the Society for Research in Child Development, 64(1), 1-114. doi: 10.1111/1540-5834.00002

SR Research. (2004). Experiment Builder (Version 1.10.165). Ontario.

Stone, W. L., Ousley, O. Y., & Littleford, C. D. (1997). Motor imitation in young children with autism: what’s the object? Journal of Abnormal Child Psychology, 25, 475-485. doi: 10.1023/A:1022685731726

Svensson, E. (2001). Guidelines to statistical evaluation of data from rating scales and questionnaires. Journal of Rehabilitation Medicine, 33(1), 47-48.

Tomasello, M. (1995). Joint attention as social cognition. In C. Moore & P. J. Dunham (Eds.), Joint Attention: Its Origins and Role in Development. Hillsdale: Lawrence Erlbaum Associates

Vlamings, P. H. J. M., Stauder, J. E. A., van Son, I. A. M., Mottron, & Laurent. (2005). Atypical visual orienting to gaze- and arrow-cues in adults with high functioning autism. Journal of Autism and Developmental Disorders, 35(3), 267-277. doi: 10.1007/s10803-005-3289-y

Welch, B. L. (1947). The generalization of ‘student’s’ problem when several different population variances are involved. Biometrika, 34, 28-35.

Wheelwright, S., Baron-Cohen, S., Goldenfeld, N., Delaney, J., Fine, D., Smith, R., Weil, L., &Wakabayashi, A. (2006). Predicting Autism Spectrum Quotient (AQ) from the Systemizing Quotient-Revised (SQ-R) and Empathy Quotient (EQ). Brain Research, 1079(1), 47-56. doi:

Wong, C., & Kasari, C. (2012). Play and joint attention of children with autism in the preschool special education classroom. Journal of Autism and Developmental Disorders, 42(10), 2152-2161. doi: 10.1007/s10803-012-1467-2