The hardest butter to button: immediate context effects in spoken word identification

Brock, J. & Nation, K. (2014). The hardest butter to button: immediate effects of sentence context in spoken word recognition. Quarterly Journal of Experimental Psychology, 67, 114-123.

Abstract

According to some theories, the context in which a spoken word is heard has no impact on the earliest stages of word identification. This view has been challenged by recent studies indicating an interactive effect of context and acoustic similarity on language-mediated eye-movements (e.g., Dahan & Tanenhaus, 2004). However, an alternative explanation for these results is that participants looked less at acoustically similar objects in constraining contexts simply because they were looking more at other objects that were cued by the context. The current study addressed this concern whilst providing a much finer-grained analysis of the temporal evolution of context effects. 32 adults listened to sentences while viewing a computer display showing four objects. As expected, shortly after the onset of a target word (e.g., “button”) in a neutral context, participants saccaded preferentially towards a cohort competitor of the word (e.g., butter). This effect was significantly reduced when the preceding verb made the competitor an unlikely referent (e.g., “Sam fastened the button”), even though there were no other contextually congruent objects in the display. Moreover, the time-course of these two effects was identical to within approximately 30 milliseconds, indicating that certain forms of contextual information can have a near-immediate effect on word identification.

 

Common to most models of spoken word recognition is the supposition that multiple lexical entries are activated in parallel according to their perceived match to the unfolding acoustic input (Gaskell & Marslen-Wilson, 2002; Luce & Pisoni, 1998; McClelland & Elman, 1986; Norris, 1994). For example, Marslen-Wilson and Welsh (1978) proposed that, at the onset of a single spoken word, a cohort of potential lexical candidates is initially activated and is then progressively whittled down as more acoustic information becomes available. Subsequent models also allow for partial activation of words that share acoustic features later on, even if they do not match at onset (see Dahan & Magnuson, 2006).

These models are based primarily on data from studies in which words have been presented out of context, such that listeners have no prior expectations about the word identity, other than the word’s frequency of occurrence. However, in everyday listening situations (as opposed to psycholinguistic experiments), words are seldom heard in complete isolation and there are various sources of contextual information that could, at least in principle, place constraints upon lexical activation. While there is little doubt that context can affect the identification of words, it remains unclear precisely when in the process of speech perception such context effects take place. A long-standing view is that lexical candidates are initially accessed entirely on the basis of bottom-up sensory information, with contextual information only playing a role during the subsequent selection of a single best-fitting candidate (Marslen-Wilson, 1984; Norris, 1986) or integration of the word meaning into the higher-level sentence representation (Forster, 1976; Tanenhaus, Carlson, & Seidenberg, 1985). According to such models, hearing the sentence “Joe fastened the button” will momentarily activate the lexical representation of “butter”, even though butter is clearly an incongruent completion to the sentence. In contrast, other researchers have argued in favour of earlier effects of context (e.g., Lucas, 1999; Dahan & Tanenhaus, 2004), although, as discussed below, current evidence for immediate effects is problematic.

Until fairly recently, most of the evidence relating to context effects on word identification came from the cross-modal semantic priming paradigm. In one such experiment, Onifer and Swinney (1981) presented spoken sentences including a homophone – a semantically ambiguous word such as “organ” – followed by a written target word or nonword to which the subject made a lexical decision response. Reaction times for associates of one meaning of the homophone (e.g., KIDNEY) were reduced, even when the context was biased towards the alternative meaning of the homophone. The authors argued, therefore, that context effects were delayed, such that both (all) meanings of the homophone were initially accessed. Nevertheless, their data showed a numerical effect of context, with response times being relatively slower for associates of the contextually inappropriate meaning. This trend was confirmed by Lucas (1999) in a meta-analysis of 17 similar priming studies. Lucas interpreted this as evidence for an early effect of context. However, the problem for all such priming studies using homophone stimuli is that behavioural responses are made some time after the offset of the homophone (even when the target word is presented prior to homophone offset). Thus, it is difficult if not impossible to determine the point in processing at which the effect of context kicks in.

To address this issue of timing, Zwitserlood (1989) presented Dutch-speaking participants with auditory sentences that ended in an incomplete fragment of a prime word. Lexical decision times to cohort competitors of the prime word were significantly reduced. For example, the initial fragment of “kapitein” [captain] primed the word “geld” [money] – an associate of the cohort competitor “kapitaal” [capital]. This was true, even when the context was biased towards the “kapitein” interpretation, leading Zwitserlood to conclude that context does not constrain the initial activation of cohort competitors. However, the contextual constraints were not particularly strong – disambiguating information was presented in the sentence prior to that carrying the prime and the cohort competitor often remained a plausible if less likely completion to the sentence. Moreover, Janse and Quené (2004) criticized this study for employing an incomplete within-subjects design, meaning that the reported effects were confounded with between-subjects differences in effect size. Indeed, simulations indicated an extremely high (62%) probability of showing a priming effect when none was present in the data.

A number of more recent studies have explored similar issues by measuring event-related potentials (ERPs) to contextually congruent and incongruent words (van den Brink, Brown, & Hagoort, 2006; Van Petten, Coulson, Rubin, Plante, & Parks, 1999). Of particular note, Hagoort and colleagues (Hagoort & Brown, 2000; van den Brink, Brown, & Hagoort, 2001) identified an early negative component of the ERP, peaking at around 250 milliseconds after the onset of the word. This component was significantly reduced if the word was congruent with the preceding sentence context or was a cohort competitor of a congruent word. These findings were interpreted as evidence for an early but not immediate effect of context on lexical processing (cf. Marslen-Wilson, 1984). However, early ERP components are typically much shorter in duration than later components, making them highly susceptible to stimulus variation that result in temporal variability in the brain response (Penolazzi, Hauk, & Pulvermuller, 2007). Thus, one cannot rule out the possibility of even earlier effects on brain responses.

Further evidence comes from studies using the so-called ‘visual world’ paradigm, in which participants’ eye-movements are recorded as they listen to spoken sentences (Cooper, 1974; Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995). Two such studies have investigated the effects of sentence context on homophone processing, providing results broadly comparable to the corresponding semantic priming studies described earlier. Huettig and Altmann (2007) found that, on hearing the homophone “pen” in a sentence biased towards the ‘animal enclosure’ interpretation, participants looked more at a writing pen (or a visually similar object such as a needle) than at an unrelated object such as a bicycle. However, the contextual constraints were not particularly strong and both meanings remained plausible. Chen and Boland (2008) reported similar findings but noted that looks towards the object corresponding to the inappropriate homophone were modulated by context. While these authors interpreted their findings in terms of context effects on lexical access, it is important to note that the context effect did not achieve statistical significance until some 600 milliseconds after the onset of the homophone.

Other studies have looked at eye-movements directed at cohort competitors of target words. In a study with Dutch-speaking participants, Dahan and Tanenhaus (2004) reported that, on hearing the word “kanon” [cannon] in a neutral context, participants looked more at a cohort competitor, camel [“kameel”], than at other phonologically unrelated objects in the display. However, biasing sentence contexts abolished this cohort effect. For instance, if the word “kanon” was preceded by the verb “roest” [“rust”] for which camel is an implausible subject, looks towards the camel were significantly reduced and were not in fact significantly different to looks towards a phonologically unrelated object. Similar findings have since been reported by Barr (2008) using English stimuli, by Weber and Crocker (2012) in German, and by Revill, Tanenhaus, and Aslin (2008) using an artificial language. In each case, a constraining verb reduced fixations on incompatible objects (in the grammatical sense). However, in all four of these studies, the reduction in cohort competitor fixations in the biased condition was accompanied by an increase in fixations on the target object, which was a thematic fit for the verb. This represents a potentially serious confound because participants can only physically look at one object at a time. The effect of context on target fixations was apparent well before the onset of the cohort effect and, in some cases, even before the onset of the target word, suggesting that the effect is not mediated by lexical entries of the corresponding objects (Altmann & Kamide, 1999). Early effects of context on competitor fixations may simply be a bi-product of the (even earlier) increase in target fixations and do not, therefore, constitute evidence for early effects on lexical processing.

Finally, Magnuson, Tanenhaus, and Aslin (2008) investigated the effects of form-class constraints on cohort competitor effects. Participants were trained to associate novel words with different textures and shapes, treated as adjectives and nouns respectively. When pragmatic considerations led participants to expect texture information, they looked less at objects whose shape was a cohort competitor of the texture word (and vice versa). Unlike in the studies using verb-mediated context, participants were not able to anticipate which specific object would be referred to. Nevertheless, the pragmatic context primed them to attend to one perceptual dimension of the stimuli (colour or texture) and it could be argued that this effect on visual attention was responsible for the context effects. Moreover, as the authors acknowledged, it is as yet unclear whether their findings generalize to natural language.

A further limitation of all the above-mentioned eye-tracking studies is that, while the authors have routinely claimed evidence for immediate effects of context on eye-movements, the analyses conducted have not actually allowed the testing of such a claim. Dahan and Tanenhaus (2004) considered the probability of gazing at the cohort competitor averaged across a window between 200 and 500 milliseconds after the onset of the target word. Even allowing for a 200 millisecond lag between the cognitive event and the corresponding eye-movement (Matin et al., 1993; but see Altmann, in press), this still means that context effects could have arisen at any point during the acoustic lifetime of the word. Similarly, Magnuson et al. (2008) and Weber and Crocker (2012) considered gaze probability averaged across a 200-700 millisecond window. Chen and Boland (2008) conducted analyses at 100 millisecond intervals but, as already noted, did not find a significant context effect prior to 600 milliseconds. Barr (2008), in contrast, adopted a curve-fitting approach which considered the temporal evolution of gaze likelihood, but assumed a priori that the context effect began at 200 milliseconds.

In sum, despite strong claims in the literature, there is still a lack of compelling evidence that context effects are immediate. As such, accounts that assume an initial period of purely bottom-up lexical processing are yet to be convincingly refuted. Here, we present evidence from an eye-tracking study that, we argue, provides such evidence. Our design was based on the study by Dahan and Tanenhaus (2004) described earlier. However, rather than presenting targets and competitors together in the same display, on each trial we presented either a target, a competitor, or an unrelated object – in each case alongside three unrelated distracters. Thus any effect of context on competitor fixations could not be explained away in terms of confounding effects on target gaze.

This design also allowed us to take full advantage of the exquisite temporal precision afforded by eye-tracking technology and investigate the time-course of context effects on saccades towards the competitor. Specifically, we asked, for each moment in time following the onset of the target word, whether or not there was evidence for a cohort effect or context effect in the eye-movement record (see McMurray et al., 2008 for a similar approach). This allowed precise investigation of the time course of different linguistic effects, whilst making no assumptions about either the shape of the curve or the lag between the linguistic event and the corresponding eye-movement.. If context effects are immediate then the time-course of these two effects should be near identical. If, however, there is an initial period of context-free lexical access then there should be an appreciable lag between the cohort effect and the context effect and participants should show a cohort effect (however brief), even when sentence context is biased against the competitor.

Method

Participants

Participants were 32 undergraduate students from Macquarie University, aged 18 to 23 years, all with normal or corrected-to-normal vision. Informed consent was obtained prior to the commencement of testing and participants were rewarded with course credits.

Stimuli

Sentences were recorded by a female native English speaker using natural intonation but with small pauses between words. One token of each word was used in the experiment, with sentences constructed by concatenating the individual words. Thus, intonation patterns were relatively consistent across sentences and the critical word tokens were identical across conditions. Each sentence described an agent acting upon an object and was composed of four words – a gender ambiguous name (e.g., Sam); a past tense verb; the definite article; and then the object noun, which was the target word (see Brock et al., 2008 for a full list of stimuli). In constrained sentences, the verb was chosen such that it was strongly associated with the object of the sentence (e.g., “Joe fastened the button”). In neutral sentences, the verb was always “chose” (e.g., “Sam chose the button”). Agents were pseudo-randomly chosen from four gender-neutral names.

Figure1 copy
Figure 1. Example stimulus display with interest areas overlain. Then scan path shows fixations and saccades for a representative subject in the 1500 milliseconds immediately following the onset of the target word “button” in the neutral sentence “Joe chose the butter”.

Visual stimuli were photo-quality pictures on a black background, taken primarily from the Hemera Photo-Objects Collection. Stimulus displays each contained four objects, located in the centre of each quadrant of the screen (see Figure 1). One of these was the critical object – which was either the target object mentioned in the sentence, a cohort competitor of the target, or an unrelated object. The remaining three were distracters, semantically unrelated to the verb and phonologically unrelated to the critical object.

Design

Condition Trials Example sentence Critical object
Target neutral 16 “Joe chose the button” Button
Competitor neutral 16 “Sam chose the button” Butter
Unrelated neutral 16 “Ashley chose the button” Lettuce
Competitor biased 16 “Alex fastened the button” Butter
Target biased 8 “Sam played the violin” Violin

Table 1: Conditions and example stimuli 

A fully within-subjects design was employed (see Table 1). The three conditions involving the competitor and unrelated objects allowed evaluation of the cohort effect and the effect of sentence context. The target neutral condition provided an index of when there was sufficient acoustic information to allow reliable identification of the target word. In each of these four conditions, the location of the critical object was fully counterbalanced and the same subset of 16 objects played the role of critical object across the 16 trials, thus controlling for the intrinsic salience of the different objects (Kamide, Altmann, & Haywood, 2003). A fifth condition, Target biased, was included to ensure that the probability of the target being present was the same across neutral and biased contexts. Objects used as targets in these filler trials were not critical objects in the experimental trials.

Procedure

Eye-movements were recorded using an EyeLink 1000 tower-mounted eye-tracking system sampling at 500Hz, which required that the head was maintained in a fixed position on a chin rest, with a viewing distance of 40 cm. Stimuli were presented using Experiment Builder software.

At the start of the experiment, the visual stimuli were presented, one-by-one in a random order. Participants were required to name each object. If they did not provide the required name, they were informed by the experimenter of the correct response and the trial was repeated at the end of the sequence.

Next, four practice trials were completed, each of which involved four stimuli that were not critical items in the main experiment. Participants viewed the displays for 4 seconds and then heard a spoken sentence. Their instructions were to use the mouse to click on any object that was mentioned in the sentence as quickly as possible. If none of the objects were mentioned, they were to simply wait for the next trial. Trials concluded 1 second after the mouse click or, if there was no mouse response, 4 seconds after the onset of the final word of the sentence.

Test trials were completed in three blocks of 24 trials. Prior to each block, the eye-tracker was calibrated using a standard nine-point calibration routine. Each participant received the same experimental items but in a different fully randomized order.

Analysis

Areas of interest were rectangles, 25% of the screen height and width, centred on the critical object in the display (see Figure 1). Interest area reports were generated using DataViewer software for an interest period beginning at the onset of the target word and ending an arbitrary 1000 milliseconds later (we anticipated that effects of interest would occur much earlier than this).

Conventionally, language-mediated eye-movement data is analysed by conducting a by-subjects t-test or ANOVA on the probability of fixation on a particular type of object averaged across a pre-specified time window. However, this is problematic because it falsely assumes that the proportion data has a normal distribution and that consecutive observations are independent (Barr, 2008). Moreover, interpretation of the gaze probability measure is complicated by the fact that it conflates the effect of context on saccades landing after the onset of the target word with context effects apparent at target word onset, and with saccades away from the object in that period. A common solution to this problem is to exclude trials on which the participant is already gazing at the target at onset. However, this approach can itself introduce biases because one is preferentially excluding those trials on which there might be a saccade away from the target (Barr, Gann, & Pierce, 2011).

Our analyses, therefore, focused on saccades towards the critical object as a function of time since the onset of the critical word (see Altmann & Kamide, 1999 for a related approach). For each 10 millisecond segment of each trial, we coded as a binary variable whether or not the subject had looked at the critical object at any point in time since the onset of the target word. Trials on which the participant was already fixating on the critical object at target word onset were excluded from further analyses. Crucially, because we were interested only in saccades towards the critical object, this exclusion approach does not bias the results as it does for gaze plots (cf. Barr et al., 2011). Averaging across subjects and items produced a cumulative fixation probability plot (Figure 2a).

These data were then subjected to mixed effects analyses at each time point using the lme4 and languageR packages in R (v 1.34). This was essentially a model-building exercise. The initial model included condition as a fixed effect and subject as a random effect. Subsequently, it was found that adding the location of the critical object (fixed effect) and the identity of the target picture (random effect) significantly improved the fit of the model to the data at the majority of time points. These factors were, therefore, included in the final model. The identity of the target word (random effect) did not enhance the model so was omitted. More complex models involving interactions between the various factors were also considered but did not provide a significantly better fit to the data at any time point.

Results

Figure2
Figure 2. Gaze probability for critical objects

To allow comparison with previous studies, Figure 2 shows the eye-tracking data plotted conventionally in terms of gaze probability as a function of time since the onset of the target word. Qualitatively, the results are very similar to those reported by Dahan and Tanenhaus (2004) and Barr (2008), despite the fact that the target was absent from the display in the critical conditions.

Figure3
Figure 3. A: Cumulative probability of saccading towards the critical object. B: Z-statistic for the three effects of interest. Cohort: Competitor neutral – Unrelated neutral; Context: Competitor neutral – competitor constraining; Context-free: Competitor constraining – Unrelated neutral. Dotted horizontal line indicates the estimated threshold for significance (alpha = .05)

Figure 3a shows the cumulative probability of saccading towards the critical object. As expected, in neutral sentences, hearing the target word resulted in increased saccades towards the cohort competitor compared with unrelated objects. Importantly, this effect was markedly reduced in the constraining context.

Figure 3b shows the z-statistics for the mixed effects analyses comparing these three conditions at each time point. The cohort effect first achieved statistical significance (a = .05) at 310 milliseconds. The context effect closely followed: it was marginally significant (p < .10) at 310 milliseconds, became significant at 340 milliseconds, z = 2.17, p » .030, and remained significant thereafter.

More importantly, the above results entailed that there was no appreciable lag between the cohort effect and the context effect. Indeed, if there had been such a lag then one would expect a significant difference between fixating a contextually incongruent competitor and fixating an unrelated distracter. However, this difference remained non-significant throughout the epoch, peaking at 350 milliseconds, z = 1.07, p » .285.

Discussion

The current study provides the most compelling evidence to date for immediate effects of sentence context on word identification. When participants heard a target word in a neutral context, they tended to look towards objects that shared the same onset, indicating that these competitors were considered as potential referents. However, this effect was significantly reduced when the same words were heard in a constraining sentence context that made the competitor objects unlikely referents.

Our results are consistent with those of a number of previous eye-tracking studies, which have been taken as evidence for immediate context effects (Barr, 2008; Dahan & Tanenhaus, 2004; Weber & Crocker, 2012). However, none of these previous studies actually determined when the effect of context became apparent, with authors either averaging across an extended time window or modeling gaze probability with onset time assumed rather than tested. In contrast, we were able to track the emergence of linguistic influences on eye-movements across time. The cohort effect became statistically significant 310 milliseconds after the onset of the target word. Allowing for a 100-200 millisecond lag between the initiation and completion of a saccade (Matin, et al., 1993), this accords well with the suggestion that the time to access the mental lexicon corresponds to the first 100 to 200 milliseconds of a word (Marslen-Wilson, 1984; Salasoo & Pisoni, 1985). Throughout the epoch, the effect of context on saccades towards the cohort competitor closely tracked the cohort effect, becoming statistically significant a mere 30 milliseconds later and remaining significant thereafter.

A potential criticism of our analytical approach is that, by conducting analyses at 100 different time points, we were capitalizing on chance. However, it is important to note that we were not ‘fishing’ for a context effect anywhere in the epoch. Indeed, of the 70 time points at which there was a significant cohort effect (and thus a reduction in the cohort effect was possible), the context effect was significant in 67 and marginally significant in the remaining three. In contrast, at no point was there even a remotely significant tendency to fixate on contextually inappropriate competitors above baseline levels, as would have been expected if access was initially context-free.

A further important innovation concerned the removal of the target object from the stimulus display for critical trials. In previous studies, the target and competitor were always both on-screen. Thus, effects of sentence context on gaze at the cohort competitor were impossible to disentangle from context effects on target-directed gaze. Our assumption was that, by removing the target from the display, gaze on the critical object at target onset would be equated across conditions. In fact, there was a small initial reduction in gaze at the competitor in constraining contexts. Although the competitor and distracters were all inconsistent with the constraining verb, it is possible that the distracters were on average somewhat more likely completions to the constraining sentences than the competitors. This bias at onset slightly exaggerated the context effect in the gaze plot (Figure 2). Crucially, however, our analyses were based on saccades towards the competitor after onset, excluding trials on which the relevant object was fixated at onset. As the cumulative fixation plot (Figure 3a) shows, the three critical conditions were almost perfectly equated throughout the “baseline” period before the onset of eye-movements triggered by the target word.

Taken in isolation, the current results could be construed as support for a sequential model of speech comprehension, whereby the set of possible words is progressively reduced as more input is received. On this view, preceding context eliminates semantically implausible words in much the same way that phonologically incompatible words are progressively weeded out in Marslen-Wilson and Welsh’s (1978) cohort model. However, other recent eye-tracking findings indicate that context effects can be over-ridden to some extent by later-occurring articulatory or lexical cues. For example, Dahan et al. (2004) reported that cohort effects re-emerged if coarticulation cues favoured the (contextually inappropriate) cohort competitor over the target. Similarly, Weber and Crocker (2012) reported that cohort effects were not abolished if the semantically inconsistent cohort competitor was of higher frequency than the semantically preferred target.

Together with the current findings, such results point towards a more dynamic view of speech perception whereby the probability attached to a lexical candidate at any moment is a joint function of its lexical frequency, its compatibility with preceding contextual information, and the acoustic match up to that point in time (Dahan, 2010). However, the precise mechanisms involved remain underdetermined. Our findings addressed the enduring question of when context effects become apparent – but this is orthogonal to the question of how different sources of information are combined (Twilley & Dixon, 2000). Immediate context effects could reflect direct influences on bottom-up processing, as in the TRACE model of speech perception (cf. McClelland & Elman, 1986), but could arise at any point up to the decision that guides ocular or manual responses, with no direct interaction between top-down and bottom-up processes (Norris & McQueen, 2008; Twilley & Dixon, 2000). Determining between these alternative accounts is beyond the scope of the current study. Nonetheless, our results place an important constraint on future model development – semantic context can and, at least in some circumstances, does have an immediate effect on spoken word recognition. Candidate models that do not allow for this possibility can, we suggest, finally be eliminated.

References

Allopenna, P. D., Magnuson, J. S., & Tanenhaus, M. K. (1998). Tracking the time course of spoken word recognition using eye movements: evidence for continuous mapping models. Journal of Memory and Language, 38, 419-439.

Altmann, G. T. M. (2011). Language can mediate eye movement control within 100 milliseconds, regardless of whether there is anything to move the eyes to. Acta Psychologica, 137, 190-200.

Altmann, G. T. M., & Kamide, Y. (1999). Incremental interpretation at verbs: restricting the domain of subsequent reference. Cognition, 73, 247-264.

Barr, D. J. (2008). Pragmatic expectations and linguistic evidence: Listeners anticipate but do not integrate common ground. Cognition, 109, 18-40.

Barr, D. J., Gann, T. M., & Pierce, R. S. (2011). Anticipatory baseline effects and information integration in visual world studies. Acta Psychologica, 137, 201-207.

Brock, J., Norbury, C. F., Einav, S., & Nation, K. (2008). Do individuals with autism process words in context? Evidence from language-mediated eye-movements. Cognition, 108, 896-904.

Chen,  L. & Boland, J. E. (2008). Dominance and context effects on activation of alternative homophone meanings. Memory & Cognition, 36, 1306-1323.

Cooper, R. M. (1974). The control of eye fixation by the meaning of spoken language: a new methodology for the real-time investigation of speech perception, memory, and language processing. Cognitive Psychology, 6, 84-107.

Dahan, D. (2010). The time course of interpretation in speech comprehension. Current Directions in Psychological Science, 19, 121-126.

Dahan, D., & Magnuson, J. S. (2006). Spoken word recognition. In M. J. Traxler & M. A. Gernsbacher (Eds.), Handbook of psycholinguistics (2 ed., pp. 249–284). Amsterdam: Academic Press.

Dahan, D., & Tanenhaus, M. K. (2004). Continuous mapping from sound to meaning in spoken-language comprehension: immediate effects of verb-based thematic constraints. Journal of Experimental Psychology: Learning Memory and Cognition, 30, 498-513.

Forster, K. I. (1976). Accessing the mental lexicon. In R. J. Wales & E. Walker (Eds.), New approaches to language mechanisms. Amsterdam: North Holland.

Gaskell, M. G., & Marslen-Wilson, W. D. (2002). Representation and competition in the perception of spoken words. Cognitive Psychology, 45, 220-266.

Hagoort, P., & Brown, C. M. (2000). ERP effects of listening to speech: semantic ERP effects. Neuropsychologia, 38, 1518-1530.

Huettig, F., & Altmann, G. T. M. (2007). Visual-shape competition during language-mediated attention is based on lexical input and not modulated by contextual appropriateness. Visual Cognition, 15, 985-1018.

Janse, E., & Quené, H. (2004). On measuring multiple lexical activation using the cross-modal semantic priming technique. In H. Quené & V. J. v. Heuven (Eds.), On speech and language: Studies for Sieb G. Nooteboom (pp. 105-114). Utrecht: LOT.

Kamide, Y., Altmann, G. T. M., & Haywood, S. (2003). The time-course of prediction in incremental sentence processing: Evidence from anticipatory eye movements. Journal of Memory and Language, 49, 133–159.

Lucas, M. (1999). Context effects in lexical access: a meta-analysis. Memory & Cognition, 27, 385-398.

Luce, P. A., & Pisoni, D. B. (1998). Recognizing spoken words: The neighborhood activation model. Ear and Hearing, 19, 1-36.

Magnuson, J. S., Tanenhaus, M. K., & Aslin, R. N. (2008). Immediate effects of form-class constraints on spoken word recognition. Cognition, 108, 866-873.

Marslen-Wilson, W. (1984). Function and process in spoken word recognition. In H. Bouma & D. G. Bouwhis (Eds.), Attention and performance X: Control of language processes (pp. 125-150). Hillsdale, NJ: Lawrence Erlbaum Associates.

Marslen-Wilson, W., & Welsh, A. (1978). Processing interactions and lexical access during word recognition in continuous speech. Cognitive Psychology, 10, 29-63.

Matin, E., Shao, K. C., & Boff, K. R. (1993). Saccadic overhead: information-processing time with and without saccades. Perception and Psychophysics, 53, 372-380.

McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18, 1-86.

McMurray, B., Clayards, M. A., Tanenhaus, M. K., & Aslin, R. N. (2008). Tracking the time course of phonetic cue integration during spoken word recognition. Psychonomic Bulletin and Review, 15, 1064-1071.

Norris, D. (1986). Word recognition: context effects without priming. Cognition, 22, 93-136.

Norris, D. (1994). Shortlist – a Connectionist Model of Continuous Speech Recognition. Cognition, 52, 189-234.

Norris, D. & McQueen, J. M. (2008). Shortlist B: a Bayesian model of continuous speech recognition. Psychological Review, 115, 357-395.

Onifer, W., & Swinney, D. (1981). Accessing lexical ambiguities during sentence comprehension: Effects of frequency of meaning and contextual bias. Memory and cognition, 9, 225-236.

Penolazzi, B., Hauk, O., & Pulvermuller, F. (2007). Early semantic context integration and lexical access as revealed by event-related brain potentials. Biological psychology, 74, 374-388.

Revill, K. P., Tanenhaus, M. K., & Aslin, R. N. (2008). Context and spoken word recognition in a novel lexicon. Journal of Experimental Psychology: Learning Memory and Cognition, 34, 1207-1223.

Salasoo, A., & Pisoni, D. B. (1985). Interaction of knowledge sources in spoken word identification. Journal of Memory and Language, 24, 210-231.

Seidenberg, M., Tanenhaus, M., & Leiman, J. (1982). Automatic access of the meanings of ambiguous words in context: Some limitations of knowledge-based processing. Cognitive Psychology, 14, 489-537.

Tanenhaus, M. K., Carlson, G. N., & Seidenberg, M. S. (1985). Do listeners compute linguistic representations? In D. R. Dowty, L. Kartunnen & A. M. Zwicky (Eds.), Natural language parsing: Psycholinguistic. theoretical and computational perspectives. Cambridge: Cambridge University Press.

Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, J. C. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, 1632-1634.

Twilley, L. C. & Dixon, P. (2000). Meaning resolution processes for words: a parallel independent model. Psychonomic Bulletin and Review, 7, 49-82.

van den Brink, D., Brown, C. M., & Hagoort, P. (2001). Electrophysiological evidence for early contextual influences during spoken-word recognition: N200 versus N400 effects. Journal of Cognitive Neuroscience, 13, 967-985.

van den Brink, D., Brown, C. M., & Hagoort, P. (2006). The cascaded nature of lexical selection and integration in auditory sentence processing. Journal of Experimental Psychology: Learning Memory and Cognition, 32, 364-372.

Van Petten, C., Coulson, S., Rubin, S., Plante, E., & Parks, M. (1999). Time course of word identification and semantic integration in spoken language. Journal of Experimental Psychology: Learning Memory and Cognition, 25, 394-417.

Weber, A., & Crocker, M. W. (2012). On the nature of semantic constraints on lexical access. Journal of Psycholinguistic Research, 41, 195-214.

Zwitserlood, P. (1989). The locus of the effects of sentential-semantic context in spoken-word processing. Cognition, 32, 25-64.

Acknowledgements

The study was supported by Australian Research Council Discovery Project DP098466 and a Macquarie University Research Development Grant. It was based on an earlier unpublished study, conducted at Oxford University, funded by the Medical Research Council, and presented at the 2007 Experimental Psychology Society meeting in London. We thank Lucy Cragg for assistance creating the stimuli, Samantha Bzishvili for data collection, and Sachiko Kinoshita for helpful comments on an earlier draft of this paper.