In conversations, interlocutors concurrently perform two related processes: speech comprehension and speech planning. We investigated effects of speech planning on comprehension using EEG. Dutch speakers listened to sentences that ended with expected or unexpected target words. In addition, a picture was presented two seconds after target onset (Experiment 1) or 50 ms before target onset (Experiment 2). Participants’ task was to name the picture or to stay quiet depending on the picture category. In Experiment 1, we found a strong N400 effect in response to unexpected compared to expected target words. Importantly, this N400 effect was reduced in Experiment 2 compared to Experiment 1. Unexpectedly, the N400 effect was not smaller in the naming compared to categorization condition. This indicates that conceptual preparation or the decision whether to speak (taking place in both task conditions of Experiment 2) rather than processes specific to word planning interfere with comprehension.
1. Introduction
In conversation, interlocutors take turns, rapidly switch between the roles of speaker and listener, and often talk in overlap (e.g., Corps et al., 2019, 2022; Stivers et al., 2009). This implies that speech planning and comprehension processes must occur simultaneously, and indeed a substantial body of laboratory work has shown that speakers can combine these processes (Bögels et al., 2015, 2018; Boiteau et al., 2014; Sjerps & Meyer, 2015). Laboratory work has also shown that speech planning is hampered by concurrent speech input (e.g., He et al., 2021). This is not surprising, given that both speech planning and comprehension require processing capacity (e.g., Cook & Meyer, 2008; Ferreira & Pashler, 2002; Jongman et al., 2015; Kristensen et al., 2013; Mädebach et al., 2011; Moisala et al., 2015), and are highly similar cognitive processes with shared representations (e.g., Glaser & Düngelhoff, 1984; Kittredge & Dell, 2016; Mitterer & Ernestus, 2008; Schriefers et al., 1990). Additionally, activating the production system has been found to enhance semantic prediction in comprehension (Dell & Chang, 2014; Gastaldon et al., 2023; Hintz et al., 2016; Lelonkiewicz et al., 2021; Pickering & Gambi, 2018; Pickering & Garrod, 2013), showing close links between the production and comprehension systems and the importance of studying their mutual interference. Less is currently known about the way comprehension is affected by speech planning. This question is addressed in the current paper. Specifically, we presented participants with spoken sentences ending in an expected or an unexpected word while EEG was recorded. We asked whether the classic N400 effect for comprehending unexpected compared to expected words would be reduced when participants planned a picture name while listening to the sentence. Our aim was to better understand how interlocutors combine speech planning and comprehension. Simultaneously, we explored which, if any, components of the EEG signal could be used as indicators of the onset of speech planning during speech processing.
Some studies investigating concurrent speech comprehension and planning have focused on comprehension tasks that did not require semantic processing, such as syllable identification (Fairs, 2019; Fargier & Laganaro, 2016; Paucke et al., 2015). They confirmed that these simple comprehension tasks can run in parallel with word planning. However, they also showed that the comprehension tasks interfered more with word planning than with non-linguistic tasks, such as tone identification. Evidence for effects of concurrent speech planning on the semantic processing of sentences comes from EEG studies carried out by Bögels and colleagues (2015, 2018) and Gerakaki (2020). Both examined how the N400 component of the EEG was modified in the presence, compared to the absence, of concurrent word planning.
The N400 is a negative deflection in the EEG signal peaking around 400 ms post-stimulus onset. It is most pronounced at centro-parietal sites. In general, theories are in agreement that the N400 represents some form of semantic processing (e.g., Kutas & Hillyard, 1984; Van Berkum et al., 1999). It has been linked to semantic prediction in comprehension (Grisoni et al., 2021; Rabovsky et al., 2018; Rabovsky & McRae, 2014). Importantly, the N400 has a larger amplitude in response to semantically unexpected stimuli than to expected stimuli, in other words, stimuli that have been pre-activated by the context (for review see Kutas & Federmeier, 2011). The difference between an N400 waveform elicited by unexpected and expected stimuli constitutes an N400 effect. The N400 effect in comprehension decreases if the listeners’ attention is diverted by performing another task, as listeners then activate less semantic information (e.g., Batterink et al., 2010; Giesbrecht et al., 2007; Hohlfeld et al., 2004, 2015; Hubbard & Federmeier, 2021; Lien et al., 2008; Vachon & Jolicoeur, 2012). We will refer to this pattern as an N400 effect reduction. An N400 effect reduction can be interpreted as an online index of listening quality, and can be used to assess the trade-off between comprehension and word planning.
To investigate EEG signatures of speech planning during comprehension, Bögels and colleagues (2018) asked participants to respond to auditory questions about objects on their screen. The trial structure allowed participants to plan their responses while comprehending the questions. On each trial, participants chose to name one of the two displayed objects (e.g., banana and pineapple) depending on the question. The questions contained expected and unexpected words, in the context of the pictures, which were presented either in the middle of the sentence (e.g., Welk object wordt als fruit/gezond gezien en is krom? [Which object is considered to be a type of fruit/healthy and is curved?]) or at its end (e.g., Welk object is krom en word als fruit/gezond gezien? [Which object is curved and is considered to be a type of fruit/healthy?]). Additionally, critical words (e.g., krom [curved]) appeared early or late in the sentence and they indicated which object should be named. The position of the critical word in the sentence allowed for either early or late word planning. This allowed the authors to investigate whether an N400 effect, elicited by the expected and unexpected words, was reduced when simultaneous word planning was taking place. They did not find a smaller N400 effect in the early compared to the late planning condition. However, they found that participants with shorter naming latencies had smaller N400 effects in the early planning condition than participants with longer naming latencies. This indicates that simultaneous word planning (that was more likely to occur in the fast responders) interfered with semantic processing in comprehension. It is possible that for some of the fast responders the attention-demanding processes of word planning were finished before the onset of the expected and unexpected words, leading to small N400 effect reductions. Consequently, the N400 effect reductions should be investigated when word planning begins right before the comprehension of the target words.
Gerakaki (2020) utilized a design where participants began word planning during comprehension of expected (e.g., “cookie”) or unexpected words (e.g., “mouse”) presented in sentence contexts (e.g., “With tea we always eat a …”). In the plan condition, participants were asked to name pictures of real objects. In the no-plan condition, participants were presented with a picture of a nonsense drawing and were asked to stay quiet. An N400 effect was smaller in the plan compared to the no-plan condition, indicating attenuated or modulated semantic processing in comprehension during simultaneous word planning. Thus, the N400 effect reduction shows promise as an index of word planning. However, this study had two limitations. Firstly, the pictures were only meaningful in the word planning trials. Hence, the N400 effect may have decreased just due to processing of another meaningful stimulus and not because word planning was taking place. This would mean that other tasks encouraging concurrent processing of pictures, such as a categorization task or simply passive viewing of pictures, could lead to the same N400 effect reduction. Secondly, the nonsense drawing was the same on all no-plan trials, whereas the object pictures varied across trials, introducing greater visual variability on plan than in no-plan trials. To overcome these limitations, we used a similar design but varied the pictures in both conditions.
Before turning to our study, it should be noted that Bögels and colleagues (2015, 2018) pointed to other potential ERP signatures of speech planning. In these studies, the early speech-planning condition was associated with a late positive potential (LPP), which they thought reflected word planning. However, the LPP was also found (although to a lesser degree) in a control experiment where participants did not provide any overt responses and were asked to remember the sentences (Bögels et al., 2015). The authors argued that the LPP in the control experiment was likely driven by attentional control. However, it was not explained why the pattern of the LPP in the control experiment closely resembled the pattern seen in the main experiment. Finding the LPP in the control experiment could indicate that it does not truly reflect word planning, but rather general semantic processing that takes place in both word planning and comprehension. In addition, Jongman and colleagues (2019) found that the LPP had different distributions of neural sources when the comprehension stimuli were presented in the auditory or visual modality, which should not be the case if the LPP reflected word planning. Jongman and colleagues suggested that the LPP reflected a difference in attention toward the sentence end. Hence, a new way of capturing speech planning during comprehension needs to be established.
1.1. Current Study
The current study consisted of two experiments, where we examined how word planning interfered with semantic processing during comprehension, as indexed by an N400. In parallel, this study assessed whether an N400 effect reduction can be used as an index of word planning during comprehension.
In the first experiment (see Figure 1A), participants listened to sentences containing sentence-final target words that either were contextually expected (e.g., “She bought a stroller for the baby”) or unexpected (e.g., “She bought a book for the baby”). The expected as well as the unexpected sentences did not feature any instances of semantic anomalies. After the target word, a picture appeared on the screen. The pictures belonged to one of two categories: fruits and vegetables. Participants were asked to name pictures that belonged to one of these categories, and remain silent when they saw pictures of the other category. Hence, in the naming condition participants engaged in both categorization and naming, while in the categorization condition the task did not require participants to perform any processes beyond picture categorization. However, we cannot exclude that in the categorization condition participants engaged in word production processes beyond conceptualization even though it was not required of them (Bloem & La Heij, 2003; Levelt, 1993; Strijkers et al., 2011; Zwitserlood et al., 2018). We will return to this point in the General Discussion. This experiment served as a pilot to gain two critical pieces of information needed before running Experiment 2. Firstly, we expected to see a more negative N400 component in the unexpected as compared to the expected condition (Kutas & Federmeier, 2011). We were interested in the size of this N400 effect to determine the sample size needed to investigate the interaction between expectancy and task (i.e., naming or categorization). Secondly, we examined the interval after the picture onset to gage the difference between the naming and categorization conditions, in the absence of comprehension effects. Based on Gerakaki’s (2020) results, we expected a larger positivity in the categorization condition between 100 ms and 300 ms post picture onset, and a larger positivity in the naming condition between 400 ms and 600 ms.
In the second experiment, participants listened to the same sentences but the pictures of fruits and vegetables now appeared during the sentences, namely 50 ms before the target word (see Figure 1B). In the naming condition, participants were asked to name pictures of one category, while in the categorization condition, participants saw pictures of the other category, but did not have to perform any explicit task. Instead they had to refrain from naming the picture. Thus, in this experiment, participants always had to process the sentence context, categorize the picture, and decide whether or not to start word planning. Considering that in the categorization condition participants had to engage in semantic processing, we could use a design with similar attentional demands in all conditions. We expected to see a larger N400 component in the unexpected as compared to the expected condition, as well as a larger positivity in the naming as compared to categorization condition. Crucially, we also hypothesized that there would be an interaction between expectancy and task conditions, as we expected a reduced or delayed N400 effect in the naming compared to the categorization condition. This interaction would indicate that semantic processing in comprehension is reduced due to simultaneous word planning and not just due to semantic processing of the picture.
2. Experiment 1
2.1. Method
EEG data, experimental scripts, and analysis scripts are available on https://osf.io/vfu57/. The present study was approved by the Ethics Committee of the Social Sciences department of the Radboud University Nijmegen (ECSW-2019-019).
2.1.1. Participants
Twenty-five, right-handed, native Dutch speaking participants took part in the experiment for financial compensation. One participant did not complete the experiment because of technical issues. The remaining 24 participants had a mean age of 24.29 years (range: 18 - 38) and six were male.
2.1.2. Materials and design
Two-hundred forty Dutch sentences with the expected endings were taken from previous studies (see Klaus et al., 2020; Piai, Roelofs, & Maris, 2014; Piai et al., 2015; Poulton & Nieuwland, 2021; Roos & Piai, 2020). In these sentences the context predicted the target word (e.g., “Ze kochten een wieg voor de baby” [She bought a stroller for the baby]). Matching sentences with unexpected endings (e.g., “Ze kochten een boek voor de baby” [She bought a book for the baby]) were created by replacing strongly related context words by neutral words. Note that the last two pre-target words were kept the same across conditions. The sentences consisted of seven to 11 words and were unrelated to the theme of food depicted in the target pictures (depicting fruits and vegetables). To avoid word repetition effects, we split the items into two lists: A and B. Both lists contained all targets, half were contextually expected and half unexpected (contextually expected targets in list A were matched with neutral contexts in list B and vice versa). Thus, each target was included in both, the expected and unexpected condition, but participants saw each target only once.
To examine the expectedness of the target words, we conducted a cloze-pretest where 10 participants red all sentences with the last word missing. They were asked to write down the word that most likely completed the sentence. The cloze probabilities for the expected and unexpected words were computed for each item. The cloze probability was used to judge the quality of items as it reliably correlates with an N400 component (Kutas & Federmeier, 2011). The cloze probability was significantly higher for the sentences with the expected (M = 0.90, SD = 0.11, range: 0.6 – 1) compared to sentences with unexpected endings (M = 0.04, SD = 0.07, range: 0 – 0.3, t(239) = 31.52, p < 0.001, d = 2.03) and the cloze probability was comparable between list A (expected: M = 0.90, SD = 0.11; unexpected: M = 0.04, SD = 0.07) and B (expected: M = 0.90, SD = 0.11; unexpected: M = 0.04, SD = 0.07). All sentences were pre-recorded by a female speaker using 3.0.0 Audacity(R).
The target pictures were greyscale drawings depicting six fruits and six vegetables from the MultiPic database (Duñabeitia et al., 2018; for the complete list see Supplementary Materials). The mean frequency of the picture names was 3.17 per million words (range: 1.55 – 5.33) for the fruits and 3.22 per million words (range: 1.33 – 6.11) for the vegetables (Decuyper et al., 2021). Each picture was displayed 10 times in the expected and 10 times in the unexpected condition.
We used a two-by-two design with the variables of target expectedness (expected versus unexpected) and planning (plan versus no-plan). We utilized 60 trials per condition. We pseudorandomized the items in list A with MIX (Van Casteren & Davis, 2006), creating six unique lists. The same comprehension condition (i.e., expected or unexpected) and task condition (i.e., naming or categorization) appeared on maximally four consecutive trials. Each picture was shown 20 times, with at least three pictures intervening between repetitions. The same randomization was used for list B, meaning that targets appeared in the same order, creating comparable six lists. Each randomized list was used for two participants; one participant named fruits and the other named vegetables. This between-participant variable had only small effect on RTs. Vegetables were named about 50 ms faster (M = 758, SD = 224) than fruit (M = 809, SD = 298). However, the word planning could be initiated at the same time for both groups of participants. Hence, this between-participant variable was not considered in further analyses.
2.1.3. EEG acquisition
EEG was recorded using 32-scalp electrodes using actiCap system (Brain Products, Germany) arranged according to the 10-20 system. Twenty-five electrodes were positioned on the scalp. We used the remaining six electrodes to capture eye movements and speech artefacts: two above and below the left eye, two on the left and right temples, and two above and below the right side of the lips on the orbicularis oris muscle. The left mastoid was used as online reference. The data was recorded using 500 Hz sampling rate with a band-pass filter of 0.016 to 150 Hz. We adjusted impedances of all electrodes below 25 kΩ.
2.1.4. Procedure
The participants were first fitted with an EEG cap. During the capping they were familiarized with the picture names, by looking at each picture and reading its associated name on a printed sheet. In the beginning of the experiment participants were instructed to carefully listen to all sentences and pay attention to the pictures. They were asked to name either the fruits or vegetables as fast as possible after the onset of an exclamation mark. All stimuli were presented with Presentation Software (Neurobehavioral Systems). Each trial started with a fixation cross presented simultaneously with a beep sound (200 ms). Subsequently, the expected and unexpected sentences were played. The pictures appeared in the center of the screen 2000 ms after the onset of the target words and remained on the screen for 2500 ms. Afterwards, an exclamation mark appeared and participants named either the fruits or vegetables (or they remained quiet). At the end of every trial, a blinking interval was presented with duration jittered between 1200 and 1500 ms. Trials were separated into 10 blocks each lasting about 4.5 minutes, and participants could take a break after each block.
2.1.5. EEG preprocessing
The EEG data was preprocessed using BrainVision Analyzer (Version 2.2.0). Channels with excessive noise were removed (no more than three channels were excluded per participant) and subsequently interpolated using neighbouring channels. We filtered the data with a band-pass filter between 0.1 and 30 Hz and a notch filter at 50 Hz, using 24dB/octave roll-off. All channels were re-referenced to an average mastoid reference. The data was segmented starting 200 ms before target word onset to 600 ms post picture onset, and baseline correction was performed using the first 200 ms of the segment. Subsequently, we rejected trials with artificial noise (e.g., steep jumps in the amplitude). ICA was performed to correct for blinks, eye movements, and steady muscle activity. The remaining physiological artefacts were removed using a semiautomatic procedure with a threshold of ±80 μV. A minimum of 40 trials were included per participant and per condition.
2.1.6. Analysis
Naming latencies were calculated automatically with a threshold in Presentation. They were analysed with linear mixed effects model fitted with lmerTest package (version 3.4; including the lme4 package, Bates et al., 2015) with random intercepts for participant, item, and picture category.
For the EEG analysis, we specified three regions of interest (ROI) for which we computed Bayesian linear mixed effects models using the ‘brms’ (Bürkner, 2017) package in R (version 3.4.2). To examine the N400 effect, we averaged voltages from 300 to 600 ms post target comprehension onset for the centro-parietal electrodes (i.e., C3, Cz, C4, CP5, CP1, CP2, CP6, P7, P3, Pz, P4, P8) for every trial. We specified two ROIs to examine the ERP differences between naming and categorization trials one from 100 to 300 ms and one from 400 – 600 ms both for parieto-occipital regions (i.e., CP5, CP1, CP2, CP6, P7, P3, Pz, P4, P8, Oz) as the effect was prominent in this region (Gerakaki, 2020). We computed three Bayesian linear mixed effects models where the voltages from the 3 ROIs served as dependent variables. For the N400 effect we used the expectancy as independent variable with random intercepts and random slopes for subjects and target words (voltage ~ expectancy + (expectancy | subj) + (expectancy | target)). For both naming versus categorization we used the task condition as independent variable with random intercepts and random slopes for subjects and pictures (voltage ~ task + (task | subj) + (task | picture)). We used the default priors for the main effects, intercept, standard deviations of group-level effects, and the residual error in all three models (Bürkner, 2017). For each main effect we reported beta and 95% credible interval. The credible interval indicates a 95% probability that the population estimate would lie within the interval, given the observed data. Thus, when comparing two groups, credible intervals that do not include 0 indicate that there is 95% likelihood that the population means are different from one another, given the observed data.
2.2. Results and Discussion
2.2.1. Accuracy and naming latencies
Accuracy was close to perfect, as 99.38% of the trials had a correct response. Trials with incorrect responses were excluded from all subsequent analyses. Naming latencies were examined for the naming condition. No RTs were recorded for the categorization condition. The naming latencies were comparable following expected (M = 783, SD = 261) and unexpected targets (M = 782, SD = 267). Linear mixed effects model did not show significant effect of expectancy on naming latencies (β = -0.01, S.E. = 7.05, t = -10-3, p = 0.999). The expectancy of the targets did not affect the naming latencies because the comprehension and production tasks did not overlap, and participants could name the pictures only 2500 ms after the onset of the target word.
2.2.2. The N400 effect
Both the expected and unexpected condition showed a negative potential at around 500 ms after target in the centro-parietal electrodes (Figure 2; for a figure with all electrodes see Supplementary Materials). As expected, this peak (i.e., an N400) had more negative amplitude in the unexpected compared to the expected condition in the time interval from 300 to 600 ms (β = -1.45, CrI [-1.97 -0.93]).
2.2.3. Planning effect
There was a strong difference between the naming and categorization condition starting at approximately 300 ms and peaking 550 ms after the presentation of the picture (Figure 3; for a figure showing all electrodes see Supplementary Materials). There was a larger positive potential in the naming as compared to categorization condition in the time interval from 400 to 600 ms after the picture onset (β = 8.87, CrI [8.71 9.04]). There was very small difference between the conditions from 100 to 300 ms after the picture onset (β = 0.13, CrI [-0.46 0.73]). Thus, there is no evidence that the conditions differed from one another in this time interval.
In sum, we found a large N400 effect, indicating that participants engaged in semantic processing of the target words. We found a more pronounced positive potential in the naming compared to the categorization condition. There are two main possibilities with regard to what this potential could reflect: firstly, the LPP (Bögels et al., 2015, 2018; Jongman et al., 2019), or secondly, a more general go no-go effect (e.g., Rodriguez-Fornells et al., 2002; Schmitt et al., 2000). Our results cannot clarify which process is driving the positivity effect in our experiment.
3. Experiment 2
Subsequently, we investigated whether the N400 effect was reduced if simultaneous word planning was taking place during comprehension of the expected and unexpected words. Based on the results of Experiment 1, we planned to analyse a time-window between 300 ms and 600 ms post target word onset, where we expected the strongest N400 effect.
3.1. Method
In Experiment 2, most of the materials and analyses were the same, thus, only differences will be described.
3.1.1. Power analysis
To determine the minimum number of participants required we performed a power analysis with SIMR package (Green & MacLeod, 2016). The power analysis was based on a linear mixed effects model fitted with the lmerTest package (version 3.4; including the lme4 package, Bates et al., 2015). We used the ROI from 300 to 600 ms for the centro-parietal electrodes as a dependent variable. The size of an N400 effect was estimated using the following model: voltage ~ expectancy + (expectancy | subj) + (expectancy | target). The data for the categorization condition was simulated using the fixed effects from this model, with the value for the intercept set at -0.472, and the value for the effect of expectancy set at -1.451. The data for the naming condition was simulated based on anticipating an effect of half the size, with the same intercept and the value for the effect of expectancy set at -0.725. Subsequently, the data for the naming and categorization conditions were combined, and a model with the interaction was run. Due to convergence issues we removed random slopes for the combined model (voltage ~ expectancy * task + (1 | subj) + (1 | target)). For this model, there was a significant effect of expectancy (β = -1.29, S.E. = 0.17, t = -7.56, p < 0.001), task (β = -0.39, S.E. = 0.17, t = -2.27, p = 0.02), and their interaction (β = 1.06, S.E. = 0.34, t = 3.09, p = 0.002). For the power simulation the model was extended for 200 subjects. The power analysis with 1000 simulations showed that at the alpha level of 0.05 including 24 subjects would give us more than 80% power to detect the interaction between expectancy and task.
3.1.2. Participants
Twenty-five right-handed native Dutch speaking participants took part in the experiment for financial compensation. Data of one participant was excluded from the analysis because of technical difficulties. The remaining 24 participants had a mean age of 21.83 years (range: 18 - 35) and eight were male, 15 female, and one other gender.
3.1.3. Materials and design
The same experimental lists, recordings, and pictures were used as in Experiment 1 (see section 2.1.2), except that the pictures of lemon and orange were slightly modified to make the distinction easier for participants. The picture of lemon was made pointier and the picture of orange was made larger than that of the lemon. The naming latencies were again longer for fruit (M = 849, SD = 307) as compared to vegetable pictures (M = 777, SD = 304). This between-participant factor was again not included in follow up analyses as it was counterbalanced between the conditions of interest.
3.1.4. Procedure
EEG recording and picture familiarization was done in the same way as in Experiment 1 (see section 2.1.3). Each trial started with a fixation cross (200 ms), subsequently the expected and unexpected sentences were played. The picture appeared in the centre of the screen 50 ms before the onset of the target word (to ensure word planning was initiated before the comprehension was concluded) and remained on the screen for 3000 ms, which was 500 ms longer than in Experiment 1, to ensure there was a sufficiently long time-window to analyse without speech preparation artefacts. Afterwards, and exclamation mark appeared and participants were asked to name the pictures of one category (fruits or vegetables, depending on the group they were assigned to) and remain silent when a picture of the other category appeared. After each trial same blinking interval was presented as in Experiment 1. Trials were separated into 10 blocks each lasting about four minutes, and participants could take a break after each block.
3.1.5. EEG preprocessing
The same preprocessing pipeline was used as in Experiment 1. Channels with excessive noise were removed (no more than three channels were excluded per participant). Minimum of 40 trials were included per participant and per condition (i.e., expected target word with naming trials, expected target word with categorization trials, unexpected target word with naming trials, and unexpected target word with categorization trials).
3.1.6. EEG analysis
We specified one region of interest (ROI), which was the same as the centro-parietal ROI in Experiment 1, for which we computed Bayesian linear mixed effects model (for more details see section 2.1.6). We used voltage as dependent variable, and expectancy and task as independent variables. The model included the interaction term and random intercepts as well as random slopes for subjects and target words (voltage ~ expectancy * task + (expectancy * task | subj) + (expectancy * task | target)). We used default priors for all parameters except for the interaction between expectancy and task. Following Dienes and colleagues (2014), for the interaction, we used a prior centred on mean zero and SD of 2, as 2 μV was the approximate previous reported effect size for a similar interaction (Gerakaki, 2020). Hence, there is a 95% prior probability that the population parameter lies between -4 and 4 μV. For the interaction, we calculated the Bayes Factor evidence in support of the null hypothesis (BFnull) in addition to the parameter estimate and credible intervals. Our Bayes Factors corresponded to the ratio between posterior density and prior density at zero. Values higher than one indicate proportional increase in confidence in the null hypothesis, such that BFnull = 3 indicates a three-fold increase of the evidence for the null hypothesis.
3.2. Results and Discussion
Accuracy was again close to perfect, as 99.68% of the trials had a correct response. Trials with incorrect responses were excluded from all subsequent analyses. Naming latencies were examined for the naming condition. Naming latencies were faster following expected (M = 808, SD = 306) than unexpected targets (M = 820, SD = 309; β = 16.78, S.E. = 7.21, t = 2.33, p = 0.02). The expectancy of the targets affected the naming latencies because of the overlap between the comprehension and production tasks. This indicates that lower comprehension demands, as in the expected condition, were associated with faster speech production. However, the difference between the naming latencies in the two conditions was only 12 ms. Thus, this effect is likely not important for the interpretation of the long-lasting N400 component. Interestingly, the naming latencies were around 30 ms slower than in Experiment 1, showing that the task overlap between comprehension and production delayed speech production in general.
3.2.1. The N400 during word planning
In Experiment 2, we found effects of expectancy and of task in the same direction as in Experiment 1 (Figure 4; for a figure with all electrodes see Supplementary Materials). The unexpected compared to the expected condition had a more negative amplitude from 300 to 600 ms post target word onset (β = -0.66, CrI [-1.13 -0.19]), consistent with an N400 effect. There was again a strong difference between the naming and categorization conditions starting at approximately 300 ms after the presentation of the picture. There was a larger positive potential in the naming as compared to categorization condition from 350 to 650 ms post picture onset (β = 4.60, CrI [3.54 5.64]).
The amplitude of the N400 effect did not increase or decrease in naming as compared to categorization condition, yielding evidence for the null hypothesis (β = 0.13, CrI [-0.46 0.73], BFnull = 4.83). Note that BFnull larger than 3 provides moderate evidence against the interaction between expectancy and task (Lee & Wagenmakers, 2013). We also performed a sensitivity analysis (Nicenboim & Vasishth, 2016) to assess the robustness of the null result for the interaction, based on the chosen priors. Using a narrower prior (SD = 1) yielded only anecdotal evidence for the null (BFnull = 2.47), whereas using a wider prior (SD = 4) confirmed moderate evidence for the null (BFnull = 9.16).
3.2.2. Correlation between general naming latencies and the N400 effect size
Following Bögels and colleagues (2018), we computed the correlation between naming latencies and effect size of the N400. For every participant, we averaged naming latencies as well as their N400 effect size in the naming condition. We computed Bayesian correlation in brms using Gaussian likelihood and default priors. We did not find enough evidence to reject the null (r = 0.31, CrI [-0.08 0.63]). Note that there is a trend in the expected direction, as participants with shorter naming latencies seem to show smaller N400 effect.
3.2.3. N400 effect comparison between experiments
Subsequently, we compared the N400 effect in Experiment 1 and Experiment 2. We ran an additional Bayesian linear mixed effects model based on the combined data of both experiments, for the centro-parietal electrodes, and for the time window from 300 to 600 ms post target onset. We used expectancy and experiment as independent variables, and we included random intercepts and random slopes for subjects and target words (voltage ~ expectancy * experiment + (expectancy | subj) + (expectancy | target)). We used the default priors for all parameters.
This analysis showed that the unexpected compared to the expected condition had more negative amplitude (β = -1.06, CrI [-1.40 -0.73]). Experiment 2 compared to Experiment 1 had more positive amplitude (β = 9.72, CrI [8.09 11.35]). Most crucially, the analysis showed that the effect of expectancy was larger in Experiment 1 than in Experiment 2 (β = -0.78, CrI [-1.42 -0.13]).
4. General Discussion
The goal of the present study was to examine whether word planning interfered with the semantic processing of spoken sentences, as indexed by the N400 effect. In Experiment 1, participants processed sentences that ended with either expected or unexpected word, after which they engaged in word planning. Thus, we could observe the EEG signatures of both processes independently of each other. By contrast, in Experiment 2, word planning took place during semantic processing of the expected and unexpected words. Thus, we could observe the impact of word planning on comprehension.
The study yielded three main results. First, an N400 effect was seen in both experiments (i.e., with and without concurrent word planning). This means that participants engaged in semantic processing of the target when there was no explicit comprehension task, and even in a dual-task scenario. Our results suggest that semantic processes as reflected in N400 activity are not completely suppressed by word planning.
Second, the N400 effect was reduced in Experiment 2 compared to Experiment 1. Thus, the secondary task elicited by the presentation of the picture interfered with the semantic processing of the sentences. Our results are in line with other studies that found modulations of an N400 effect in dual tasks (Hohlfeld et al., 2004, 2015; Hubbard & Federmeier, 2021; Lien et al., 2008; Vachon & Jolicoeur, 2012).
Third, in Experiment 2 we found moderate evidence against the interaction between word planning and expectancy. In other words, the N400 effect was independent of whether or not the participants prepared to name the picture. Recall that in the naming and categorization condition, participants had to process the pictures enough to categorise them as fruits versus vegetables, and, based on this information, decide whether to name or not to name the picture. Apparently, these processes interfered with sentence processing, but the actual preparation for naming did not lead to further measurable interference.
4.1. A Secondary Task Reduces the N400 Effect
The N400 effect was reduced in Experiment 2 compared to Experiment 1. In the secondary task elicited by the presentation of the picture, participants had to recognize and categorize the pictures as belonging to the fruit or vegetable category. Subsequently, they had to initiate the corresponding task set by preparing for naming, or preparing for button press. Thus, the secondary task interfered with the semantic processing of the sentences. However, several sub-processes relevant for the secondary task, such as conceptual processing, decision whether or not to speak, or linguistic processes evoked by the picture, could be responsible for the observed interference. We discuss these options below.
4.1.1. The secondary task might divert attention from semantic processing
In both the naming and categorization condition, participants had to process the pictures sufficiently on a conceptual level to determine the semantic category (fruit vs. vegetable) and then decide whether or not to name the picture. Participants had to decide whether to speak or not in both conditions, in order to name the picture or to stay silent. It is difficult to decide which process was more likely responsible for the N400 effect reduction, since both processes require capacity. Activation of conceptual information requires attention (Mädebach et al., 2011). Categorization in itself may require less attentional involvement, especially categorization of less complex visual scenes comprised of pictures including only one object (Evans & Treisman, 2005; Walker et al., 2008). However, the decision to speak or to stay silent based on the categorization likely increased the attentional demands of the categorization process further. Thus, the processing capacity requirements do not help us to disambiguate which of these processes were responsible for the reduction of the N400 effect. The results of Bögels and colleagues (2018) might indicate that conceptual preparation alone is sufficient to reduce the N400 effect under specific conversational constraints, as there were some N400 effect reductions when participants had to speak on all trials and therefore no trial-by-trial decision was needed.
A follow-up experiment should test which of these processes actually drive the N400 effect reductions. This could be done by varying the difficulty of the decision processes. For example, modifying the relative frequency of naming versus categorization trials within blocks could modify the difficulty of the decision whether to speak. The speech blocks could consist 90% of naming and 10% of categorization trials, and the no-speech blocks could consist 90% of categorization and 10% of naming trials. The decision whether to speak would be easier in speech blocks and more difficult in the no speech blocks. Similar to our design, all conditions would require conceptual processing of pictures to the same extent. If we were to see a reduced N400 effect in both conditions (as in Experiment 2), it is likely that the conceptual preparation is driving the N400 effect reductions. On the other hand, if we were to see a larger N400 effect (as in Experiment 1), it is likely that the speech versus no speech decision was driving the effect.
4.1.2. Linguistic processing evoked by the picture might interfere with semantic processing of the sentences
Activation of the production system has been found to enhance semantic prediction in comprehension (i.e., prediction-by production; Dell & Chang, 2014; Gastaldon et al., 2023; Hintz et al., 2016; Lelonkiewicz et al., 2021; Pickering & Gambi, 2018; Pickering & Garrod, 2013). The N400 has been linked to semantic prediction as well (Grisoni et al., 2021; Rabovsky et al., 2018; Rabovsky & McRae, 2014), meaning that in our study, the amplitude of the N400 effect could have been impacted by whether participants’ prediction-by-production system was hindered or facilitated. Firstly, the functioning of the system could have been hindered if it was preoccupied with linguistic processing evoked by the picture. This would imply that, in this experiment, pictures evoke linguistic processing not only in the naming but also in the categorization task. Thus, in Experiment 2, due to task overlap, participants might not be able to use the prediction-by-production system for the comprehension task, leading to a smaller N400 effect in Experiment 2. This explanation is in line with Martin and colleagues (2018) who found that the N400 effect, elicited by expected and unexpected gender-marked determiners in Spanish, was smaller when participant’s production system was occupied. This means that taxing the production system hinders prediction in comprehension.
Secondly, the prediction-by-production system could have been facilitated in both conditions through the presence of the naming task, similarly as in the study by Hintz and colleagues (2016). This study showed that people tended to read expected ends of sentences faster than unexpected ends of sentences, but only if the reading task was intermixed with a naming task that activated the production system. Thus, activation of the production system was associated with enhanced semantic prediction. We cannot evaluate this option as we did not have block of trials where participants never named the pictures.
4.1.3. The role of the late positivity
In both experiments, we found a big positive potential in the naming compared to the categorization condition. There are two possibilities for what this positive potential could reflect. Firstly, it is consistent with the LPP (i.e., sustained positivity) that was found during word planning (Bögels et al., 2015, 2018; Jongman et al., 2019). The LPP is currently thought to reflect differences in attention to the sequence end when word planning compared to no planning is taking place (Jongman et al., 2019). Secondly, similar positivities were also found in go no-go tasks (e.g., Rodriguez-Fornells et al., 2002; Schmitt et al., 2000). Thus, the positive potential may reflect the go no-go nature of this experiment, as participants prepared to name a picture in the naming condition and withhold any response in the categorization condition.
Note that this large positive potential overlaps with the N400 latency. Hence, the N400 effect modulations that we observed in Experiment 2 relative to Experiment 1 could have occurred because the positive potential occluded the N400 effect. This could mean that in Experiments 1 and 2, the semantic processing in comprehension is taking place in the same manner, while the N400 is reduced due to component overlap. We deem this explanation unlikely, considering in Figure 4, the expected and unexpected conditions seem to differ in the naming condition during the time-window where the positive potential is most pronounced. If the N400 effect reductions were indeed driven by component overlap, we would expect to see the smallest N400 effect when the positive potential is largest (i.e., in the naming condition around 500 ms after picture onset). Thus, we believe that the N400 effect reductions in Experiment 2 are not entirely driven by component overlap.
In sum, we observed that the requirement to conceptually process, categorize, and possibly retrieve linguistic information about the picture affects predictive processing. However, these processes do not completely eliminate semantic prediction processing, as reflected by the N400. This points to a certain resilience of semantic prediction, perhaps due to the fact that speech processing often happens in dual-task contexts.
4.2. Naming as Compared to Categorization Does Not Modulate the N400 Effect
The delayed naming task did not interfere more with semantic processing of the sentence than the categorization task. There are three main ways of explaining this. Firstly, there might have been activation of the picture names in the delayed naming condition, but not in the categorization condition. This would mean that the activation of linguistic information in the naming task did not interfere with the semantic processing of the sentence, in addition to the interference also present in the categorization task. Secondly, there might have been no activation of linguistic information in the analyzed time window. Thirdly, there might have been activation of linguistic information in both the naming and the categorization condition. Below, we discuss these three options.
4.2.1. Explanation 1: Naming might interfere with semantic processing to the same extent as categorization
We assumed that participants activated the picture names in the naming condition but not in the categorization condition. If this was the case, the results would indicate that the activation of linguistic information does not interfere with semantic processing of the sentence. This is in line with studies that showed that even though speech planning interfered with comprehension, it did not completely eliminate semantic processing (Bögels et al., 2018; Martin et al., 2018). Thus, compared to the strong interference effect from conceptual and decision processes, which are present in both categorization and naming conditions, the specific linguistic interference effect may be weak. This points to some resilience of the comprehension system against additional interference from linguistic processes. This resilience might support interlocutors’ fast turn taking abilities in conversations (Levinson & Torreira, 2015).
The absence of additional interference from the naming task was surprising in light of Gerakaki’s (2020) results. In contrast to her results, we did not find a difference between the naming and categorization condition in Experiment 2. In our view, the most likely account of this difference is that the control tasks, where no speech planning was required, differed between the two studies. In Gerakaki’s study, the speech versus no speech decision was easy, as participants could base this decision on superficial features of the pictures (i.e., existing object or a scribble). In our study, participants had to categorize the pictures in both conditions to perform the speech versus no speech decision. This means that in Gerakaki’s study, extensive conceptual processing of the objects had to be performed only in the plan condition, whereas in our study it was required in both conditions. In other words, Gerakaki’s conditions were not matched for conceptual processing and the effect of planning on the N400 may have had a conceptual origin. Our study indicates that there is no additional interference of word planning on semantic processing of the sentence, as compared to conceptual processing.
Note also that in Gerakaki’s study (2020), the unexpected target words were semantically anomalous (e.g., With tea we always eat a mouse), whereas in our study, they were merely unexpected (e.g., She bought a book for the baby). The semantically anomalous endings could have directed attention to comprehension, leading to a larger N400 effect (Frank et al., 2013). Moreover, N400 elicited by stimuli without semantic anomalies and N400 elicited by stimuli with semantic anomalies could be modulated differently in dual tasks. It could be that all tasks that require any level of attentional involvement (e.g., categorization, naming) reduce the N400 elicited by stimuli without semantic anomalies. Alternatively, it could be that only tasks that require high attentional involvement (e.g., naming) reduce the N400 elicited by stimuli with semantic anomalies, while tasks with lower attentional involvement (e.g., categorization) do not. Thus, future research could assess whether finding the interaction between expectancy and task depends both on the stimuli driving the N400 effect (stimuli with or without semantic anomalies) and the attentional demands of the naming and categorization tasks.
The absence of the interaction may appear to be inconsistent with earlier studies that showed crosstalk between production and comprehension. Speech planning was affected more by processing words or syllables than by processing non-linguistic or not understood stimuli (Fairs, 2019; Fargier & Laganaro, 2016; He et al., 2021; Paucke et al., 2015). Note that these studies concerned the effects of comprehension on production, whereas we investigated the reverse effect – from production on comprehension. There are at least two possible interpretations of our findings. Firstly, processes of lexical access for production require little additional capacity when combined with conceptual processing. This interpretation is in line with studies that found that lemma selection does not require processing capacity (Ayora et al., 2011; Dell’Acqua et al., 2007). However, this is a controversial claim, as more recent studies found the opposite pattern, namely that lemma selection does require central attention (Ayora et al., 2009; Ferreira & Pashler, 2002; Kleinman, 2013; Piai, Roelofs, & Schriefers, 2014; Schnur & Martin, 2012). Secondly, semantic processing of the sentence might take place at a different hierarchical level compared to the decision and lexical processes relevant in the picture-naming tasks. This could especially be the case if the N400 reflects higher-order integrative processes (e.g., Van Berkum et al., 1999) that have little overlap with the processes required in the simple picture naming task. Hence, we might not observe any crosstalk effects due to little overlap between the tasks.
4.2.2. Explanation 2: Participants did not prepare for naming before the response signal
Another possible reason for not finding a difference between the naming and categorization condition in Experiment 2 is that participants purposefully delayed word planning, or at least some stages of it. Delayed naming has not been frequently used in dual-tasking studies, so it is not clear how participants strategize to perform the task. In picture naming, activation rapidly spreads from the conceptual and lexical level evoked by the picture to word form and phonological level and then to articulatory commands (Caramazza, 1997; Dell & O’Seaghdha, 1992; Levelt et al., 1999). We assumed that in a delayed naming task, the process was arrested when the level of word form and phonological retrieval was reached, and that only the preparation and execution of the motor commands remained to be done after the response signal (Kawamoto et al., 2008). In immediate naming, word form and phonological retrieval can be reached within 500 ms (Indefrey & Levelt, 2004), so the chosen time window should have been sufficiently long to capture this stage. However, we cannot tell from our data how far this process had progressed in the critical time period we studied. In short, response preparation for naming and categorization in the critical window may have been more similar than we had originally envisioned.
4.2.3. Explanation 3: The categorization task also activated lexical processes
It is unclear whether the presentation of pictures automatically activates all word planning stages, including word form and phonological retrieval, especially when participants are in a speaking mode. In this study, in the naming condition, participants had to go through all stages of word planning to successfully perform the task. In the categorization condition, it is unclear what exact word planning processes participants engaged in, as it is still debated whether objects that speakers do not plan to name aloud activate word planning processes. In the classical proposal by Levelt (1993), lexical-semantic information is only activated during speech planning. In agreement with this, several studies found that in picture naming and translation tasks, related pictures induced facilitation, arising at a pre-lexical, conceptual level, while related words induced interference effects, arising during lexical access (Bloem & La Heij, 2003; La Heij, Wido et al., 2003). This implies that pictures only activate conceptual information. In contrast, another priming study indicated the opposite, namely that even rapidly presented pictures do activate word form and phonological information (Zwitserlood et al., 2018). Strijkers and colleagues (2011) showed that word frequency EEG signatures, which indicate processing on a word form and phonological level, occur even when participants categorized pictures without the intention to name them. Interestingly, the frequency signature in the categorization task emerged at a different time window than the frequency signature in the naming task. This indicates that the task itself has a strong impact on whether and how the word form and phonological information associated with the picture is activated.
Whether pictures automatically activate all word planning processes was never studied in dual tasks. In our experiment, participants conceptually engaged with pictures while performing another task (i.e., comprehending the sentence). We assumed that they would engage in the least demanding version of the naming and categorization task to save resources for comprehension. Thus, we assumed that in the categorization condition, participants would stop the chain of the word planning stages after the conceptual processing of the picture. However, we cannot exclude the possibility that participants engaged in the complete chain of word production processes, apart from preparing motor commands and executing them, even in the categorization condition. This might have led to equal N400 effect reductions in both conditions.
4.3. Central Attention Versus Task Switching
Based on previous literature it is still unclear whether N400 effect reductions in dual task scenarios are driven by central attention limitation (e.g., Batterink et al., 2010; Giesbrecht et al., 2007; Hohlfeld et al., 2004, 2015; Lien et al., 2008) or by task switching (Vachon & Jolicoeur, 2012). These accounts might have important implications for follow up research in this field. According to the central attention bottleneck account, the N400 effect is smaller in Experiment 2 because the planning task occupies central attention that is necessary for semantic processing in comprehension. Alternatively, our results could be interpreted according to the task switching account, if we presume that participants had to engage in task switching between the explicit task (naming or categorization) and passive comprehension. This account would predict that such a switch alone could drive the N400 reduction. Despite our experiment not using a true dual-task paradigm, the task switching account might be important to consider because the content of the comprehension and the explicit task were not related. It could be vital to investigate whether the N400 reductions persist even when the planning task is related to the comprehension task.
4.4. Conclusion
The present study was a partial replication of Gerakaki (2020). We showed an N400 effect elicited by auditory comprehension in a dual task. We showed decreases in the N400 effect when participants engaged in word planning or categorization simultaneously with comprehension. The N400 effect was not more reduced in the word naming compared to categorization condition. These results are difficult to interpret, as it is not clear how participants prepared the picture names in the delayed naming condition, or which word planning stages were activated in the categorization condition. In general, the N400 effect reduction seems to reflect conceptual and decision processes (taking place in naming and categorization condition) rather than lexical processing effect. This points to a resilience of the comprehension system against interference from capacity-demanding decision and categorization processes. This resilience might support swift responses observed in dialogue settings.
Contributions
All authors contributed to conception and design, analysis and interpretation of the data, drafting and revision of the article, as well as final approval. Acquisition of the data was done by Cecilia Husta.
Funding Information
The study was part of the first author’s PhD project, funded by the Max Planck Society, Munich, Germany.
Competing Interests
No authors have competing interests.
Data Accessibility Statement
EEG data, experimental scripts, and analysis scripts are available on https://osf.io/vfu57/.