In their natural environments, children usually see several novel objects while they hear the labels for these objects, making it difficult for them to know exactly which objects these words refer to. This referential ambiguity problem can be alleviated through selective attention and inhibitory control because a child who focuses on plausible referents while inhibiting irrelevant ones during object labelling has a higher chance of identifying the intended referent. The present study examined this hypothesis by testing 3.5- to 4.5-year-old children. In particular, we examined the links between children’s word learning (a cross-situational learning task), selective attention (flanker task), inhibitory control (day-night Stroop task) while controlling for working memory (Corsi block task). We found that children learned the novel word-object associations and completed the cognitive control tasks successfully. However, we did not find any association between word learning and cognitive control or memory span. We argue that the lack of a significant association between cognitive control and word learning may be indicative of a more exploratory style of learning in young children.

Children are remarkably good at learning words; from as young as six months old, they show evidence of understanding words like Mommy and hand (Bergelson & Swingley, 2012; Bortfeld et al., 2005; Tincoff & Jusczyk, 1999, 2012). This is an astonishing achievement especially given the complexity of early visual environments. Specifically, a young learner may often hear an unfamiliar word in a complex visual environment without being aware of which object in that environment the word refers to (Clerkin et al., 2017; Medina et al., 2011). Such referential ambiguity is typically alleviated by external cues, e.g., social-pragmatic cues (Nurmsoo & Bloom, 2008) or lexical constraints (Markman & Wachtel, 1988; Mervis & Bertrand, 1994). However, children appear to be able to deduce the intended referent even in the absence of additional external cues (L. B. Smith & Yu, 2008). Two strategies have been proposed to account for children’s ability to resolve referential ambiguity in early word learning. The first suggests that children track the frequency with which particular objects co-occur with particular words in the input and associate repeatedly co-occurring words and objects with one another; a strategy classed under associative learning (Yu & Smith, 2007). An alternative strategy, hypothesis testing, suggests that, in situations of true referential ambiguity, children randomly map a word onto an object and subsequently refine this hypothesis, given evidence for or against this mapping (Medina et al., 2011).

Word learning in situations of referential ambiguity has typically been assessed using the so-called cross-situational word learning task. In this task, learners see two unfamiliar objects and hear two unfamiliar labels in each training trial, such that within a trial, the mapping between the labels and the two objects presented remain ambiguous. The intended word-object associations can only be inferred successfully across trials either through associative learning or hypothesis testing. According to the dynamic associative model (McMurray et al., 2012), learning here is achieved by forming several possible word-object associations, i.e., associating both words with both objects and pruning these associations with each subsequent encounter, e.g., by pruning one of the associations when they see the object and do not hear the associated label. The intended association will eventually be learned, given that the correct label will be associated more frequently with the correct referent relative to other objects (Yu & Smith, 2007). In contrast, according to the propose-but-verify hypothesis (Trueswell et al., 2013), the learner randomly selects one of the objects as a referent for the label and tests this hypothesis in subsequent encounters with the label till the intended referent is found (Aravind et al., 2018; Berens et al., 2018).

While these two suggestions differ in terms of how they propose referential ambiguity is resolved, both assume that the young child is capable of disambiguating the novel word-object associations in this task. Indeed, many studies reported successful learning in such word learning tasks, not just for young adults (Escudero et al., 2016a; Poepsel & Weiss, 2014; Vouloumanos, 2008; Yu & Smith, 2007) but also children (Fitneva & Christiansen, 2017; Outters et al., 2023; Suanda et al., 2014) and even infants (Escudero et al., 2016b; L. B. Smith & Yu, 2008; Vlach & Johnson, 2013).

Regardless of the strategy that a child uses, word learning is demanding, requiring the child to retrieve previously seen objects and revise her current knowledge of word-object co-occurrences based on her new observations, likely tapping working memory. Given constraints on working memory capacity, it may, therefore, be advantageous for the child to selectively focus on relevant sources of information and filter out irrelevant ones to avoid information overload. Consequently, word learning enlists not merely the strategies highlighted above, but considerable other demands in terms of attentional control (i.e., selectively focus on relevant information), inhibitory control (i.e., filter out irrelevant information) and working memory (i.e., remember newly formed word-object associations), especially when considering word learning in situations of referential ambiguity. Against this background, the current study examines the association between individual differences in early attentional control, inhibitory control, working memory and word learning success.

Indeed, as already suggested above, given the demands on young word learners with regard to aggregating information such as novel labels, objects, co-occurrence frequencies and other contextual cues (Romberg & Saffran, 2010; Zhang et al., 2019), memory constraints are likely to play an important role in word learning (K. Smith et al., 2011; Yurovsky & Frank, 2015). Working memory, which is a system that stores and processes information over a short period of time (Baddeley, 2003), allows new information to be integrated into existing knowledge, such that word meanings are refined appropriately and learned successfully (Oberauer et al., 2003). As such, the capacity of one’s working memory determines how much information one can store and encode, which is, in turn, responsible for the success of word learning (Chow et al., 2021; Gordon et al., 2022). This argument is in line with previous studies which found working memory capacity to be a good predictor of word learning (Jackson et al., 2021; Vlach & DeBrock, 2017, 2019) and that deficits in working memory may be one of the underlying causes of delayed language development in children with specific language impairment (Archibald, 2017; Archibald & Gathercole, 2006; Archibald & Harder Griebeling, 2016; Hansson et al., 2004; A. P. Hill et al., 2015). In fact, by introducing a memory buffer component to a word learning computational model, Soh and Yang (2021) were able to improve the performance of the model such that it captures a wider range of empirical findings than previous computational models. In short, working memory is likely to play an important role in early word learning.

One way of attenuating memory constraints during word learning is to attend selectively to potentially relevant information and disregard less relevant information (L. B. Smith et al., 2002). Yu et al. (2012) reported differences in participants’ looking behaviour in a cross-situational learning task based on subsequent learning performance. In particular, strong learners - but not weak learners - preferred to look at previously seen objects during training. Such selective attention may allow learners to identify the most frequent co-occurring objects as referents of novel labels. Similarly, Smith et al. (2002) demonstrated that previous experience with the importance of an objects’ shape while determining word-object associations leads to children paying more attention to the shape of an object in subsequent disambiguation tasks. In other words, word learning experience allows children to learn to selectively attend to relevant information and to ignore irrelevant information. Although we cannot determine to what extent the influence between word learning and selective attention is bi-directional, such findings converge to highlight the importance of selective attention towards relevant information in successful word learning.

Indeed, there is robust evidence for an association between executive function (EF; an umbrella term that includes attentional and inhibitory control) and linguistic development, covering varied aspects of language learning such as language proficiency (Iluz-Cohen & Armon-Lotem, 2013), vocabulary size (Beisly et al., 2022; Weiland et al., 2014), reading skills (Blair & Razza, 2007; Nevo & Breznitz, 2013) and even second language learning (Li & Grant, 2015; Linck et al., 2013). Yet, the association between EF and novel word learning per se, i.e., the formation of novel word-object associations, is not as clear. In particular, while some studies report finding a positive association between EF (attentional control) and lexical acquisition (e.g., De Diego-Lazaro, 2019; Yoshida et al., 2011), others report finding no such effect (e.g., Kapa & Colombo, 2014; Slone et al., 2017; White et al., 2017).

The inconsistencies in previous research on EF and word learning have typically been explained by a variety of factors. First, employing different kinds of word-learning tasks (e.g., learning object properties, Slone et al., 2017; Yoshida et al., 2011; learning object names, De Diego-Lazaro, 2019) may cause discrepancies in findings, since each linguistic task presumably has distinct demands on EF. Thus, for instance, studies with young adults find a positive association between inhibitory control and word learning performance when the word learning task is challenging to the learner, e.g., when interference from other languages is introduced (Bartolotti et al., 2011). Moreover, Hill and Wagovich (2020) and Yang and Yim (2018) revealed that EF predicts word learning in children with weaker language skills but not in typically developing children, indicating that when children are cognitively challenged, those with a stronger EF outperform those with a weaker EF. Second, given differences in EF across childhood (Mezzacappa, 2004; Park et al., 2018; Sullivan et al., 2014), recruiting participants of different ages and language proficiency (e.g., young children, Yang & Yim, 2018; school-aged children, M. S. Hill & Wagovich, 2020; adults, Bartolotti et al., 2011) is likely to lead to increased inconsistencies in the findings. Perhaps most importantly, given the important role of working memory in word learning (Berens et al., 2018; Vlach & DeBrock, 2017), the impact of attentional and inhibitory control on word learning may be masked or over-amplified, if working memory constraints are not controlled for. Indeed, with adults, Zavaleta (2020) revealed a significant effect of working memory but only a marginal effect of attentional control on novel word learning, raising questions as to the potential associations between word learning, working memory and attentional control in young children.

Against this background, the current study examined the association between attentional control, inhibitory control, working memory and word learning success in young children using a cross-situational word learning task, the flanker task (attentional control), the day-night Stroop task (inhibitory control) and the Corsi block task (working memory). In particular, to address the first issue of differences in word learning tasks, we employed a well-established example of word learning in situations of referential ambiguity, namely the cross-situational learning task. Our design closely followed previous studies examining cross-situational learning, i.e., children were presented with two novel objects and two novel labels in each trial without any information as to the distinct word-object associations within a trial. Across trials, however, children could use the frequency with which particular objects appeared with particular labels to infer the distinct associations. Our findings will be comparable to other studies examining word learning in situations of referential ambiguity. To address the issue of age differences in participants, we recruited 3.5 to 4.5 years old children, an age range typically tested in EF tasks. Finally, to address the role of working memory in tasks examining the association between EF and word learning, we included a working memory task, with performance in this task being included as a control variable in our model (see Analysis plan). This is especially important given previous findings that working memory contributes to word learning in cross-situational learning tasks (Romberg & Saffran, 2010). Note that we controlled for working memory to ensure that any effects observed regarding the role of attentional control in word learning is not due to the influence of working memory. This does not imply that working memory is not a component of EF.

We employed the cross-situational learning task to test for word learning in the current study, given the requirements in this task to selectively attend to frequently co-occurring word-object associations while inhibiting less frequently co-occurring associations (c.f. Yu et al., 2012). We hypothesised, therefore, that individual differences in attentional and inhibitory control were likely to be associated with word learning success in this attentionally demanding word learning task. The flanker task was used to measure selective attention (as in previous studies such as Kalamala et al., 2018, and Sanders et al., 2018), where participants have to selectively attend to the central arrow while ignoring the flanking arrows. The day-night Stroop task, a child-friendly version of the standard colour Stroop task, was used to measure inhibitory control, where children have to inhibit prepotent associations of sun-day and moon-night for successful performance in the task, i.e., say “night” to a picture of a sun and “day” to a picture of a moon (Diamond et al., 2002; Simpson & Riggs, 2005). The inclusion of both an attentional control task and an inhibitory control task is also in line with recent evidence (Unger & Sloutsky, 2023) that selective attention can be separated into focusing on the relevant information and filtering out irrelevant information. Hence, the inclusion of both tasks may allow us to better capture the relation between EF and word learning. The Corsi block task (also known as the frog matrix task in Morales et al., 2013 and Sorge et al., 2017) was carried out to measure working memory, where participants are presented with a number of squares that light up sequentially in a random order. Inclusion of a working memory task in the present study allows us to control for the impact of working memory on the association between word learning success and EF. We note that the Corsi block task is typically associated with spatial working memory instead of verbal working memory. Nonetheless, since we considered any correctly remembered square as a correct response, we posit that the task provides a reliable measure of children’s memory span. Based on the studies discussed above, we predicted a positive association between children’s performance in the word learning task and their performance in the flanker and day-night Stroop while controlling for performance in the Corsi block task, such that children who have better attentional and inhibitory control are more successful in acquiring novel word-object associations, over and above any differences in working memory.


In order to determine the sample size needed to detect a significant positive correlation between attentional control and word learning, we ran a power analysis (see preregistration for details), which indicated that we need 67 children to achieve a power of .80 to detect a correlation of .30. We recruited 108 monolingual German-speaking children between 41 and 53 months of age. These children were not born pre-maturely and did not have any hearing or sight impairment at the time of testing. The age range was chosen based on previous reports that advanced EF, specifically the ability to filter out irrelevant information, undergo substantial development between 3 to 10 years of age (Aslan & Bäuml, 2010; Mezzacappa, 2004; Rueda et al., 2004, 2005; Unger & Sloutsky, 2023). Thus, the age range allowed us to capture greater variance in attentional control and would be unlikely to include ceiling effects. Of the 108 children tested, 26 provided incomplete data (i.e., the child provided data for either the cross-situational learning or the EF tasks, but not both) and were excluded from further analyses. Data from 11 additional children were excluded, because we were unable to code the looking behaviour in the test trials of the cross-situational learning task. These exclusion criteria have been pre-registered (see Data analysis for a summary of the proposed analysis laid out in our pre-registration, at The final sample consisted of 71 children (31 boys, 40 girls, Mage: 45.7 months).


All tasks were conducted online. The EF tasks (i.e., flanker, day-night Stroop and Corsi block tasks) were created using PsychoPy (version 2021.1.4) and run on the online platform Pavlovia (, while the cross-situational learning task was run on e-BabyLab (Lo et al., 2021), a platform that allows for unmoderated online eye-tracking tasks with young children. All auditory stimuli used in the present study were recorded by a female German native speaker in an enthusiastic, infant-directed manner. The stimuli can be found at

Cross-situational learning task

Six colourful novel objects (obtained from Frank et al., 2016; see Figure 1) were presented to children. Each object image had a resolution of 569 x 427 pixels. Unlike Frank et al. (2016), the novel objects in the present study were depicted on a white background to increase ease of coding of the looking time data. An additional 10 familiar toys (rabbits, teddy bears, toy ducks, toy planes and toy trucks) were used as attention catchers in between experimental trials.

Figure 1.
Novel Objects Used in the Cross-situational Learning Task
Figure 1.
Novel Objects Used in the Cross-situational Learning Task
Close modal

The novel labels used (i.e., eschu, gissel, maasche, peto, schufi, and sima) were bisyllabic non-words in keeping with German phonotactic rules.

Flanker task

Following Yoshida et al. (2011), drawings of a yellow fish with an arrow embedded on the body were presented to children (see Figure 2). The target and flanking fish were visually identical to each other and could point either to the left or the right. In the neutral condition, the target fish was presented on its own whereas in the congruent and incongruent conditions, two flanking fish were presented on each side of the target fish (in the same direction as the target fish in the congruent condition and in the opposite direction in the incongruent condition). Throughout the flanker task, the background was set to light blue (Hex colour code: #87CEFA) to resemble water (see Figure 2). Two arrows were always visible below the fish with which children could indicate the direction of the target fish. An image of a pointing hand appeared on the top centre of the screen whenever a response was required from the children.

Figure 2.
Screenshot of an Experimental Trial of the Flanker Task
Figure 2.
Screenshot of an Experimental Trial of the Flanker Task
Close modal

Day-night Stroop task

An image of a green smiling alien was used to introduce the task to children. Images of a sun and a moon (see Figure 3) were presented on the two sides of the screen in all experimental trials of the task. In accordance with Diamond et al. (2002), the background of the moon image was black, whereas the background of the sun image was white. To contrast both images from the rest of the screen, the background was set as light grey (Hex colour code: #A9A9A9).

Figure 3.
Example of an Experimental Trial of the Day-Night Stroop Task
Figure 3.
Example of an Experimental Trial of the Day-Night Stroop Task
Close modal

Corsi block task

A drawing of a scarlet macaw was presented as an introduction to the task. Nine white squares with black borders, arranged in a 3 x 3 matrix, were presented throughout the experimental trials (see left panel of Figure 4). Each square depicted a drawing of familiar animals, obtained from various internet sources. These familiar animals allowed us to refer to specific squares in a child-friendly manner when giving instructions about the task, such that we described a particular square using the animal (e.g., tap on the cat) instead of with the square’s position (e.g., tap on the top left square). The squares turned light orange (Hex colour code: #FFD966) when they were “lit up” (see right panel of Figure 4).

An image of a brown teddy bear was used as an attention getter, interspersed between the three EF tasks.

Figure 4.
Stimuli (3x3 Squares) Presented in the Corsi Block Task

Note. In the Corsi block task, to-be-remembered object locations were lit up in sequence. The left panel shows the squares at “baseline”, i.e., none of the squares is lit up. The right panel shows an example of an object location (here, the top-left square in light orange) which is lit up.

Figure 4.
Stimuli (3x3 Squares) Presented in the Corsi Block Task

Note. In the Corsi block task, to-be-remembered object locations were lit up in sequence. The left panel shows the squares at “baseline”, i.e., none of the squares is lit up. The right panel shows an example of an object location (here, the top-left square in light orange) which is lit up.

Close modal


Cross-situational learning task

The design of our word learning task closely followed that of Yu and Smith’s (2008). The task always started with a calibration video, then a training phase followed by a test phase. Attention catchers were presented in between experimental trials in both phases.

In the calibration video, a yellow fixation cross appeared in the middle of the screen, followed by the appearance of a rubber duck on the left of the screen and then the presentation of a ball on the right of the screen. This video was used to help coders identify children’s gaze to the left and the right of the screen. After calibration, children were given an introduction to the task, where children were told that they would see new toys that Anna and Benny had just received and that they should listen carefully to the names of these toys to help Anna and Benny find the toys. This introduction allowed us to provide a succinct yet child-friendly instruction to children as to what was expected of them in the task.

There were 30 trials in the training phase, where every training trial presented two novel objects (one object on either side of the screen) and two novel labels. Five hundred milliseconds after the onset of a training trial, the first novel label was played. The second novel label was played 500 ms after the offset of the first label. Each training trial lasted for four seconds. The order of the novel labels was counterbalanced, such that the first novel label referred to the object on the left and the right across an equal number of trials. Thus, there was no relation between the order of labels and the side of object presentation, creating ambiguity within each trial as to the referents of the presented labels. As each training trial presented two labels and two objects, by the end of the training phase, each object co-occurred with six different labels, five times with the intended label for this object and five times with the label for the other object on the screen. To fully counter-balance the novel object-label pairings across children, we had six training sets (instead of two, as in L. B. Smith & Yu, 2008), such that different children received different training sets and that each novel object was associated with a different novel label in each training set. These six training sets also allowed us to counter-balance for the side of object presentation (i.e., whether the object appeared on the left or right) and the order of label presentation (i.e., whether the first label referred to the left or right object). The order of the training trials was pseudo-randomised so that the same objects did not appear across two consecutive trials.

The test phase followed immediately after the training phase. The test phase comprised of two blocks, each consisting of six trials (i.e., 12 test trials across test blocks). In each test trial, children were presented with two of the trained objects and the label for one of these objects (repeated three times). To reduce the duration of the experiment and given that children showed successful recognition early in the trials in previous studies, we reduced the duration of test trials to six seconds (relative to the eight seconds in Yu and Smith, 2008). The novel label was presented three times during the trial, with the label’s onset at 500 ms, 2500 ms and 4500 ms, respectively. The order of the test trials within each block was pseudo-randomised, such that the order of test trials was fixed in each set. In every test block, all novel objects appeared once as a target and once as a distractor. Each novel label was tested twice, once in each test block, such that the location of target presentation (i.e., left or right) was counter-balanced across test blocks.

Cognitive tasks

The order of flanker task and day-night Stroop task was counterbalanced across children, while the Corsi block was always presented as the final task. The decision to administer the Corsi block at the end of the experiment was made based on the length and increasing complexity of this task. Thus, we hoped to prevent children from becoming frustrated and eventually abandoning the study by presenting this task at the end.

Flanker task. We first trained children to identify the target fish, Fido, by asking them to tap on Fido, with the additional clue that Fido always appears in the middle. Next, children received a practice phase, which consisted of three different sets of practice trials to help children understand the requirements of the task. The first practice set aimed at familiarising children with the purpose of the task, that is, to identify the direction of the target fish. Children were first told that Fido swims either to the left or the right and that they should tap on the arrow that indicates the direction in which Fido is swimming. They were then presented with four randomised neutral trials, i.e., Fido was presented on his own in the middle, randomised for whether Fido swims to the left or right in these trials. The second practice set was aimed at familiarising children with the congruent and incongruent conditions, such that children were told that Fido always swims in the middle when he swims with his “friends” (i.e., the flanking fish). This second practice set consisted of eight randomised congruent and incongruent trials. In congruent trials, Fido swam in the same direction as the flanking fish and in incongruent trials, Fido swam in the opposite direction of the flanking fish. The third and final practice set followed after the second practice set, in which stimuli of all conditions (i.e., left and right pointing target fish in all three conditions, hence, six trials) were presented to children in a random order.

In every practice trial, a fixation cross appeared on the screen for 500 ms, followed by stimulus presentation which lasted for one minute, after which, if no response was made, the next trial started. Children received feedback after every response, such that they were praised when they responded correctly but were reminded of the instructions (i.e., “press the left arrow if Fido is swimming to the left” or “press the right arrow if Fido is swimming to the right”) if they made a mistake.

The test phase, divided into two blocks, began after the practice phase. Each test block consisted of 24 trials with eight trials in each condition, i.e., neutral, congruent and incongruent, in each block. In other words, children received a total of 48 test trials. The test trials were identical to the practice trials, except that in the test trials no feedback was given.

An attention getter was shown between the practice and test phases and between the two test blocks. A reminder of the task instructions was played during the presentation of the attention getter between two test blocks.

Day-night Stroop task. As this study was designed to be an unmoderated online experiment, the design of the present day-night Stroop task was modified slightly such that it mimicked the snow-grass task (Carlson & Moses, 2001), in which children need to point to the opposite image (e.g., snow) upon hearing the stimulus (e.g., green). This modification was made so that children respond by tapping on an image instead of verbally saying their answer. We do not expect this modification to affect children’s performance given previous studies suggesting that performance in the day-night Stroop task and the grass-snow task is correlated (Montgomery & Koeltzow, 2010).

The task began with a validation phase, followed by a practice phase and finally a test phase. Across all trials, the stimuli (i.e., a sun and a moon, see Figure 4) remained visible on the screen. The position of the stimuli (i.e., left or right of the screen) was counter-balanced across children. Each trial lasted for a maximum of one minute, after which the next trial started even if the child did not make any response.

The validation phase was divided into two blocks. In the first validation block, i.e., the picture validation block, we tested whether children were able to identify the sun and moon images using two validation trials. In both trials, children saw a sun and a moon on the left and right of the screen (see Figure 4). Children were asked “Do you know which picture is the sun? Tap on the sun” in one trial and “Do you know which picture is the moon? Tap on the moon” in another trial. In the second validation block, i.e., the association validation block, we examined whether children correctly associated the sun and the moon with day and night, respectively. There were two such validation trials, where children saw a sun and a moon and were asked “Which picture appears at night? Tap on the picture that appears at night” in one trial and “Which picture appears in the day? Tap on the picture that appears in the day” in another trial. In both validation phases, children were praised for correct answers and were corrected when wrong (e.g., “no, this picture is a moon” for the picture validation block or “no, the moon appears at night” for the association validation block).

After the validation phase, children were told that a friendly alien Lara lives in a funny world where the sun shines at night and the moon appears during the day. Thus, when Lara says night, children should tap on the sun, and when Lara says day, children should tap on the moon. Children then saw an example of correct performance, after which the practice phase began. In the practice phase, children received four practice trials where they saw a sun and a moon and heard either the word day or night. In all practice trials, children were praised when they responded correctly but received the instructions again (e.g., tap on the sun when Lara says night) if they selected the wrong image. The practice phase ended with an attention getter, after which the test phase started. In accordance with Ling, Wong and Diamond (2016), 16 test trials were presented in the following pseudorandomised order, that is, night, day, day, night, night, day, night, day, day, night, night, day, night, day, day, night. In all trials, children were supposed to tap the sun when they heard night and the moon when they heard day. No feedback was given in the test trials.

Corsi block task. The Corsi block task included a practice phase and a test phase. Unlike typical Corsi block tasks, the present task only required children to remember which of the squares lit up. In other words, we did not examine whether children tapped the lighted squares in the order in which they lit up. This modification was made because we observed from pilot data that remembering the order of squares was very difficult for children around three years of age.

Children were first shown the matrix of animals and told that some of the animals would light up during the trial. Two animals then lit up sequentially for 2000 ms each. Next, children were reminded which animals had lit up and saw an image of a hand tapping on the images that had been lit up. This demonstration was followed by a guided practice trial, where two different animals lit up, after which children were prompted to tap on them. Children were praised for every correct response.

The practice phase began after the guided practice trial. There were two practice trials, where two different animals lit up in each trial. Every trial started with a verbal alert “schau mal!” (look!), followed by the sequential lighting of squares. After the designated number of squares were lit, children heard “du bist dran” (your turn), which was an indication that they should now tap on the objects that had lit up. Children were allowed to tap on the squares multiple times, such that the number of taps allowed in every trial corresponded to the number of squares which lit up. Every correct tap led to a “magical” sound effect whereas wrong responses led to a sound effect typically associated with incorrect responses in electronic games. After the allocated number of responses is tapped, the next trial started.

The test phase consisted of three test blocks, each block containing three test trials. The number of squares that lit up in each test trial increased by block (i.e., level of difficulty, Sorge et al., 2017). Specifically, test trials in the first block presented two lit squares (i.e., easy), test trials in the second block presented four lit squares (i.e., intermediate) and test trials in the third block presented six lit squares (i.e., difficult)1. Hence, children saw a total of 36 lit squares. The order of the test trials within each block was randomised and children always progressed from easy to difficult regardless of their performance. The test trials were identical to the practice trials except that no feedback was given regarding children’s response. As the number of responses allowed for every trial corresponded to the number of lit squares, children were required to tap twice in easy trials, four times in intermediate trials and six times in difficult trials.

An attention getter appeared between the practice and test phase and also between each test block. During the presentation of the attention getter between test blocks, children were informed that they would be shown more trials and that more squares would be lit up.


Data from 58 of the 71 children were collected online, that is, in the child’s own home. Data from 13 children were collected in the lab because parents preferred in-person tasks over online studies when COVID-19 restrictions loosened. As this was originally designed to be an online study, links to the online tasks were sent to parents who consented to participating in the experiment at home. Two links were created, one for the cross-situational learning task and another for the EF tasks. The cross-situational learning task was always completed before the cognitive tasks, with the possibility of few days’ gap in between the two tasks.

We first contacted parents in our database by phone. Parents were informed of the aim of the study, i.e., to examine the relation between cognition and word learning. Parents were then told that their child will play an online game (i.e., the EF tasks) and watch a short video (i.e., the word learning task) and that they can start the task whenever their child is ready and that they should avoid helping their child with the game. Once parents agreed to participate in the study, they were sent an invitation email in which two links were attached; one for the cross-situational learning task and one for the cognitive tasks. After the child had completed the tasks from both links (or two weeks after completing only one of the tasks), they were sent a thank you e-mail.

In the cross-situational learning task, parents were first asked to provide informed consent for their child’s participation in the study. At the end of the cross-situational learning task, a thank you page was presented, informing parents that video recording had stopped. The whole task lasted about five minutes.

For the cognitive tasks, parents were informed that they need a tablet to run the task. The experiment always started with an introduction to three characters, which were the main characters of each of the EF tasks, that is, a yellow fish called Fido (for the flanker task), a green alien called Lara (for the day-night Stroop task) and a scarlet macaw called Bobby (for the Corsi block task). After this introduction, children were presented with the three tasks (see Design for details).

After all three tasks were completed, children saw a teddy bear and a thank you message on the screen. They received a short story in a pdf format as a thank you if they participated online or were given a small book if they participated in person.

Data analyses

In the pre-registration of the current study, we listed a few exclusion criteria. First, when a child only completed one part of the study (either the word-learning part or the cognitive function part), the data would be considered as incomplete and would be excluded from further analysis. Second, test trials of the cross-situational learning task would be removed from the analysis if the coders could only code less than 20% of children’s looking behaviour in that trial. Finally, the data of the cross-situational learning task would be excluded from the analysis if less than two test trials are retained.

We included two other exclusion criteria which were not pre-registered. First, for the flanker task, performance in the first practice set (in which four randomised neutral trials were presented) was used to identify children who may not have understood what was required of them in the task. In particular, children who did not provide correct data for at least half of the trials in this practice block (i.e., two trials) were excluded from the analysis. Second, for the day-night Stroop task, responses in both validation blocks (i.e., the picture validation block and the association validation block) were used to identify children who may not have understood the task instructions. Specifically, children were excluded from the analysis if they did not respond accurately in at least half of the picture validation block (i.e., one trial) and at least half of the association validation block (i.e., one trial).

We also pre-registered how each variable would be measured. In particular, to measure word learning in the cross-situational learning task, we would calculate the proportion of target looking (PTL), that is, the proportion of time that children spent looking at the target relative to both images on the screen (i.e., target and distractor) following the presentation of labels in the trial. Success in the task is indexed by increased looking to the target, i.e., PTL is greater than chance (.50). To determine children’s looking behaviour in the test trials, trained coders were required to play each video (corresponding to each test trial) frame-by-frame using ELAN (version 6.4) and identify video frames in which the child was gazing at either the left or right of the screen. As videos recorded using webcams are mirrored, when a child is fixating the left side in the recorded video, coders were required to code the fixation as a right gaze, that is, the child was looking at the object presented on the right. Similarly, if the child was fixating to the right, coders need to code this as a look to the image on the left. This is confirmed in the calibration and attention getter videos, where children gazed to the right when the attention getter object was presented on the left. To ensure the reliability of the codings, the first author double-coded the data and that kappa = .94.

For the Flanker task, based on our pre-registration, we measured children’s response time (in milliseconds) for correct responses only, separately for incongruent and neutral trials. To measure attentional control, we calculated the difference in response times to incongruent and neutral trials (i.e., reaction timeincongruent - reaction timeneutral). A smaller difference between these two conditions is typically indicative of better attentional control. As an exploratory measure, we also measured attentional control using children’s accuracy in the task, i.e., we calculated the difference in proportion of correct responses to the incongruent and neutral conditions. For the day-night Stroop task, we measured inhibitory control using accuracy score, which is based on Ling et al. (2016), whose day-night Stroop design we followed in the present study (see Design for details). Specifically, we analysed the proportion of correct responses, such that we divided children’s score by the maximum possible score, i.e., 16. A higher proportion of correct responses in day-night Stroop is typically interpreted as better inhibitory control. For the Corsi block task, we measured the proportion of correct responses to index working memory, which is in line with Sorge et al. (2017), who aggregated the number of correctly recalled locations across levels of difficulty and then converted the scores into percentages. Specifically, children get a score of 1 if they tapped on the correct square and a score of 0 if they tapped on the wrong square. If children tapped on one of the correct squares repetitively, only their first response is considered as correct. Children’s scores were summed across test trials and level of difficulty, then divided by the total number of lit squares, i.e., 36. Similar to the day-night Stroop task, a higher proportion of correct responses in the Corsi block task is typically interpreted as better working memory.

Finally, we pre-registered two analyses, i.e., a correlation test and a mixed-effects model so that random effects could be taken into consideration in the analyses. For the correlation analyses, we expected children’s performance in EF tasks to correlate positively with their performance in cross-situational learning.

We also fitted a mixed-effects model in addition to the correlation tests in order to control for working memory as well as to take individual differences (i.e., the random intercepts and random slopes of the model) into account when testing for the effect of attentional control on word learning. Given that the power analysis targeted the correlational analysis, we report the correlational analysis as the main analysis. It remains uncertain as to whether the mixed-effects model is well-powered, given the number of random effects and slopes we include. Calculating power in such models requires simulation of the data, which we were lacking at the time of running the study. However, we note that the current study could now provide such data for simulation of the effects in order to determine the sample size required to achieve a power of (for instance) .80 for subsequent studies. For the mixed-effects model, we fitted a beta regression model in R (version 4.2.1; R Core Team, 2022) using the function glmmTMB of the package glmmTMB (version 1.1.4; Brooks et al., 2017), where the response variable was PTL in the cross-situational learning task. Since the beta model cannot accommodate values of 0 and 1, we transformed PTL before entering it as the dependent variable into the model as follows:

ptl*(nrow(data)-1) + 0.5)/nrow(data)

The test predictor variables entered into the model were attentional control (i.e., the difference in response times to incongruent and neutral trials in the flanker task) and inhibitory control (i.e., the proportion of correct responses in the day-night Stroop task), whereas the control variable entered into the model was working memory (i.e., the proportion of correct responses in the Corsi block task). Participants and objects were entered as random intercepts, while the predictor variables and control variable were included as random slopes within the random intercept objects. The full model was thus specified as:

full.model = glmmTMB (ptl ~ z.inhibition2+ z.attention + z.memory + (1|ID) + (1|object) + (0+z.inhibition|object)3+ (0+z.attention|object) + (0+z.memory|object), family = beta_family (link = “logit”), data = data)

To avoid running into the multiple testing problem (since we are interested in the effect of more than one predictor on word learning), we conducted a full-null model comparison. The null model4 lacked the predictor variables of interest (i.e., attentional control and inhibitory control) but was otherwise the same as the full model. The data is publicly available at while the analysis script can be found at

First, we examined whether children learned the novel word-object associations in the cross-situational learning task. A one-sample t-test revealed that children learned the novel word-object associations significantly above chance (chance = .50; M = .523; t(70) = 2.25, p = .028), as they showed increased fixations to the target relative to the distractor across the trial (see Figure 5).

Figure 5.
Proportion of Target Looking Behaviour (PTL) After Label Onset.

Note. PTL above .50 (indicated with the horizontal line) indexes those time points where children were fixating the target more than the distractor whereas PTL below .50 suggests distractor looking. The two vertical dotted lines at 2000 ms and 4000 ms mark the second and third label onset.

Figure 5.
Proportion of Target Looking Behaviour (PTL) After Label Onset.

Note. PTL above .50 (indicated with the horizontal line) indexes those time points where children were fixating the target more than the distractor whereas PTL below .50 suggests distractor looking. The two vertical dotted lines at 2000 ms and 4000 ms mark the second and third label onset.

Close modal

Next, we examined children’s performance in the EF tasks. For the flanker task, we examined whether there was an inhibitory effect, that is, whether children were slower to indicate the correct direction in which Fido was swimming in incongruent trials relative to neutral trials. To ensure that we only included data from children who had understood the task, children who provided correct responses in less than half of the trials in the neutral practice phase (i.e., less than two trials) were excluded from analysis5. One data point was removed using this criterion. Another child was excluded from the analysis because of missing data in the test phase. In the remaining sample of 69 participants, children were significantly slower to respond correctly in the incongruent condition relative to the neutral condition, t(68) = -3,72, p < .001. Children also made more errors in the incongruent condition than in the neutral condition, t(68) = 4.93, p < .001 (see Table 1 for the proportion of accuracy and mean reaction time).

Table 1.
Proportion of Accuracy and Mean Reaction Time (Standard Deviation in Parentheses) in all Three Conditions of the Flanker Task.
ConditionProportion of accuracyMean reaction time (ms)
Neutral .80 (.19) 2168.69 (1069.68) 
Congruent .81 (.19) 2528.99 (1657.01) 
Incongruent .69 (.22) 3006.00 (2351.09) 
ConditionProportion of accuracyMean reaction time (ms)
Neutral .80 (.19) 2168.69 (1069.68) 
Congruent .81 (.19) 2528.99 (1657.01) 
Incongruent .69 (.22) 3006.00 (2351.09) 

To examine children’s performance in the day-night Stroop task, we excluded children who likely did not understand the task given their performance in the validation phase, i.e., we excluded children who did not respond accurately in at least half the picture validation trials (i.e., one trial) and in at least half the association validation trials (i.e., one trial). Data from one child was excluded following this criterion. For the data from the remaining 70 children, we compared the mean proportion of correct responses (M = .61, SD = .26) against chance (i.e., .50) using a one-sample t-test and found that children’s performance was significantly above chance (t(69) = 3.50, p < .001).

Finally, for the Corsi block task, we examined whether the proportion of lighted squares remembered correctly across all conditions (i.e., three difficulty levels) was significantly above chance. Two children with missing data were removed from the analysis. A one-sample t-test showed that children’s performance in the Corsi block task (M = .72, SD = .14) was significantly above chance, t(68) = 13.00, p < .001.

Our initial analysis examined the correlation matrix between word learning, attentional control and inhibitory control performance. As several data points were excluded either due to missing test data or failed validation tests, only 67 children were included in this correlational analysis. This analysis revealed no significant correlations between word learning and attentional control (r(65) = .14, p = .266 when using reaction time; r(65) = .06, p = .636 when using proportion of accuracy), and between word learning and inhibitory control (r(65) = -.22, p = .071). The full correlation matrix among EF tasks and the word learning task is provided in Table 2.

Table 2.
Descriptive Statistics and Correlations of the Word Learning and EF Tasks
1. PTL .52 .09   
2. Attentional control 837.31 1870.29 .138  
3. Inhibitory control .61 .26 -.222ǂ .244* 
4. Working memory .72 .14 -.019 .029 .048 
1. PTL .52 .09   
2. Attentional control 837.31 1870.29 .138  
3. Inhibitory control .61 .26 -.222ǂ .244* 
4. Working memory .72 .14 -.019 .029 .048 

Note. M and SD are used to represent mean and standard deviation, respectively. The correlation scores reported here are Pearson’s correlation (r). PTL refers to the proportional target looking in the word learning task. Attentional control refers to the difference in reaction time between the incongruent and neutral trials in the flanker task. Inhibitory control refers to the proportion of correct responses in the day-night Stroop task. Working memory refers to the proportion of correct responses in the Corsi block task. ǂ indicates p < .10. * indicates p < .05

Finally, we fitted the dataset containing complete data from 67 children in a beta model. We first tested model stability (i.e., whether there are any outliers which may potentially influence the model) and assessed issues of over-dispersion (i.e., whether the observed responses deviate a lot from the expected distribution of the data). The full model was fairly stable (see last two columns of Table 3) and not over-dispersed (dispersion parameter: 0.759). We then compared the full-null regression models, which revealed a non-significant difference between the two models (χ²(2) = 3.39, p = .184), indicating that none of the predictors influenced word learning (see Table 3), in line with the results obtained from the correlation analyses. We ran another model with proportion of accuracy as the attention score and obtained similar results, χ²(2) = 0.22, p = 0.897.

Table 3.
Estimates, Standard Errors and p-values of Each Term in the Full Beta Model.
TermEstimateStandard errorp-valueMinMax
Intercept 0.077 0.047 .102 0.052 0.096 
Inhibition -0.046 0.049 .349 -0.070 -0.024 
Attention 0.091 0.049 .064 0.053 0.119 
Memory -0.009 0.061 .877 -0.045 0.040 
TermEstimateStandard errorp-valueMinMax
Intercept 0.077 0.047 .102 0.052 0.096 
Inhibition -0.046 0.049 .349 -0.070 -0.024 
Attention 0.091 0.049 .064 0.053 0.119 
Memory -0.009 0.061 .877 -0.045 0.040 

Note. All predictor terms have been z-transformed. Minimum and maximum values of the estimates are obtained when levels of random effects are excluded one at a time.

Exploratory analyses

Based on the findings obtained in the main analyses, we conducted four other exploratory analyses, details of which are reported in Supplemental Materials. In a nutshell, in the first exploratory analysis, we fitted a beta model where we calculated PTL in the word-learning task only from the third label onset onwards, since children showed heightened PTL after the third label (see Figure 5). Similar to the main pre-registered model, none of the predictors has a significant effect on children’s word learning performance (see Text S1 and Table S1 for details). In the second exploratory analysis, we fitted the pre-registered main model again but with attentional control measured as the difference between congruent and incongruent trials (instead of the difference between incongruent and neutral trials). The output of this new model is the same as the pre-registered model, i.e., there is no significant difference between the full and null exploratory models (see Text S2 and Table S2 for details). In the third exploratory analysis, we examined whether the current findings were influenced by any confounding variables not accounted for in the main model. In particular, we fitted the pre-registered model with additional control variables of test block (first or second test block), test location (online or in-person) and test gap (number of days between completing the word-learning task and the EF tasks). Neither of these control variables significantly affected children’s performance in the word learning task (see Text S3 and Table S3 for details). Finally, in the fourth exploratory analysis, we plotted the three EF tasks against one another to examine whether any of these tasks can be combined for a psychometrically stronger measure of EF. As the tasks correlated weakly with each other (see Text S4 and Figure S2), we did not combine them for further analyses.

In the present study, we tested the association between attentional and inhibitory control and word learning by measuring four-year-old children’s performance in a cross-situational learning task and three EF tasks. Contrary to our predictions, we found no evidence that attentional control, inhibitory control or working memory influenced children’s success in novel word learning. In what follows, we discuss the implications of these findings in relation to past studies on novel word learning.

Our finding that four-year-old children are able to track co-occurrences between novel labels and novel objects in the cross-situational learning task is in line with previous studies such as Fitneva and Christiansen (2017), Outters et al. (2023), Suanda et al. (2014), and Vlach and DeBrock (2019). The cross-situational learning task in the current study was introduced to children by telling them that they would see new toys of Anna and Benny and that they should listen to the names of these toys carefully to help Anna and Benny look for the toys later. Hence, children received explicit instructions that their task was to learn labels of objects. Our findings add to the literature that children are apt word learners even in situations where they do not have any additional pragmatic or syntactic cues as to the intended referents of the novel labels presented. We chose the word learning task because of robust findings in the literature to-date suggesting that, while children are able to learn words in this paradigm, performance is not at ceiling. Admittedly, one of the limitations of the current study is that the task is stripped of social cues and may not be as naturalistic as word learning in the wild. Nevertheless, recent work examining the mechanisms underlying cross-situational learning suggests, first, that simple associative learning mechanisms – that can be assumed to underlie word learning in the wild – capture the pattern of looking behaviour in the task, and, second, that the task captures individual differences in the extent to which children learn over time to inhibit incorrect referents and fixate intended referents (Yu & Smith, 2017). As social cues provide children with useful information during referential disambiguation, an interesting future direction is to administer a more naturalistic word learning task which includes social cues to examine the interplay between the social cues and EF during word learning.

With regard to the association between attentional and inhibitory control and word learning success, we hypothesised that children with better EF would show improved performance in the word learning task, capturing their enhanced ability to focus on relevant information, i.e., the intended objects, when presented with the novel labels. Indeed, such an association between EF and word learning is hinted at in Pruden et al.’s (2006) and Pomper and Saffran’s (2019) research, showing that the formation of novel word-object associations in both 10-month-old and four-year-old children is disrupted by the presence of highly salient distractors. In other words, children need to inhibit their attention to the distractors and focus on the intended targets during word learning in order to form the correct word-object associations. Furthermore, Yoshida et al. (2011) and Smith et al. (2002) showed that children who focused on relevant object features were more successful in mapping novel labels to these features. Against this background, we expected a positive association between attentional control, inhibitory control and word learning success.

However, we observed no evidence of such an association in the present study. We suggest that this null finding is unlikely to be due to overall poor performance in the task, given that children performed above chance in all three tasks tested. Neither do we find that children were at ceiling in any of the tasks presented, suggesting that the tasks captured natural variance in children’s EF and word learning. As the attrition rate of the current study is quite high, it is possible that the final sample, i.e., children who managed to complete all tasks, have better EF than those who did not complete all the tasks, leading to limited individual differences observed in the data. However, we observe a good spread across the whole range of scores of the EF tasks (see Figure S2), therefore, the null finding is also unlikely to be due to a restricted range of individual differences in task performance.

Instead, we suggest that the lack of significant associations between EF and word learning in the task may speak to the literature on the randomness of children’s exploratory and attentional strategies in learning. In particular, a number of studies suggest that young children’s exploration of their perceptual space may be more random than adults’ exploratory behaviour. For instance, Schulz et al. (2019) and Meder et al. (2021) found that children tend to explore a visual space more than adults when sampling an unknown area for rewards. In terms of object categorisation, Deng and Sloutsky (2015) demonstrated that four-year-olds but not adults were able to remember irrelevant features of stimuli even though they were taught to focus on the relevant features. Such a random pattern of exploration may actually be beneficial for children. In a task where participants need to identify one (of several) monsters that give out the highest rewards, Sumner et al. (2019) showed that adults stopped exploring monsters that had already been explored once they identified the most rewarding monster. Children, on the other hand, continued exploring previously examined monsters even after discovering the highest reward region. In other words, under circumstances where there is some ambiguity as to the relevance of the information provided, sampling what is already known may actually confer benefits to a naïve learner, an argument which is also supported by Blanco and Sloutsky (2019, 2020) and Plebanek and Sloutsky (2017).

Against this background, our findings that differences in children’s attentional and inhibitory control appear to be unrelated to their word learning success, may be taken to suggest that children’s attentional strategies may not be geared towards learning success. To put it differently, children may explore the world differently from adults. Such a pattern of exploration is adaptive especially for children who are uncertain about what information to focus on because it supports a pattern of broad information gathering. With regard to novel word learning, a word may have several meanings and an object may have more than one label. By keeping the associations between novel labels and novel objects open and flexible instead of inhibiting potentially irrelevant information, young children will be more able to correct mistakes and learn more information. However, in examining the pattern of eye-movements in the cross-situational learning task, Yu and Smith (2011) suggest that such random patterns of looking, with shorter looks and more switches between referents, are associated with impoverished learning. Thus, there may be a sweet spot between the exploratory behaviour that may be adaptive early in development and the attentional strategies that underlie successful word learning. This raises the question as to the mechanisms underlying successful word learning, with regard to the associative and hypothesis-based learning accounts briefly discussed in the Introduction. While our study was not designed to disentangle the two accounts, we note that Yu and Smith (2011) further suggest that children likely do not store all possible word-object associations in the task but only some of them. Thus, successful learning depends on children learning over time to inhibit fixations to incorrect referents and building on the few associations they have learned. Future research could, therefore, examine the association between the kinds of eye-movements displayed by children in such tasks, i.e., more structured looking behaviour or more random brief spurts, and their EF, with regard to potentially disentangling these accounts further. At present, the lack of any significant correlations between EF and word learning provides neither support for or against either account.

Furthermore, given the constraints on word learning especially in situations of referential ambiguity, we predicted that working memory would be significantly correlated with children’s success in the cross-situational learning task. Indeed, a number of studies have demonstrated a relationship between working memory capacity and word learning success using behavioural data (Chow et al., 2021; Gordon et al., 2022; Vlach & DeBrock, 2017, 2019) and also computational modelling (Soh & Yang, 2021).

However, we did not observe any evidence of a significant association between working memory and word learning in the current study. One possible reason for our failure to find an effect is that the Corsi block task typically measures spatial working memory (i.e., in which location did the light appear), whereas remembering word-object co-occurrences is typically related to verbal working memory. However, given that we considered any correctly remembered square (even when the order of response was incorrect) as a correct response, we suggest that our task captures memory span rather than spatial working memory. Moreover, the squares have animals on them (see Figure 4) and children could have labelled these animals subconsciously while performing the task, suggesting that some form of verbal working memory is likely to be involved. Furthermore, we note that Oberauer et al. (2003) compared various components of working memory, i.e., verbal memory (sequence of semantically unrelated nouns), numerical memory (sequence of numbers) and spatial memory (location of dots) and found that performance in verbal and spatial tasks were highly correlated with one another. In short, our choice of a working memory task is unlikely to be the cause of the non-significant findings.

Another possible reason for the lack of correlation between working memory and word learning is the timing of the presentation of the test phase immediately after the training phase in the cross-situational learning task. Since children were tested immediately after training, their memory of the newly-formed word-object associations is likely to be quite strong, masking potential associations between working memory and word learning. Especially given the fleeting nature of both child and adult word learning in such tasks (Friedrich & Friederici, 2011; Medina et al., 2011), it is possible that testing word recognition given a longer break between word learning and retention test or a more demanding word learning task (e.g., higher number of word-object associations to be learned) may reveal stronger associations between working memory and word learning success.

There are several pragmatic interpretations of the non-significant relation between word learning and EF. One interpretation is that the word learning task was in the visual domain, i.e., it was an eye-tracking task that did not require any manual responses from children, whereas the EF tasks required manual responses from children. In other words, a word learning task which requires a more explicit response from children e.g., selecting the correct option, may result in significant associations with EF, a hypothesis to be tested in the future. Measuring word learning using explicit behavioural responses will also allow us to test word learning effects using different response modes (e.g., touchscreen, pointing), thus providing us with a useful comparison with the current findings. Another pragmatic explanation of the null finding is that the children were overwhelmed by the task, as evidenced by the small magnitude of word learning effect. However, given that even 12- to 14-month-olds have been able to complete a similar task successfully (L. B. Smith & Yu, 2008), and that Outters et al. (2023) finds successful performance in children around the same age as those tested in the current study, this is an unlikely explanation.

It is also important to acknowledge several limitations of the study which could contribute to the null results. For instance, the present study was conducted as an unmoderated online study, which means that children completed the tasks under different conditions, which could potentially mask the relation between EF and word learning. Additionally, excluding children who did not provide data for both word learning and EF tasks could contribute to the null effect because children who did not complete the experiment may have weaker EF skills than those who completed the whole study. Furthermore, due to the task impurity problem (i.e., each EF task typically employs more than one EF component), it is considered best practice to include multiple tasks for each EF component. However, doing so would likely have led to an even higher attrition rate in our study given that children were already required to complete three EF tasks in addition to the word learning task. Hence, we only employed one task per EF component, which could, to a certain extent, explain the null results reported.

Two questions still remain. First, why do several previous studies report a positive association between word learning and EF? Second, if EF and novel word learning are not correlated, why is the relation between linguistic abilities and EF so robust? With regard to the first question, we speculate that the outcome obtained depends very much on the word learning task used. As discussed in the Introduction section, various word learning tasks have been employed and it is likely that tasks that tap into more than mere word-object association learning, such as category learning and adjective learning, may have a greater chance of tapping into the association between EF and word learning. We had assumed that the demanding nature of the cross-situational learning task would have a similar effect, but it seems that this task does not cross the threshold required to tap into associations between EF and learning. Alternatively, selective attention, which is assumed in the present study as a method of sampling for information, is not always task-oriented but could also be affected by other factors such as object salience (Pomper & Saffran, 2019) and or children’s interests (Ackermann et al., 2020; Mani & Ackermann, 2018). For instance, Pruden et al. (2006) revealed that 10-month-old infants disregard social cues in favour of perceptual cues, such that they associate novel labels with visually salient objects instead of the objects which the speaker gazed at. Recent studies similarly reveal that sampling strategies of younger children (5 years of age) appear to be more random and exploratory, whereas the sampling strategies of older children (6-9 years of age) appear to be more systematically based on their knowledge or uncertainty (de Eccher & Mani, 2023). Further studies will be needed to verify this speculation.

With regard to the second question, linguistic outcomes such as vocabulary, reading and school readiness tap into more than just the formation of simple word-object associations. Importantly, cognitive control is also linked to temperament, theory of mind and other social abilities, which are helpful in acquiring these linguistic abilities. In short, while EF may not be directly related to forming novel word-object associations per se, having better cognitive control is likely to be beneficial to learning in general. We suggest, therefore, that at least with regard to word learning, we find little relationship between attentional and inhibitory control and word learning, suggesting that early word learning may be more random and exploratory than previously assumed. Future research may, therefore, need to consider the reality of such exploratory sampling – potentially by using eye-tracking data to examine the attentional strategies underlying learning in such tasks (c.f. Yu et al., 2017) and how they correlate with attentional and inhibitory control and successful word learning.

Contributed to conception and design: MYS, NM, KH

Contributed to acquisition of data: MYS

Contributed to analysis and interpretation of data: MYS, NM

Drafted and/or revised the article: MYS, NM, KH

Approved the submitted version for publication: MYS, NM, KH

We are very thankful to Roger Mundry for his input with regard to model fitting, to the families who participated in the study and to our student assistants who helped in data collection and coding. We acknowledge support by the Open Access Publication Funds/transformative agreements of the Göttingen University.

This project was funded by RTG 2070 Understanding Social Relationships.

We have no known conflict of interest to disclose.

The data associated with this paper are publicly available at The analyses of the project were pre-registered at In addition, all study materials, analyses scripts as well as a pdf copy of the pre-registration are publicly available on the project page at


Going from two to six squares follows the design of Sorge et al. (2017).


As the beta model did not converge the first time we fitted it, we first z-transformed all our predictor variables, which were then renamed by adding the prefix z..


The beta model still did not converge after z-transforming the predictor variables, and because there were not many options in changing the controls, we chose in the next step to remove all correlations between the random slopes. This solved the convergence issue and is the model that we used in our analysis.


null.model = glmmTMB (ptl ~ z.memory + (1|ID) + (1|object) + (0+z.inhibition|object) + (0+z.attention|object) + (0+z.memory|object), family = beta_family (link = “logit”), data = data)


This exclusion criterion was not pre-registered.

Ackermann, L., Hepach, R., & Mani, N. (2020). Children learn words easier when they are interested in the category to which the word belongs. Developmental Science, 23(3), e12915.
Aravind, A., de Villiers, J., Pace, A., Valentine, H., Golinkoff, R., Hirsh-Pasek, K., Iglesias, A., & Wilson, M. S. (2018). Fast mapping word meanings across trials: Young children forget all but their first guess. Cognition, 177, 177–188.
Archibald, L. M. (2017). Working memory and language learning: A review. Child Language Teaching and Therapy, 33(1), 5–17.
Archibald, L. M., Gathercole, S. E. (2006). Short-term and working memory in specific language impairment. International Journal of Language Communication Disorders, 41(6), 675–693.
Archibald, L. M., Harder Griebeling, K. (2016). Rethinking the connection between working memory and language impairment. International Journal of Language Communication Disorders, 51(3), 252–264.
Aslan, A., Bäuml, K.-H. T. (2010). Retrieval-induced forgetting in young children. Psychonomic Bulletin Review, 17(5), 704–709.
Baddeley, A. (2003). Working memory and language: An overview. Journal of Communication Disorders, 36(3), 189–208.
Bartolotti, J., Marian, V., Schroeder, S. R., Shook, A. (2011). Bilingualism and inhibitory control influence statistical learning of novel word forms. Frontiers in Psychology, 2.
Beisly, A., Kwon, K.-A., Jeon, S., Lim, C. (2022). The moderating role of two learning related behaviours in preschool children’s academic outcomes: Learning behaviour and executive function. Early Child Development and Care, 192(1), 51–66.
Berens, S. C., Horst, J. S., Bird, C. M. (2018). Cross-situational learning is supported by propose-but-verify hypothesis testing. Current Biology, 28(7), 1132–1136.
Bergelson, E., Swingley, D. (2012). At 6–9 months, human infants know the meanings of many common nouns. Proceedings of the National Academy of Sciences, 109(9), 3253–3258.
Blair, C., Razza, R. P. (2007). Relating effortful control, executive function, and false belief understanding to emerging math and literacy ability in kindergarten. Child Development, 78(2), 647–663.
Blanco, N. J., Sloutsky, V. M. (2019). Adaptive flexibility in category learning? Young children exhibit smaller costs of selective attention than adults. Developmental Psychology, 55(10), 2060–2076.
Blanco, N. J., Sloutsky, V. M. (2020). Attentional mechanisms drive systematic exploration in young children. Cognition, 202.
Bortfeld, H., Morgan, J. L., Golinkoff, R. M., Rathbun, K. (2005). Mommy and me: Familiar names help launch babies into speech-stream segmentation. Psychological Science, 16(4), 298–304.
Brooks, M. E., Kristensen, K., van Benthem, K. J., Magnusson, A., Berg, C. W., Nielsen, A., Skaug, H. J., Mächler, M., Bolker, B. M. (2017). glmmTMB balances speed and flexibility among packages for zero-inflated generalized linear mixed modeling. The R Journal, 9, 378–400.
Carlson, S. M., Moses, L. J. (2001). Individual differences in inhibitory control and children’s theory of mind. Child Development, 72(4), 1032–1053.
Chow, J. C., Ekholm, E., Bae, C. L. (2021). Relative contribution of verbal working memory and attention to child language. Assessment for Effective Intervention, 47(1), 3–13.
Clerkin, E. M., Hart, E., Rehg, J. M., Yu, C., Smith, L. B. (2017). Real-world visual statistics and infants’ first-learned object names. Philosophical Transactions of the Royal Society B: Biological Sciences, 372(1711), 20160055.
De Diego-Lazaro, B. (2019). Novel-word learning in bilingual children with hearing loss [Doctoral dissertation, Arizona State University].
de Eccher, M., Mani, N. (2023). Are children sensitive to gaps in their knowledge? Metacognition and active selection of information in word learning.
Deng, W. S., Sloutsky, V. M. (2015). The development of categorization: Effects of classification and inference training on category representation. Developmental Psychology, 51(3), 392–405.
Diamond, A., Kirkham, N., Amso, D. (2002). Conditions under which young children can hold two rules in mind and inhibit a prepotent response. Developmental Psychology, 38(3), 352–362.
Escudero, P., Mulak, K. E., Vlach, H. A. (2016a). Cross-situational learning of minimal word pairs. Cognitive Science, 40(2), 455–465.
Escudero, P., Mulak, K. E., Vlach, H. A. (2016b). Infants encode phonetic detail during cross-situational word learning. Frontiers in Psychology, 7.
Fitneva, S. A., Christiansen, M. H. (2017). Developmental changes in cross-situational word learning: The inverse effect of initial accuracy. Cognitive Science, 41(S1), 141–161.
Frank, M. C., Sugarman, E., Horowitz, A. C., Lewis, M. L., Yurovsky, D. (2016). Using tablets to collect data from young children. Journal of Cognition and Development, 17(1), 1–17.
Friedrich, M., Friederici, A. D. (2011). Word learning in 6-month-olds: Fast encoding–weak retention. Journal of Cognitive Neuroscience, 23(11), 3228–3240.
Gordon, K. R., Lowry, S. L., Ohlmann, N. B., Fitzpatrick, D. (2022). Word learning by preschool-age children: Differences in encoding, re-encoding, and consolidation across learners during slow mapping. Journal of Speech, Language, and Hearing Research, 65(5), 1956–1977.
Hansson, K., Forsberg, J., Löfqvist, A., Mäki-Torkko, E., Sahlén, B. (2004). Working memory and novel word learning in children with hearing impairment and children with specific language impairment. International Journal of Language Communication Disorders, 39(3), 401–422.
Hill, A. P., van Santen, J., Gorman, K., Langhorst, B. H., Fombonne, E. (2015). Memory in language-impaired children with and without autism. Journal of Neurodevelopmental Disorders, 7(1), 1–13.
Hill, M. S., Wagovich, S. A. (2020). Word learning from context in school-age children: Relations with language ability and executive function. Journal of Child Language, 47(5), 1006–1029.
Iluz-Cohen, P., Armon-Lotem, S. (2013). Language proficiency and executive control in bilingual children. Bilingualism: Language and Cognition, 16(4), 884–899.
Jackson, E., Leitão, S., Claessen, M., Boyes, M. (2021). Word learning and verbal working memory in children with developmental language disorder. Autism Developmental Language Impairments, 6.
Kapa, L. L., Colombo, J. (2014). Executive function predicts artificial language learning. Journal of Memory and Language, 76, 237–252.
Li, P., Grant, A. (2015). Identifying the causal link: two approaches toward understanding the relationship between bilingualism and cognitive control. Cortex, 73, 358–360.
Linck, J. A., Hughes, M. M., Campbell, S. G., Silbert, N. H., Tare, M., Jackson, S. R., Smith, B. K., Bunting, M. F., Doughty, C. J. (2013). Hi-LAB: A new measure of aptitude for high-level language proficiency. Language Learning, 63(3), 530–566.
Ling, D. S., Wong, C. D., Diamond, A. (2016). Do children need reminders on the day–night task, or simply some way to prevent them from responding too quickly? Cognitive Development, 37, 67–72.
Lo, C. H., Mani, N., Kartushina, N., Mayor, J., Hermes, J. (2021). e-Babylab: An open-source browser-based tool for unmoderated online developmental studies. PsyArXiv.
Mani, N., Ackermann, L. (2018). Why do children learn the words they do? Child Development Perspectives, 12(4), 253–257.
Markman, E. M., Wachtel, G. F. (1988). Children’s use of mutual exclusivity to constrain the meanings of words. Cognitive Psychology, 20(2), 121–157.
McMurray, B., Horst, J. S., Samuelson, L. K. (2012). Word learning emerges from the interaction of online referent selection and slow associative learning. Psychological Review, 119(4), 831–877.
Meder, B., Wu, C. M., Schulz, E., Ruggeri, A. (2021). Development of directed and random exploration in children. Developmental Science, 24(4).
Medina, T. N., Snedeker, J., Trueswell, J. C., Gleitman, L. R. (2011). How words can and cannot be learned by observation. Proceedings of the National Academy of Sciences, 108(22), 9014–9019.
Mervis, C. B., Bertrand, J. (1994). Acquisition of the novel name-nameless category (N3C) principle. Child Development, 65(6), 1646–1662.
Mezzacappa, E. (2004). Alerting, orienting, and executive attention: Developmental properties and sociodemographic correlates in an epidemiological sample of young, urban children. Child Development, 75(5), 1373–1386.
Montgomery, D. E., Koeltzow, T. E. (2010). A review of the day–night task: The Stroop paradigm and interference control in young children. Developmental Review, 30(3), 308–330.
Morales, J., Gómez-Ariza, C. J., Bajo, M. T. (2013). Dual mechanisms of cognitive control in bilinguals and monolinguals. Journal of Cognitive Psychology, 25(5), 531–546.
Nevo, E., Breznitz, Z. (2013). The development of working memory from kindergarten to first grade in children with different decoding skills. Journal of Experimental Child Psychology, 114(2), 217–228.
Nurmsoo, E., Bloom, P. (2008). Preschoolers’ perspective taking in word learning: Do they blindly follow eye gaze? Psychological Science, 19(3), 211–215.
Oberauer, K., Süß, H.-M., Wilhelm, O., Wittman, W. W. (2003). The multiple faces of working memory: Storage, processing, supervision, and coordination. Intelligence, 31(2), 167–193.
Outters, V., Hepach, R., Behne, T., Mani, N. (2023). Children’s affective involvement in early word learning. Scientific Reports, 13(7351).
Park, J., Ellis Weismer, S., Kaushanskaya, M. (2018). Changes in executive function over time in bilingual and monolingual school-aged children. Developmental Psychology, 54(10), 1842–1853.
Plebanek, D. J., Sloutsky, V. M. (2017). Costs of selective attention: When children notice what adults miss. Psychological Science, 28(6), 723–732.
Poepsel, T. J., Weiss, D. J. (2014). Context influences conscious appraisal of cross situational statistical learning. Frontiers in Psychology, 5.
Pomper, R., Saffran, J. R. (2019). Familiar object salience affects novel word learning. Child Development, 90(2), e246–e262.
Pruden, S. M., Hirsh-Pasek, K., Golinkoff, R. M., Hennon, E. A. (2006). The birth of words: Ten-month-olds learn words through perceptual salience. Child Development, 77(2), 266–280.
Romberg, A. R., Saffran, J. R. (2010). Statistical learning and language acquisition. WIREs Cognitive Science, 1(6), 906–914.
Rueda, M. R., Fan, J., McCandliss, B. D., Halparin, J. D., Gruber, D. B., Lercari, L. P., Posner, M. I. (2004). Development of attentional networks in childhood. Neuropsychologia, 42(8), 1029–1040.
Rueda, M. R., Rothbart, M. K., McCandliss, B. D., Saccomanno, L., Posner, M. I. (2005). Training, maturation, and genetic influences on the development of executive attention. Proceedings of the National Academy of Sciences, 102(41), 14931–14936.
Sanders, L. M., Hortobágyi, T., Balasingham, M., Van der Zee, E. A., van Heuvelen, M. J. (2018). Psychometric properties of a flanker task in a sample of patients with dementia: A pilot study. Dementia and Geriatric Cognitive Disorders Extra, 8(3), 382–392.
Schulz, E., Wu, C. M., Ruggeri, A., Meder, B. (2019). Searching for rewards like a child means less generalization and more directed exploration. Psychological Science, 30(11), 1561–1572.
Simpson, A., Riggs, K. J. (2005). Inhibitory and working memory demands of the day–night task in children. British Journal of Developmental Psychology, 23(3), 471–486.
Slone, L. K., Atakagi, N., Sandhofer, C. M. (2017). Selection, memory, and inhibition processes in young children’s novel word learning [Poster presentation].
Smith, K., Smith, A. D., Blythe, R. A. (2011). Cross-situational learning: An experimental study of word-learning mechanisms. Cognitive Science, 35(3), 480–498.
Smith, L. B., Jones, S. S., Landau, B., Gershkoff-Stowe, L., Samuelson, L. (2002). Object name learning provides on-the-job training for attention. Psychological Science, 13(1), 13–19.
Smith, L. B., Yu, C. (2008). Infants rapidly learn word-referent mappings via cross-situational statistics. Cognition, 106(3), 1558–1568.
Soh, C., Yang, C. (2021). Memory constraints on cross situational word learning. Proceedings of the Annual Meeting of the Cognitive Science Society, 43.
Sorge, G. B., Toplak, M. E., Bialystok, E. (2017). Interactions between levels of attention ability and levels of bilingualism in children’s executive functioning. Developmental Science, 20(1).
Suanda, S. H., Mugwanya, N., Namy, L. L. (2014). Cross-situational statistical word learning in young children. Journal of Experimental Child Psychology, 126, 395–411.
Sullivan, M. D., Janus, M., Moreno, S., Astheimer, L., Bialystok, E. (2014). Early stage second-language learning improves executive control: Evidence from ERP. Brain and Language, 139, 84–98.
Sumner, E., Li, A. X., Perfors, A., Hayes, B., Navarro, D., Sarnecka, B. W. (2019). The Exploration Advantage: Children’s instinct to explore allows them to find information that adults miss. PsyArXiv.
Tincoff, R., Jusczyk, P. W. (1999). Some beginnings of word comprehension in 6-month-olds. Psychological Science, 10(2), 172–175.
Tincoff, R., Jusczyk, P. W. (2012). Six-month-olds comprehend words that refer to parts of the body. Infancy, 17(4), 432–444.
Trueswell, J. C., Medina, T. N., Hafri, A., Gleitman, L. R. (2013). Propose but verify: Fast mapping meets cross-situational word learning. Cognitive Psychology, 66(1), 126–156.
Unger, L., Sloutsky, V. M. (2023). Category learning is shaped by the multifaceted development of selective attention. Journal of Experimental Child Psychology, 226, 105549.
Vlach, H. A., DeBrock, C. A. (2017). Remember dax? Relations between children’s cross-situational word learning, memory, and language abilities. Journal of Memory and Language, 93, 217–230.
Vlach, H. A., DeBrock, C. A. (2019). Statistics learned are statistics forgotten: Children’s retention and retrieval of cross-situational word learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 45(4), 700–711.
Vlach, H. A., Johnson, S. P. (2013). Memory constraints on infants’ cross-situational statistical learning. Cognition, 127(3), 375–382.
Vouloumanos, A. (2008). Fine-grained sensitivity to statistical information in adult word learning. Cognition, 107(2), 729–742.
Weiland, C., Barata, M. C., Yoshikawa, H. (2014). The co-occurring development of executive function skills and receptive vocabulary in preschool-aged children: A look at the direction of the developmental pathways. Infant and Child Development, 23(1), 4–21.
White, L. J., Alexander, A., Greenfield, D. B. (2017). The relationship between executive functioning and language: Examining vocabulary, syntax, and language learning in preschoolers attending Head Start. Journal of Experimental Child Psychology, 164, 16–31.
Yang, Y., Yim, D. (2018). The role of executive function for vocabulary acquisition and word learning in preschool-age children with and without vocabulary delay. Communication Sciences Disorders, 23(1), 43–59.
Yoshida, H., Tran, D. N., Benitez, V., Kuwabara, M. (2011). Inhibition and adjective learning in bilingual and monolingual children. Frontiers in Psychology, 2.
Yu, C., Smith, L. B. (2007). Rapid word learning under uncertainty via cross-situational statistics. Psychological Science, 18(5), 414–420.
Yu, C., Smith, L. B. (2011). What you learn is what you see: Using eye movements to study infant cross-situational word learning. Developmental Science, 14(2), 165–180.
Yu, C., Zhong, Y., Fricker, D. (2012). Selective attention in cross-situational statistical learning: evidence from eye tracking. Frontiers in Psychology, 3.
Yurovsky, D., Frank, M. C. (2015). An integrative account of constraints on cross-situational learning. Cognition, 145, 53–62.
Zavaleta, K. L. (2020). The role of executive control in language learning [Doctoral dissertation, University of Arizona].
Zhang, Y., Chen, C.-H., Yu, C. (2019). Mechanisms of cross-situational learning: Behavioural and computational evidence. Advances in Child Development and Behavior, 56, 37–63.
This is an open access article distributed under the terms of the Creative Commons Attribution License (4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Supplementary data