The acquisition of emotion words is critical to children’s socio-emotional development. Previous studies report that children acquire emotion words gradually during ages 3–5 and beyond. The majority of this work, however, has used demanding tasks for young children (e.g., asking children to label emotion-related facial configurations) and has predominantly relied on facial configurations. Here we designed a child-friendly, word-comprehension task incorporating both facial configurations and body language. In two preregistered online experiments, we asked two to four-year-olds (N = 96) to connect emotion words—happy, sad, angry, and scared—to either facial configurations (Experiment 1) or combined facial and body cues (Experiment 2). We found relatively early competence in understanding emotion words, especially those of the same-valence. All age groups, including 2-year-olds, successfully linked emotion words to corresponding facial configurations (Experiment 1). Experiment 2 replicated this pattern and further showed that children performed equally well (though not substantially better) when given additional body cues. Parental reports of children’s exposure to and use of masks during the COVID-19 pandemic did not correlate with children’s performance in either experiment. Even before children can produce emotion words in an adult-like manner, they possess at least a partial understanding of those words and can map them to emotion cues within valence domains.

The ability to communicate our own and others’ emotions is important for our social life and subjective well-being (Nook, 2021). Acquiring emotion words, however, may pose a difficult problem for young children. Unlike words that refer to concrete objects in the environment (e.g., “apple” and “ball”), emotion words refer to abstract, internal affective states. Although there are external, observable cues derived from these internal states, these cues are often transient, highly variable, and manifest in different forms, including facial movements, vocalizations, and body postures. These properties make learning the correspondence between emotion words and their referents relatively challenging. How do children acquire emotion words and what are the external cues that children map these words to?

Research looking at children’s production of emotion words has found a protracted developmental pattern (see Widen, 2013 for review). Although children begin to produce emotion words before their second birthday (Frank et al., 2016; Ridgeway et al., 1985), they do not use them in an adult-like manner. For instance, when being asked to label emotion-related facial configurations, children make systematic errors. While those at age 2 can differentiate between emotion words of opposite valences (e.g., “happy” vs. “angry”), they confuse same-valence emotion words such as “angry,” “sad,” “scared,” and “disgusted.” Only through ages 3–5 and beyond do they slowly and gradually distinguish between and correctly use emotion words that share the same valence (e.g., Maassarani et al., 2014; Widen & Russell, 2003, 2008). This suggests that children initially tend to overgeneralize emotion words based on their valence (positive or negative), and that it takes considerable time and development for children to differentiate between same-valence emotion words (Widen, 2013).

Studies investigating children’s comprehension of emotion words have found a similar pattern. Children have been asked to choose from several facial configurations when being asked, e.g., “Who is angry?” (a choice-from-array task; e.g., Bullock & Russell, 1984; Pons et al., 2004) or to place all facial configurations displaying a certain emotion (e.g., angry) in a box (a sorting task; e.g., Widen & Russell, 2008; Russell and Widen 2002). The majority of these studies tested children aged 3 years and older, and found that children have more difficulty distinguishing between emotion words of the same valence than those with opposite valences (Bormann-Kischkel et al., 1990; Bullock & Russell, 1984, 1985; Russell & Widen, 2002; Widen & Russell, 2008).

Using simplified paradigms, a smaller number of emotion word comprehension studies included 2-year-olds to see if younger children can pass the tasks when task demands were reduced. For instance, Bullock & Russell (1985) used a two-alternative, choice-from-array task to test children’s comprehension of nine emotion words—“happy,” “excited,” “surprised,” “scared,” “mad,” “disgusted,” “sad,” “sleepy,” and “calm.” Although they reported that 2-year-olds performed above chance for all emotions except sleepy, performance was averaged across cross-valence (e.g., happy vs. scared) and within-valence (e.g., sad vs. scared) comparisons. Thus, results were inconclusive about whether 2-year-olds truly discriminated between same-valence emotions. Similarly, Price, Ogren & Sandhofer (2022) found that 2- and 3-year-olds performed above chance in sorting emotion-related facial configurations into three categories: happy, sad, and angry in Study 1 and surprised, disgusted, and afraid in Study 2. However, as the valence of the three categories in each study was mixed—one positive and two negative in Study 1 and one neutral and two negative in Study 2—it is unclear whether the children could succeed at same-valence emotion differentiation specifically. In addition, other studies that looked at same-valence contrasts found that 2-year-olds failed to distinguish between negative emotions (Russell & Widen, 2002; Widen & Russell, 2008). For instance, Russell and Widen (2002) asked children to include or exclude faces into a box labeled with a certain emotion. While two-year-olds (especially those between 24 and 30 months) excluded happy faces from an “angry” box, they put all negative faces, including sad, fear, disgust, and angry, in the box at equal rates, suggesting that they did not succeed at same-valence emotion differentiation.

Collectively, past work examining children’s production and comprehension of emotions words consistently suggests that an ability to discriminate between emotion words of the same valence undergoes a protracted development (Gates, 1923; Izard, 1971; Widen, 2013; Widen & Russell, 2008). Even at age two, children overgeneralize emotion words based on valence, and they slowly and gradually learn to produce and understand emotion words in an adult manner during ages 3–5 and beyond.

However, it is crucial to reexamine this late acquisition of emotion words to ensure that research has not underestimated children’s abilities. Accurately assessing the abilities of young children is a challenge in developmental psychology. Even when a child possesses a certain ability, an experimental task may fail to detect it if the child has difficulty following task instructions, maintaining interest, or being motivated to give the right answer. Furthermore, children may be unfamiliar with some experimental stimuli that adults find familiar, leading to inaccurate assessments of their abilities.

By using child-friendly, age-appropriate tasks and stimuli, researchers have been able to demonstrate much earlier abilities than previously believed in not only language but also cognitive and social domains (e.g., Bergelson & Swingley, 2012, 2013; Hamlin, 2013; Liu et al., 2017; Saffran et al., 1999; Schmidt & Sommerville, 2011; Wagner et al., 2018). For example, although most children do not begin to speak their first words until their first birthday, research has found that infants as young as 6 to 9 months old can understand a variety of common nouns, including “hand,” “hair,” “apple,” “bottle,” and “juice,” using infant-friendly tasks (Bergelson & Swingley, 2012). Similarly, while previous research suggested that children have a protracted development of learning words in abstract domains such as color (Wagner et al., 2013), number (Wynn, 1992), and time (Tillman & Barner, 2015), research using a more child-friendly, eye-tracking task found evidence of color word comprehension in toddlers as young as 18 to 33 months (Wagner et al., 2018; see also Forbes & Plunkett, 2018; Yurovsky et al., 2015).

Our study is distinct from previous emotion word acquisition research in two notable ways. First, we used a child-friendly task that has relatively low task demands. Although word-production tasks such as free labeling have been argued to be a fair measure of spontaneous recognition of emotion from emotional expressions (Widen & Russell, 2003), free-labeling tasks require children to find the relevant category of words, search for the most likely one, and then produce it. These task demands can easily overwhelm young children. We also avoided word-comprehension tasks that require children to learn and follow complex experimental instructions (e.g., sorting tasks; Widen & Russell, 2008) or process multiple emotion categories simultaneously (e.g., choosing from an array of emotional expressions; Bullock & Russell, 1984). Instead, following Bullock & Russell (1985), we used a two-alternative forced-choice (2AFC) task, in which children are asked to point to one of two emotion stimuli (scared vs. angry) upon hearing an emotion word (e.g., “Who is angry?”)1. This is a natural task for young children because parents often ask them pedagogical questions (Yu et al., 2017) and children use pointing to communicate with their parents before they can even speak (Tomasello et al., 2007). Different from Bullock & Russell (1985) however, instead of looking at children’s overall accuracy across both cross-valence and within-valence comparisons, we focus specifically on within-valence comparisons: We included three negative emotion words—“angry,” “sad,” and “scared”—and examined if children performed above chance within this valence domain. We also included one positive emotion word “happy” to replicate the general pattern that children perform better on cross-valence (e.g., happy vs. scared) than within-valence (e.g., sad vs. scared) comparisons (see Widen, 2013 for review).

Second, in contrast to some prior work, we incorporated not only facial configurations but also body language in our stimuli to determine if additional cues could enhance children’s performance. As emotions manifest not only on people’s faces but also in their voices and body language, it is possible that young children have acquired emotion words but linked them to cues other than facial configurations. Indeed, research on adults suggests that especially for negative emotions like sadness, anger, and fear, body language provides stronger cues to emotional states than facial configurations do. For example, when observing incongruent facial configurations and body postures, adults’ emotion judgments were biased toward the emotion expressed by the body (Aviezer et al., 2008; Meeren et al., 2005; Van den Stock et al., 2007). Research on body language with young children, however, has found mixed findings. Between ages 3 and 8, children show an increasing ability to map emotion words to body postures (in the absence of facial cues; Nelson & Russell, 2011a; Witkower et al., 2021). When both facial cues and body posture are available but convey conflicting emotion information, children ages 6 years and beyond, like adults, were less good at labeling the facial configurations (Mondloch, 2012; Mondloch et al., 2013; Nelson & Mondloch, 2017). However, compared to isolated facial and body cues, combined (congruent) cues did not significantly improve 3- to 5-year-olds’ emotion attribution (Nelson & Russell, 2011a, 2011b), suggesting that at least preschool-aged children might not be able to integrate multiple emotion cues. Given the limited number of studies however, more work is needed to further understand children’s emotion cue integration, especially in the preschool ages.

In sum, in two experiments, the current study used a child-friendly task to test young children (2.0-4.9 years)’s emotion word comprehension. We used child facial configurations in Experiment 1 and added body postures to those facial configurations in Experiment 2. On each trial, we presented two emotion stimuli side by side on a screen and asked children to point to one of them upon hearing an audio prompt (e.g., “Who is angry?”). We tested children as young as age two to see if we could find relatively early competence. We also included children up to 4.9 years old to provide a comparison with prior work.

As the study was conducted during the COVID-19 pandemic (March–August 2021), all children participated remotely from home with their parents. Recent work suggests that online testing can produce comparable results to in-person testing (Chuey et al., 2021, 2022; Scott et al., 2017). Therefore, we posted our study on Lookit, an online testing platform for developmental research (Scott et al., 2017). Families then participated in our study remotely without the presence of experimenters, known as un-moderated or asynchronous online testing. Additionally, as our study involves emotion perception, which may be affected by mask usage during the pandemic, we included a mask-use survey to determine if individual children’s exposure to and use of masks would correlate with their performance in our task.

2.1. Methods

2.1.1. Participants

Based on a small number of pilot participants (5 2-year-olds and 3 3-year-olds), we found a large effect size in each age group (Cohen’s d=1.24 for 2-year-olds and d>10 for 3-year-olds). Both effect sizes are much larger than the commonly-used large-effect benchmark: cohen’s d=0.80 (Cohen, 1988). Thus, we were comfortable assuming a large effect (at least 0.8) in our study, based on which we would need n=16 per age group to reach 85% power. Thus, we recruited 48 English-speaking children between ages 2.0 and 4.9 (MAge = 3.41, range: 2.06–4.85), 16 children per age group (2-year-olds: MAge = 2.44, range = 2.06–2.96; 3-year-olds: MAge = 3.41, range = 3.04–3.80; 4-year-olds: MAge = 4.39, range = 4.07–4.85).2 All children were recruited and tested on an online testing platform for developmental research, Lookit (Scott et al., 2017). Following our preregistered exclusion criteria, six additional children were tested but excluded from analysis due to (1) first language not English (n=1), (2) completing less than 1/3 trials (n=3), (3) parental interference (n=1), or (4) online testing technical issues (n=1).

2.1.2. Stimuli

We selected our face stimuli from the Child Affective Facial Expression (CAFE) dataset (LoBue & Thrasher, 2014, 2015). We followed 2 selection criteria: (1) the face was rated among the highest for displaying a target emotion, and (2) a range of gender and racial/ethnic groups were represented. We ultimately selected all facial expressions from the top ten expressions for each emotion category, with only two exceptions. The exceptions—one in the scared category and the other in the sad category—were included to increase the representation of male Asian faces, which were underrepresented among the top-rated expressions, but were still among the top expressions within the specific gender and racial/ethnic group. The final face set consists of six facial expressions in each emotion category, with an equal number of boys and girls and the inclusion of diverse racial/ethnic groups. We also used six non-emotion pictures (e.g., car, apple, and flower) for training purposes.

A native English speaker recorded audio stimuli. To add some variation to the audio, two sets of audio were recorded: (1)“Who’s [happy/sad/angry/scared]? Can you find the [happy/sad/angry/scared] person?” (2) “Where is the [happy/sad/angry/scared] person? Can you show me who is [happy/sad/angry/scared]?” Sentence frames were randomly selected for each test trial.

2.1.3. Procedure

Children participated in the study remotely with their parents or legal guardians on Lookit. There were 3 training trials followed by 24 test trials.

The goal of the training trials was to familiarize children and their parents with the response procedure. On each trial, two non-emotion pictures (e.g., a car and an apple) were presented side by side on a screen and an audio prompt asked the child to pick one of them: e.g., “Can you find the apple? Point to the apple to show me where it is!” The child’s parent or helper clicked on the picture that the child picked and pressed the next button to continue.

Then the test trials began. On each trial, participants saw two facial configurations presented side by side on a screen (e.g., a happy face and a sad face) and heard an audio prompt asking them to pick one of them (e.g., “Who is happy? Can you find the happy person?”). Like the training trials, the child pointed to one of the two facial configurations and their parent clicked on the one they picked and pressed the next button to continue. If the child did not know the answer, the parent could skip the trial by directly clicking the next button. On average, two-, three-, and four-year-olds skipped 4.0%, 0.3%, 0.0% trials, respectively. The skipped trials were coded as missing value and were omitted from analysis. Coding the skipped trials as incorrect does not change the pattern of results in the study. The selection of facial configurations and audio prompts on each trial was randomized with the constraint that they were sampled evenly across trials.

We made extensive efforts to eliminate parental interference. First, before the experiment, parents watched a training video ( They were told that we aimed to understand how their children responded to the prompts and we asked them to accurately represent their children’s choices. We also asked them to avoid responding to their children’s choices with phrases like “Are you sure?” or “That’s right,” which might tell children what they themselves thought about the answer; instead, phrases like “OK” or “alright” are neutral and great. Second, throughout the study, there was a text box at the lower part of the screen reminding parents to report their children’s responses accurately. Third, a coder blind to condition screened all videos to see if there were any signs of parental interference. The coder specifically looked for cases in which (1) the child changed their response after their parent questioned their first choice, and (2) the parent pointed to one side of the screen while repeating the prompt, which might bias children’s own pointing. Based on these criteria, only one parent interfered heavily during the experiment (i.e., interfered on over 1/3 of trials) and all data from that participant was excluded (preregistered exclusion criterion). For the remaining participants, parental interference was rare; the coder only judged 0.5% trials that had the potential of parental interfere, and these trials were excluded (also preregistered exclusion criterion).

Finally, a coder blind to condition and parents’ clicks coded children’s pointing responses offline from videotapes. As our main focus was on the youngest age group, the coder coded all videotapes from 2-year-olds and 1/3 of videotapes (randomly selected) from 3- and 4-year olds. The coder’s coding corroborated parents’ clicks on 98%, 94% and 99% of trials for 2-, 3- and 4-year-olds, respectively. As for the small proportion of trials that the coder and parents disagreed, we do not know if the coder or parents better represented children’s responses, because the coder could not always see the child’s pointing behavior clearly from videotapes (which were recorded through participants’ computer/laptop webcams). The high coder-parent agreement, however, suggests that parents’ clicks were generally trustworthy, and the small number of discrepancies may mostly be due to unclear child pointing itself or noises in offline coding, rather than parents intentionally trying to misrepresent children’s responses. We have also used both parents’ clicks and the coder’s coding to analyze 2-year-olds’ behaviors, and they generated similar patterns of results, confirming again that parents’ clicks were reliable. Also because there is more missing data in offline coding than parents’ clicks (due to technical issues like videos got cut off), we opted to use parents’ clicks throughout for analysis to optimize reliability as well as statistical power.

At the end of the experiment, parents were asked to fill out a survey about mask use. The survey comprised three questions: “How frequently has your child seen you wear a mask in the past year?”, “How frequently has your child seen their peers wear a mask in the past year?”, “How frequently has your child been wearing a mask in the past year?”. Parents were provided with five options: “Never,” “Rarely,” “Sometimes,” “Often,” and “Always.”

2.2. Results and discussion

Following our preregistered analysis plan, we analyzed children’s choices with a logistic mixed-effects model (using the lme4 package in R; Bates et al., 2015). We included Trial Type (cross-valence vs. within-valence comparison, sum coded), Age (centered), and their interaction as fixed effects. We also preregistered that we would start with a maximal random effect structure such that random intercepts and random slopes of Trial Type, Emotion, and their interaction were all fit by subject, and these random effects would be pruned if the model failed to converge (Barr et al., 2013). The final random effect included the random intercept by subject only, so the final full model was: ChoiceTrial Type * Age + (1subject). The intercept was significant (β=2.76, z=10.69, p.001), indicating that children as a group performed above chance (M=0.86). There was an effect of Age (β=1.13, z=4.72, p.001) and Trial Type (β=.45, z=3.54, p.001) but no interaction between the two (β=.17, z=1.37, p=.170), suggesting that children’s performance improved with age and they performed better on cross-valence than within-valence trials (Figure 2).

Also following our preregistered analysis, we grouped children by age bin and examined if all age groups passed each trial type (cross- or within-valence). We found that all age groups performed significantly above chance on both trial types (two-year-olds: cross-valence, M=.86, t(15)=10.66, p.001, 95% CI [.79, .93], within-valence, M=.64, t(15)=2.88, p=.011, 95% CI [.54, .74]; three-year-olds: cross-valence, M=.89, t(15)=8.47, p.001, 95% CI [.79, .99], within-valence, M=.84, t(15)=7.84, p.001, 95% CI [.75, .93]; four-year-olds: cross-valence, M=.99, t(15)=87.00, p.001, 95% CI [.98, 1.01], within-valence, M=.96, t(15)=34.32, p.001, 95% CI [.93, .99]; Figure 2B). These results indicate that all age groups, including 2-year-olds, were able to map emotion words to facial configurations both cross and within valence domains.

Exploratory analysis examined if children performed equally well on the three trials within each trial type. We first fit data from the three cross-valence trials (happy-sad, happy-angry, and happy-scared) to the following model: ChoiceTrial * Age + (1subject). We neither found differences between the three trials (all ps>.214) nor any interaction with age (all ps>.510). We also fit data from the three within-valence trials (sad-angry, sad-scared, and angry-scared) to the same model. Similarly, we neither found differences between the three trials (all ps>.323) nor any interaction with age (all ps>.390). These results suggest that children’s above-chance performance was not driven by a particular trial. Instead, they were equally good at all trials of the same type.

In our exploratory analysis, we also examined the correlation between mask use during the pandemic and children’s performance. We fit a regression model with independent variables consisting of children’s exposure to parents wearing masks (Parent Mask), exposure to peers wearing masks (Peer Mask), and the frequency of children wearing masks themselves (Own Mask). Age was centered and included as a covariate. The dependent variable was each child’s mean accuracy across trials. The model specification was: AccuracyParent Mask + Peer Mask + Own Mask + Age. None of the three mask-related variables significantly predicted children’s performance after controlling for age (all ps>.505). This finding indicates that the level of exposure to and use of masks did not have a predictive relationship with children’s performance.

Taken together, while we replicated prior findings that children are better at cross-valence comparisons than within-valence comparisons (see Widen, 2013 for review), we also found a relatively early ability to map emotion words to facial configurations. By using a child-friendly, word comprehension task, even 2-year-olds succeeded in making within-valence mappings of words and faces. Such performance was not correlated with the degree of individual children’s exposure to and use of masks. In the next experiment, we aim to replicate these findings as well as testing if adding congruent body postures would further improve children’s performance.

3.1. Methods

3.1.1. Participants

As in Experiment 1, we recruited 48 English-speaking children between ages 2.0 and 4.9 (MAge = 3.53, range: 2.04–4.92) on the same online testing platform, Lookit.3 There were 16 children in each age group (2-year-olds: MAge = 2.50, range = 2.04–2.95; 3-year-olds: MAge = 3.52, range = 3.03–3.98; 4-year-olds: MAge = 4.56, range = 4.12–4.92). Following our preregistered exclusion criteria, 11 additional children were tested but excluded from analysis due to (1) first language not English (n=3), (2) having seen a sibling participate in the study (n=1), (3) parental interference (n=4), (4) online technical issue (n=2), or (5) performance outside 3 standard deviations of the mean (n=1; a 4-year-old).

3.1.2. Stimuli

We used the same face stimuli as in Experiment 1. Body stimuli were from online image sources (e.g., google images). The final body stimuli included images with the upper body. There were six images in each emotion category. The six images were further divided into two subcategories: hands above the shoulder and hands below the shoulder. This differentiation ensured both the inclusion of some variation in naturalistic body language and that there were comparable variations across categories. We used Adobe Photoshop to combine the face and body stimuli (Figure 1B). We fine-tuned the skin color such that the colors of the face and arms were matched. We also grayscaled the color of clothes in all body images.

Figure 1.
Stimuli for Experiments 1 & 2.
Figure 1.
Stimuli for Experiments 1 & 2.
Close modal

We validated the combined face-body stimuli with a group of adults (N = 59) recruited from an online testing platform, Prolific. We used the same procedure as in LoBue & Thrasher (2015). Adult participants saw one face-body stimulus at a time and were asked to select whether the child was happy, sad, angry, scared, disgusted, surprised, or neutral. As a comparison, the same group of adults also validated the face-only stimuli. We presented the face-body stimuli in one block and the face-only stimuli in the other block. Order of blocks was randomized. Stimuli within each block were also randomized.

Adults’ judgments on the face-only stimuli mostly replicated LoBue & Thrasher (2015). They performed near ceiling on recognizing happy (M=.96, t(58)=59.39, p.001), sad (M=.92, t(58)=38.06, p.001), angry (M=.91, t(58)=42.21, p.001) but not scared faces (M=.62, t(58)=14.76, p.001; one-sample t-test, chance level = .14). Note that although we chose top faces in the scared category, those faces were not as easily recognized as those in other emotion categories both in LoBue & Thrasher (2015)’s original validation and in our own validation. The most easily confused emotion was surprise. As our study does not include surprise, the task for children might be easier than for adults.

Adults’ judgments on the face-body stimuli suggest that adding body postures significantly improved adults’ ability to recognize scared (M=.72; t(58)=2.97, p=.004) and angry stimuli (M=.95; t(58)=2.46, p=.017, paired-sample t-test). No significant difference was found in the near-ceiling happy category (M=.95; t(58)=.89, p=.376). We also found a slight drop in the sad category (M=.87; t(58)=2.30, p=.025). A closer look at the data suggests that the drop was mainly driven by one of the sad photos (i.e., the second one in the Sad column in Figure 1B); rating dropped from .85 to .59 (t(58)=3.40, p=.001). If we exclude this photo, adults’ performance was near ceiling (M=.93), similar to their performance when presented with the face-only sad stimuli (t(58)=.61, p=.546). Below we report results from all stimuli but excluding the sad stimulus that yielded lower ratings does not change the overall pattern of results.

3.1.3. Procedure

We used the same procedure and coding scheme as in Experiment 1. On average, two-, three-, and four-year-olds skipped 6.1%, 2.5%, and 0.3% trials, respectively. The skipped trials were coded as missing values and were omitted from analysis. Coding the skipped trials as incorrect responses does not change the pattern of results. As in Experiment 1, a coder screened all videotapes for parental interference; data from four participants were replaced due to parental interference on over 1/3 of trials (preregistered exclusion criterion). Additionally, a blind coder coded pointing behaviors of all 2-year-olds and 1/3 of 3- and 4-year-olds (randomly selected) offline from videotapes. The coder-parent agreement was 98%, 96% and 99% for 2-, 3- and 4-year-olds, respectively. We also used both the coder’s judgments and the parents’ clicks to analyze 2-year-olds’ behaviors, and they generated similar patterns of results, further ensuring the reliability of using parents’ clicks for analysis.

3.2. Results and discussion

As preregistered, we used the same mixed-effects model as in Experiment 1. The intercept was significant (β=3.00, z=11.12, p.001), indicating above-chance performance as a group (M=.88). There was an effect of Age (β=1.24, z=4.92, p.001) and Trial Type (β=.38, z=2.63, p=.009) but no interaction between the two (β=.06, z=.44, p=.663), suggesting that in line with Experiment 1, children’s performance improved with age and they did better overall in cross-valence than within-valence trials (Figure 2).

Figure 2.
Proportion correct responses by age for both experiments, both overall (A) and by trial type (B). Points show individual participants’ means and lines show linear fits.
Figure 2.
Proportion correct responses by age for both experiments, both overall (A) and by trial type (B). Points show individual participants’ means and lines show linear fits.
Close modal

As a follow-up to our preregistered analysis and to compare with Experiment 1, we grouped children by age bin and looked at each trial type separately. Replicating Experiment 1, all age groups performed significantly above chance on both cross-valence and within-valence trials (two-year-olds: cross-valence, M=.84, t(15)=7.04, p.001, within-valence, M=.69, t(15)=4.43, p.001; three-year-olds: cross-valence, M=.95, t(15)=19.75, p.001, within-valence, M=.91, t(15)=14.36, p.001; four-year-olds: cross-valence, M=.95, t(15)=10.00, p.001, within-valence, M=.94, t(15)=16.32, p.001).

Also as in Experiment 1, we ran exploratory analysis examining if children performed equally well on the three trials within each trial type. We fit data from the three cross-valence trials (happy-sad, happy-angry, and happy-scared) to the same model as in Experiment 1: ChoiceTrial * Age + (1subject). We found neither differences between the three trials (all ps>.714) nor any interaction with age (all ps>.725). We also fit data from the three within-valence trials (sad-angry, sad-scared, and angry-scared) using a model with the same specification. Similarly, we neither found significant differences between the three trials (all ps>.073) nor any interaction with age (all ps>.103). These results showed no major variations in children’s performance on the trials of the same trial type, suggesting that children’s above-chance performance was not driven by a particular emotion pairing.

We also examined the correlation between mask use and children’s performance as in Experiment 1. None of the mask-related variables significantly predicted children’s performance after controlling for age (all ps>.467). This finding is consistent with Experiment 1, suggesting that the level of exposure to and use of masks did not predict children’s performance.

In sum, Experiment 2 mirrored the overall pattern observed in Experiment 1. Children demonstrated a higher proficiency in cross-valence comparisons compared to within-valence comparisons. However, even 2-year-olds were capable of associating emotion words with combined facial and body cues within valence domains. This performance showed no correlation with the extent of individual children’s exposure to and use of masks.

We next conducted an exploratory analysis comparing children’s performance across experiments to see if children benefited from body cues. We added Experiment (face vs. face + body) to our preregistered model. The full model did not converge and after pruning (Barr et al., 2013), the final model was: ChoiceTrial Type * Age + Experiment + (1subject). While there was still a robust effect of Age (β=1.19, z=6.83, p.001) and Trial Type (β=.42, z=4.35, p.001), there was no effect of experiment (β=.10, z=.331, p=.741). This result—combined with the comparable levels of performance across experiments—suggests that, consistent with Nelson & Russell (2011a, 2011b), children were as good when presented with facial cues alone as when presented with multiple cues.

Given that body posture improved adults’ recognition of fear and anger (their recognition of the remaining two categories were near ceiling; see Methods: Stimuli), we ran another exploratory analysis focusing on the scared-angry comparison alone, to see if we would find a similar improvement in children. The final model was: ChoiceAge + Experiment + (1subject). As the overall pattern, children did not perform better on the scared-angry comparison in Experiment 2 than in Experiment 1 (β=.084, z=.26, p=.797). Thus, again, we did not find evidence for an improved performance in young children when given additional body cues.

Given that there was no significant difference between experiments, we collapsed data across experiments to increase the statistical power of the analysis by age group. All age groups successfully mapped emotion words to emotion cues both cross and within valence domains (two-year-olds: cross-valence, M=.85, t(31)=12.06, p.001, 95% CI [.79, .91], within-valence, M=.66, t(31)=5.13, p.001, 95% CI [.60, .73]; three-year-olds: cross-valence, M=.92, t(31)=16.25, p.001, 95% CI [.87, .97], within-valence, M=.88, t(31)=14.23, p.001, 95% CI [.82, .93]; four-year-olds: cross-valence, M=.97, t(31)=20.80, p.001, 95% CI [.93, 1.02], within-valence, M=.95, t(31)=30.16, p.001, 95% CI [.92, .98]). This confirms our main conclusion from both experiments that even 2-year-olds can make within-valence mappings of emotion words and emotion cues.

In two preregistered online experiments, we looked at 2- to 4-year-olds’ ability to map emotion words to facial and body cues in a child-friendly, word comprehension task. While we found that children performed better on cross-valence than within-valence comparisons (i.e., a general pattern in line with prior work; see Widen, 2013 for review), we also found an early ability to understand emotion words and distinguish them from others of the same valence. Even 2-year-olds were able to differentiate negative emotion words including “angry”, “sad”, and “scared,” and map them to facial configurations (Experiment 1). We replicated this pattern in Experiment 2 and additionally found that adding body language did not further improve children’s performance, suggesting that a single cue is sufficient for young children to make those mappings. Children’s performance in both experiments was not correlated with the degree of their exposure to and use of masks. These results indicate that before children can produce emotion words in an adult-like manner, they have some understanding of common emotion words like “happy,” “angry,” “sad,” and “scared.”

Our result that children performed better on cross-valence than within-valence comparisons is broadly consistent with the theory that children initially organize emotions based on valence (i.e., positive vs. negative; Widen, 2013). Our study adds to this literature, however, in finding a relatively early ability to differentiate same-valence emotions. Across two experiments, our study provides robust evidence that within-valence emotion discrimination emerges as early as the third year. This result is aligned with more recent work demonstrating a fine-grained understanding of emotion early in development. For instance, even one-year-old infants are able to discriminate between a range of same-valence vocal expressions and search for their likely causes in the environment (Wu et al., 2017; see also Ruba et al., 2019; Walle et al., 2017). Our result also has implications for a proposal on emotional development: emotion words play a role in the acquisition of emotion concepts (Hoemann et al., 2019; Ogren & Sandhofer, 2022; Shablack et al., 2020). An earlier ability to comprehend emotion words suggests that the influence of emotion words may also take place much earlier in development, shaping young children’s construction of emotion concepts.

Why did children, especially 2-year-olds, perform better on within-valence mappings in our study than in most of prior work (e.g., Bormann-Kischkel et al., 1990; Bullock & Russell, 1984; Widen & Russell, 2003, 2008)? There are several possibilities. First, while much prior work used a word production task (Widen, 2013), we removed the production demand by using a comprehension task. Given the extensive prior literature on the earlier emergence of comprehension than production (e.g., Bergelson & Swingley, 2012; Wagner et al., 2018), there is every reason to suspect that (perhaps partial) word meanings would be measurable earlier in comprehension. Second, rather than asking children to select from or categorize an array of facial configurations (Bullock & Russell, 1984; Widen & Russell, 2008), children in our study were asked to choose from two emotion stimuli at a time. This smaller array might be more manageable than other comprehension tasks given children’s limited attention span and working memory (Bullock & Russell, 1985). Last, children participated in our study remotely from home and were accompanied by their parents. While early findings have demonstrated comparable results between in person and online testing (Chuey et al., 2021, 2022; Scott et al., 2017), it is plausible that online testing, especially those that do not require the presence of an experimenter, could potentially result in better performance for specific age groups and tasks. In particular, 2-year-olds are often challenging to test with explicit measures (e.g., forced-choice tasks) in a laboratory setting. They may feel more at ease completing tasks at home with their parents present than in a laboratory with experimenters, and in turn, show better performance. It is beyond the scope of the current paper to pin down which possibility best explains children’s performance. However, these possibilities may not be mutually exclusive and may work synergistically in reducing task demands and consequently revealing young children’s competencies.

Adding body postures to facial configurations did not further improve 2- to 4-year-olds’ performance. One possibility is that our body-language stimuli did not accurately and specifically represent the target emotions. This explanation is unlikely, however, given that at least adults’ emotion attribution was improved by body cues in the angry and scared categories (and their performance on the happy and sad categories was near ceiling). Another possibility is that children at this age have not learned to map emotion words to body postures. That idea is inconsistent with other work showing a growing ability to recognize body postures over this age range (Nelson & Russell, 2011a). Another explanation is that, in line with Nelson & Russell (2011a, 2011b), preschool-aged children (3–5 years) do not benefit from the presentation of multiple emotion cues over the presentation of a facial configuration alone. For instance, unlike adults and older children, preschool-aged children may take a feature-based approach, rather than a holistic approach, to processing emotion cues, and thus fail to utilize and integrate multiple sources of emotion information. While future work is needed to explain the exact role of body postures, both of our experiments replicated each other, showing consistently that an understanding of same-valence emotion words emerges as early as the third year.

We did not find that the extent of children’s exposure to and use of masks in their daily lives predicted their performance in our task. Considering the children’s proficiency in matching emotion words to facial configurations in our study, our findings suggest that the experience of observing mask-wearing faces may not impact children’s ability to learn about emotions generally. It is possible that children can still perceive emotion even when a mask covers part of the face (Ruba & Pollak, 2020). Alternatively, it is possible that emotion perception is influenced by masks (Gil & Le Bigot, 2023; Gori et al., 2021; Kastendieck et al., 2021), but the presence of unmasked faces in children’s environment (e.g., when they are at home with unmasked parents) provides sufficient input for children to learn about emotions and emotion words. Although our study cannot differentiate between the two possibilities, both are consistent with the idea that children can effectively learn despite variability in input.

Our study poses important questions for future work. First, it raises the possibility that emotion word comprehension may be present even before age two. Using infant-friendly tasks such as eye tracking, prior work has found an understanding of common nouns in infants as young as 6 to 9 months old (Bergelson & Swingley, 2012) and an understanding of color words in toddlers between 18 and 33 months (Wagner et al., 2018), well before these children can pass other, more demanding tasks. Could emotion words be similar? Future work could test this possibility with infants using eye tracking paradigms. Second, open questions remain regarding the generalizability of our findings. More work is needed to examine if our findings are generalizable to children in other populations as well as to a larger number of emotion words. Last, it is important for future work to use more naturalistic stimuli that better resemble emotion cues in the real world.

The current study moves us one step closer to understanding children’s early emotion word knowledge. While it takes years for children to reach an adult level in producing emotion words, a systematic comprehension of those words can be measured during the third year, despite the fact that emotion cues are highly variable in the environment. Such powerful learning capacities may lay an important foundation for the later-developing, sophisticated abilities to communicate, understand, and regulate emotions.

Contributed to conception and design: YW, HMM, CMB, MCF Contributed to acquisition of data: YW, HMM Contributed to analysis and interpretation of data: YW, MCF Drafted and/or revised the article: YW, MCF Approved the submitted version for publication: YW, HM, CMB, MCF

The authors received no financial support for this project.

There are no competing interests declared by authors.

All the stimuli, data, and analysis scripts can be found at


Note that while the 2AFC task is a commonly-used, child-friendly task in developmental work (e.g., Tillman & Barner, 2015; Yurovsky et al., 2016; Jenny R. Saffran et al., 1997), the task (and other choice-from-array tasks) has evoked fierce debates in affective science on whether it provides strong evidence for the universal account of emotion recognition (Russell 1994; Hoemann et al. 2019). Due to the limited, preselected facial configurations provided on each trial, the task has been criticized for its potential for inflating effect sizes, resulting in the appearance of universality across cultures. The main focus of our study, however, is not on testing the universal account of emotion. Instead, we are interested in the earliest stage of emotion word acquisition. Consistent with the broader word learning literature (e.g., Bergelson & Swingley, 2012; Fernald et al., 1998), we take as a premise that children need to have at least some knowledge of emotion in order to show above-chance performance on the 2AFC task.


Our preregistration is available at


Preregistration available at

Aviezer, H., Hassin, R. R., Ryan, J., Grady, C., Susskind, J., Anderson, A., Moscovitch, M., & Bentin, S. (2008). Angry, Disgusted, or Afraid? Psychological Science, 19(7), 724–732.
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278.
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1–48.
Bergelson, E., & Swingley, D. (2012). At 6–9 months, human infants know the meanings of many common nouns. Proceedings of the National Academy of Sciences, 109(9), 3253–3258.
Bergelson, E., & Swingley, D. (2013). The acquisition of abstract words by young infants. Cognition, 127(3), 391–397.
Bormann-Kischkel, C., Hildebrand-Pascher, S., & Stegbauer, G. (1990). The development of emotional concepts: A replication with a german sample. International Journal of Behavioral Development, 13(3), 355–372.
Bullock, M., & Russell, J. A. (1984). Preschool Children’s Interpretation of Facial Expressions of Emotion. International Journal of Behavioral Development, 7(2), 193–214.
Bullock, M., & Russell, J. A. (1985). Further Evidence on Preschoolers’ Interpretation of Facial Expressions. International Journal of Behavioral Development, 8(1), 15–38.
Chuey, A., Asaba, M., Bridgers, S., Carrillo, B., Dietz, G., Garcia, T., Leonard, J. A., Liu, S., Merrick, M., Radwan, S., Stegall, J., Velez, N., Woo, B., Wu, Y., Zhou, X. J., Frank, M. C., & Gweon, H. (2021). Moderated online data-collection for developmental research: Methods and replications. Frontiers in Psychology, 12, 4968.
Chuey, A., Boyce, V., Cao, A., & Frank, M. C. (2022). Conducting developmental research online vs. In-person: A meta-analysis.
Fernald, A., Pinto, J. P., Swingley, D., Weinberg, A., & McRoberts, G. W. (1998). Rapid gains in speed of verbal processing by infants in the 2nd year. Psychological Science, 9(3), 228–231.
Forbes, S. H., & Plunkett, K. (2018). Linguistic and cultural variation in early color word learning. Child Development, 91(1), 28–42.
Frank, M. C., Braginsky, M., Yurovsky, D., & Marchman, V. A. (2016). Wordbank: An open repository for developmental vocabulary data. Journal of Child Language, 44(3), 677–694.
Gates, G. S. (1923). An experimental study of the growth of social perception. Journal of Educational Psychology, 14(8), 449–461.
Gil, S., & Le Bigot, L. (2023). Emotional face recognition when a colored mask is worn: A cross-sectional study. Scientific Reports, 13(1), 174.
Gori, M., Schiatti, L., & Amadeo, M. B. (2021). Masking emotions: Face masks impair how we read emotions. Frontiers in Psychology, 12, 1541.
Hamlin, J. K. (2013). Moral judgment and action in preverbal infants and toddlers: Evidence for an innate moral core. Current Directions in Psychological Science, 22(3), 186–193.
Hoemann, K., Xu, F., & Barrett, L. F. (2019). Emotion words, emotion concepts, and emotional development in children: A constructionist hypothesis. Developmental Psychology, 55(9), 1830–1849.
Izard, C. E. (1971). The face of emotion. New York: Appleton-Century Crofts.
Kastendieck, T., Zillmer, S., & Hess, U. (2021). (Un)mask yourself! Effects of face masks on facial mimicry and emotion perception during the COVID-19 pandemic. Cognition and Emotion, 36(1), 59–69.
Liu, S., Ullman, T. D., Tenenbaum, J. B., & Spelke, E. S. (2017). Ten-month-old infants infer the value of goals from the costs of actions. Science, 358(6366), 1038–1041.
LoBue, V., & Thrasher, C. (2014). The Child Affective Facial Expression (CAFE) set [Data set]. Databrary.
LoBue, V., & Thrasher, C. (2015). The child affective facial expression (CAFE) set: Validity and reliability from untrained adults. Frontiers in Psychology, 5, 1532.
Maassarani, R., Gosselin, P., Montembeault, P., & Gagnon, M. (2014). French-speaking children’s freely produced labels for facial expressions. Frontiers in Psychology, 5, 555.
Meeren, H. K. M., Van Heijnsbergen, C. C. R. J., & De Gelder, B. (2005). Rapid perceptual integration of facial expression and emotional body language. Proceedings of the National Academy of Sciences, 102(45), 16518–16523.
Mondloch, C. J. (2012). Sad or fearful? The influence of body posture on adults’ and children’s perception of facial displays of emotion. Journal of Experimental Child Psychology, 111(2), 180–196.
Mondloch, C. J., Horner, M., & Mian, J. (2013). Wide eyes and drooping arms: Adult-like congruency effects emerge early in the development of sensitivity to emotional faces and body postures. Journal of Experimental Child Psychology, 114(2), 203–216.
Nelson, N. L., & Mondloch, C. J. (2017). Adults’ and children’s perception of facial expressions is influenced by body postures even for dynamic stimuli. Visual Cognition, 25(4–6), 563–574.
Nelson, N. L., & Russell, J. A. (2011a). Preschoolers’ use of dynamic facial, bodily, and vocal cues to emotion. Journal of Experimental Child Psychology, 110(1), 52–61.
Nelson, N. L., & Russell, J. A. (2011b). Putting motion in emotion: Do dynamic presentations increase preschooler’s recognition of emotion? Cognitive Development, 26(3), 248–259.
Nook, E. C. (2021). Emotion differentiation and youth mental health: Current understanding and open questions. Frontiers in Psychology, 12, 700298.
Ogren, M., & Sandhofer, C. M. (2022). Emotion words link faces to emotional scenarios in early childhood. Emotion, 22(1), 167–178.
Pons, F., Harris, P. L., & de Rosnay, M. (2004). Emotion comprehension between 3 and 11 years: Developmental periods and hierarchical organization. European Journal of Developmental Psychology, 1(2), 127–152.
Price, G. F., Ogren, M., & Sandhofer, C. M. (2022). Sorting out emotions: How labels influence emotion categorization. Developmental Psychology, 58(9), 1665–1675.
Ridgeway, D., Waters, E., & Kuczaj, S. A. (1985). Acquisition of emotion-descriptive language: Receptive and productive vocabulary norms for ages 18 months to 6 years. Developmental Psychology, 21(5), 901–908.
Ruba, A. L., Meltzoff, A. N., & Repacholi, B. M. (2019). How do you feel? Preverbal infants match negative emotions to events. Developmental Psychology, 55(6), 1138–1149.
Ruba, A. L., & Pollak, S. D. (2020). Children’s emotion inferences from masked faces: Implications for social interactions during COVID-19. Plos One, 15(12), e0243708.
Russell, J. A., & Widen, S. C. (2002). A Label Superiority Effect in Children’s Categorization of Facial Expressions. Social Development, 11(1), 30–52.
Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70(1), 27–52.
Saffran, J. R., Newport, E. L., Aslin, R. N., Tunick, R. A., & Barrueco, S. (1997). Incidental language learning: Listening (and learning) out of the corner of your ear. Psychological Science, 8(2), 101–105.
Schmidt, M. F. H., & Sommerville, J. A. (2011). Fairness expectations and altruistic sharing in 15-month-old human infants. PloS One, 6(10), e23223.
Scott, K., Chu, J., & Schulz, L. (2017). Lookit (part 2): Assessing the viability of online developmental research, results from three case studies. Open Mind, 1(1), 15–29.
Shablack, H., Stein, A. G., & Lindquist, K. A. (2020). Comment: A role of language in infant emotion concept acquisition. Emotion Review, 12(4), 251–253.
Tillman, K. A., & Barner, D. (2015). Learning the language of time: Children’s acquisition of duration words. Cognitive Psychology, 78, 57–77.
Tomasello, M., Carpenter, M., & Liszkowski, U. (2007). A new look at infant pointing. Child Development, 78(3), 705–722.
Van den Stock, J., Righart, R., & De Gelder, B. (2007). Body expressions influence recognition of emotions in the face and voice. Emotion, 7(3), 487–494.
Wagner, K., Dobkins, K., & Barner, D. (2013). Slow mapping: Color word learning as a gradual inductive process. Cognition, 127(3), 307–317.
Wagner, K., Jergens, J., & Barner, D. (2018). Partial color word comprehension precedes production. Language Learning and Development, 14(4), 241–261.
Walle, E. A., Reschke, P. J., Camras, L. A., & Campos, J. J. (2017). Infant differential behavioral responding to discrete emotions. Emotion, 17(7), 1078–1091.
Widen, S. C. (2013). Children’s interpretation of facial expressions: The long path from valence-based to specific discrete categories. Emotion Review, 5(1), 72–77.
Widen, S. C., & Russell, J. A. (2003). A closer look at preschoolers’ freely produced labels for facial expressions. Developmental Psychology, 39(1), 114–128.
Widen, S. C., & Russell, J. A. (2008). Children acquire emotion categories gradually. Cognitive Development, 23(2), 291–312.
Witkower, Z., Tracy, J. L., Pun, A., & Baron, A. S. (2021). Can children recognize bodily expressions of emotion? Journal of Nonverbal Behavior, 45(4), 505–518.
Wu, Y., Muentener, P., & Schulz, L. E. (2017). One- to four-year-olds connect diverse positive emotional vocalizations to their probable causes. Proceedings of the National Academy of Sciences, 114(45), 11896–11901.
Wynn, K. (1992). Children’s acquisition of the number words and the counting system. Cognitive Psychology, 24(2), 220–251.
Yu, Y., Bonawitz, E., & Shafto, P. (2017). Pedagogical questions in parent–child conversations. Child Development, 90(1), 147–161.
Yurovsky, D., Case, S., & Frank, M. C. (2016). Preschoolers flexibly adapt to linguistic input in a noisy channel. Psychological Science, 28(1), 132–140.
Yurovsky, D., Wagner, K., Barner, D., & Frank, M. C. (2015). Signatures of domain-general categorization mechanisms in color word learning. Proceedings of the 36th Annual Conference of the Cognitive Science Society, 2775–2780.
This is an open access article distributed under the terms of the Creative Commons Attribution License (4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Supplementary data