Infant language development includes a complex social dynamic between adults and infants. Infant vocalization is a well-studied area of development, however adult perception of infant vocalization is less well-understood. The effectiveness of identifications made by adults may impact the social feedback loops that drive development. We collected data from a final sample of 460 undergraduate students who listened to brief (100-500 ms) audio clips of infant vocalization. Participants were asked to identify infants in the audio clips as male/female, English/non-English, and their approximate age. Participants were unable to determine the sex of the infant better than chance but showed better than chance performance for language and age, albeit with low accuracy. Exploratory follow-up analyses did not reveal an effect of caregiving experience, childcare experience, or participant gender on a participants’ ability to correctly identify the infant’s age, sex, or language. These findings suggest that adult caregivers, regardless of experience, are able to perceive elements of infant vocalizations that may influence responsiveness to infant vocal development. However, performance is far from perfect.
Most research in infant language development focuses on the infant perspective. However, infants’ communicative experiences do not happen in a vacuum but rather in interactions with adult caregivers. Examining this process from the adult perspective may provide important insights into how caregivers’ perceptions influence these dynamic interactions. The ability of an adult to identify infant characteristics such as age, language and sex based on their vocalizations could inform our understanding of the extent to which caregivers correctly perceive characteristics of the developing infant, and thereby whether infant characteristics may influence caregivers’ communicative behaviours. Variables such as the adult listener’s gender and infant caregiving experience may also be important factors that predict their ability to communicate with infants and discriminate between infant characteristics.
Signal Detection in Infant and Caregiver Vocalizations
Although there is less research on caregiver perceptions of early speech-related vocalizations, there is a broad literature on the form/function of infant vocalizations more generally, including early cries and laughter, and from the side of caregiver vocalizations. This work touches on questions of the universality of human vocalizations (both infant and adult), and the extent to which a particular communicative signal is detectable. For example, research has found that nonverbal vocal communicative phenomena such as laughter are quite consistent across cultures (Bryant et al., 2018) and that infants as young as 5 months old can evaluate the acoustic features of laughter (Vouloumanos & Bryant, 2019). In addition, Filippi and colleagues (2017) found that levels of arousal in vocalizations can be inferred across language groups and even across species; this biologically rooted detectability of arousal may mean that caregivers of infants can intuitively detect arousal in infant vocalizations. Similarly, Soltis’ (2004) review of infant cry notes that there is evidence for a human universal of infant crying as a signal to caregivers, and for a graded relationship between level of need and acoustic features of crying, particularly during the experience of pain (e.g. Porter et al., 1986). However, while evidence for specific cries associated with particular meanings has been found (e.g. Wiesenfeld et al., 1981), the evidence has been mixed, with some studies reporting no such ability (e.g. Müller et al., 1974). From a converse perspective, more recent research suggests that newborn cries may already partly reflect properties of the ambient language (Mampe et al., 2009).
The issues of universality and signal detection have also been examined from the perspective of caregiver vocalizations, and some researchers have suggested there may be an evolutionary relationship between infant vocalizations and caregiver infant-directed speech (IDS, e.g. Sachs, 1977). The question of the universality of characteristics of IDS has long been of interest (e.g. Bryant & Barrett, 2007; Farran et al., 2016; Fernald, 1989; Kitamura et al., 2001; Soderstrom, 2007). Acoustic features such as median f0 and f0 variability are found to be the most predictive of infant-directedness across cultures, as measured by naive listeners (Moser et al., 2020). Other studies have pointed to evidence that IDS increases the salience of communicative functions (Fernald, 1989), and have found a relationship between IDS and the communication of emotions (e.g. Soderstrom et al., 2021; Trainor et al., 2000).
Importance of Caregiver Responding
The interactions between caregivers and infants have been shown to be integral in the development of prelinguistic vocalizations. The dynamic communication between infant and caregiver is highlighted in work by Goldstein and Schwade (2008), who found that infants restructured their vocalizations to fit the same phonological form as their caregivers when mothers responded contingently to babbling, but not when mothers’ responses were non-contingent. These results revealed that some aspects of vocal development benefit from contingent social responding from the infant’s caregiver. In later work, Elmlinger et al. (2019) found that contingent speech was less lexically diverse and contained shorter utterances than non-contingent speech. The findings of Elmlinger et al. illustrate how infants influence the ambient language in their learning environments by spurring more easily learnable language from caregivers during social interactions.
Similarly, Warlaumont et al. (2014) examined the role of social feedback on the language development of children with and without autism by analyzing naturalistic vocal interactions between infants and their caregivers. They found that adults respond more readily to speech-related infant vocalizations than non-speech-related vocalizations. Subsequent infant vocalizations, in turn, were influenced by whether the adult had responded to a previous vocalization. Thus, increased adult responsiveness can produce more infant speech-related vocalizations, and this increase in speech-related vocalizations will consequently increase adult responsiveness, what Warlaumont and colleagues refer to as a “social feedback loop”. Such caregiver-infant interactions have long-term consequences: a wide literature supports the importance of caregiver responsiveness during the first years of life for early cognitive and language development as well as their later school readiness and academic performance (Tamis-LeMonda et al., 2001; Zauche et al., 2016).
As part of this social feedback loop, a caregiver’s ability to determine language and the maturity of the infant with whom they are interacting might govern their ability to respond appropriately to that child. However, research examining adults’ abilities to determine the characteristics of infants based on their vocalizations is sparse.
Caregiver’s Perception of Infant Vocal Maturity/Age
For a social feedback loop to drive infants toward vocal maturity, it is necessary for caregivers to perceive differences in infant vocalizations. There are relatively few studies directly examining caregivers’ ability to determine the maturity/age of infants based on their vocalizations. One very old study (Olney & Scholnick, 1976) found that when researchers provided infant vocalizations to adult listeners, they could determine the relative age of an infant with 88% accuracy. Oller et al. (2001) similarly found that parents could accurately recognize canonical babbling without training. In fact, Oller et al. (2001) found that 90% or more of untrained parents are intuitively aware of whether their infants have entered the canonical stage of vocal development, and this awareness was present in families of all socioeconomic, educational, and ethnic backgrounds. Similarly, Ramsdell-Hudock et al. (2019) found that when untrained listeners classified infant vocalizations, their responses largely overlap with published categorical descriptors that researchers and clinicians use. These studies support the idea that adult caregivers are able to determine the maturity of infant vocalizations.
More recently, Cychosz and colleagues (2021) created a “meta”-corpus of infant vocalizations to examine infant vocalizations cross-linguistically and cross-culturally. They asked citizen scientist annotators to categorize the short audio clips of infant vocalizations as 1. Canonical, 2. Non-canonical, 3. Cry, 4. Laugh, or 5. Junk. Results showed that citizen scientists could show agreement over the categorizations of the infant vocalizations across various sub-corpora, suggesting that adults are able to determine the maturity of infant vocalizations. This corpus forms the stimuli for the current study. In the current study, we frame the question a little differently. Rather than asking about vocal maturity directly, we examine naive, untrained listeners’ ability to guess the approximate age of the infant producing the vocalization, as a proxy measure for vocal maturity. This avoids the necessity of pre-training the listeners on linguistic constructs such as canonical vs. non-canonical. Of course, caregivers will presumably already know the age of their child - however, implicit perception of vocal development may still influence how they respond to their infant’s vocalizations as described in the above literature.
Language Discrimination of Infant Vocalization by Adults
A second way of probing adults’ perception of infant vocalizations is to examine whether they are sensitive to the extent to which it matches the language-specific characteristics of the ambient native language. Since babbling converges toward the native language during development (de Boysson-Bardies et al., 1989), an adult’s ability to discriminate the language in which an infant babbles provides another window into their ability to detect the infant’s vocal maturity. One early study, Atkinson et al. (1968), used 15-second-long babbling samples from 5 to 17-month-old American, Russian and Chinese infants to test if adult listeners could correctly identify babbling of infants raised in different language communities as English or non-English. They also examined if participants could predict if infants at a given age are from the same or different language communities. While the participants’ decisions were not totally random, results indicated that adults could not identify English vs non-English, or same vs different language communities. Olney and Scholnick (1976) similarly found that adults were unable to identify the language that an infant was learning, in a comparison of two infants, one English and one Chinese. de Boysson-Bardies et al. (1984) used a more robust variety of languages and infant ages and found evidence in favour of language discrimination. In this study, thirty-two samples of babbling were obtained from Arabic, Chinese and French infants at the ages of 6- 8- and 10 months olds. However, correct identification was not robust and appeared to be related to larger-scale prosodic characteristics of the speech rather than phonological characteristics such as canonical syllable form.
Adult Perception of Infant Sex in Vocalization
Caregivers may interact differently with infants depending on their perception of the infant’s sex (Mesman & Groeneveld, 2018). It has been suggested that these differences in interactions across infant sex are due to gender socialization, but differences may also be due to perceived differences in the infants’ vocalizations. For example, Sung et al. (2013) found that mothers showed differences in the duration and frequency of their responses to male and female infants, which is an important factor in language development. The reason that mothers exhibit different behaviour with male and female infants could be due to the vocalizations these infants produce, but findings to date are mixed on whether sex might influence infant vocalization, particularly in whether it occurs in a manner that is perceptible to caregivers. One recent study, Oller et al. (2020), found that male infants babble more than females, however Cychosz et al. (2021) did not find evidence in favour of an effect of child sex on canonical babbling ratio in their examination of the corpus we will use. While to our knowledge there have been no studies of the accuracy of caregiver perception of sex via infant vocalization, there is at least some reason to believe that sex-related differences could exist, as estrogen has been positively correlated with verbal development while testosterone has been negatively correlated (e.g. Hollier et al., 2013; Quast et al., 2016). However, such differences if they do exist would likely be small.
Factors Affecting Adult Discrimination Ability
Adults’ discrimination of infant vocalization may vary according to factors like experience with caregiving. One early study (Gladding, 1979) found that trained adult listeners were significantly better at identifying infant cry signals than untrained adult listeners. In another study (Fernald, 1989), parents and college students with no experience with infants were tested on their ability to judge communicative intent of caregiver vocalizations to infants (and to adults). While focused on adult, rather than infant, vocalizations, the results indicated that there were no effects of participant sex or experience with children on adult judgments’ of the communicative intent of the speaker.
In a more recent study, Lindová et al. (2015) found that adults with children were significantly better than adults without children at identifying positive and negative preverbal infant vocalizations. Additionally, the researchers found that younger adults were more accurate at discriminating between the six infant vocalizations when compared to the older adults. However, listeners’ gender was not a significant factor in discriminating between the infant vocalizations. Using functional magnetic resonance imagining (fMRI), Parsons et al. (2017) found that rearing a child (i.e., more experience with an infant) had an impact on neural processing of infant cues compared to non-mothers, suggesting that greater infant experience leads to greater and differential neural reactivity. Similarly, Bouchet and colleagues (2020) found that experience (both prior to testing and within the experiment) led to better recognition of an infant from their cries, but neither sex nor parenthood status had an effect on performance.
Aim and Hypotheses
There is currently sparse research on adults’ ability to discriminate characteristics of infant vocalizations, and factors that might affect this ability. This study tested three preregistered, confirmatory hypotheses and two exploratory hypotheses.
Participants will be able to identify the infant’s sex significantly above chance (50%).
Participants will be able to identify whether the infant is acquiring English or another language above chance (50%).
Participants will be able to identify the infant’s age range above chance (33%).
Participants with caregiving/childcare experience will be able to identify infant’s sex, language and/or age significantly better than other participants.
Participants who identify as females will be able to identify infant’s sex, language and/or age significantly better than other participants.
The experiment was initially pilot tested internally for debugging purposes. Further, prior to collecting our experimental sample, data was collected from six participants to confirm that there were no technical issues and that our system to automatically grant participants credit was working correctly. These data were not used for any analysis.
Participants were recruited from the SONA subject pool at the University of Manitoba in Winnipeg, Canada. SONA is an online system that first-year undergraduate psychology students use in order to participate in research studies. Students received one credit toward their introductory psychology class upon completion of the study. A total of 626 students participated in this study. Data from 166 participants were excluded; see Analysis for details. Of the 460 remaining participants, 293 were female and 167 were male. The average length of reported time as a primary caregiver (within the last five years) was 1.15 months (SD = 6.49), the average length of time reported as a childcare worker or any area with significant exposure to infants (within the last five years) was 5.04 months (SD = 12.34), and the average age of the participants was 19.82 years (SD = 3.78). Modally, the vast majority of participants had no reported experience with caregiving. Further details can be found at https://osf.io/n9emr/. All of the data, including those from pilot testing, are accessible on our GitHub repository: https://github.com/melsod/OCSWinter2020.
We selected audio clips (100-500 ms) of infant vocalizations collected from the BabbleCor corpus (https://osf.io/rz4tx/; Cychosz et al., 2021). BabbleCor is a public cross-linguistic corpus of infant vocalizations in five different languages. We signed and submitted the Data Sharing Agreement (https://osf.io/m2vbd/) in order to have access to a private meta-data file with information for each clip regarding the infant’s sex, age (months) and native language, as well as a prior classification of each audio clip (canonical, non-canonical, laughing, crying, and junk), from the BabbleCor’s citizen science classification project, based on a “majority vote” of 3 or more raters.
In selecting the clips used for our experiment, we excluded the Warlaumont corpus where infants’ native languages were English and/or Spanish as the corpus did not report which infant spoke which language. We included all the other corpora but limited our sample to those clips classified as canonical and non-canonical. Our objective was to study the development of language, so we excluded the non-verbal clips (laugh, cry, junk) and clips containing ambient noise.
We classified the infants found in the corpus into three age categories (0-7 months, 8-18 months, 19-36 months), two language categories (English, Non-English), and two sex categories (Male, Female). Age categories rather than a continuous measure were chosen to simplify the task for the participants and for analysis. Given the corpus we were working with, we selected ages that roughly correspond to developmental periods relevant to language development, i.e. early preverbal period, emergence of canonical babbling and transition to first word, emergence of multiword utterances. Similarly, the dichotomous English/non-English was selected as our participant sample was primarily English-speaking. Due to constraints imposed by the corpus we could not get a single sample that would provide sufficient control for all three main hypotheses. That is, we were not able to subset the data in such a way that ensured that equal numbers of English and non-English infants were included, while controlling for equal numbers of male and female infants as well as equal number of infants in each age group. Therefore, we selected two separate samples; one for the sex hypothesis and one for the age and language hypotheses. We had six categories of infants for the language and age hypotheses: English or non-English and age range groups (e.g., English and 0 to 7 month old, non-English and 19 to 36 month old, etc.). We also had two categories of infants for the sex hypothesis sample (non-English infants aged 8 to 18 months who were either male or female).
For the age and language audio samples, we randomly selected two infants from each category and 20 clips from each infant. In total, we had a first-pass selection of 240 clips for the age and language sample. We used the same process for sex, but only had two categories and selected five female infants and five males. We controlled the sex sample by selecting non-English speaking infants who were 8 to 18 months old. In total, we had a first-pass selection of 200 clips for the sex sample. The code to select clips uses R-studio 1.2.5019 and R 3.6.1 (2019-07-05; RStudio Team, 2020) and can be found on our OSF page.
For the audio selection procedure, three researchers listened to the selected audio clips and made judgments on whether any of them needed to be excluded. The researchers excluded the audio clips that contained sounds other than infant vocalization, such as crying, laughing, parents’ voices or overlapping sounds, and other non-babbling sounds. Although some infants had reached the stage of producing words, the short nature of the clips (~500 ms) made identification based on recognition of lexical items unlikely. All the researchers who made judgments on the audio clips were blind to the age, sex, and language or if two clips belonged to the same infant. One researcher’s decisions were omitted because they were judged too strict, leading to the exclusion of about half of the database. We have included these judgments in our documentation. After excluding the recommended clips, we were left with fewer than ten clips for some of the selected infants. Another researcher then randomly selected and screened new clips for each of these infants until there were ten usable clips for each selected infant (one clip added into the age and language sample, three clips added into the sex sample). Then, for those for whom there were more than ten usable clips, we randomly chose ten audio clips from each infant’s remaining audio clips. The result was our final sample of 10 audio clips of infant vocalizations per infant.
Data collection took place in March, 2020. Although we planned to collect data in person and in our lab, the entire process was moved online due to the COVID-19 pandemic. Participants connected to our study through the SONA system online portal. They were directed to the experiment’s URL: https://melsod.github.io/OCSWinter2020/. Participants were instructed to read through our informed consent form that briefly explained the purposes of the experiment and reminded them of their rights. Then, written instructions asked the participants to sit in front of their computer while wearing headphones to answer the experimental questions. Participants were informed that the experiment should take no more than 30 minutes to complete. Participants then answered the questions using a mouse and keyboard. The mean time taken to complete the experiment was 12.69 minutes (1% of observations trimmed to remove extreme outliers). After consenting to participate in the study, participants were presented with instructions on how to complete the study. Within the instructions, there was an attention check where the participants had to type in a random word that was provided in the instructions. Then, participants completed a soundcheck where they heard two words and they were asked to press the key corresponding to the first letter and the last letter of the two words, respectively. If participants needed more than 5 attempts in either the attention check or sound check, they were excluded from the data analysis.
Next, participants were provided with 10 brief (100-500 ms) audio clips of vocalizations from an individual infant. After listening to the series of 10 audio clips, participants responded to one of the following questions: What is the infant’s sex? (Male/Female) What language is spoken in the infant’s home? (English/Non-English) Or How many months old is the infant? (0-7 months/8-18 months/19-36 months). Participants were required to listen to all of the audio clips at least once (but were not limited in the number of times they could listen to the audio clips) before responding to the question asked. Participants were asked to select a choice according to their best judgement and their answers were collected and anonymized by the computer. After a response, participants would be presented with another 10 audio clips from a different infant. Experimental questions were asked in a block format (e.g. all the sex trials were presented, followed by all of the language trials, followed by all of the age trials) and the order of the three experimental blocks (six unique order conditions) was randomized between participants.
After participants completed the three blocks, they were provided with nine general questions about themselves and their experience with infants (see Appendix).
Data were excluded from our analysis under the following conditions: 1. Incomplete data; 2. Obviously and unquestionably inaccurate/inattentive data (e.g., participant selected the same response for all trials in a block); 3. Participants who did not respond to the attention and/or soundcheck appropriately after more than 5 tries; 4. Participants who selected the “Other” gender category, as we did not expect to have, nor did we obtain, enough participants for any meaningful analysis for that sub-sample. However, these participants are available in the raw dataset; 5. Participants who listed their country of residence as other than Canada/USA and/or who did not identify English as their first language; and 6. Participants who had significant exposure to and/or conversational experience with any of these five languages: Spanish, Tsimane’, Yeli-Dnye, Tseltal Mayan, and Quechua.
We elected to use a relatively homogeneous sample (e.g. excluding participants who do not identify English as their first language) in order to clarify interpretation of our findings. Experience with English versus other languages could influence participants’ ability to discriminate English from other languages, and might also effect their assessment of infant age based on vocalizations of English-learning vs. other-language learning infants. While one might not expect primary language exposure to affect the ability to discriminate infant sex, it is possible that subtle language-specific acoustic features could influence participants judgements in unanticipated ways. Moreover, this allowed for greater comparability of our findings across the 3 discrimination features (age, language and sex).
Based on the above criteria, we implemented the following exclusions: (a) 87 participants reported that they had significant exposure to/conversational experience with non-English languages in the corpus (Spanish, Tsimane’, Yêlí-Dnye, Tseltal Mayan, or Quechua); (b) five participants were excluded as they identified their gender as “Other”; (c) Twenty participants did not identify English as their first language; and (d) three participants reported that they resided in a country other than Canada or the USA. Additionally, we used two exclusion criteria that were based on participant responses: (a) Seven participants were excluded for giving the same response to all of the questions for an entire experimental block (delinquent responding); and (b) Fourteen participants failed to respond correctly to the attention check; and (c) 47 participants failed to respond correctly to the audio check. A response was required for every trial and data was not saved until the end of the experiment so there was no missing data.
We measured the proportion of correct responses for each participant for each of the three questions about the infants’ sex, age, and language. Participants’ responses were coded in a binary fashion as correct or incorrect. We used separate one-sample proportion t-tests to analyze each of the sex, age, and language research questions individually. An alpha of .01 was used in all tests in lieu of a formal correction for multiple tests.
We also conducted exploratory analyses examining whether the gender, recent childcare experience, and/or caregiving experience of the participants had an effect on the proportion of correct responses for any analyses that showed significant above-chance responding.
All analyses were conducted with R and R-studio (R Core Team, 2020; RStudio Team, 2020). The details of all the data and analysis code can be found on our GitHub (https://github.com/melsod/OCSWinter2020) as well as the OSF project page (https://osf.io/2a6b4/) and our preregistrations (https://osf.io/2a6b4/registrations). Note that significant technical difficulties on the part of OSF prevented the formal registration from being processed in a timely fashion, however we proceeded according to the preregistration as submitted in March 2020 and documented any discrepancies and additions.
As specified in the pre-registered analytic plan, a two-tailed, one-sample proportion t-test was used to determine if participants were able to identify the infant’s sex significantly above chance (50%). We found that with a t(459) = -2.93, p = .004, 95% CI = [0.46, 0.49], there is significant evidence (alpha = 0.01) that people respond differently than chance. However, the direction of the difference was in the opposite direction to what we hypothesized and the effect size is very small (d = -0.14). This means that participants are slightly worse than chance at predicting the sex of the infants based on the audio clips presented. See Figure 1 for a histogram of the participants’ accuracy in determining the infant’s sex.
As specified in the pre-registered analytic plan, a two-tailed, one-sample proportion t-test was used to determine if participants were able to identify whether the infant acquired English or a different language above chance (50%). We found that with a t(459) = 4.94, p < .001, 95% CI = [0.52, 0.55], there is significant evidence (alpha = 0.01) that participants perform better than chance when determining whether the baby was acquiring English. However, this difference corresponds to a small effect size (d = 0.23). See Figure 2 for a histogram of the participants’ accuracy in identifying the infant’s language.
As specified in the pre-registered analytic plan, a two-tailed, one-sample proportion t-test was used to determine if participants were able to identify the infant’s age range (from three options) above chance (33%). We found that with a t(459) = 17.34, p < .001, 95% CI = [0.44, 0.46], there is significant evidence (alpha = 0.01) that participants are able to determine the infant’s age better than chance. This difference corresponds to a large effect size (d = 0.84). However, while the statistical effect is large, note that performance is still below 50%. See Figure 3 for a histogram of the participants’ accuracy in determining the infant’s age.
In order to more fully examine the differences found in our first three hypotheses we conducted planned, but exploratory follow-up analyses for those hypotheses that were supported by the data. To this end we conducted hierarchical within-subjects binary logistic regressions for both the age and language related data. Although not included in our pre-registration plan, at the request of a reviewer we also conducted a follow-up analysis on the infant sex-related data even though the difference found was in the wrong direction. These analyses involved sociodemographic data collected from participants during the experiment. Additional exploratory analyses can be found here: https://osf.io/zx4md/.
To examine whether gender, recent childcare experience, or caregiving experience of the participant had an effect on identifying infant age, a hierarchical regression analysis was conducted. We first created a baseline model with only an intercept as a random effect. Then recent child interaction experience (recent childcare and caregiving experience separately) were entered to model as a stage one model.
The random intercept model (the baseline model) was a significant predictor of age with β = -0.197, z = -7.20, and with a corresponding p < .001. The Akaike information criterion (AIC) value for the baseline model was 7602.9. The stage one model (the child interaction model), contained both recent childcare and caregiving experiences as predictors. We found no significant effect (alpha = 0.01) of childcare experience, with β = 0.005, a z = 2.11, and with a corresponding p = .035. There was also no significant effect (alpha = 0.01) for recent caregiving experience with β = -0.007, a z = -1.57, and with a corresponding p = .118. The AIC value for the child interaction model was 7602.1. The results indicated that neither childcare nor caregiving experience of participants significantly affected their identification of infants age.
We then compared the baseline model with the child interaction model. The result was chi-square X2 (2) = 4.84, p = .089, suggesting that the effect of recent childcare and caregiving experience together did not improve model fit significantly (alpha = 0.01). However, the AIC value dropped slightly from 7602.9 to 7602.1, suggesting some minor improvement from the baseline model.
To more closely examine whether childcare experience has a significant effect, we conducted an additional (unplanned) model that contained only recent childcare experience as a main variable. The result of the new model showed no effect of childcare experience, β = 0.003, z = 1.54 and p = .123. We then compared that new model to the baseline model. The results showed a chi-square value X2(1) = 2.37, p = .123. This suggests that childcare experience did not significantly improve the predictive potential of the baseline model. Taken together, these results indicated that neither recent childcare, nor caregiving experience had a significant effect on identifying infant age.
Next, we added gender as another predictive variable into the child interaction model and we called it the gender model (stage two model). Results for this model showed no effect of childcare (β = 0.005, z = 2.17, p = .030), caregiving (β = -0.007, z = -1.60, p = .110), or gender (β = 0.031, z = 0.54, p = .587). The AIC of the gender model was 7603.8.
Finally, we compared the gender model with the child interaction model. The result of comparison was not significant, chi-square value X2(1) = 0.29, p = .587, and the AIC value between two models was increasing. These results indicated that adding gender to our model did not significantly improve the predictive potential of the model. Therefore we can conclude that participant gender did not have a significant effect on identifying infant age.
To examine whether gender, recent childcare experience, or caregiving experience of the participant had an effect on identifying infant language, a hierarchical regression analysis was conducted. We first created a baseline model with only an intercept as a random effect. Then recent child interaction experience (recent childcare and caregiving experience separately) were entered to model as a stage one model.
The random intercept model (the baseline model) was a significant predictor of language identification with β = 0.129, z = 4.79, and with a corresponding p < .001. The AIC value for the baseline model was 7633.4. The stage one model (the child interaction model), contained both recent childcare and caregiving experiences as predictors. We found no significant effect (alpha = 0.01) of childcare experience, with β = .005, a z = 1.98, and with a corresponding p = .048. There was also no significant effect (alpha = 0.01) for recent caregiving experience with β = -0.004, a z = -0.96, and with a corresponding p = .336. The AIC value for the child interaction model was 7633.4. The results indicated that neither childcare nor caregiving experience of participants significantly affected identification of the infants’ language.
We then compared the baseline model with the child interaction model. The result was a chi-square value X2(2) = 3.93, p = .140, suggesting the effect of recent childcare and caregiving experience together did not improve model fit significantly (alpha = 0.01). In fact, the AIC value remained at 7633.4, indicating that adding childcare and caregiving experience as predictors caused no improvement to the baseline model. These results indicate that neither recent childcare nor caregiving experience had a significant effect on identifying infant language.
Next, we added gender as another predictive variable into the child interaction model and we called it the gender model (stage two model). Results for this model showed no effect of childcare (β = 0.005, z = 1.81, p = .071), caregiving (β = -0.004, z = -0.91, p = .365), or gender (β = -0.048, z = -0.84, p = .404). The AIC of the gender model was 7634.7.
Finally, we compared the gender model with the child interaction model. The result of this comparison was not significant, chi-square value X2(1) = 0.70, p = .404, and the AIC value between the two models was increasing. These results indicated that adding gender to our model did not significantly improve the predictive potential of the model. Therefore we can conclude that participant gender did not have a significant effect on identifying infant language.
At the request of the reviewers we deviated from our pre-registered plan to examine whether gender, recent childcare experience, or caregiving experience of the participant had an effect on identifying infant sex (even though sex was not significantly predicted by participants). To do this we conducted a hierarchical regression analysis. We first created a baseline model with only an intercept as a random effect. Then recent child interaction experience (recent childcare and caregiving experience separately) were entered to model as a stage one model.
The random intercept model (the baseline model) was a significant predictor of sex identification with β = -0.086, z = -2.92, and with a corresponding p = 0.004. The AIC value for the baseline model was 6372.4. The stage one model (the child interaction model), contained both recent childcare and caregiving experiences as predictors. We found no significant effect (alpha = 0.01) of childcare experience, with β = .001, a z = 0.50, and with a corresponding p = .620. There was also no significant effect (alpha = 0.01) for recent caregiving experience with β = -0.001, a z = -0.18, and with a corresponding p = .859. The AIC value for the child interaction model was 6376.2. The results indicated that neither childcare nor caregiving experience of participants significantly affected identification of infants’ sex.
We then compared the baseline model with the child interaction model. The result was a chi-square value X2(2) = 0.25, p = .883, suggesting the effect of recent childcare and caregiving experience together did not improve model fit significantly (alpha = 0.01). In fact, the AIC value for the two models was almost identical indicating that adding childcare and caregiving experience as predictors caused no improvement to the baseline model. These results indicate that neither recent childcare nor caregiving experience had a significant effect on identifying infant sex.
Next, we added gender as another predictive variable into the child interaction model and we called it the gender model (stage two model). Results for this model showed no effect of childcare (β = 0.001, z = 0.33, p = .739), caregiving (β = -0.001, z = -0.12, p = .906), or gender (β = -0.057, z = -0.91, p = .361). The AIC of the gender model was 6377.3.
Finally, we compared the gender model with the child interaction model. The result of comparison was not significant, chi-square value X2(1) = 0.84, p = .361, and the AIC value between the two models was nearly identical. These results indicate that adding gender to our model did not significantly improve the predictive potential of the model. Therefore we can conclude that participant gender did not have a significant effect on identifying infant sex.
The current study examined adult participants’ ability to identify infant age, sex, and the language used in the infant’s environment, from characteristics of the infant vocalizations. Contrary to our initial hypothesis, and surprisingly, we found that participants were slightly worse than chance at identifying an infant’s sex. Participants could identify whether the language that the infant was learning was English or non-English better than random chance, although with an extremely small effect size. Participants were also able to identify the infant’s age range better than chance and a large effect size, though performance was still well below perfect performance. Our exploratory analyses indicated that neither childcare nor caregiving experience significantly affected the ability of the participants to identify the infants’ age, language, or sex. Nor did participant’s gender have an effect on identifying the infants’ age, language, or sex.
Results from our study are consistent with previous findings that adults are able to identify the relative age and primary language of an infant (de Boysson-Bardies et al., 1984; Oller et al., 2001; Olney & Scholnick, 1976; Ramsdell-Hudock et al., 2019), although performance was far from perfect, particularly for the language task. While an adult’s ability to identify infants’ relative age and their ability to determine the maturity of infants’ vocalizations are not identical constructs, this approach allowed us to test participants without any training on linguistic constructs, and is meaningful in assessing the participants’ perception of the infant’s development based on their vocalizations. Previous research shows that infants’ vocalizations mature in response to sophisticated adult vocalizations (Goldstein & Schwade, 2008; Warlaumont et al., 2014). Our findings suggest that adults can use characteristics of infant vocalization to gauge infant vocal development, which facilitates appropriate and contingent responding to enhance language development in infants. However, given the low (though statistically significant) performance, it can be questioned how effective this might be for caregivers in influencing appropriate responding. Indeed, even in the case where we found a “large” effect size (infant age), the participants’ judgments were quite far from perfect. This highlights the challenge of relating these kinds of experimental findings to the real world. Our findings can speak to whether there is information in the signal, but not the extent to which this affects caregiver behaviour “in the wild”. It is also worth noting that this performance was based on a small number of brief, isolated audio clips. Caregivers in real-world conditions will have longer utterances and more contextual cues to aid their perception. Our findings are consistent with the variable findings in the literature regarding the ability to discriminate different kinds of infant cry (Soltis, 2004).
As prior research has not analyzed listeners’ ability to identify infants’ sex from vocalizations, our study makes a novel contribution to the existing literature on language development. Contrary to our proposed hypothesis, our findings suggest that participants were not able to identify infants’ sex better than chance. Surprisingly, performance was significantly below chance. This suggests that our participants were significantly more likely to identify male vocalizations as female and vice versa. While the effect size was very small, and smaller than that of the language discrimination task, it complicates the interpretation of our positive findings. Unfortunately, we do not have a satisfactory explanation and suspect that this may be a Type I error. An alternative interpretation is that the participants were able to discriminate infant sex but identified them inversely. We cannot rule out this possibility, but if so, the effect is quite weak. A post hoc acoustic analysis (see the project page for more detail) to ascertain whether the clips could be differentiated based on acoustic features did not identify any systematic differences by infant sex, however this was also the case for language and infant age, so these null findings do not help us interpret the data. A more comprehensive acoustic analysis conducted on a larger dataset, similar to that of Moser et al. (2020) for infant-directed caregiver vocalizations and Mampe et al. (2009) for newborn cries, may shed additional light on this issue.
Also contrary to some prior research findings, we did not find that adult participants with childcare/caregiving experience performed better than participants who had no childcare/caregiving experience, at identifying infants’ age, sex or language. Our results support Lindová et al. (2015) and Bouchet et al. (2020) who found that participants’ gender does not affect their ability to accurately identify infant characteristics. Our findings suggest that all potential caregivers, regardless of gender or caregiving experience, may have innate perceptual capabilities in interpreting infant vocal cues that support their appropriate responding from the perspective of language development. However, as exploratory analyses they must be interpreted with caution - it is likely, particularly given the small effect in the language task, that our data were underpowered to find such an effect. Furthermore, our convenience sample of undergraduate students is not best-situated to address this possibility. Future research may consider examining these questions in a more explicit way, for example by deliberately sampling from parent and non-parent populations. Additionally, it is possible that these effects might emerge in more indirect tests of discrimination – researchers may choose to consider alternate methods of identifying these characteristics that do not directly rely on explicit judgement.
Due to the COVID-19 pandemic, we deviated from our original plan of in-person data collection. However, we have no reason to believe that this adversely affected our results, as online data collection has been widely validated in a large number of different psychological tasks (Crump et al., 2013). We also note a relatively high exclusion rate: 166 participants had to be excluded from our sample. However, exclusion of these participants was motivated and implemented without bias or knowledge of the impact on results. There is no reason to believe the exclusions have biased the sample. However, given that our sample consisted only of Introductory Psychology students in a university setting, our findings may not generalize to other demographics. Similarly, our findings are limited to the relatively small sample of participants available within the BabbleCor corpus, and might not generalize to other languages or infants. In addition, despite the fact that we were careful in selecting our clips, it is possible that some auditory characteristic of the clips themselves apart from the infant vocalization either acted as a confound or a source of noise in the participants’ discrimination. We look forward to the results of ongoing data collection efforts by some members of the BabbleCor team with new infant vocal clips, which will greatly expand the pool of available stimuli and findings.
This research represents a novel contribution to the field of developmental science as little research has been done examining adult judgements of infant vocalizations. This literature is important because it allows us to examine the dynamic of caregiver-infant vocal interactions in a way that has been largely neglected; from the perspective of an adult. The judgements adults make about infant language and age range are important, as language development in infants is a dynamic social process, and, as adults modify their speech in response to infant vocalizations, development is facilitated (Goldstein & Schwade, 2008; Warlaumont et al., 2014). This study demonstrates that adults are able to discriminate aspects of infant vocalizations when presented with brief audio clips, although without a high degree of accuracy, which allows them to effectively support infant vocal development.
This study was completed in partial fulfilment of the University of Manitoba course PSYC 4540 T39 (honours undergraduate) and PSYC 7310 T34 (graduate), Open and Collaborative Science: Theory and Practice. This study was registered with Open Collaborative Science (GUID: 2a6b4).
We would like to thank BabbleCor (https://osf.io/rz4tx/) for allowing us to use their data set for this project, and Trev Sie for assistance with acoustic measurement.
Contributed to conception and design: AAB, OC, MC, CSC, BJ, JAJ, SLM, MM, SM, JSa, BCS, JSp, EIP, TT, DT, JZ, MS
Contributed to acquisition of data: MC, BJ, MM, SM, BCS
Contributed to analysis and interpretation of data: MC, BCS, JSp, JZ
Drafted and/or revised the article: AAB, OC, CSC, BJ, JAJ, SLM, MM, SM, JSa, BCS, JSp, EIP, TT, DT, JZ, MS
Approved the submitted version for publication or provided advance permission for the manuscript to be submitted before the final draft was completed: AAB, OC, MC, CSC, BJ, JAJ, SLM, MM, SM, JSa, BCS, JSp, EIP, TT, DT, JZ, MS
No funding was used for this study.
The authors of this article declare that there are no competing interests.
Data Accessibility Material
Appendix: Demographic questions
In the last 5 years, estimate how many months you have worked/volunteered in childcare or an area with significant exposure to infants (answer “0” if none)
In the last 5 years, estimate how many months you have been the primary caregiver for a child under the age of 3 years (answer “0” if none)
What is your age in years?
What is your gender? (Male, Female, Other)
What is your country of residence? (Canada, USA, Other)
Do you have a normal hearing? (Yes, No)
Is English your first language? (Yes, No)
Do you have significant exposure to/conversational experience with any of the following languages? Select all that apply: (Spanish, Tsimane’, Yêlí-Dnye, Tseltal Mayan, Quechua)
Do you consider yourself monolingual (only speaking one language)? (Yes, No).