Young children’s self-directed speech during activities has long fascinated developmental psychologists. Why do children talk to themselves and does it support development? On Vygotsky’s sociocultural account, children use self-directed speech to regulate thought and behavior while performing challenging tasks, and this speech has a specific developmental trajectory—from overt (i.e., talking at full volume) to partially covert (i.e., whispering or mumbling), before finally becoming fully covert (i.e., inner speech or thought). Many published empirical findings are consistent with this account; however, task-relevant self-directed speech does not always relate to task performance. The reported study aimed to gain insight into why by exploring, in a sample of five and six-year-old children (n = 86), task-relevant overt and partially-covert speech on three tasks, and relations with task performance as well as child and task characteristics. Trial-level multinomial models were used, allowing us to model trial number and task demands that varied by trial. Both overt and partially-covert speech were observed on all tasks, with most children using at least one of these forms of speech some of the time. Relations between self-directed speech and performance varied across tasks, showing positive, null, or negative associations. Self-directed speech varied depending on the demands of the task on a given trial. Overt speech, but not partially covert speech, was related to a parent-report measure of child talkativeness as well as social speech during the task. Age was not consistently related to self-directed speech on the tasks. Together, these findings challenge the notion that overt speech is best understood simply as a transition to partially covert and then fully covert speech. Rather, whether speech is overt (vs. partially or fully covert) may be determined by social goals. More research with a wider variety of tasks is needed to better understand the nature of these two forms of self-directed speech.
Young children often spontaneously–and endearingly–talk to themselves while engaged in activities. Speculation about the nature of self-directed speech dates back to Piaget, who interpreted such talk as a manifestation of young children’s cognitive immaturity and egocentricity (Piaget, 2005). Vygotsky (1978, 1934/2012) famously challenged Piaget’s interpretation, suggesting instead that self-directed speech was functional, playing a key role in the emergence of higher cognitive functioning. For Vygotsky, the overt nature of children’s self-directed speech was an important step in a developmental transition of speech becoming increasingly internalized. This idea fit well with his theory of the social origins of thought, in which others support children’s thinking with their words until children learn to do this for themselves, at first overtly and then covertly. For example, a child completing a challenging activity, such as putting together a puzzle or using a tool, might have their attention and actions guided by an older peer or parent using words, such as labels, instructions, and questions. As a result of such experiences, the child learns to guide themselves with their own speech, first using audible (overt) speech, and later using subvocal (partially covert) speech (e.g., whispering or muttering). Moreover, on Vygotsky’s account, the internalization process was posited to be functional; as overt speech was internalized, it became condensed and more efficient and effective as a tool for self-regulation (Alderson-Day & Fernyhough, 2015; Vygotsky, 1934/2012).
These ideas captivated the field and generated decades of empirical work that appears generally consistent with Vygotsky’s ideas (Alderson-Day & Fernyhough, 2015; Winsler et al., 2009). Laboratory-based studies have found children talk to themselves while completing various cognitively demanding tasks, such as puzzles and planning, recalling, matching, and switching (Al-Namlah et al., 2006; Behrend et al., 1989; Fernyhough & Fradley, 2005; Flavell et al., 1966; Karbach & Kray, 2007; Winsler & Naglieri, 2003). Self-directed speech often correlates with age (e.g., Al-Namlah et al., 2006; Behrend et al., 1989; Elliott et al., 2021; Sawyer & Brooks, 2021; Winsler et al., 2005; Winsler & Naglieri, 2003) and children use more task-relevant speech after their performance is scaffolded by others’ speech (Winsler et al., 2005).
Self-directed speech is also often associated with task performance. For example, preschool-aged children are more successful at constructing models when they use planning-related speech (Mulvihill et al., 2021). Spontaneous rehearsal on delayed recall tasks is associated with better recall in five- and six-year-olds (Elliott et al., 2021; Flavell et al., 1966). Interfering with self-directed speech leads to deleterious effects on performance, in children and adults (Emerson & Miyake, 2003; Kray et al., 2008; Lidstone et al., 2010). Self-directed speech on some tasks varies with task difficulty, sometimes showing a curvilinear pattern with the greatest amount of speech occurring during tasks that are intermediate in difficulty (Behrend et al., 1989; Fernyhough & Fradley, 2005; Winsler & Naglieri, 2003).
However, self-directed speech during tasks does not always relate to performance. For example, one large cross-sectional study of 5- to 17-year-olds found that 5-year-olds were more accurate on a planning task when they used overt speech, yet this relation was not found for other age groups who continued to use such speech (Winsler & Naglieri, 2003). Similarly, in another study children who were taught labels for novel shapes were later better able to track the shapes in a sustained attention task than children who were not taught labels (Doebel et al., 2018); however, while children who were taught labels also tended to spontaneously verbalize them during the task, this verbalization did not relate to performance. Another study found some indication of a relation between self-directed speech and performance on a planning task, but findings were inconsistent, with overt speech and partially covert speech correlating with one of two measures of performance (and not the same ones), and doing so concurrently but not longitudinally (Fernyhough & Fradley, 2005).
Why does self-directed speech sometimes relate to performance and sometimes not? One proposed explanation is that after a certain age, self-directed speech may not be useful on some tasks yet children persist in using it (Winsler & Naglieri, 2003). In one previous study, between 10 and 33% of children aged 11 to 17 years spontaneously used overt or partially covert speech during a trail-making task without being prompted to do so (Winsler & Naglieri, 2003). It is not clear, however, why speech would altogether cease to be useful.
Another possibility is that overt speech is partly determined by children’s sociality and talkativeness. That is, children may use task-relevant speech out loud, not because it is a necessary transition in using speech to guide themselves but because they also have a social agenda (Bjorklund, 2009). In this case, some children could be using speech covertly and some overtly and thus the relation between overt speech and performance would be weak or non-existent despite speech being useful. Consistent with these ideas, Vygotsky (1962) found that children used less self-directed speech when they believed they were alone. More recent research found that children use less self-directed speech when engaged in digital as opposed to concrete tasks, which could be because they are more immerse in the task and less socially inclined (Bochicchio et al., 2022). Teacher ratings of talkativeness are associated with overt task-relevant speech but not partially covert self-directed speech (Fernyhough & Fradley, 2005). Similarly, overt task-relevant speech correlates with social speech (Al-Namlah et al., 2006). Overt task-relevant speech also correlates with positive affect and negatively correlates with neutral affect and parent perceptions of children’s social skills (Winsler & Naglieri, 2003). Children sometimes use more overt task-relevant speech when others are present versus when they are alone, whereas the same is not true for partially covert speech (McGonigle-Chalmers et al., 2014). Children also use more overt—but not partially covert—speech when their activities are framed by adults as play versus as work (Sawyer & Brooks, 2021).
In the field’s enthusiasm for Vygotsky’s ideas, there have been few attempts to generate data that could falsify or prompt adjustments to his theory. For example, if indeed the overtness of self-directed speech is partly determined by children’s social goals, this would suggest that the notion of a general overt-to-partially covert developmental trajectory may be worth interrogating further. That is, it may be the case that children are capable of internalizing self-directed speech to support self-regulation quite early in development, particularly on certain executive function tasks, without first using speech overtly. This is consistent with findings that three- to five-year-old children show better executive function when provided with goal-relevant verbal labels (e.g., Doebel et al., 2018; Doebel & Zelazo, 2013, 2015) For example, on the Dimensional Change Card Sort, children are instructed to sort cards (e.g., blue boats, red trucks) by one dimension (e.g., color) for several trials before being instructed to switch to a new dimension (e.g., shape). Young children struggle to shift to the new sorting dimension and typically “perseverate” on the old one; however, children are more likely to switch when the to-be-sorted card is first labeled by the relevant dimension (e.g., “Here’s a boat”) versus if it is labeled by both dimensions (e.g., “Here’s a red boat”) or by neither dimension (“Here’s one”) (Doebel & Zelazo, 2013, 2015). Prior theory and empirical findings indicate that the engagement of executive function on switching tasks is likely verbally mediated (e.g., Doebel & Zelazo, 2013; Emerson & Miyake, 2003; Karbach & Kray, 2007; Kray et al., 2008, 2009; Miyake et al., 2004; Zelazo, 2004), yet younger children rarely spontaneously vocalize labels on such tasks.
It is also possible that standard analytic approaches are not always be sensitive enough to detect relations between speech and performance. Many analyses average or sum across task trials to create estimates of self-directed speech. In so doing, information may be lost regarding trial-level variation in task features and demands that could influence whether or not speech is used and benefits performance. For example, on the Tower of London, a planning task used to measure self-directed speech, success on some (but not all) trials requires making counterintuitive moves or resisting moves that are available but that will increase the total number of moves required to achieve a solution (Bull et al., 2004). Similarly, on the Selective Attention task, children match pictures by different dimensions (e.g., shape, number, color), which vary in salience (Chan & Mazzocco, 2017). Previous findings also suggest trial number may predict self-directed speech, with children using less speech as the task goes on (Berk & Spuhl, 1995).
Taken together, it is clear from previous work that children use task-relevant self-directed speech during cognitively challenging tasks. However, such speech is not always related to performance. The current exploratory study aimed to provide insights into when and why self-directed speech correlates with performance by examining self-directed speech and performance on three tasks that have been used previously to elicit and study self-directed speech: a planning task (Fernyhough & Fradley, 2005), a visual dimensional matching task (Winsler et al., 1997) and a delayed recall task (Flavell et al., 1966). We selected these tasks in part because they vary greatly in their performance demands (e.g., matching by dimension, recalling information in order, planning), allowing us to examine speech in different task contexts. We modeled self-directed speech at the trial level, predicting the odds of a particular form of speech being used on a given trial, allowing us to account for trial number and specific task trial characteristics that could influence performance and speech use.
The study examined overt speech and partially covert speech separately, testing if these forms differentially relate to performance. If overt speech is partially determined by social goals, it may be less related to performance. We also explored whether the two forms of speech are differentially predicted by social speech variables (talkativeness with strangers and social talk during the task). While this is not the first study to explore children’s self-directed speech on multiple tasks, by exploring overt and partially covert speech separately and modeling our data at the trial level, we hoped to learn more about how the manifestation of self-directed speech during a task relates to performance, task-trial demands, and child sociality.
Method
Participants
Eighty-six 5- to 7-year-old children (M = 5.99 years SD = .61, range = 5.0 – 7.1; females = 47) were recruited to participate in an in-person (lab-based) study in 2015 and were drawn from a database of U.S.-based families who had previously indicated interest in participating in research. Four additional children were excluded from the study due to uncooperativeness (n = 3) and developmental delay (n = 1) that prohibited participation. The target sample size was 85 participants, which would provide 80% power to detect an r of .3 at an alpha level of p < .05.
Some children did not complete some tasks due to failure to satisfactorily complete the practice portion of a given task or uncooperativeness. In total, 84, 83, and 72 children completed the Delayed Recall task, Selective Attention task, and Tower of London task, respectively. Due to the geographical region in which this study was conducted, most participants (> 90%) had at least one parent with a four-year college degree and were White and non-Hispanic/Latino.
Design and Measures
The reported study used a cross-sectional correlational design. Children completed the three focal tasks across two test sessions. These tasks were administered as part of a larger study on self-directed speech and proactive control. The larger study included several measures that are not reported here. The tasks were completed across two 45-minute sessions that were a week or less apart. Tasks were administered in a fixed order, in order to minimize variation between subjects in task performance due to differences in order, as is recommended in individual differences studies (Friedman et al., 2008). At the first session, children completed the Delayed Recall and Selective Attention tasks, and at the second session they completed the Tower of London.
Delayed Recall Task. This computerized task was an adaptation based in part on a task used by Flavell, Beach, and Chinsky (1966). On each trial, children were shown images of common objects presented serially on a computer screen (e.g., a ball, a house, and a shoe), and after a brief delay were asked to recall the order in which they were presented. At test, the three items were presented side-by-side but in a new order, and children had to point to the pictures in the order that they saw them. The delay between the presentation of the last image and the response screen was eight seconds. Following three practice trials, children completed 10 test trials. The primary performance measure was accuracy in recalling the original order on a specific trial.
Selective Attention Task. This task was adapted from Manfra and Winsler (2006). In this task, children were presented with an easel that displayed three small pictures, side-by-side, that matched one another on one of three dimensions: shape, color, or number (Figure 2). For example, as illustrated in Figure 2, they might have been shown three pictures that varied in shape and number but were all the same color (purple). Children needed to identify the dimension or dimensional value on which the pictures matched (e.g., same color). To provide their response, children were given access to an open, translucent box containing 18 picture cards depicting a single dimensional value (e.g., a purple splotch). Following a demonstration trial and two practice trials, children completed 12 test trials (four of each dimension). The primary performance measure was children’s accuracy at identifying the correct matching dimension on a given trial. We also coded the dimension that children had to match by on a given trial, expecting that performance might vary depending on the dimension involved in line with previous work (Chan & Mazzocco, 2017).
Tower of London Task. This task was adapted from Fernyhough and Fradley (2005). Children were presented with two apparatuses, each of which had three wooden pegs of different lengths and three colored wooden spheres on the pegs (Figure 3). The spheres were configured in a different arrangement on each apparatus, and children were instructed to make one apparatus look like the other in as few moves as possible. They were also instructed that they could only move one sphere at a time and had to keep all spheres on the pegs (i.e., not holding a sphere in their hand while making moves with another sphere). Children completed six trials in total, the first half of which could be completed in three moves, and the second half of which could be completed in four moves. Performance was indexed in two ways: by whether or not children solved the puzzle, and the total number of moves they made beyond what the task trial required. Despite clear instructions, children often did not adhere to the rules and either used two hands or tried to stack two balls on the short peg. When children did not follow a rule, the experimenter restarted the trial and we include only the trial where there was no rule break. When children did not follow a rule but it was not observed by the experimenter, we excluded the trial. When a child became frustrated on a trial and refused to proceed after one or two moves, the trial was restarted, to avoid letting the child become too discouraged to continue with the task and study.
We also coded three trial-level variables for this task: the minimal number of moves to a solution (three vs. four), the number of counterintuitive moves, and the presence of a prepotent opening move. Counterintuitive moves were defined as “sub-goal” moves that would require moving a sphere to a peg that was not its final destination (Bull et al., 2004). We defined prepotent moves as those that were suggested by the starting state and/or goal state. For example, a prepotent move might be to move a sphere to an open peg rather than to a more strategic peg that already had a sphere on it. Another prepotent move might be to move a sphere directly to its final destination before ensuring other spheres were moved away from that location first (Bull et al., 2004). Two raters coded these features for each of the six trials with 100% agreement.
Talkativeness. Parents were asked to rate their child’s level of talkativeness with people he or she does not know, on a five-point scale, with ‘1’ indicating that the child is not at all talkative, and ‘5’ indicating that the child is very talkative. This approach was adapted from prior work in which teachers were asked to rate children’s general talkativeness (Fernyhough & Fradley, 2005). By asking specifically about talkativeness with unfamiliar people we expected to reduce the likelihood that parents’ evaluations would reflect how talkative their child was generally at home.
Self-directed Speech Coding. Similar to previous approaches (e.g., Berk, 1986; Winsler et al., 2005), we first transcribed all of children’s speech during the tasks, noting for each task the trial during which the speech occurred. From the transcripts, we coded, for each task, the occurrence of task-relevant overt speech and partially covert self-directed speech on each trial. Task-relevant overt speech was defined as normal volume speech (using a regular speaking voice) that was not directed at another person. This definition excluded meta-comments about the task or stimuli (e.g., “This is hard.”) and comments unrelated to the task. Partially covert self-directed speech was defined as non-social speech that was lower-than-normal volume, including whispering, muttering, and lip movement. It was often not possible to determine whether such speech was task-relevant because, by its nature, it was difficult or impossible to hear. But in cases where it could be discerned that whispering was not task-relevant, it was not coded. In addition, we coded whether or not children used any social speech during the task, during or in between trials. Speech was classified as social on the basis of the content of the speech or social cues like looking at the experimenter. A second research assistant coded a random selection of 20% of the transcripts and agreement was high (Kappa for overt speech = .86, Kappa for partially covert self-directed speech = .82).
Results
Analytic Approach
Analytic code and data can be found on the Open Science Framework: https://osf.io/q7n5k/. We fit multinominal baseline logit models using the mblogit() function from the mclogit package in R (R. Core Team, 2013) to predict the odds of children using a specific form of task-relevant self-directed speech (overt, partially covert, or mixed) versus no speech on a given trial of a given task. ‘No speech’ was the reference category. Thus, the dependent variable was the odds of speech in one of these categories occurring (versus no speech) on a given trial. This allowed us to cleanly investigate what predicted the presence of each of these forms of speech, with specific interest in overt and partially covert speech. Multinomial models allow parameters to be estimated more efficiently than a series of binomial models. Trials on which children used both forms of speech are inherently difficult to interpret; however, we report them for completeness. Random intercepts were included to account for subject-level variation and repeated measurement. Additional exploratory analyses were implemented using generalized linear mixed effects models using the glmer() function in the lme4 package (Bates et al., 2014).
Description of Self-directed Speech and Relations with Age
Table 1 provides examples of overt and partially covert speech used on each task.
Task | Overt speech | Partially-covert speech (whispering) |
Selective Attention | “Oh, cause they're all moons.” “I know, four things.” “All purple.” “Where are you ‘three’…” | “All moons.” “Blue.” “One… two… three…four…” (counting items in pictures) |
Delayed Recall | “Lamp…tree…house.” (labeling at presentation) “House, dog, flag…house, dog, flag…” (rehearsal during interval) | “Ball…cup…shoe.” (labeling at presentation) “Ball, cup, shoe…ball, cup, shoe…” (rehearsal during interval) |
Tower of London | “Put the red one there and the green one there.” “The blue ball doesn't need to move but I need to move this… so I need to flip these over.” | “I have to take that off…” “This one needs to be over here and this one’s supposed to be here.” “I can put that one there and that one there and that one there.” “This one right there.” |
Task | Overt speech | Partially-covert speech (whispering) |
Selective Attention | “Oh, cause they're all moons.” “I know, four things.” “All purple.” “Where are you ‘three’…” | “All moons.” “Blue.” “One… two… three…four…” (counting items in pictures) |
Delayed Recall | “Lamp…tree…house.” (labeling at presentation) “House, dog, flag…house, dog, flag…” (rehearsal during interval) | “Ball…cup…shoe.” (labeling at presentation) “Ball, cup, shoe…ball, cup, shoe…” (rehearsal during interval) |
Tower of London | “Put the red one there and the green one there.” “The blue ball doesn't need to move but I need to move this… so I need to flip these over.” | “I have to take that off…” “This one needs to be over here and this one’s supposed to be here.” “I can put that one there and that one there and that one there.” “This one right there.” |
First, we provide descriptive information regarding self-directed speech at the task level. Some amount of overt or partially covert task-relevant speech (defined as presence on one trial or more) was used by 74% of children on the Selective Attention task, 76% on the Delayed Recall task, and 60% on the Tower of London task. Partially covert speech tended to be more prevalent, with 52%, 54%, and 44% children using partially covert speech on at least one trial of the Selective Attention, Delayed Recall, and Tower of London tasks, respectively, versus 47%, 39%, and 33% using overt speech on at least on trial of these tasks. Many children used a mix of these types of speech: 39%, 47%, and 33% on the Selective Attention, Delayed Recall, and Tower of London tasks, respectively. Age was not a consistent predictor of the presence of overt or partially covert speech on the different tasks. The only age-related finding was that younger children were more likely to use any form of self-directed speech on the Selective Attention task, b = -.01, t = -2.02, p < .05. Post-hoc tests indicated this appeared to be driven by overt speech use, b = -.01, t = -1.88, p = .06 and not partially covert speech, p > .4. When talkativeness with strangers and the presence of social speech were included as covariates, age was no longer significant but a trend was still present, p = .08. By contrast, age was consistently associated with performance on the tasks (Selective Attention task: b = .27, z = 3.71, p < .001; Delayed Recall task: b = .04, z = 2.58, p < .01; Tower of London task: b = .13, z = 2.14, p < .05). Table 2 shows the bivariate correlations speech on tasks (proportion of trials on which overt and partially covert speech was present) and age. Age was not associated with proportion of partially covert or overt speech on any task. Partially covert speech on each task was correlated with the others. Overt speech on the Selective Attention task and Tower of London task was correlated.
Variables . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . |
---|---|---|---|---|---|---|
| ||||||
| -.02 | |||||
| -.11 | -.27 | ||||
| -.13 | -.05 | -.06 | |||
| .04 | .26* | .26* | -.39*** | ||
| .00 | .38*** | -.06 | .08 | .16 | |
| -.08 | .06 | .21† | -.03 | .27* | -.10 |
Variables . | 1 . | 2 . | 3 . | 4 . | 5 . | 6 . |
---|---|---|---|---|---|---|
| ||||||
| -.02 | |||||
| -.11 | -.27 | ||||
| -.13 | -.05 | -.06 | |||
| .04 | .26* | .26* | -.39*** | ||
| .00 | .38*** | -.06 | .08 | .16 | |
| -.08 | .06 | .21† | -.03 | .27* | -.10 |
Note. Overt and partially covert speech are reported here as proportion of trials on which such speech was used, in order to explore associations among tasks. In our regression analyses speech is coded at the trial level with four categories (no speech, overt speech, partially covert speech, and both). SA= Selective Attention task, DR = Delayed Recall task, TOL = Tower of London task. †p < .10, *p < .05, ** < .01, ***p < .001.
Predictors of Overt and Partially Covert Speech
On the Selective Attention task, the odds of overt speech (compared to no speech) on a given trial were not predicted by age, successful matching, or trial number. Consistent with task-level findings, the odds of mixed speech on a trial were negatively related to age, such that the odds of producing both forms of speech on a given trial were higher among younger children, b = -.10, SE = .05, z = -2.21, p < .05. The odds of mixed speech on a trial were also negatively related to trial number, b = -.16, SE = .08, z = -2.07, p < .05, with more speech on earlier trials. The odds of mixed speech and partially covert speech on a given trial were predicted by the matching dimension: such speech tended to occur when children were matching by number versus other dimensions (mixed speech: b = .49, SE = .18, z = 2.8 and b = .25; partially covert speech: SE = .06, z = 3.95, ps < .001). Similarly, the odds of partially covert speech on a given trial were marginally higher when children needed to match by shape as opposed to color, b = .25, SE = .13, z = 1.95, p = .05.
On the Delayed Recall task, the odds of overt speech and partially covert speech on a given trial were predicted by successful recall, b = .57, SE = .25, z = 2.23, p < .05, and b = .87, SE = .27, z = 3.27, p < .01, respectively. The strength of the relation between speech and accuracy did not vary by speech type, p > .35. Age and trial number were not significant predictors of overt, partially covert or mixed speech.
On the Tower of London task, the odds of using overt speech were lower on trials involving prepotent moves, b = -1.7, SE = .77, z = -2.1, p < .05. There was a similar trend for trials involving counterintuitive moves, b = -1.14, SE = .59, z = -1.92, p = .06. There was no association between self-directed speech and the number of moves beyond what the task required. However, the odds of partially covert speech on a trial were predicted by children’s failure to solve the trial, b = .25, SE = .06, z = 3.95, ps < .001. There was no relation between overt speech and the odds of solving a trial. The strength of the predictive relation between speech and solving a trial varied significantly by speech type, b = 2.16, SE = .81, z = 2.68, p < .01. Age was not a significant predictor of overt, partially covert or mixed speech.
Exploring Child Talkativeness and Social Speech as Predictors
In the next set of analyses, we added talkativeness with strangers to our models to assess whether this variable predicted overt and partially covert speech. On the Selective Attention task, use of overt or mixed speech was associated with talkativeness, b = .86, SE = .26, z = 3.32, p < .001, and b = .77, SE = .35, z = 2.18, p < .05, but use of partially covert speech was not, p > .8. On the Delayed Recall task, the same pattern was found, with a trend of overt speech being predicted by talkativeness, b = .42, SE = .23, z = 1.88, p = .06. Mixed speech was significantly associated with talkativeness, b = .62, SE = .30, z = 2.07, p < .05. Partially covert speech was not associated with talkativeness, p > .14. Finally, on the Tower of London, overt speech was associated with talkativeness, b = .63, SE = .24, z = 2.7, p < .01. Mixed and partially covert speech were not associated with talkativeness, ps > .27.
We also tested whether the occurrence of social speech during a task predicted overt and partially covert self-directed speech. On all tasks, social speech tended to predict overt task-relevant speech (Selective Attention task: b = 1.89, SE = .69, z = 2.8, p < .01; Delayed Recall task: b = .94, SE = .54, z = 1.74, p = .08; Tower of London: b = 1.54, SE = .70, z = 2.20, p < .05). The same patterns were not found for partially covert speech, all ps > .23. Social speech on each task correlated with talkativeness with strangers (Selective Attention task: b = .70, SE = .25, z = 2.8, p < .01; Delayed Recall task: b = .65, SE = .26, z = 2.5, p < .05; Tower of London task: b = .61, SE = .25, t = 2.35, p < .05).
Discussion
The reported study provides new insights that can guide future research on self-directed speech aimed at clarifying its origins and function(s) in childhood. Using trial-level models, we found an overall pattern suggesting that the overtness of speech may be at least in part driven by children’s social goals. The relation between speech and performance varied by task, with positive, negative, and null relations found (on the Delayed Recall, Tower of London, and Selective Attention tasks, respectively). Task-relevant overt speech tended to be predicted by parent-reported talkativeness with strangers and children’s social talk during the tasks. This pattern was not found for partially covert speech. Moreover, in contrast to previous work, we did not find consistent evidence of age-related change in self-directed speech, despite our sample spanning two years in age and despite performance on the tasks improving with age.
The findings regarding speech and performance could in part be due to variation in children’s sociality. On the Selective Attention task, where no main effect of overt or partially covert speech was found, many children may have been supporting their performance using inner speech, which we could not measure. This is suggested by the finding that on trials that were likely more challenging (i.e., involving matching by number), children tended to use more partially covert speech. On less challenging trials, some children may have used inner speech whereas others used overt speech because they were more social, thus obscuring a relation with performance.
The negative relation between performance and partially covert speech on the Tower of London could, on the other hand, be explained by children’s failure to plan on some trials, with partially covert speech occurring more when they found themselves “stuck” without a solution and not knowing how to proceed. This is consistent with our finding that children used less overt speech on trials involving prepotent moves, and is also consistent with prior findings of lack of spontaneous planning on the Tower of London task (Lidstone et al., 2010). Our findings are consistent with previous research finding inconsistent patterns between self-directed speech and performance on this task (Fernyhough & Fradley, 2005). And while it is possible that partially covert speech could predict future task performance, as others have theorized (Frauenglass & Diaz, 1985), this idea has not found empirical support (Fernyhough & Fradley, 2005).
The finding of a limited relation between age and self-directed speech is surprising given that previous cross-sectional and longitudinal studies have found associations between self-directed speech and age (e.g., Al-Namlah et al., 2006; Kohlberg et al., 1968; Winsler & Naglieri, 2003). One possibility is that individual differences in children’s tendency to use specific forms of self-directed speech overwhelmed age effects in our study. Partially covert speech was correlated across tasks, and, to a lesser extent, a similar pattern was found for overt speech (on two tasks). Together these results suggest that there is some degree of consistency in self-directed speech patterns within individuals, despite differences in task demands, similar to what has been found in some previous studies (e.g., Lidstone et al., 2010; Sawyer & Brooks, 2021; Winsler & Naglieri, 2003). In the context of a cross-sectional age range of two years, individual differences may explain more variation in speech use than chronological age. However, this is not what would be expected on Vygotsky’s account, which would predict a general trend towards internalization. Depending on the difficulty of the task, children might be expected to be declining or inclining in their use of self-directed speech. The relation between age and performance was comparably robust.
Generally, our results present a challenge to the widely held idea that the overtness of self-directed speech is best understood as a transition phase on the way to partially covert and then fully covert speech, and call for further explorations of overt and partially covert speech across a more diverse set of tasks and broader age range, to gain a better understanding of when and why children use these different forms of speech. Longitudinal studies involving diverse tasks could also clarify whether overt, full-volume speech generally precedes partially-covert speech across a broad range of tasks, or if some young children appear to use inner speech on some tasks without first using overt or partially covert speech. Similarly, more research is needed to determine the extent to which these different forms of speech have unique roles in development.
A key limitation of the reported research is the limited number of tasks used to assess self-directed speech. Theoretically, self-directed speech would be expected to occur across many contexts in which children complete cognitively demanding tasks. Many tasks that are widely used in the literature have been preferred precisely because they tend to elicit more overt speech, which limits our ability to draw inferences about the importance and manifestation of self-directed speech on cognitively demanding tasks broadly. A broader selection of tasks would allow for further insight into when and how children use self-directed speech.
Another limitation of this study is its correlational nature. In the case of the Delayed Recall task, the consistent relation between self-directed speech and performance, and the strategic nature of children’s speech, make it reasonable to infer that self-directed speech causally supported performance. Similarly, on some trials of the Selective Attention task, children appeared to use speech strategically to identify the relevant matching dimension (i.e., by counting). However, it is nevertheless possible that the speech is epiphenomenal. For example, children’s speech may sometimes reflect their verbal narration of a non-verbal thought process. This is consistent, for example, with the negative relation between partially covert speech and problem solving on the Tower of London. Experimental research can clarify these patterns and what they suggest about the causal influence of self-directed speech.
Our study nevertheless provides new insights regarding the nature of self-directed speech in young children. The overtness of self-directed speech, even when task-related, may in part be a function of children’s desire to engage others in their activity (Bjorklund, 2009; McGonigle-Chalmers et al., 2014). Trial-level analyses of different types of self-directed speech can increase detection of these and other effects. While this is but one study, we suggest that the field has not sufficiently explored alternatives to, or attempted to falsify, Vygotsky’s ideas. We suggest doing so would advance theory and the practical value of research on this topic. Future research with a wider variety of tasks can further clarify the relevance of overt and partially covert speech in children’s emerging higher cognitive processes.
Funding
This research was supported by the Eunice Kennedy Shriver National Institute of Child Health and Human Development of the National Institutes of Health under Awards F32HD079191 and RHD086184A.
Acknowledgments
The authors thank Cleo Andersen-Green, Marina Blum, Jerome Hoover, Jessica Johnson Taylor Kotary, Margaret McCandless, and Samantha Stone for their assistance with data collection and coding, and Akira Miyake and Al Kim for useful discussions.
Competing Interests
The authors have no competing interests to report.
Author Contributions
SD and YM designed the research. SD analyzed the data. SD and YM wrote the paper.
Data Accessibility Statement
Deidentified data and analytic scripts are shared on the Open Science Framework: https://osf.io/q7n5k/