Children’s mentalizing abilities and their moral reasoning both develop rapidly during the preschool years and are jointly critical for navigating our complex social world. As children develop, they place more emphasis on intentions in making moral judgments, yet little research has examined the role of belief understanding in morally-relevant situations. We examined children’s (N = 61), and adults’ (N = 36) judgments of agent intent, deserved consequence, and blame and praise in morally-relevant belief vignettes. By 5 years of age, children consistently rated agents with false beliefs as better intentioned in a good intent condition (even though the outcome was bad) than in a bad intent condition (even though the outcome was good). Yet, even 7-year-olds had difficulty assigning consequences (reward or punishment) based on an agents’ intent, and only adults assigned consequences as a function of intent. Nonetheless, children of both ages accurately assigned praise or blame as a function of intent. Understanding of praise and blame may represent a way station along the road to more accurate assignment of reward and punishment.

The ability to appreciate others’ mental states is critical for navigating our complex social world. Children’s mentalizing abilities develop rapidly during the preschool years. This Theory of Mind (ToM), allows them to understand, explain, and predict behavior (Premack & Woodruff, 1978) and is critical for other domains of social functioning (Astington, 2003; Baron-Cohen et al., 1985; Leslie, 1988; Wellman, 2020). The goal of the current study was to examine the interplay of mentalizing with one such domain: moral judgment.

Moral judgments can be made with respect to a wide range of behaviors. Most broadly they encompass actions toward others intended to impact them in positive or negative ways: helping or harming. They thus reflect concern about not only wrongdoing but with treating others prosocially with fairness and kindness (Killen & Smetana, 2015; Walczyk & Cockrell, 2023).

In western cultures, the mental states of agents (their intentions, knowledge, and beliefs) are considered central to such judgments. Since at least Piaget (1932/1965), however, there has been evidence suggesting that when making moral judgments, young children tend to rely more on outcomes than intent (Cushman et al., 2013; Zelazo et al., 1996). However, children as young as three can use intent information in simple scenarios to make explicit moral judgments (Smetana, 2006), and in some cases even infants show sensitivity to moral intent (Hamlin, 2013).

As children develop, they begin to place more emphasis on intentions in making moral judgments (Cushman et al., 2013; Lagattuta & Kramer, 2022; Margoni & Surian, 2017). For example, Cushman et al. examined 4- to 8-years olds’ judgments of accidental harm and attempted harm. Older children judged attempted (but failed harm) as more wrong than accidental harm and there was a steep decrease over age in wrongness and punishment judgments for the accidental harm situation. The youngest children viewed the act as naughty and punishable while the oldest children did not, suggesting the youngest children were focused on the outcome of the situation, which is consistent with Piaget’s (1932/1965) research.

Less research, however, has examined how and when children integrate belief and false belief understanding into moral judgments. Yet the role of belief in moral evaluation is often critical. Take for example, a scenario from Young and colleagues (2007) in which Grace and her friend are touring a chemical plant. Grace goes to get coffee and her friend asks for sugar in hers. Grace thinks the white powder by the coffee station is sugar, but unbeknownst to her the powder is toxic and her friend dies. Understanding Grace’s false belief regarding the sugar is essential in order to make fair moral judgements of intent, reward or punishment, and praise or blame.

Killen and colleagues (2011) were among the first to systematically examine children’s (3- to 8-year-olds) moral judgments within a false belief context. In their study, children heard a story about a well-intentioned boy (wanting to help his teacher clean up the classroom) accidentally causing a negative outcome (throwing away a special cupcake in a brown bag) because he acts on the basis of a false belief about a container’s contents (he thought the bag contained trash). Children were then asked a series of questions to assess their false belief understanding, their understanding of the agent’s intentions, and whether they thought punishment was warranted. Killen et al. found that children who had not attained false belief understanding were more likely to attribute ill-intentions and assign punishment for the accidental transgression than children who had such understanding. Further, it was not until at least age 7 years that children appropriately attributed positive intentions to the accidental transgressor.

In more recent work, Ochoa et al. (2022) refined and extended Killen et al. ’s (2011) work. First, in Killen et al., only one belief story was presented to children and it involved only one combination of belief and intent: The agent had a good intention but held a false belief. In contrast, Ochoa et al. asked children (4- and 5-year-olds) to make judgments in a wider range of scenarios involving interactions between beliefs (true vs. false) and intentions (good vs. bad). For example, in one scenario (bad intent, false belief) a story character was trying to make their friend upset by giving them a skunk (a disliked animal) but due to a false belief (the skunk had moved to a different location), they shared a kitten (a liked animal). Second, Ochoa et al. made the agent’s intention more explicit, so that children did not have to make assumptions about intent. Third, Ochoa et al. also aimed to illuminate the developmental progression of moral understanding. Specifically, they examined whether children are able to immediately use their false belief understanding to make judgments of moral intentions, or whether a lag might be present such that children need time to integrate ToM with moral reasoning. Relatedly, they examined whether there is a lag between accurate intent judgments and consequence judgments involving reward or punishment.

Ochoa et al. (2022) found several important results. First, both 4- and 5-year-olds did very well in the true belief contexts: They rated agents as worse intentioned and more deserving of punishment in bad intent conditions compared to good intent conditions. That result is critical from a methodological standpoint because it shows that children of this age were able to accurately track the scenarios with which they were presented. By age 4, children have a rudimentary understanding of intent and use this information to judge whether agents holding true beliefs deserve punishment or reward. Second, however, only older children (5-year-olds) with greater false belief understanding displayed intention understanding in the false belief contexts, contexts that were otherwise identical to the true belief contexts. Nonetheless, children displayed this understanding some two years earlier than those in Killen et al. (2011). Third, a developmental lag was present for 4-year-olds: even 4-year-olds who displayed understanding of the agent’s false belief, often failed to make accurate intent judgments. In contrast, almost all 5-year-olds with false belief understanding made correct intent judgments in false belief contexts. Finally, a further lag was present such that children at both ages who had shown both belief understanding and intent understanding, nonetheless had difficulty assigning appropriate consequences based on intent.

In a follow-up study, Ochoa et al. (2022) examined whether the developmental lags seen in the first study would be present when simplified vignettes were presented. They also included an adult sample in order to establish a developmental endpoint. They found that when the vignettes were simplified (by only requiring two animals to be tracked over a single location in each vignette) and the length of the testing session was reduced (by only including 2 false belief vignettes), 4-year-olds with false belief understanding were now able to make accurate intent-based judgments. Yet, both 4- and 5-year-olds continued to have great difficulty making accurate consequence judgments in false belief contexts. In marked contrast, adults’ moral judgments were overwhelmingly made on the basis of intent, for both intent and consequence judgments.

The purpose of the current study was to provide a more thorough assessment of when children begin to make appropriate consequence judgments on the basis of intent in false belief contexts. We approached this in several ways. First, we extended the age range from 5 to 7 to examine whether adult levels of understanding would be achieved by this age. We hypothesized that older children would be more likely than younger children to correctly make intent judgments in false belief contexts, although based on Ochoa et al. (2022), we expected 5-year-olds to do quite well. As we were particularly interested in how children use their ToM understanding in these contexts, we narrowed our focus to only include children who passed the relevant false belief tasks as described below.

Second, following Nobes and Martin’s (2022) graded punishment scale, we included a more fine-grained scale of punishment and reward rather than the simpler trichotomy presented by Ochoa et al. (2022): assign trouble, assign a treat, or assign nothing. Specifically, we asked children to assign either a little trouble or a lot of trouble, assign a small treat or big treat, or assign nothing. We hypothesized that using a more finely-graded scale might uncover developmental increments in understanding that could be masked using the “all or nothing” response options in Ochoa et al.’s prior research.

In particular, children’s willingness to assign strong punishment to actors with ill intent may quite reasonably have been undercut by the fact that outcomes were positive; similarly their willingness to assign large rewards to actors with positive intent may have been undercut by the fact that outcomes were negative. It is possible that children may be more willing to assign a little punishment when intentions are bad but outcomes good, and a little reward when intentions are good but outcomes bad.

Third, we expanded our consideration of moral evaluation to include not only reward and punishment but also praise and blame. It’s possible that children’s judgments of praise and blame would be more closely linked to intention (and belief) than reward and punishment. Malle (2014), for example, suggests that when adults assign punishment, we are evaluating the act, but when we are judging someone as blameworthy or praiseworthy we are evaluating the agent for their role in the event. While we may hesitate to give rewards when outcomes are inadvertently bad, we may nonetheless offer praise for good intentions in those cases; conversely, while we may not punish when good outcomes are fortuitously brought about by ill-intentioned agents, we may nonetheless be quite willing to blame those agents in such cases. In that respect, praise and blame may represent something of a way station on children’s road to incorporating mental states, and specifically intent understanding, into consequence judgments: Evaluations of praise and blame may be much less constrained by outcomes than assignments of reward and punishment.

In addition, research with adults suggests that individuals show an asymmetry when assigning praise and blame, such that they are more likely to blame a person for bad side effects than praise them for good side effects. For example, in Knobe (2003) adults were presented with two scenarios. In one, Helen hates her sister and she wants her to look bad at the prom, so she agrees to make her sister a dress. She made an ugly dress but the dress ended up fitting perfectly and her sister was chosen as Prom Queen (bad intention, good side-effect). In a separate scenario, Helen loves her sister and she wants her to look good at the prom, so she agrees to make her a dress. Unfortunately, the dress ended up fitting very poorly and her sister got bullied at school for it (good intention, bad side-effect). Knobe (2003; see also Sarin et al., 2017) found that individuals are more likely to say that a negative side effect was brought on intentionally (and is therefore blameworthy) compared to a side effect that is positive.

Turning to children, there is mixed evidence regarding whether they take into account others’ epistemic states when making judgments of blame. Some research suggests that it’s not until age 7 that children consider a person’s knowledge and belief in assigning blame (Kalish & Cornelius, 2007; Wang et al., 2011). For example, Wang et al. presented children (3-, 5-, and 7-year-olds) with vignettes in which a child fails to follow a rule because they were not present during the rule change. Children were asked whether the child should be blamed for breaking the rule. Both 3- and 5-year-olds suggested that the agent should be blamed, while 7-year-olds did not. On the other hand, Mulvey et al. (2020) found that children (3 to 12-years old) and adults use information regarding negligence in making blame judgments. Mulvey et al. presented participants’ two vignettes: in one, the transgressor was negligent and the victim was careful and in another the transgressor was careful but the victim was negligent. Children were then asked, “Whose fault was this?”. Children of all ages overwhelmingly said that the transgressor was to blame in the first scenario. Yet, in the scenario in which the transgressor was careful but the victim was negligent, younger children still blamed the transgressor, whereas older children (although not to the same extent as adults) blamed the victim. This suggests that judgments of blame are developing throughout childhood, and may be more salient as children age. In contrast, however, to our knowledge there is no research on children’s judgment of praise in morally-relevant scenarios. Hence, in this study we examined participants’ blame/praise judgments and whether children would do better making these judgments compared to consequence judgments.

In sum, in this study we presented children (5- and 7-year-olds) and adults with morally relevant vignettes (very similar to those in Ochoa et al., Study 2) in which agents held false beliefs. Participants were then asked to ascribe beliefs and intentions to the agents, and to assign rewards and punishments to them along a 5-point graded scale. In addition, participants were asked to make praise and blame judgments for the scenarios along a very similar graded scale.

We hypothesized a main effect of age, such that older children would be more likely than younger children to correctly answer false belief questions in these scenarios and that there would be an age-related change in making intent judgments, such that overall older children would be more likely than younger children to correctly make intent judgments in false belief contexts.

For the consequence evaluation, we hypothesized an interaction between age group and agent intent. We expected that 5-year-olds would generally not be successful in assigning reward and punishment in false belief contexts. Yet, we expected that 7-year-olds would differentiate their consequence rating based on agent intent. Whether the older children would do as well as adults was an open question. Further, we believed that older children and adults, compared to younger children, would be more likely to assign trouble in the bad intent condition. Yet, we did not expect a difference in the good intent condition, because we thought that the majority of participants would respond that the appropriate consequence would be nothing, as adults did in Ochoa et al. (2022).

When assessing blame and praise judgments, we hypothesized that, in contrast to punishment/reward, younger children would be able to appropriately blame the agent in the bad intent condition as they did in Mulvey et al. (2020). Therefore, we did not expect a main effect of age, but we did hypothesize a main effect of condition such that participants would blame the agent in the bad intent condition and offer a neutral response or praise in the good intent condition.

For children, we hypothesized that assigning blame and praise would be easier than assigning consequences, as suggested by Malle (2014). For 5-year-olds, we did not think children would differentiate their consequence judgments based on intent, but we did think they would when making blame/praise judgments. For 7-year-olds, we expected to see more of an adult-like pattern of responses for consequence judgments, such that they would often punish in the bad intent condition, but give nothing in the good intent condition. Further, we expected 7-year-olds to blame the agent for the bad intention, saying it was a bad/mean thing to do, and likely either say nothing or praise in the good intent condition. For adults, we expected a similar pattern of judgments of consequence as in Study 1 of Ochoa et al. (2022) - mostly assigning punishment in the bad intent condition, but assigning nothing in the good intent condition. As for assigning blame/praise, we predicted that in the bad intent condition most adults would assign blame, and sometimes assign praise in the good intent condition or choose to say nothing.

Our hypotheses and analysis plan were pre-registered on the Open Science Framework prior to analyzing the data. The preregistration and study materials can be found at The study was part of a larger project examining theory of mind, morality, and empathy. Here we report on relations between false belief understanding and moral judgments.


Based on Ochoa et al. (2022) we expected to find a medium to large interaction using an ANOVA framework (ηp2 ≥ .10) between age and intent condition on consequence judgments. An a priori power analysis indicated at least 30 false belief passers would be required in each age group (5-year-olds, 7-year-olds, and adults) for the study to be adequately powered. To achieve this goal, we oversampled to allow for the fact that some children would not pass the false belief questions. Doing this, allowed us to achieve our sample size goal and, in fact, we exceeded it by one child (a 5-year-old in the bad intent condition) because the family had already been scheduled to visit the lab. It is important to note that our power analysis was based on an ANOVA framework, but our analysis plan was updated to use a nonparametric framework involving mixed-effects models with random effects. Therefore, because parametric frameworks are generally more powerful than nonparametric ones, some of our analyses may be somewhat underpowered. However, our sample size could not be increased because of practical constraints (i.e., time and participant compensation).

Sixty-six children participated (24 female, 34 male, and 8 children for whom parents did not report a gender): 35 5-year-olds (Mage = 64.10 months, SD = 2.81) and 31 7-year-olds (Mage = 89.40 months, SD = 3.65). Sixty-two of these children were recruited via a developmental database from the University of Oregon, and 4 were recruited from a developmental science listserv called Children Helping science ( An additional 3 children were tested but excluded from analyses because the parent reported their child as having developmental delays. The research protocols for this study were approved by the University of Oregon Institutional Review Board.

Families were provided compensation in the form of a $10 electronic gift card of their choice. Fifty-two parents reported their child’s race as White or Caucasian, 3 reported their child was of Hispanic, Latino, or Spanish origin, 3 reported children as being two or more races, and 11 did not report their child’s race. Sixty-one percent of families reported making at least $75,000 a year, and 62% of children were in school or daycare at least part-time. The population from which the sample was drawn is a university town in the Western United States and is majority white with a median income in 2021 of $55,776. Our sample thus mirrors that population with respect to ethnicity but is higher socioeconomic status with respect to income.

Thirty-nine adults (75% female, Mage = 19.92 years, SD = 3.14 years) from the Psychology and Linguistics Human Subjects Pool at the University of Oregon also participated. We oversampled knowing that a few participants might not show up to their appointments, potentially not pass all the false belief questions, or be removed for not completing the tasks. Hence, our adult sample was ultimately somewhat larger than our child samples. The majority (78%) reported being White or Caucasian, and 53% reported being Freshman students. Adult participants received compensation in the form of class credit for a psychology course.


Participants were randomly assigned (within each age group) to one of two intent conditions (good or bad) in which they responded to two morally-relevant vignettes. Because of pandemic restrictions, children were tested individually in a single 15-minute live video session with a researcher. Children were engaging in remote learning and had the technology needed to participate in our study. We asked that the caregiver be present at the beginning of the session in case of any technical difficulties. It was then up to the caregiver whether they wanted to sit next to the child or not throughout the session. The researcher shared their computer screen and played the recorded video vignettes, and then asked children the test questions. Adults completed all tasks online through Qualtrics on their own time.


Morally-relevant Belief Vignettes

The two pre-recorded vignettes in each intent condition were approximately 2 minutes long each and were counterbalanced across conditions. See Table 1 for a depiction of the experimental manipulation. The vignettes each featured two characters. In the good intent stories, an agent wanted to make a friend happy by sharing a desirable animal (i.e., butterfly or kitten) but because of a false belief the agent ended up sharing an undesirable animal (i.e, spider or skunk). In the bad intent stories, the agent wanted to make their friend upset, but because of a false belief, ended up sharing a desirable animal.

Table 1.
Experimental Manipulation of Intent and Belief Factors to Yield Morally-Relevant Belief Vignette Outcomes
Story Stimuli Intent Manipulation Belief Manipulation Outcome 
Kitten/Skunk Good False Skunk shared
(accidental harm) 
Bad False Kitten shared
(accidental positive outcome) 
Butterfly/Spider Good False Spider shared
(accidental harm) 
 Bad False Butterfly shared
(accidental positive outcome) 
Story Stimuli Intent Manipulation Belief Manipulation Outcome 
Kitten/Skunk Good False Skunk shared
(accidental harm) 
Bad False Kitten shared
(accidental positive outcome) 
Butterfly/Spider Good False Spider shared
(accidental harm) 
 Bad False Butterfly shared
(accidental positive outcome) 

As an example, in one of the bad intent vignettes, Susan (the agent) discovered a box containing a butterfly and another containing a spider. After returning the animals to their containers, she explicitly stated her intent to make a second, offstage character, Amy (the recipient), upset by sharing the container with the spider. The agent then left the scene, at which point the belief manipulation took place, with the animals switching locations. At this point, the recipient entered the scene, prompting the agent to offer what she thought was a “bad” container (the one containing the spider), but was really the container with the butterfly.

Following each vignette, participants were asked three comprehension questions: two of these concerned whether each of the items would make the recipient feel good or bad (e.g., “Do butterflies make [recipient] feel good or bad?”) and one concerned whether the agent was present or absent for the switch (e.g., “Was [agent] there to see the animals switch?”). For each of these questions children rarely erred: They were 96-98% correct for each question.

Participants then responded to the test questions described below. An image of the relevant character (agent/recipient) was paired with each question to direct children’s attention to the character asked about. A 5 point Likert-type happy-sad face scale was used to accompany the blame/praise question and the consequence scale was used for the consequence evaluation question. Training trials for the scales were conducted before using the scales for the first time. No participants had difficulty understanding the scales.

The test questions were as follows. 1. Agent intention evaluation (“When [AGENT] handed [RECIPIENT] the box, was [AGENT] trying to be nice, mean, or just okay?”). 2. Agent belief (“What does [AGENT] think is in the container?”). 3. Reality check (“What is really in the container?”). 4. Consequence evaluation (“What should happen to [AGENT]?” Should he/she get: in a lot of trouble, in a little trouble, nothing, a small treat, or should he/she get a big treat?). 5. Blame/praise evaluation (“What should we say to [AGENT]?” A. You tried to do a really, really nice thing. That was good! B.You tried to do a nice thing. C. Nothing., D. You tried to do a mean thing. E). You tried to do a really, really mean thing. That was bad!). The order of the blame/praise and consequence questions was counterbalanced between-subjects.

Participants were credited with false belief understanding if they correctly answered the agent belief and reality check question in both stories. Scoring for the test measures fell into three schemes: agent belief and reality check (correct or incorrect), agent intention evaluation (a three-point scale based on answer valance (-1 = mean, 0 = just okay, and 1 = nice), and the remainder of the forced-choice questions for consequence and praise/blame (continuous, five-point scales based on answer valence, with the most negatively-valenced answers coded -2, neutral answers coded 0, and most positively-valenced answers coded 2). All variables were treated as ordinal data in analyses.

Caregiver Measures

Demographics and Family Questionnaire. Parents/guardians were asked to complete a 13-item questionnaire asking about race and ethnicity, and socioeconomic status.

Adult Measures.

Demographics Questionnaire. Adult participants filled out a 6-item questionnaire about race and ethnicity, family socioeconomic status, class standing, and age.


We only included participants who passed both false belief and reality questions in the moral vignettes. A total of 61 children passed the false belief questions in both vignettes; 30 out of 35 5-year-olds (15 in the bad intent condition and 15 in the good condition, Mage = 64.00 months, SD = 2.88) and all 31 7-year-olds (17 in the bad intent condition and 14 in the good intent condition Mage = 89.40, months, SD = 3.65). Thirty-six out of 39 adults passed the false belief questions, 18 in each condition. There were five 5-year-olds (3 in the bad intent condition) and three adults (2 in the bad intent condition) who did not pass false belief questions and were excluded from further analyses.

All analyses were conducted in R (R Core Team, 2017) and figures were produced using the ggplot2 package (Wickham, 2009). No main effects of participant gender, story order (kitten/skunk vs. butterfly/spider), or order of consequence and praise/blame questions were found for any morally relevant test questions and so these factors were collapsed in subsequent analyses.

For our central analyses, because our data were ordinal, we used the clmm function in the ordinal package to conduct mixed-effects multinomial logistic regressions on our main dependent variables of interest (Christensen, 2019). For each analysis we included intent condition and age group1as fixed factors, and the interaction between the two, and included participant ID as a random factor. In all analyses, the negatively valenced responses (i.e., “mean,” “a lot of trouble”, “a lot of blame”) and the youngest children (5-year-olds) were used as the reference categories. If the interaction term was not significant, we dropped it and included only the main effects in the model. If the interaction term was significant, we followed-up with simple effects analyses, and reported the p value for each comparison.

Intent Evaluation

Figure 1 depicts performance on the agent intention question. Almost all adults and all 7-year-olds performed at ceiling: reporting that the agent in the bad intent condition was trying to be mean and the agent in the good intent condition was trying to be nice. Since there was no variability in 7-year-olds’ responses, and close to zero in adult responses, it was not appropriate to analyze group differences as planned. Even though 5-year-olds’ responses were more variable than 7-year-olds and adults, they still appropriately differentiated their intent judgments based on intent condition: mixed effects multinomial regression, B = 5.25, SE = 1.59, z = 3.31, p < .001. A one-sample t-test was conducted to examine children’s performance against chance (1 in 3); 5-year-olds performed significantly better than chance in both conditions: 73% of judgments in the bad intent condition were that the agent was trying to be mean (t(29) = 7.79, p <.001) and 85% of judgments in the good intent condition were that the agent was trying to be nice (t(29) = 36.12, p <.001).

Figure 1.
Percent of intent judgments separated by age group and intent condition.
Figure 1.
Percent of intent judgments separated by age group and intent condition.
Close modal

Agent Consequence

Figure 2 depicts performance on the agent consequence question. We conducted a 3 (age group: 5, 7, adults) x 2 (intent: good vs bad) mixed-effects multinomial regression on the agent consequence measure. The interaction term was not significant and therefore was excluded from a reduced final model. In the final model there was a significant main effect of intent on the consequence judgments (B = 4.35 , SE = 1.01 , z = 4.29, p <.001, but no main effect of age (ps > .68). Overall, participants deemed agents in the bad intent condition as more deserving of a negative consequence than agents in the good intent condition.

Figure 2.
Percent of consequence judgments separated by age group and intent condition.
Figure 2.
Percent of consequence judgments separated by age group and intent condition.
Close modal

We next conducted a follow-up analysis, recoding responses as either correct or incorrect. Specifically, in the bad intent condition, assigning trouble is a clear correct response but assigning nothing could also be considered reasonable as there is no negative outcome. Assigning a treat in this condition is clearly incorrect. Conversely, in the good intent condition, assigning a treat is a clear correct response but assigning nothing could also be considered reasonable as there is no positive outcome. Assigning trouble in this condition is clearly incorrect. Hence, in the bad intent condition, we coded the following as correct: nothing, little trouble, and a lot of trouble. In the good intent condition, we coded the following as correct: nothing, little treat, and a big treat. Children thus received a score of 0 or 1 based on their incorrect/correct responses for each trial and a total score of 0-2.

A 3 (age group) x 2 (intent) ANOVA on correct consequence ratings (0-2) revealed a main effect of age group (F(2, 91) = 6.89, p = .002, ηp2= .13), and intent condition (F(1, 91) = 4.42, p = .05, ηp2= .03), but not a significant interaction between the two (F(2, 91) = 0.56, p = .57, ηp2= .01). Overall, participants made more correct consequence judgments in the bad intent condition (87% correct) compared to the good intent condition (62% correct). Further, adults (97.2%) made more correct judgments overall compared to both 5-year-olds (66.7% correct; B = -0.51 , SE = 0.18 , t = -2.86, p =.01) and 7-year-olds (64.5% correct; B = -0.61 , SE = 0.18 , t = -3.43, p =.003). When examining the total percent correct, neither 5- (t(29) = 1.49, p = .15) nor 7-year-olds (t(30) = 0.94, p = .35) performed better than chance. Only adults consistently differentiated consequence judgments based on intent (t(35) = 13.40, p <.001).

Agent Blame/Praise

Figure 3 depicts performance on the agent blame/praise judgment. We conducted a 3 (age group: 5, 7, adults) x 2 (intent: good vs bad) mixed-effects multinomial regression on the agent blame/praise judgments. The interaction term was not significant and therefore was excluded from a reduced final model. In the final model there was a significant main effect of intent (B = 12.76, SE = 1.91, z = 6.70 , p < .001, but no main effect of age (p > .29). All age groups appropriately rated the agent in the good intent condition as more praiseworthy than the agent in the bad intent condition.

Figure 3.
Percent of blame or praise judgments separated by age group and intent condition.
Figure 3.
Percent of blame or praise judgments separated by age group and intent condition.
Close modal

We conducted the same follow-up analysis as for consequence judgments to examine whether age groups differed in whether their blame/praise ratings were correct or incorrect. In the bad intent condition we coded the following judgments as correct: nothing, some blame, a lot of blame. In the good intent condition we coded the following as correct: nothing, some praise, and a lot of praise.

A 3 (age group) x 2 (intent) ANOVA on blame/praise judgments (0-2) did not reveal any significant effects. All age groups were near ceiling: 93.3% of 5- year-olds, 96.8% of 7-year-olds, and 97.2% of adults made correct blame/praise judgments. In contrast to the agent consequence judgments, all age groups performed better than chance on the blame/praise judgments (t(29) = 9.52, t(30) = 23.80, t(35) = 13.14, ps <.001, respectively).

Moral evaluations guide our decisions and actions in our daily lives, and being able to reason about others’ intentions and beliefs is essential to sustain positive social relationships. However, the interplay between the moral and the mentalizing domains is also crucial. In this study, we examined how children (5- and 7-year-olds) and adults make moral judgments in complex situations in which reasoning about others’ intentions and beliefs is fundamental to making accurate judgments. Building on prior work (Killen et al., 2011; Ochoa et al., 2022), we examined judgments of an agents’ intentions, deserved punishment or reward, and blame or praise in situations in which intention and outcome were misaligned because of agents’ false beliefs.

In previous work, Ochoa et al. (2022) found that children (4- and 5-year-olds) and adults use intent information to make intent and consequence judgments in true belief conditions, regardless of false belief understanding. Young children with false belief understanding were also able to make appropriate belief and intent judgments in false belief contexts, but did not make intent-based judgments regarding punishment and reward. Only adults made consequence judgments based on agent intent: they chose to punish an agent with bad intent (the outcome was good due to false belief), but give nothing (no reward or punishment) to an agent with good intent (the outcome was bad due to a false belief).

In this study we extended the age range to include 7-year-olds to examine more fully the developmental progression in understanding of how intent influences rewards and punishments. In addition, we added a graded scale of consequences to examine whether children would be more likely to appropriately punish/reward agents if given a more nuanced set of response alternatives. Further, we asked participants if they wanted to blame or praise the agent using a graded scale to test whether they might find assigning blame and praise easier than assigning punishment and reward.

As hypothesized, a strong majority of children in this study appropriately attributed intent based on condition (good intent and bad intent) in false belief contexts. One-hundred percent of the 7-year-olds and 70% of 5-year-olds did so. Even 5-year-olds performed better than chance, differentiating their judgments based on agent intent. This corroborates findings from the second study of Ochoa et al. (2022), in which most 5-year-olds correctly attributed intent in false belief contexts, although there is still some development regarding intent understanding occurring between 5- and 7-year-olds. Further, although 4-year-olds with false belief understanding in study 1 of Ochoa et al. (2022) did not attribute agent intent based on condition, they did so when the task was simplified in study 2. This suggests that even at 4 years of age children seem to display a fledgling understanding of intent in false belief conditions. Yet, these various findings differ from Killen et al. (2011): In their study, although on average 5.5 year-olds rated that the act was “all right”, it was not until 7.5 years-old that children appropriately attributed positive intentions to the accidental transgressor. One potential reason for this difference is that in our study (and in Ochoa et al., 2022) the agent explicitly states that they want to make their friend either happy or sad, whereas in Killen et al. it’s merely implied that the character has a good intention because they are helping clean up. It is likely that the agent’s intention was more salient in the later studies, thus helping children make intent judgments.

One of the primary goals of this study was to more thoroughly examine children’s and adults’ consequence judgments in morally-relevant scenarios. Using a more finely graded scale of consequences, in which participants could assign a little/big treat, a lot/a little trouble, or nothing, we hypothesized an interaction between age group and agent intent. There was no significant interaction but there was indeed a significant main effect of intent: Participants deemed agents in the bad intent condition as more deserving of a negative consequence than those in the good intent condition. Yet, when consequence judgments were coded as correct and incorrect, neither 5- nor 7-year-olds performed better than chance. The result is similar to findings from the two studies of Ochoa et al. for 5-year-olds. Our findings show that even by 7 years of age children continue to have difficulty assigning rewards and punishments in false belief contexts. In contrast, adults were essentially at ceiling in differentiating consequences on the basis of intent.

That said, using the graded scale of consequences did reveal some novel patterns. Descriptively, more 5-year-olds seemed willing to assign punishment to the agent in the bad intent condition than they had done in Studies 1 and 2 of Ochoa et al. (2022): Fewer than 25% of 5-year-olds in the earlier study assigned trouble in the bad intent condition, whereas in this study 70% of judgments included trouble (a lot or a little). Similarly, less than 40% of 5-year-olds in Ochoa et al. assigned a treat in the good intent condition, whereas 50% did so in this study. Further, more adults (50%) were willing to suggest a treat (a small or big one) for the agent in the good intent condition compared to adults in Ochoa et al., who all reported the appropriate consequence as not assigning trouble or a treat. Hence, it’s possible that participants found the more nuanced graded punishment scale more appropriate in cases in which the intent and outcome were not aligned, mitigating against assigning large rewards and punishments. Nonetheless, whatever help the scale afforded was not enough to move even the 7-year-olds to above chance performance. These results do not support our hypothesis that older children would differentiate their consequence rating based on agent intent.

In addition to asking participants to make consequence judgments, we asked them to assign blame or praise along a graded scale. As hypothesized, even 5-year-olds blamed the agent in the bad intent condition and praised the agent in the good intent condition, and did so to the same extent as older children and adults. Moreover, when scored as correct/incorrect, over 90% of participants in each age group responded correctly. Thus, assigning blame/praise was considerably easier for children than assigning reward/punishment. That said, we note that our findings are at odds with previous research at least with respect to blame. Recall that earlier research had suggested that children under 7 infrequently use an agents’ knowledge in making blame judgments (Kalish & Cornelius, 2007; Wang et al., 2011). Further research is needed to identify the reasons for this discrepancy, but one potential reason is that the violation in Wang et al. was a rule violation, without a bad outcome, and not psychological harm, as in our studies.

Assuming our findings are accurate, why is it that children are willing to assign blame and praise based on an agents’ intention, but do not differentiate their consequence judgments in these morally-relevant scenarios? Following Malle (2014), we argue that evaluations of actors (praise and blame) are less constrained by misaligned outcomes than evaluations of actions (reward and punishment). From a young age, children are socialized by parents, teachers, and peers to behave morally in order to be able to successfully integrate into society (Grusec, 2023). Further, social blame and praise are often used to guide behavior until children have internalized these moral values. So it may be that children view blame and praise as its own consequence and therefore punishment or reward may not be needed. Another possibility is that making blame and praise judgment serves as a stepping stone for making accurate consequence judgments. That said, we did not find an order effect of consequence and blame/ praise questions, so it’s possible that these judgments are independent of each other. In future research, asking children why they choose to blame or praise an agent may reveal on what basis they are evaluating agents.

Finally, unlike prior research with adults (Knobe, 2003; Sarin et al., 2017), and in contrast to our hypothesis, we did not find an asymmetry in participants’ attributions of blame and praise. Adults assigned blame (94%) in the bad intent condition and praise (94%) in the good intent condition to the same extent. In contrast, in Sarin et al. (2017) adults blamed an agent with a good intention that resulted in a bad outcome, and neither blamed nor praised an agent with a bad intention that resulted in a good outcome. It’s possible that the severity of the outcome influences how adults make blame and praise judgments. For example, in our stories the outcome resulted in short-term discomfort (receiving a disliked animal) or comfort (receiving a liked animal), whereas in Sarin et al. the outcome (winning or losing a big contract for a company) was much more strongly positive or negative and may therefore have influenced participants’ judgment to a greater degree.

Future Directions

There are many interesting avenues for future research based on our findings. First, as previously mentioned, understanding in which scenarios children choose to blame or praise may provide insight into why they assign blame/praise based on an agents’ intent but do not do the same for punishment/reward. For example, children and adults may judge an agents’ actions differently if the outcome results in physical harm or help compared to psychological harm or help. More generally, reducing or amplifying the salience of outcomes may well influence children’s moral judgments, likely as a function of their developing executive skills (Carlson & Moses, 2001; Devine & Hughes, 2014).

Second, what additional information may be needed for children to differentiate their assigned consequences based on agent intent in these scenarios? Providing details regarding the agents’ and recipients’ previous moral character may influence children’s judgments. In a recent study, Cameron et al. (2022) provided children (6-11-years-olds) with varying information on the agent’s previous moral actions (good, bad, mixed) and asked them to judge the unacceptability of the moral act (e.g., lying, cheating, stealing) and deserved punishment. Children judged acts by characters who were described as having a good moral character as more acceptable than the character with the mixed or bad moral description. Further, children assigned harsher punishment to the character with a bad moral character compared to the agents with good or mixed moral character. Future studies should examine how providing information about an agents’ moral character influences moral judgments in situations in which a false belief is involved.

Third, it is likely the case that children, especially younger children, do not yet have much experience or access to assign rewards and punishments in everyday life. If children do not yet have this experience it may be harder for them to make such moral judgments. An interesting avenue for future research is to examine how or whether children’s past access or frequency of distributing rewards and punishments influences their judgments of deserved reward or punishment in these situations.

Fourth, children’s understanding of negligence could be explicitly examined in contexts like the one we studied. There is evidence suggesting that children can attend to negligence and foreseeability when making moral judgments (Mulvey et al., 2020; Nobes, 2009). For example, Mulvey et al. (2020) found that as children get older (7- to 12-year-olds compared to 3- to 6-year-olds), they give more weight to negligence-related events for both the transgressor and victim when making moral judgments. Negligence may be particularly relevant to our good intent condition in which the agent knew there was a disliked animal in the vicinity and so might be judged negligent (and therefore deserving of blame and punishment) for not checking the contents of the container. However, we saw no evidence of this in the current study because all children and adults praised agents in the good intent condition. That said, without specifically asking about negligence, we cannot know for sure whether children and adults are considering this factor in making consequence and blame/praise judgments.

Fifth, the extent to which children’s moral judgments are related to moral behavior in everyday life should be further explored. Moral behaviors include acting prosocially versus antisocially and lying/cheating versus truth telling. We study moral judgments because we believe they heavily influence the way people act. For example, Malti et al. (2010) found that children (5- to 9-year-olds) who exhibited prosocial behaviors assigned less punishment to agents in moral transgression stories compared to children who exhibited fewer prosocial behaviors (Malti et al., 2010). In addition, children who exhibited more aggressive behaviors assigned more punishment to the transgressors. One interesting way to follow-up on our research would be to examine how empathy and prosocial versus antisocial behaviors influence moral judgments.

Finally, our study’s generalizability is limited in that our sample is not representative of individuals from diverse cultures and perspectives. Besides the four children recruited from a developmental science listserv, our sample was restricted to children and adults from a university town in the Pacific Northwest of the United States. We note that a focus on intent in moral situations is not universal even in adult moral reasoning across cultures (e.g., see McNamara et al., 2019, for work in a Fijian context). Further, research suggests that cultures emphasize different moral principles based on religion and ideology; for example, cooperation may be vital to survival in certain societies, whereas other societies may be more accepting of selfish behaviors (Baker et al., 2021; Gray et al., 2012). Future work could investigate how children and adults from different regions and diverse cultural contexts make moral judgments in scenarios like those in the present study.


In this study we build on prior research that finds theory of mind understanding is critically important for moral judgments and that both understandings are developing quickly and in parallel during childhood. We find that moral reasoning develops beyond the preschool years, as 5-year-olds’ understanding of moral intent itself lagged behind that of 7-year-olds and adults, and even 7-year-olds did not appropriately assign punishment/reward as a function of intent. Yet, by 5 years of age, children are making blame and praise judgments based on intent, indicating that an appreciation of blame and praise may serve as a stepping stone for more complex judgments like punishment and reward. Overall, our findings suggest that integrating theory of mind and moral judgment is a multi-faceted developmental achievement that unfolds only gradually over childhood.

Contributed to conception and design: KO, LM

Contributed to acquisition of data: KO

Contributed to analysis and interpretation of data: KO, LM

Drafted and/or revised the article: KO, KM, LM

Approved the submitted version for publication: KO, KM, LM

We gratefully acknowledge the help of our research assistants, Kadee Iha, Natasha Freudmann, Jaquelin Ubaldo-Flores, and Nicole Langpap with participant recruitment, data collection, and coding. We are especially thankful to the children and families who participated in our study.

Kathryn L. Mills and Louis J. Moses are associate editors at Collabra: Psychology. They were not involved in the review process of this manuscript.

The preregistration, study stimuli, and analysis scripts for this study can be found at The data that support the findings of this study are available from the corresponding author, Karlena D. Ochoa, upon reasonable request. Email Karlena D. Ochoa ([email protected]) for access to these data.


All primary analyses were also conducted with age (in months) as a continuous independent variable. We found the same pattern of results: no significant effects of age in months (for children) in the consequence and blame/praise models.

Astington, J. W. (2003). Sometimes necessary, never sufficient: False-belief understanding and social competence. In B. Repacholi & V. Slaughter (Eds.), Individual differences in theory of mind: Implications for typical and atypical development (pp. 13–38). Psychology Press.
Baker, E. R., Huang, R., Battista, C., & Liu, Q. (2021). Theory of mind development in impoverished U.S. children and six cross-cultural comparisons. Journal of Applied Developmental Psychology, 76, 101314.
Baron-Cohen, S., Leslie, A. M., & Frith, U. (1985). Does the autistic child have a “theory of mind”? Cognition, 21(1), 37–46.
Cameron, S., Wilks, M., Redshaw, J., & Nielsen, M. (2022). The effect of moral character on children’s judgements of transgressions. Cognitive Development, 63, 101221.
Carlson, S. M., & Moses, L. J. (2001). Individual differences in inhibitory control and children’s theory of mind. Child Development, 72(4), 1032–1053.
Christensen, R. H. B. (2019). Cumulative Link Models for Ordinal Regression with the R Package ordinal. R Package Vignette Version 2019.12-10, URL.
Cushman, F., Sheketoff, R. W., Wharton, S., & Carey, S. (2013). The development of intent-based moral judgment. Cognition, 127(1), 6–21.
Devine, R. T., & Hughes, C. (2014). Relations between false belief understanding and executive function in early childhood: A meta-analysis. Child Development, 85(5), 1777–1794.
Gray, K., Young, L., & Waytz, A. (2012). Mind Perception Is the Essence of Morality. Psychological Inquiry, 23(2), 101–124.
Grusec, J. E. (2023). Moral Development from a Socialization Perspective. In Handbook of Moral Development (3rd ed., pp. 323–338). Routledge.
Hamlin, J. K. (2013). Failed attempts to help and harm: Intention versus outcome in preverbal infants’ social evaluations. Cognition, 128(3), 451–474.
Kalish, C. W., & Cornelius, R. (2007). What is to be Done? Children’s Ascriptions of Conventional Obligations. Child Development, 78(3), 859–878.
Killen, M., Mulvey, K. L., Richardson, C., Jampol, N., & Woodward, A. (2011). The accidental transgressor: Morally-relevant theory of mind. Cognition, 119(2), 197–215.
Killen, M., & Smetana, J. G. (2015). Origins and development of morality. Handbook of Child Psychology and Developmental Science, 3(7), 701–749.
Knobe, J. (2003). Intentional action in folk psychology: An experimental investigation. Philosophical Psychology, 16(2), 309–324.
Lagattuta, K. H., & Kramer, H. J. (2022). Developmental Changes in Integrating Mental States and Moral Judgments. Handbook of moral development.
Leslie, A. M. (1988). Some implications of pretense for mechanisms underlying the child’s theory of mind. In J. W. Astington, P. L. Harris, & D. R. Olson (Eds.), Developing theories of mind (pp. 19–46). Cambridge University Press.
Malle, B. F., Guglielmo, S., & Monroe, A. E. (2014). A theory of blame. Psychological Inquiry, 25(2), 147–186.
Malti, T., Gasser, L., & Gutzwiller‐Helfenfinger, E. (2010). Children’s interpretive understanding, moral judgments, and emotion attributions: Relations to social behaviour. British Journal of Developmental Psychology, 28(2), 275–292.
Margoni, F., & Surian, L. (2017). Children’s intention-based moral judgments of helping agents. Cognitive Development, 41, 46–64.
McNamara, R. A., Willard, A. K., Norenzayan, A., & Henrich, J. (2019). Weighing outcome vs. intent across societies: How cultural models of mind shape moral reasoning. Cognition, 182, 95–108.
Mulvey, K. L., Gönültaş, S., & Richardson, C. B. (2020). Who is to blame? Children’s and adults’ moral judgments regarding victim and transgressor negligence. Cognitive Science, 44(4), e12833.
Nobes, G., & Martin, J. W. (2022). They should have known better: The roles of negligence and outcome in moral judgements of accidental actions. British Journal of Psychology, 113(2), 370–395.
Ochoa, K. D., Rodini, J. F., & Moses, L. J. (2022). False belief understanding and moral judgment in young children. Developmental Psychology, 58(11), 2022–2035.
Piaget, J. (1965). The moral judgment of the child. Free Press. (Original work published 1932)
Premack, D., & Woodruff, G. (1978). Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences, 1(4), 515–526.
R Core Team. (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing.
Sarin, A., Lagnado, D. A., & Burgess, P. W. (2017). The intention-outcome asymmetry effect. Experimental Psychology, 64(2), 124–141.
Smetana, J. G. (2006). Social domain theory: Consistencies and variations in children’s moral and social judgments. In M. Killen & J. G. Smetana (Eds.), Handbook of moral development (pp. 119–153). Erlbaum.
Walczyk, J. J., & Cockrell, N. F. (2023). The nexus of morality and creativity vis-a-vis deception: A cognitive framework. In Creativity and Morality (pp. 81–99). Academic Press.
Wang, F., Zhu, L., & Shi, K. (2011). How do children coordinate information about mental states with social norms? Cognitive Development, 26(1), 72–81.
Wellman, H. M. (2020). Reading Minds: How Childhood Teaches Us to Understand People. Oxford University Press.
Young, L., Cushman, F., Hauser, M., & Saxe, R. (2007). The neural basis of the interaction between theory of mind and moral judgment. Proceedings of the National Academy of Sciences, 104(20), 8235–8240.
Zelazo, P. D., Helwig, C. C., & Lau, A. (1996). Intention, Act, and Outcome in Behavioral Prediction and Moral Judgment. Child Development, 67(5), 2478–2492.
This is an open access article distributed under the terms of the Creative Commons Attribution License (4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.