This paper reports four preregistered replications (total N = 2,008) of previously published studies from our lab. The first study replicates Cheek and Ward (2019) Study 3, which found that people want more options when adopting a maximizing goal of making the best possible choice compared to a satisficing goal of making a “good enough” choice. The second study replicates Cheek, Schwartz, and Shafir (2023) Study 2, which found that larger choice sets make choices feel more self-expressive even when they do not afford better preference matching. The third study replicates Cheek and Murray (2023) Supplemental Study 5, which found that people’s intuitions about adaptation to psychophysical experiences are associated with their intuitions about adaptation to life hardship. The final study replicates Cheek (2023) Study 4, which found that an informational intervention can reduce the biased belief that individuals in poverty are less vulnerable to harm than higher-income individuals. We successfully replicated all of the original effects, increasing confidence in the earlier findings.
The credibility revolution has provided psychologists with many tools to improve their research practices and increase confidence in their research findings (e.g., Beer et al., 2023; Korbmacher et al., 2023; Nelson et al., 2018). Among these tools, replication has emerged as one prominent and promising method for assessing the reliability of effects reported in the literature (e.g., Nosek et al., 2022; Zwaan et al., 2018). Indeed, from large-scale, multi-lab replication projects to smaller or more focused efforts, replications may help researchers judge how confident they should be about the robustness of a given study result. In the present research, we report four replications of our own lab’s recent work, joining others who have investigated the reliability of their previous research through self-replications (e.g., Aknin et al., 2020; Giessner & Schubert, 2019; Kraus, 2015; Shah et al., 2019).
Self-replications have been highlighted as one approach researchers can take to increase their confidence, and perhaps others’ confidence, in the reliability of their findings (e.g., Cesario, 2014; B. W. Roberts, 2015). Alternatively, when self-replications are not successful, they provide a path to updating and correcting the literature by identifying potential false positives in a research team’s studies (e.g., Heyman et al., 2017; Stanton & Campbell, 2016). Self-replications can take different forms—some attempt to replicate original findings with as similar methods as possible (e.g., Giessner & Schubert, 2019; Heyman et al., 2017; Kraus, 2015; Shah et al., 2019), while others attempt to generalize past findings to new contexts, populations, or time periods (e.g., Aiken et al., 2008; Gillebaart & Adriaanse, 2017; Lichtenstein & Slovic, 1973; Okazaki et al., 2002). Here, we take the former approach and attempt to replicate four previous studies from our lab using the materials from the original studies. Of course, it is not possible to identically replicate a past finding (e.g., because one cannot re-run the study in the same time period as the original), but we chose our replication targets and designed our studies so that they would, in our view, meet Nosek and Errington’s (2020) definition of a replication study as “a study for which any outcome would be considered diagnostic evidence about a claim from prior research” (p. 2).
Overview of the Present Research
We conducted replications of four previously-published studies from our lab: Cheek and Ward (2019) Study 3; Cheek et al. (2023) Study 2; Cheek and Murray (2023) Supplemental Study 5; and Cheek (2023) Study 4. We chose these papers for two reasons. First, they reported research that was led by the lab director (Cheek) and investigated prominent topics of interest in our lab: maximizing during decision making; the effects of choice set size; lay intuitions about hardship; and beliefs about people in poverty. Second, we wanted to ensure that each of our chosen studies could be run on Prolific, and that, in our judgment, using the original materials on Prolific in June 2024 would constitute a diagnostic test of the original theorizing (Nosek & Errington, 2020). For each paper, we replicated the first reported study that we viewed as a clear theoretical test of the paper’s central ideas (i.e., not necessarily the first study reported overall). We explain our specific choice of each study when reporting the results of each replication below.
Our motivation for the present replication project was twofold. First, spurred by ongoing conversations about the importance of replication—and in this case, of self-replication—we wanted to conduct our own self-replications to explore the robustness of previous effects from our lab. Our reasoning was that the four target papers sampled a relatively wide range of the topics our lab investigates and thus that these self-replications would be informative for more than one line of work in our lab.1 Whereas some replication projects may focus on a narrower set of research questions or even just one research question, we drew inspiration from past replication efforts that involved many different areas of research (e.g., Klein et al., 2014; Open Science Collaboration, 2015) when deciding to sample the present set of papers. Just as some replication projects have sought to make conclusions about the robustness of a particular body of work (e.g., psychology research published in high-profile journals; Open Science Collaboration, 2015), we conceptualized our project as affording conclusions about the robustness of a particular body of work—namely, our own. Additionally, whereas some replication projects may focus on effects that seem especially robust or especially uncertain, we instead selected papers based on the extent to which they represented focal areas of research in the lab.
Our second goal was pedagogical. Replications have been highlighted as a valuable teaching tool (e.g., Frank & Saxe, 2012), and we decided that conducting the present studies and subsequently writing them up for publication would be a valuable learning opportunity for the second author, for whom the present work is the first peer-reviewed publication. Given our two goals, we opted to conduct our replications without adding additional measures or otherwise extending past studies, though of course self-replications can sometimes also contribute to the literature by both replicating and extending previous research.
We decided to power each replication to have a 95% chance of detecting the same effect size, regardless of the specific effect size measure used in each study: d = 0.30 or r = .15 or f = 0.15. We aimed to ensure sufficient power to detect effect sizes that we deemed meaningful while balancing the constraints of available funding for this project. We felt that powering for this effect size was appropriate both because it ensured we could detect effects that were potentially meaningful for applications of this research outside of the lab (cf. Funder & Ozer, 2019) and because it provided power to detect effects at least as small as each of the original effects we sought to replicate while providing more power than the original studies we were replicating.2
Transparency and Open Science Practices
We preregistered each of our replication studies through AsPredicted.org (Replication 1: https://aspredicted.org/4qcs-tcp9.pdf; Replication 2: https://aspredicted.org/c5kq-hvqg.pdf; Replication 3: https://aspredicted.org/dgs5-jbvj.pdf; Replication 4: https://aspredicted.org/47hr-78f3.pdf) and we “bundled” the preregistrations so that each would link to the others, making it clear that these four studies were all part of the same replication project. We also explicitly noted in the individual preregistrations that each study was part of a series of four replication studies, and we specified which four studies we were replicating. We did this to prevent concerns about unsuccessful replications being left in the “file drawer” (Rosenthal, 1979)—this way, readers would know we reported all of the studies comprising our replication effort. For each study, we report how we determined the sample size, all data exclusions, and all measures. All materials, data, and analysis scripts are available through the Open Science Framework: https://osf.io/s7p6f/.
Replication 1: Cheek & Ward (2019) Study 3
For our first replication, we chose Study 3 from Cheek and Ward (2019). This paper investigated how and why maximizing (seeking to make the best possible choice) relates to positive and negative experiences with choice. Maximizing has been conceptualized both as a goal and as a strategy (e.g., Cheek & Goebel, 2020; Dalal et al., 2015; Diab et al., 2008; Lai, 2010; Schwartz et al., 2002). Cheek and Ward examined whether this distinction could shed light on the apparently paradoxical experiences of maximizers, who seem at times to want more options to choose from than satisficers (individuals seeking only to make a “good enough” choice) and yet also experience more overload than satisficers when choosing from larger choice sets (e.g., Dar-Nimrod et al., 2009).
To help explain this “maximization paradox,” Cheek and Ward (2019) hypothesized that the maximizing goal of making the best choice would be associated with wanting larger choice sets, whereas the maximizing strategy of searching extensively for and thoroughly comparing alternatives could be associated with becoming more easily overwhelmed by choice–especially when there are a lot of options. The first two studies in this paper were correlational studies that used the Maximization Scale (Schwartz et al., 2002) to test this theorizing, but because this scale has been widely criticized (e.g., Dalal et al., 2015; Diab et al., 2008; Lai, 2010; for a review, see Cheek & Schwartz, 2016), we decided not to replicate either of these first two studies. Our goal was to replicate the first reported study that was a clear test of each paper’s hypotheses, and we were concerned that the Maximization Scale’s limitations might make the first two studies suboptimal theory tests. Accordingly, we chose to replicate Study 3.
Study 3 of Cheek and Ward (2019) sought to test the hypothesis that adopting a maximizing goal (relative to a satisficing goal) could cause people to want larger choice sets from which to choose. In other words, Study 3 moved from the correlational approach of Cheek and Ward’s first two studies to an experimental approach to better test causal claims about the maximizing goal’s effect on choice set size preferences (the hypothesized effect of the maximizing strategy vs. satisficing strategy was tested in separate studies). In this study, participants were asked both to imagine adopting a maximizing goal of making the best possible choice and to imagine adopting a satisficing goal of making a “good enough” choice whether or not it is the best. In each of these goal conditions, participants reported how many options they would want to have when making three different consumer choices (a new car to buy, a new cellphone to buy, and a movie to see).
Cheek and Ward (2019) found that participants reported wanting more options from which to choose when they imagined adopting a maximizing goal compared to when they imagined adopting a satisficing goal. They interpreted this finding as support for their theorizing that the goal of making the best possible choice causes maximizers to want larger choice sets, even when they might be happier with smaller choice sets in the end (Dar-Nimrod et al., 2009). In this replication, we used the original materials from Cheek and Ward’s study, though we ran this study on Prolific, whereas Cheek and Ward ran their study on Amazon’s Mechanical Turk. Our judgment was that a sample of U.S. Prolific users in 2024 was a similar enough population to Cheek and Ward’s earlier U.S. Mechanical Turk population that our replication would provide a diagnostic test of the original theorizing (Nosek & Errington, 2020).3
Method
Participants
We aimed to recruit 200 participants through Prolific to achieve a final sample of at least 147 after exclusions, which provides a 95% chance of detecting an effect of d = 0.30 with α = .05. To be included in the analyses, participants needed to pass two attention check questions and confirm that they did not respond randomly or dishonestly. In total, 201 participants completed the study, of whom 192 met the inclusion criteria. See Table 1 for participant demographic information.
Characteristic . | Cheek & Ward (2019) Study 3 Replication . | Cheek, Schwartz, & Shafir (2023) Study 2 Replication . | Cheek & Murray (2023) Supplemental Study 5 Replication . | Cheek (2023) Study 4 Replication . |
---|---|---|---|---|
Initial Sample Size | 201 | 650 | 621 | 744 |
Preregistered Exclusions | 9 | 69 | 35 | 95 |
Final Analyzed Sample Size | 192 | 581 | 586 | 649 |
Gender | ||||
Woman | 94 | 316 | 331 | 360 |
Man | 95 | 251 | 244 | 266 |
Non-binary | 3 | 14 | 9 | 22 |
Not listed | 0 | 0 | 1 | 1 |
Age | M = 36.21 SD = 11.06 | M = 36.22 SD = 11.92 | M = 36.78 SD = 12.58 | M = 37.16 SD = 12.40 |
Race/Ethnicity | ||||
Asian | 30 | 71 | 69 | 77 |
Black | 22 | 70 | 75 | 101 |
Latine | 24 | 59 | 55 | 57 |
Middle Eastern/North African | 1 | 4 | 5 | 5 |
Native American | 3 | 8 | 8 | 12 |
White | 129 | 397 | 403 | 423 |
Multiracial | 3 | 18 | 16 | 30 |
Not listed | 0 | 2 | 7 | 2 |
Characteristic . | Cheek & Ward (2019) Study 3 Replication . | Cheek, Schwartz, & Shafir (2023) Study 2 Replication . | Cheek & Murray (2023) Supplemental Study 5 Replication . | Cheek (2023) Study 4 Replication . |
---|---|---|---|---|
Initial Sample Size | 201 | 650 | 621 | 744 |
Preregistered Exclusions | 9 | 69 | 35 | 95 |
Final Analyzed Sample Size | 192 | 581 | 586 | 649 |
Gender | ||||
Woman | 94 | 316 | 331 | 360 |
Man | 95 | 251 | 244 | 266 |
Non-binary | 3 | 14 | 9 | 22 |
Not listed | 0 | 0 | 1 | 1 |
Age | M = 36.21 SD = 11.06 | M = 36.22 SD = 11.92 | M = 36.78 SD = 12.58 | M = 37.16 SD = 12.40 |
Race/Ethnicity | ||||
Asian | 30 | 71 | 69 | 77 |
Black | 22 | 70 | 75 | 101 |
Latine | 24 | 59 | 55 | 57 |
Middle Eastern/North African | 1 | 4 | 5 | 5 |
Native American | 3 | 8 | 8 | 12 |
White | 129 | 397 | 403 | 423 |
Multiracial | 3 | 18 | 16 | 30 |
Not listed | 0 | 2 | 7 | 2 |
Note. Participants could select more than one option when reporting their race/ethnicity and could choose not to report demographic information. For the replication of Cheek, Schwartz, and Shafir (2023) Study 2, we report sample size and demographics after the additional (preregistered) exclusion of participants from the small choice set condition.
Materials and Procedure
In a counterbalanced order, participants were asked to imagine making choices with both a maximizing goal of making the best possible choice and a satisficing goal of making a “good enough” choice.4 In the maximizing condition, participants read the following description:
Imagine you are going to make some choices. You are going to try to make the best choice possible. You don’t want to settle—your goal is to choose the very best option.
In the satisficing goal condition, participants instead read:
Imagine you are going to make some choices. You are going to try to make a “good enough” choice. You don’t need to make the very best choice possible–you just want something that meets your standards, regardless of whether it’s the best possible option or not.
In each goal condition, participants rated how many options they would want to choose from when choosing a new car, when choosing a new cellphone, and when choosing a movie to watch on a 15-point scale with labels ranging from 2 options to 30+ options in increments of 2. Ratings for the three choices were averaged together to create a composite index of desired number of options (α = .77 for maximizing condition; α = .70 for satisficing condition).
Results
As in the original study, participants reported wanting more options to choose from when imagining adopting a maximizing goal (M = 6.51, SD = 3.20) than when imagining adopting a satisficing goal (M = 3.97, SD = 2.15), t(191) = 16.47, p < .001, d = 1.19, 95% CI [1.00, 1.37].5
Discussion
This study successfully replicated the original finding from Cheek and Ward (2019) Study 3: imagining adopting a maximizing goal, compared with a satisficing goal, led people to desire larger choice sets. Unexpectedly, the effect size in our replication study was substantially larger than the effect size in the original study (dreplication = 1.19 vs. doriginal = 0.66). One possibility is that the larger effect size emerged because participants on Prolific in 2024 paid more attention than the original participants from MTurk (e.g., Krefeld-Schwalb et al., 2024). For our replication purpose, however, we view this study as a successful replication, given that we found a large effect in the same predicted direction as Cheek and Ward’s original finding and thus our results support the original hypothesis.
Replication 2: Cheek, Schwartz, & Shafir (2023) Study 2
For our second replication, we chose Study 2 from Cheek et al. (2023). This paper explored the effects of choice set size on perceived self-expression–that is, the extent to which choosers see their choices as reflective of who they are (e.g., Bruner, 1990). Whereas much of the previous literature on choice set size has explored when and how having more options affects outcomes like choice paralysis, uncertainty, regret, and satisfaction (e.g., Chernev, 2003; Iyengar & Lepper, 2000; Reutskaja et al., 2022; Reutskaja & Hogarth, 2009; Schwartz, 2016; Tversky & Shafir, 1992; for reviews, see Chernev et al., 2015; Reutskaja et al., 2020), Cheek et al. focused on the meaning people ascribe to their choices. They theorized that choice set size can shape how people interpret their choices, such that larger choice sets can make choices feel more self-expressive. They further theorized that larger choice sets might increase self-expression in part by causing people to feel like they were better able to match their preferences during choice. This theorizing drew on the fact that, at least in a U.S. cultural context, there is a strong belief that the freedom to choose is a fundamental means through which preferences are satisfied and expressed (e.g., Bellah et al., 1985; Botti et al., 2023; Cheek et al., 2022; Cheek & Schwartz, in press; Markus & Kitayama, 1991; Reutskaja et al., 2022; Schwartz & Cheek, 2017). This belief in the link between choice freedom and preference satisfaction may thus make larger assortments feel like better paths to preference matching, which could then increase perceived self-expression. Indeed, previous research has found that people think their choices are more self-expressive when they feel they have more closely matched their preferences (e.g., Rozenkrants et al., 2017; Sela et al., 2017).
In their first study, Cheek et al. (2023) tested their predictions about the effects of choice set size on perceived self-expression and perceived preference matching by randomly assigning participants to choose from either a large assortment of 20 drinks or a randomly selected subset of 3 drinks from the larger set. As predicted, they found that participants who chose from the larger set thought their choices were more self-expressive and thought they had better matched their preferences. Perceived preference matching also mediated the effect of choice set size on self-expression.
Although this first study provided initial evidence in support of Cheek et al.’s theorizing, the authors also noted that, in this study, it may be normatively true that participants who chose from the larger set better matched their preferences–indeed, the larger set objectively offered a wider range of potentially preference-satisfying options. Cheek et al., however, hypothesized that, at least in the U.S., people might connect abundant choice and preference satisfaction so deeply that they would see their choices as better preference matches and as more self-expressive after choosing from larger assortments even when larger assortments objectively afforded no better opportunity to preference match than smaller assortments. In other words, larger choice sets might create the illusion of preference matching even when they do not actually offer any better options than smaller assortments.
Cheek et al. (2023) tested this potential illusory preference matching in their second study. Participants were randomly assigned either to choose from a set of 3 Italian restaurants (described with attributes including price, quality, parking availability, and whether they allowed reservations) or to choose from a larger set comprised of those 3 options and 17 additional options that were all objectively worse than the original 3 options on at least one dimension. Thus, the larger choice set did not afford participants any objectively better opportunity to match their preferences because the additional options were all objectively inferior. Nonetheless, Cheek et al. found that participants who chose from the larger choice set thought that they had better matched their preferences and thought their choice was more self-expressive. And, as in the first study, perceived preference matching mediated the effect of choice set size on self-expression.
When choosing a study to replicate, we viewed Study 2 as a more compelling theory test than Study 1 because it ruled out a normative interpretation of Cheek et al.’s (2023) results. Accordingly, we used the original materials from Cheek et al.’s Study 2, though we ran this study on Prolific, whereas Cheek et al. ran their study on Amazon’s Mechanical Turk. As in our first replication, our judgment was that a sample of U.S. Prolific users in 2024 was a similar enough population to Cheek et al.’s earlier U.S. Mechanical Turk population that our replication would provide a diagnostic test of the original theorizing (Nosek & Errington, 2020).
Method
Participants
We aimed to recruit 650 participants through Prolific to achieve a final sample of at least 580 after exclusions, which provides a 95% chance of detecting an effect of d = 0.30 with α = .05. To be included in the analyses, participants needed to pass two attention check questions and confirm that they did not respond randomly or dishonestly. In total, 650 participants completed the study, of whom 599 met those inclusion criteria (see results section below for a description of additional preregistered exclusions resulting in a final analyzed sample of 581). See Table 1 for participant demographic information.
Materials and Procedure
Participants were shown a list of fictional Italian restaurants and chose where they would want to eat. Participants were randomly assigned to choose from either the large choice set (n = 298) or the small choice set (n = 301). In the large choice set condition, participants chose from a set of 20 options, whereas participants in the small choice set condition chose from a set of 3 options. For each restaurant option, we provided the restaurant’s name, and ratings indicating quality (number of stars out of five), price (number of dollar signs out of five), parking availability (free, meter, or paid parking garage), and whether the restaurant took reservations.
All 3 options in the small choice set condition received five stars and one dollar sign (i.e., were high-quality and very affordable), offered free parking, and accepted reservations. The additional 17 options in the large choice set condition were all worse on at least one dimension—they had a lower rating, were more expensive, offered less convenient parking, and/or did not accept reservations. In other words, the 3 options in the small choice set dominated the alternatives added to create the large choice set. Thus, the additional options in the large choice set were not expected to better match participants’ preferences than those already available in the small choice set.
After making their restaurant choice, participants completed two manipulation check questions (e.g., “How much do you agree with this statement: There were a lot of options to choose from”; α = .886), a two-item measure of perceived preference matching (e.g., “How well did your choice match your preferences?”; α = .83), and a four-item measure of perceived self-expression (e.g., “How much did your choice reflect your identity?”; α = .95)
Results
Preliminary Analyses
In the large choice set condition, 6% of participants (n = 17) chose one of the options exclusive to that set–that is, one of the additional, objectively inferior, options added to the 3 options in the small choice set. Assuming that a comparable percentage of participants presented with the small choice set would have preferred one of the alternatives only available in the larger set, we followed our preregistration and the original study by excluding the 6% of participants (n = 18) in the small choice set condition who had the lowest scores on the perceived preference matching measure. This strategy aims to ensure that any difference between the choice set conditions is not driven by those participants in the small choice set condition whose preferences would be objectively better matched by the large choice set’s options, allowing us to test the illusory preference matching predicted and found by Cheek et al. (2023). (We found the same pattern of results when including these 18 participants in analyses; see supplemental material for details.)
Main Analyses
As in the original study, the manipulation of choice set size was effective: participants in the large choice set condition felt they had more choice (M = 6.92, SD = 1.66, n = 298) than participants in the small choice set condition (M = 3.14, SD = 1.67, n = 283), t(579) = 27.33, p < .001, d = 2.27, 95% CI [2.06, 2.48]. Replicating the original study, participants in the large choice set condition thought that they had better matched their preferences (M = 7.46, SD = 1.43) than those in the small choice set condition (M = 6.06, SD = 1.35), t(579) = 12.13, p < .001, d = 1.01, 95% CI [0.83, 1.18]. Further replicating the original study, participants in the large choice set condition thought that their choices were more self-expressive (M = 4.96, SD = 1.98) than those of participants in the small choice set condition (M = 3.52, SD = 1.94), t(579) = 8.90, p < .001, d = 0.74, 95% CI [0.57, 0.91]. Finally, to test the predicted pattern of mediation, we used Hayes’ (2018) PROCESS macro for SPSS with bootstrapping (5,000 samples) to estimate confidence intervals. Again replicating the original study, perceived preference matching significantly mediated the effect of choice set size on perceived self-expression, indirect effect = 0.67, 95% CI [0.50, 0.87] (see Table 2 for details).
Effect | Estimate | SE | 95% CI |
Total effect of condition on perceived self-expression | 1.45 | 0.16 | [1.13, 1.76] |
Direct effect of condition on perceived self-expression | 0.77 | 0.17 | [0.44, 1.11] |
Indirect effect of condition on perceived self-expression | 0.67 | 0.10 | [0.50, 0.87] |
Effect of condition on perceived preference matching | 1.40 | 0.12 | [1.17, 1.63] |
Effect of perceived preference matching on perceived self-expression | 0.48 | 0.06 | [0.37, 0.59] |
Effect | Estimate | SE | 95% CI |
Total effect of condition on perceived self-expression | 1.45 | 0.16 | [1.13, 1.76] |
Direct effect of condition on perceived self-expression | 0.77 | 0.17 | [0.44, 1.11] |
Indirect effect of condition on perceived self-expression | 0.67 | 0.10 | [0.50, 0.87] |
Effect of condition on perceived preference matching | 1.40 | 0.12 | [1.17, 1.63] |
Effect of perceived preference matching on perceived self-expression | 0.48 | 0.06 | [0.37, 0.59] |
Discussion
This study successfully replicated the original findings from Cheek et al. (2023) Study 2: choosing from a larger choice set caused participants to see their choices as better matching their preferences and as more self-expressive. The effect of choice set size on self-expression was also mediated by perceived preference matching. As in the original study, these effects emerged despite the fact that the larger choice set did not objectively afford a better opportunity for participants to match their preferences. In other words, having more options can lead people to have the illusion that they have better matched their preferences regardless of whether additional options are appealing or not.
Replication 3: Cheek & Murray (2023) Supplemental Study 5
For our third replication, we chose Supplemental Study 5 from Cheek and Murray (2023). Recent research has shown that people sometimes believe individuals from lower socioeconomic status (SES) backgrounds are less harmed by negative events than higher-SES individuals (Cheek, 2023; Cheek & Shafir, 2020, 2024; Summers et al., 2021). This belief, labeled the “thick skin bias” by Cheek and Shafir (2020), has been found among laypeople in a representative sample of the U.S. and among professionals including teachers and mental healthcare providers (Cheek & Shafir, 2020). It also extends to judgments about physical pain in addition to emotional harm (Bernardes et al., 2021; Cheek & Shafir, 2020; Summers et al., 2021) and to judgments about children in addition to adults (Cheek & Shafir, 2020; Summers et al., 2023). The belief that lower-SES individuals are less harmed by negative events is also associated with downstream forms of neglect, such as the belief that lower-SES women experiencing abuse or harassment are less in need of help and support (Cheek, Bandt-Law, et al., 2023). Building on this work, Cheek and Murray’s paper tested a potential cognitive cause of the thick skin bias–namely, the overgeneralization of lay intuitions about adaptation.
Research on the thick skin bias grew from a burgeoning literature on lay beliefs about hardship showing that people appear to believe that “what does not kill you makes you stronger” (i.e., that experiencing adversity can make people less vulnerable to future harm; Ben-Avi et al., 2018; Deska et al., 2020; Hoffman et al., 2016; Hoffman & Trawalter, 2016; Infurna & Jayawickreme, 2019; Owenz & Fowers, 2019; Trawalter & Hoffman, 2015; Zagefka, 2022). In an attempt to understand why people believe hardship is toughening instead of believing that hardship makes people more vulnerable, Cheek and Murray (2023) proposed that this belief emerges in part because people overgeneralize their other, accurate intuitions about adaptation. Specifically, Cheek and Murray (2023) suggested that people may have relatively accurate intuitions about some forms of adaptation. In particular, people may understand psychophysical adaptation–adaptation to previous levels of exposure in physical perception (e.g., brightness, weight, volume). Decades of psychology have shown that people do indeed adapt to physical stimuli (e.g., Bevan & Darby, 1955; Harvey & Campbell, 1963; Helson, 1964); for example, someone who has just been holding something very heavy will find a mid-weight object to be lighter than will someone who has just been holding something very light. Because people have access to this perceptual phenomenon (i.e., they encounter adaptation in their own everyday sensory experiences), they likely hold fairly strong intuitions that people adapt to psychophysical experiences. People may then generalize these accurate intuitions about psychophysical adaptation to other contexts where they are less appropriate, such as the context of hardship.
In other words, people may believe that someone who has experienced more hardship would be less affected by new negative events than someone who has experienced less hardship in the same way that someone who has previously held a heavier weight would experience a new object as less heavy than someone who has previously held a lighter weight. If so, then the more people endorse intuitions about adaptation to psychophysical experiences, the more they should also endorse intuitions about adaptation to hardship. Cheek and Murray (2023) tested this prediction in their first series of studies.
In Study 1 of Cheek and Murray (2023), participants completed a measure of intuitions about psychophysical adaptation and a measure of intuitions about adaptation to hardship. In that study, these measures were positively correlated: the more strongly participants believed that people adapt to psychophysical stimuli, the more strongly they also believed that people adapt to hardship. Although this study provided evidence in support of Cheek and Murray’s hypothesis, the measures were worded in a somewhat different fashion than many social psychology measures (see Cheek & Murray, 2023, Study 1, for details).7 To reduce concerns that the effect was specific to the wording of the measure, Cheek and Murray conducted five follow-up studies (Supplemental Studies 1-5) in which they varied different aspects of the measures’ wording and response scales. The predicted correlation between psychophysical adaptation intuitions and hardship adaptation intuitions emerged consistently across all of these studies, and we therefore view all six studies (the original Study 1 and the five follow-ups) as equivalently diagnostic tests of Cheek and Murray’s theorizing. We ultimately chose to replicate Supplemental Study 5 because the measures in that study (described below) most closely resemble typical social psychology measures (e.g., they have 1-7 Likert-type scales that range from strongly disagree to strongly agree), and thus we expected that readers would find a replication of this study most compelling. The original Supplemental Study 5 was conducted on Prolific, and for this replication we ran the original materials on Prolific as well.
Method
Participants
We aimed to recruit 620 participants through Prolific to achieve a final sample of at least 571 after exclusions, which provides a 95% chance of detecting an effect of r = .15 with α = .05. To be included in the analyses, participants needed to pass two attention check questions and confirm that they did not respond randomly or dishonestly. In total, 621 participants completed the study, of whom 586 met the inclusion criteria. See Table 1 for participant demographic information.
Materials and Procedure
Participants completed both of the following measures in a counterbalanced order.
Psychophysical Adaptation Intuitions. Participants completed a 5-item measure of the strength of their intuitions about psychophysical adaptation. For each item, participants rated how much they agreed with a statement describing psychophysical adaptation on a 7-point scale (1 = strongly disagree; 7 = strongly agree). For example, one item read, “Someone who has just been in a very hot room would find a 60-degree (F) day to be cooler than would someone who has just been in a very cold room.” Following our preregistration and the original study, when scoring the composite measure (α = .74), we omitted the one reverse-coded item (because including it reduces the scale’s reliability to unacceptably low levels; see supplemental material for details). Higher scores indicate greater endorsement of psychophysical adaptation intuitions.
Hardship Adaptation Intuitions. Participants completed a 5-item measure of the strength of their intuitions about hardship adaptation (α = .84). For each item, participants rated how much they agreed with a statement describing adaptation to hardship on a 7-point scale (1 = strongly disagree; 7 = strongly agree). For example, one item read, “Someone who has previously experienced very little life hardship would find being ignored by a city council meeting more offensive than would someone who has previously experienced a lot of life hardship.” Higher scores indicate greater endorsement of hardship adaptation intuitions.
Results
As in the original study, participants who more strongly endorsed psychophysical adaptation intuitions also more strongly endorsed hardship adaptation intuitions, r = .30, 95% CI [.22, .37], p < .001.
Discussion
This study successfully replicated the original finding from Cheek and Murray (2023) Supplemental Study 5: participants who more strongly endorsed psychophysical adaptation intuitions also more strongly endorsed hardship adaptation intuitions. This replication provides further evidence of the relation between intuitions about adaptation to different kinds of experiences—low-level perceptual experiences versus emotionally complex experiences of adversity. Thus, one source of people’s belief that “what does not kill you makes you stronger” may be that they (accurately) perceive human adaptation in other domains (i.e., physical perception) and overgeneralize their intuitions about adaptation to the context of life hardship. Of course, this study was correlational and does not afford causal conclusions, though additional studies by Cheek and Murray (2023) found evidence for the causal role of psychophysical adaptation intuitions in the thick skin bias using experimental designs (for details, see the original paper).
Replication 4: Cheek (2023) Study 4
For our fourth and final replication, we chose Study 4 from Cheek (2023). As described above, people sometimes display a “thick skin bias,” whereby they incorrectly think lower-SES individuals are less vulnerable to harm than higher-SES individuals (Cheek & Shafir, 2020). Cheek (2023) built on previous thick skin bias research in two ways. First, Cheek tested whether people displayed the bias in the context of the global coronavirus pandemic, finding in Studies 1-2 that people indeed thought some everyday effects of COVID-19, such as being apart from loved ones, were less harmful for lower-SES individuals. Study 3 then tested whether reading an informational article that outlined how poverty actually increased the harm of the pandemic could reduce the thick skin bias in the context of the pandemic. Cheek found that reading an article reduced, but did not reverse (i.e., fully correct), the thick skin bias in perceptions of the everyday effects of the pandemic.
Although these three studies contributed to the thick skin bias literature and were diagnostic tests of Cheek’s theorizing, they were also conducted in 2020 in the months after the U.S. went into lockdown and it became clear the coronavirus pandemic would last more than a couple of weeks. As a result, we did not feel confident that replicating any of these three studies in June 2024, when the pandemic, though ongoing, had faded from many U.S. Americans’ attention and when the everyday effects used in Cheek’s studies are no longer ubiquitous concerns as they were during the onset of the global crisis, would be able to provide diagnostic evidence in favor of or against Cheek’s original theorizing. Of course, replicating these studies could nonetheless inform ongoing theorizing by testing whether people still show the thick skin bias when judging, several years later, how past pandemic-related events affected different individuals, but that goal differs from our goal of conducting replications that could test the same theorizing as the original research (Nosek & Errington, 2020; for related discussions see Gergen, 1974; Schwarz & Strack, 2014; Shafir & Cheek, 2024). Accordingly, we ultimately chose to replicate Study 4 from Cheek (2023).
In Study 4, Cheek (2023) probed the potential debiasing effect of an informational article outside of the context of the pandemic. Specifically, participants were randomly assigned to read either a debiasing article that explicated how poverty exacerbates rather than mitigates the harm of new negative events or a control article about working memory. After reading one of these articles, participants rated how harmed they thought a lower-SES individual and a higher-SES individual would be by a series of 11 negative events sampled from everyday life (e.g., not getting enough sleep; being treated badly by one’s boss). Cheek found that participants displayed the thick skin bias by rating the lower-SES individual as less harmed than the higher-SES individual whether they read the debiasing article or the control article, but he also found that the bias was smaller in the debiasing article condition. The debiasing article did not appear to change perceptions of how harmed lower-SES individuals would be by negative events; rather, it only decreased how harmed participants thought the higher-SES individuals would be.
Cheek concluded by suggesting that it may be worth further studying informational interventions as potential strategies for addressing the thick skin bias, but that it was also a striking illustration of the strength of the bias that it still emerged even after people had read an article explicitly arguing against it. In our replication, we used the original materials from Cheek (2023) Study 4 with two minor exceptions. First, in the original study, participants were randomly assigned to read about either two men (one lower-SES and one higher-SES) or two women (again one lower-SES and one higher-SES). The lower-SES individual was named Jordan in both conditions, whereas the higher-SES individual was named Thomas when participants were assigned to read about men and was named Tanya when participants were assigned to read about women. Although previous work has not found an effect of this name difference (see Cheek & Murray, 2023, Study 2), we decided to match names between target gender conditions by using the name Taylor in both conditions. Second, in the original study, the articles were claimed to be written by Stanley Coren, but we changed this name to Stanley Johnson. These were the only changes we made from the original materials and we did not anticipate that either of these changes would have meaningful effects on the results. In other words, we do not view these changes as undermining the diagnostic value of this replication (Nosek & Errington, 2020). The original study was conducted on Prolific, and for this replication we ran the original materials on Prolific as well.
Method
Participants
We aimed to recruit 750 participants through Prolific to achieve a final sample of at least 582 after exclusions, which provides a 95% chance of detecting an effect of f = 0.15 with α = .05. To be included in analyses, participants needed to pass two attention check questions, submit a valid response to one article comprehension check question after reading the article, and confirm (1) that they did not respond randomly or dishonestly and (2) that they did actually read the article. In total, 744 participants completed the study and provided consent to analyze their data both before and after participating (as required by our IRB for this study). Of these, 649 participants met the inclusion criteria. See Table 1 for participant demographic information.
Materials and Procedure
At the beginning of the study, participants were randomly assigned to read one of two informational article excerpts. In the debiasing article condition, participants read a piece that explicated how poverty frequently exacerbates the effect of negative events for people in poverty. In the control article condition, participants instead read an unrelated excerpt about working memory (for full texts, see study materials on our OSF page). After reading the article, participants answered an open-ended article comprehension check question asking them to summarize the main point of the article along with a filler question asking how interesting the article was.
Participants then read about a low-SES and a high-SES target in a counterbalanced order. As in the original study, each target was described as White to prevent participants from inferring race based on SES information (e.g., participants might assume that the low-SES target was Black and that the high-SES target was White based on class-race associations in the U.S.; e.g., Lei & Bodenhausen, 2017). Target gender8 (man vs. woman) was also varied across subjects for potential exploratory analyses as in the original study. For the low-SES target, participants read the following description:
Jordan, a white man [woman] in his [her] 20s, was born and raised in a large city in the U.S. Jordan has experienced many financial difficulties in his [her] life. He and his [She and her] siblings were raised by parents who struggled to find steady work to pay the bills. Jordan’s family is financially unstable; they often struggle to have enough money for food, rent, or other basic things.
For the high-SES target, participants read the following description:
Taylor, a white man [woman] in his [her] 20s, was born and raised in a large city in the U.S. Taylor has not experienced any financial difficulties in his [her] life. He and his [She and her] siblings were raised by parents who comfortably supported them by working well-paying jobs. Taylor’s family is financially stable; they never struggle to have enough money for food, rent, or other basic things.
After reading about each target, participants rated how harmed each target would be by 11 negative events (e.g., having heating break during winter, having an abusive boss, being kept awake all night by noise). The impact of each event was rated on a 0–10 scale (with higher numbers indicating a more severe impact) for a particular emotional reaction, such as how disappointed, sad, upset, or frustrated the target would be in that situation (αlow-SES = .90; αhigh-SES = .85).
Results
As in the original study, a 2 (article condition) × 2 (target SES) mixed ANOVA yielded a significant main effect of target SES, F(1, 647) = 190.89, p < .001, ηp2= 0.23, 90% CI [0.18, 0.27], a nonsignificant main effect of article condition, F(1, 647) = 1.89, p = .170, ηp2 = 0.00, 90% CI [0.00, 0.01], and a significant interaction, F(1, 647) = 15.09, p < .001, ηp2 = 0.02, 90% CI [0.01, 0.05].9 To better understand this interaction, we broke down the results first by article condition and then by target SES condition, but we note that we preregistered only the ANOVA and not these pairwise comparisons (though they exactly mirrored Cheek’s, 2023, analyses). Replicating the original study, the debiasing article did not eliminate the thick skin bias. Indeed, participants thought that the low-SES target would be less affected by the negative events than the higher SES target in both the control article condition (Mlow-SES = 6.32, SDlow-SES = 1.72 vs. Mhigh-SES = 7.87, SDhigh-SES = 1.31), t(319) = 13.04, p < .001, d = 0.73, 95% CI [0.61, 0.85], and the debiasing article condition (Mlow-SES = 6.78, SDlow-SES = 1.67 vs. Mhigh-SES = 7.65, SDhigh-SES = 1.45), t(328) = 6.78, p < .001, d = 0.37, 95% CI [0.26, 0.49]. However, also replicating the original study, the effect was smaller in the debiasing condition by about half a standard deviation.
Broken down differently, participants thought that the high-SES target would be less harmed by negative events after reading the debiasing article, t(647) = -2.07, p = .039, d = -0.16, 95% CI [-0.32, -0.01], whereas they thought the low-SES target would be more harmed after reading the debiasing article, t(647) = 3.43, p < .001, d = 0.27, 95% CI [0.12, 0.42].
Discussion
Overall, this study successfully replicated the original findings from Cheek (2023) Study 4: a debiasing article reduced, but did not eliminate, the thick skin bias.10 Replicating previous research on the thick skin bias (Cheek & Shafir, 2020), participants in both conditions thought that the low-SES target would be less harmed than the high-SES target by the negative events, but this effect was smaller among participants who read the debiasing article. As in the original study, the debiasing article influenced judgments about the high-SES target, causing participants to think the high-SES target would be less harmed. However, unlike the original study, the debiasing article also influenced judgments about the low-SES target, causing participants to think the low-SES target would be more harmed.
We did not predict that the debiasing article would have a stronger effect than in the original study, but we still see this as a successful replication of the finding that reading the debiasing article reduces the thick skin bias. Indeed, if anything, the present study suggests the article may be more effective than the original study suggested. In sum, this study’s results echo the suggestion from Cheek (2023) that it might be worth further studying informational interventions on the thick skin bias given the documented decrease in the bias, while at the same time underlining the potential ability of the bias to persist despite the provision of information in direct opposition to it.
General Discussion
We conducted four replications of previous studies from our research lab to assess the reliability of our past findings. Across all studies, we successfully replicated the original findings (see Table 3 for a summary and interpretation of each replication). Two of our replications, those of Cheek et al. (2023) Study 2 and of Cheek and Murray (2023) Supplemental Study 5, closely mirrored the original findings. Our other two replications, those of Cheek and Ward (2019) Study 3 and of Cheek (2023) Study 4, also replicated the original effects, but found, if anything, stronger effects than the original studies.
Study . | Original Result . | Replication Result . | Interpretation . |
---|---|---|---|
Cheek & Ward (2019) Study 3 | Participants reported wanting more options to choose from when imagining adopting a maximizing goal (M = 7.69, SD = 3.63) than when imagining adopting a satisficing goal (M = 5.26, SD = 3.07), t(102) = 8.58, p < .001, d = 0.66, 95% CI [0.48, 0.85]. | Participants reported wanting more options to choose from when imagining adopting a maximizing goal (M = 6.51, SD = 3.20) than when imagining adopting a satisficing goal (M = 3.97, SD = 2.15), t(191) = 16.47, p < .001, d = 1.19, 95% CI [1.00, 1.37]. | Successful replication of the original effect. |
Cheek, Schwartz, & Shafir (2023) Study 2 | |||
Manipulation Check | Participants in the large choice set condition felt they had more choice (M = 6.53, SD = 1.93) than participants in the small choice set condition (M = 3.11, SD = 1.97), t(343) = 16.27, p < .001, d = 1.75, 95% CI [1.50, 2.00]. | Participants in the large choice set condition felt they had more choice (M = 6.92, SD = 1.66) than participants in the small choice set condition (M = 3.14, SD = 1.67), t(579) = 27.33, p < .001, d = 2.27, 95% CI [2.06, 2.48]. | Successful replication of the original effect. |
Perceived Preference Matching | Participants in the large choice set condition thought that they had better matched their preferences (M = 7.31, SD = 1.55) than those in the small choice set condition (M = 5.94, SD = 1.84), t(325.32) = 7.47, p < .001, d = 0.81, 95% CI [0.59, 1.03]. | Participants in the large choice set condition thought that they had better matched their preferences (M = 7.46, SD = 1.43) than those in the small choice set condition (M = 6.06, SD = 1.35), t(579) = 12.13, p < .001, d = 1.01, 95% CI [0.83, 1.18]. | Successful replication of the original effect. |
Perceived Self-Expression | Participants in the large choice set condition thought that their choices were more self-expressive (M = 5.02, SD = 2.20) than those of participants in the small choice set condition (M = 3.71, SD = 2.21), t(343) = 5.49, p < .001, d = 0.59, 95% CI [0.38, 0.81]. | Participants in the large choice set condition thought that their choices were more self-expressive (M = 4.96, SD = 1.98) than those of participants in the small choice set condition (M = 3.52, SD = 1.94), t(579) = 8.90, p < .001, d = 0.74, 95% CI [0.57, 0.91]. | Successful replication of the original effect. |
Mediation Analysis | Perceived preference matching significantly mediated the effect of choice set size on perceived self-expression, indirect effect = 0.69, 95% CI [0.46, 0.98]. | Perceived preference matching significantly mediated the effect of choice set size on perceived self-expression, indirect effect = 0.67, 95% CI [0.50, 0.87]. | Successful replication of the original effect. |
Cheek & Murray (2023) Supplemental Study 5 | Participants who more strongly endorsed psychophysical adaptation intuitions also more strongly endorsed hardship adaptation intuitions, r = .28, 95% CI [.17, .38], p < .001. | Participants who more strongly endorsed psychophysical adaptation intuitions also more strongly endorsed hardship adaptation intuitions, r = .30, 95% CI [.22, .37], p < .001. | Successful replication of the original effect. |
Cheek (2023) Study 4 | |||
Target SES x Article Interaction | There was a significant interaction between target SES and article condition, F(1, 299) = 6.47, p = .011, ηp2 = 0.02, 90% CI [0.00, 0.06]. | There was a significant interaction between target SES and article condition, F(1, 647) = 15.09, p < .001, ηp2 = 0.02, 90% CI [0.01, 0.05]. | Successful replication of the original effect. |
Target SES effect broken down by article condition | Participants thought that the low-SES target would be less affected by the negative events than the higher SES target in both the control article condition (Mlow-SES = 6.16, SDlow-SES = 1.63 vs. Mhigh-SES = 7.74, SDhigh-SES = 1.32), t(148) = 10.58, p < .001, d = 0.87, 95% CI [0.63, 1.04], and the debiasing article condition (Mlow-SES = 6.41, SDlow-SES = 1.78 vs. Mhigh-SES = 7.36, SDhigh-SES = 1.62), t(151) = 4.79, p < .001, d = 0.39, 95% CI [0.16, 0.62]. However, the effect was smaller in the debiasing condition by about half a standard deviation. | Participants thought that the low-SES target would be less affected by the negative events than the higher SES target in both the control article condition (Mlow-SES = 6.32, SDlow-SES = 1.72 vs. Mhigh-SES = 7.87, SDhigh-SES = 1.31), t(319) = 13.04, p < .001, d = 0.73, 95% CI [0.61, 0.85], and the debiasing article condition (Mlow-SES = 6.78, SDlow-SES = 1.67 vs. Mhigh-SES = 7.65, SDhigh-SES = 1.45), t(328) = 6.78, p < .001, d = 0.37, 95% CI [0.26, 0.49]. However, the effect was smaller in the debiasing condition by about half a standard deviation. | Successful replication of the original effect. |
Article effect broken down by target SES condition | Participants thought that the high-SES target would be less harmed by negative events after reading the debiasing article, t(299) = −2.24, p = .026, d = −0.26, 95% CI [−0.49, −0.03], whereas the perceived impact of the negative events for the low-SES target did not vary significantly between article condition, t(299) = 1.27, p = .205, d = 0.15, 95% CI [−0.08, 0.37]. | Participants thought that the high-SES target would be less harmed by negative events after reading the debiasing article, t(647) = -2.07, p = .039, d = -0.16, 95% CI [-0.32, -0.01], whereas they thought the low-SES target would be more harmed after reading the debiasing article, t(647) = 3.43, p < .001, d = 0.27, 95% CI [0.12, 0.42]. | Partially consistent replication of the original result. The replication found a stronger effect of article condition compared to the original study. In the replication, the debiasing article affected ratings of both the low-SES and the high-SES target, whereas in the original study the debiasing article only affected ratings of the high-SES target. |
Study . | Original Result . | Replication Result . | Interpretation . |
---|---|---|---|
Cheek & Ward (2019) Study 3 | Participants reported wanting more options to choose from when imagining adopting a maximizing goal (M = 7.69, SD = 3.63) than when imagining adopting a satisficing goal (M = 5.26, SD = 3.07), t(102) = 8.58, p < .001, d = 0.66, 95% CI [0.48, 0.85]. | Participants reported wanting more options to choose from when imagining adopting a maximizing goal (M = 6.51, SD = 3.20) than when imagining adopting a satisficing goal (M = 3.97, SD = 2.15), t(191) = 16.47, p < .001, d = 1.19, 95% CI [1.00, 1.37]. | Successful replication of the original effect. |
Cheek, Schwartz, & Shafir (2023) Study 2 | |||
Manipulation Check | Participants in the large choice set condition felt they had more choice (M = 6.53, SD = 1.93) than participants in the small choice set condition (M = 3.11, SD = 1.97), t(343) = 16.27, p < .001, d = 1.75, 95% CI [1.50, 2.00]. | Participants in the large choice set condition felt they had more choice (M = 6.92, SD = 1.66) than participants in the small choice set condition (M = 3.14, SD = 1.67), t(579) = 27.33, p < .001, d = 2.27, 95% CI [2.06, 2.48]. | Successful replication of the original effect. |
Perceived Preference Matching | Participants in the large choice set condition thought that they had better matched their preferences (M = 7.31, SD = 1.55) than those in the small choice set condition (M = 5.94, SD = 1.84), t(325.32) = 7.47, p < .001, d = 0.81, 95% CI [0.59, 1.03]. | Participants in the large choice set condition thought that they had better matched their preferences (M = 7.46, SD = 1.43) than those in the small choice set condition (M = 6.06, SD = 1.35), t(579) = 12.13, p < .001, d = 1.01, 95% CI [0.83, 1.18]. | Successful replication of the original effect. |
Perceived Self-Expression | Participants in the large choice set condition thought that their choices were more self-expressive (M = 5.02, SD = 2.20) than those of participants in the small choice set condition (M = 3.71, SD = 2.21), t(343) = 5.49, p < .001, d = 0.59, 95% CI [0.38, 0.81]. | Participants in the large choice set condition thought that their choices were more self-expressive (M = 4.96, SD = 1.98) than those of participants in the small choice set condition (M = 3.52, SD = 1.94), t(579) = 8.90, p < .001, d = 0.74, 95% CI [0.57, 0.91]. | Successful replication of the original effect. |
Mediation Analysis | Perceived preference matching significantly mediated the effect of choice set size on perceived self-expression, indirect effect = 0.69, 95% CI [0.46, 0.98]. | Perceived preference matching significantly mediated the effect of choice set size on perceived self-expression, indirect effect = 0.67, 95% CI [0.50, 0.87]. | Successful replication of the original effect. |
Cheek & Murray (2023) Supplemental Study 5 | Participants who more strongly endorsed psychophysical adaptation intuitions also more strongly endorsed hardship adaptation intuitions, r = .28, 95% CI [.17, .38], p < .001. | Participants who more strongly endorsed psychophysical adaptation intuitions also more strongly endorsed hardship adaptation intuitions, r = .30, 95% CI [.22, .37], p < .001. | Successful replication of the original effect. |
Cheek (2023) Study 4 | |||
Target SES x Article Interaction | There was a significant interaction between target SES and article condition, F(1, 299) = 6.47, p = .011, ηp2 = 0.02, 90% CI [0.00, 0.06]. | There was a significant interaction between target SES and article condition, F(1, 647) = 15.09, p < .001, ηp2 = 0.02, 90% CI [0.01, 0.05]. | Successful replication of the original effect. |
Target SES effect broken down by article condition | Participants thought that the low-SES target would be less affected by the negative events than the higher SES target in both the control article condition (Mlow-SES = 6.16, SDlow-SES = 1.63 vs. Mhigh-SES = 7.74, SDhigh-SES = 1.32), t(148) = 10.58, p < .001, d = 0.87, 95% CI [0.63, 1.04], and the debiasing article condition (Mlow-SES = 6.41, SDlow-SES = 1.78 vs. Mhigh-SES = 7.36, SDhigh-SES = 1.62), t(151) = 4.79, p < .001, d = 0.39, 95% CI [0.16, 0.62]. However, the effect was smaller in the debiasing condition by about half a standard deviation. | Participants thought that the low-SES target would be less affected by the negative events than the higher SES target in both the control article condition (Mlow-SES = 6.32, SDlow-SES = 1.72 vs. Mhigh-SES = 7.87, SDhigh-SES = 1.31), t(319) = 13.04, p < .001, d = 0.73, 95% CI [0.61, 0.85], and the debiasing article condition (Mlow-SES = 6.78, SDlow-SES = 1.67 vs. Mhigh-SES = 7.65, SDhigh-SES = 1.45), t(328) = 6.78, p < .001, d = 0.37, 95% CI [0.26, 0.49]. However, the effect was smaller in the debiasing condition by about half a standard deviation. | Successful replication of the original effect. |
Article effect broken down by target SES condition | Participants thought that the high-SES target would be less harmed by negative events after reading the debiasing article, t(299) = −2.24, p = .026, d = −0.26, 95% CI [−0.49, −0.03], whereas the perceived impact of the negative events for the low-SES target did not vary significantly between article condition, t(299) = 1.27, p = .205, d = 0.15, 95% CI [−0.08, 0.37]. | Participants thought that the high-SES target would be less harmed by negative events after reading the debiasing article, t(647) = -2.07, p = .039, d = -0.16, 95% CI [-0.32, -0.01], whereas they thought the low-SES target would be more harmed after reading the debiasing article, t(647) = 3.43, p < .001, d = 0.27, 95% CI [0.12, 0.42]. | Partially consistent replication of the original result. The replication found a stronger effect of article condition compared to the original study. In the replication, the debiasing article affected ratings of both the low-SES and the high-SES target, whereas in the original study the debiasing article only affected ratings of the high-SES target. |
Note. Full details of each replication study are reported in the main text. Portions of the original results are directly transposed from the original articles to facilitate easy comparison of results (see “When It Is and Isn’t OK to Recycle Text in Scientific Papers,” 2024). In the original results section of Cheek (2023) Study 4, 95% confidence intervals were given for the mean of harm ratings instead of standard deviations. To keep the results of the replication of Cheek (2023) Study 4 consistent with the results of the other studies, however, we report standard deviations. For the replication of Cheek (2023) Study 4, we preregistered the ANOVA but not the pairwise comparisons, though the latter analyses exactly mirrored those conducted in the original study.
In our replication of Cheek and Ward (2019) Study 3, we found the same pattern of results but an effect size almost twice as large as the original, and in our replication of Cheek (2023) Study 4, we found that the debiasing article affected not only perceptions of how harmed the high-SES target would be (as in the original) but also how harmed the low-SES target would be. That is, the debiasing article both decreased perceptions of the high-SES target’s vulnerability to harm and, unlike the original study, increased perceptions of the low-SES target’s vulnerability to harm, thereby influencing judgments more than the original (though with a similar overall impact on the size of thick skin bias as in the original study).
We did not predict these stronger effects, and we do not have a clear answer for why they emerged. One possibility is that participants on Prolific in 2024 paid more attention to our experimental stimuli, resulting in reduced noise and error (for evidence of the high attentiveness of Prolific participants, see, e.g., Douglas et al., 2023; Krefeld-Schwalb et al., 2024). Although the original Cheek (2023) Study 4 was run on Prolific, it was run during the height of the global coronavirus pandemic, which could have resulted in participants who were less attentive (e.g., due to stress), who were less experienced (e.g., joined amid the economic difficulties in the pandemic), or who otherwise differed from the original participants. Just as researchers are often cautioned to avoid excessive speculation about why replication effects may be smaller than original effects, we are hesitant to speculate too much about why our replications sometimes produced larger effects than the original studies. While future research could continue to explore these findings (e.g., to determine more precise effect size estimates and to explore causes of effect size variation), our view is that the findings from the well-powered studies in the present research are significant and have meaningfully-sized effects in the same direction as the original findings and thus broadly replicate those original results, albeit with variation in the direction of even stronger effects. From a practical perspective, one possibility is that the best approach to estimating an effect size for the present findings (e.g., when needing an effect size estimate for a future power analysis) may be to synthesize the original effects and new replication effects via meta-analysis. The discrepancies in effect size also potentially contribute to ongoing conversations about effect size variation across crowdsourcing platforms and across time (e.g., Krefeld-Schwalb et al., 2024; Schooler, 2011) and highlight how even successful (self-)replications can raise new questions about the size of effects.
Limitations and Constraints on Generalizability
Although the present research provides insight into the replicability of previous research in our lab, replicability is only one of many important features of reliable and credible research (e.g., Vazire et al., 2022). The present work did not investigate measurement validity, for example, nor did the present work attempt to generalize previous findings to new contexts or populations. Thus, there are many directions for future research that can address other aspects–and indeed other limitations–of the studies considered here.
For instance, in our replication of Cheek (2023) Study 4, we followed the original study by only including White targets. This design allowed us to closely follow the original materials and to avoid inferences based on SES that might emerge if we did not specify any race/ethnicity information (e.g., Lei & Bodenhausen, 2017), but it also contributes to a research literature that often accepts Whiteness as a “neutral” identity (S. O. Roberts & Mortenson, 2022) and does not fully embrace intersectional perspectives that might, for instance, explore both SES and race (e.g., Cole, 2009; Crenshaw, 1989; Hudson et al., 2024). In fact, in another of Cheek’s (2023) studies including both White and Black targets, there was some evidence that high-SES White targets were perceived as especially vulnerable to some everyday effects of the coronavirus pandemic, speaking to the importance of looking at the intersections among SES and other identities in future work. Hence, the present research may increase confidence in the replicability of previous findings, but it does not address the limitations and the constraints on the generalizability of those findings.
Conclusion
Across four preregistered studies, we successfully replicated several previous findings from our research team. This project increases confidence in the reliability of the original findings and highlights the value of self-replications as a tool labs can adopt to assess the robustness of their research contributions. Indeed, reflecting on this project, we found it valuable to take a step back and evaluate the robustness of past findings from our lab. We also found this project to be a valuable educational opportunity for the second author, who gained a variety of hands-on experiences in the psychology research process from study planning and data collection to data analysis and publication. Based on our experiences, we join previous authors (e.g., Cesario, 2014; B. W. Roberts, 2015) in highlighting self-replications as a useful part of a larger set of open and transparent research practices that contribute to building a robust and credible psychological literature.
Data Accessibility Statement
All the materials, participant data, and analysis scripts can be found on this paper’s project page on the Open Science Framework: https://osf.io/s7p6f/
Contributions
Contributed to conception and design: NC, ZY
Contributed to acquisition of data: NC
Contributed to analysis and interpretation of data: NC, ZY
Drafted and/or revised the article: NC, ZY
Approved the submitted version for publication: NC, ZY
Funding Information
This research was supported by start-up funds provided to the first author from Purdue University.
Competing Interests
The authors have no competing interests to declare.
Footnotes
Though there were additional papers from our lab we could also have replicated, we limited our project to four papers based on the availability of research funds.
The original effect sizes were: d = 0.66 for Cheek & Ward (2019) Study 3; d’s = 1.75 (manipulation check), 0.81 (preference matching), and 0.59 (self-expression) for Cheek et al. (2023) Study 2; r = .28 for Cheek & Murray (2023) Supplemental Study 5; and f = 0.15 for Cheek (2023) Study 4. (We calculated f = 0.15 based on the effect size from Cheek (2023) Study 4 rounded to three digits [ηp2 = 0.021]. If the effect size is rounded to two digits [ηp2 = 0.02] as in the text of Cheek (2023) Study 4, the effect size is slightly smaller, f = 0.143 but still very close to the effect size for which we powered our target sample size.)
We switched from Mechanical Turk to Prolific in the present research because recent changes to Amazon’s billing procedures created logistical difficulties in transferring funds into Mechanical Turk at our university. Our judgment was that this switch would not undermine the diagnosticity of our replications.
Because we report direct replications, some of the methods descriptions may be identically transposed from the original studies to ensure the descriptions remain clear and consistent with the original study (see “When It Is and Isn’t OK to Recycle Text in Scientific Papers,” 2024).
See supplemental material for exploratory analyses of order effects for this and all subsequent studies. For all studies, order effects did not drive results and the original effects emerged in all stimuli presentation orders.
We report reliability coefficients calculated after having made the exclusions described in the results section (n = 581).
In Study 1 of Cheek and Murray (2023), rather than rating agreement with a series of statements on a Likert-type scale, participants rated which of two individuals would experience a physical stimulus more strongly, someone who had just experienced a high level of that stimulus or someone who had just experienced a low level of that stimulus. For example, one item read, “To whom would a lit candle seem brighter?” and the response scale ranged from Definitely someone who just came inside on a very sunny day (coded as 1) to Definitely someone who just left a very dark room (coded as 6). In the review process for that paper, questions were raised about this measure, and the supplemental studies were conducted to address these questions (e.g., about question wording and about the lack of a neutral midpoint on the scale). We do not have any evidence to suggest that this first scale is invalid, and indeed the fact that all six studies (Study 1 and Supplemental Studies 1-5) found the same pattern of results suggests the different versions of the measures similarly tapped the constructs of interest. Nonetheless, our prediction was that the measures used in Supplemental Study 5, which involved the common format of rating agreement from 1 (strongly disagree) to 7 (strongly agree), would seem most familiar to a social psychology audience. We would, however, consider replications of any of the six correlational studies to be diagnostic tests of the original theorizing.
We preregistered that we would test the effect of target gender in a supplemental analysis, though target gender was not a focal variable of interest and we did not predict an effect of target gender. We report the results of a 2 (target SES) × 2 (article condition) × 2 (target gender) mixed ANOVA, which yielded no significant effects of target gender, in supplemental material.
We report 90% CIs for effect sizes of F-tests because such tests are one-sided (see Lakens, 2013).
It is worth noting that this study does not provide direct evidence about the cognitive mechanisms driving participants’ judgments. The debiasing article reduced displays of the thick skin bias, but future research is needed to determine the mechanisms driving this change. For example, the article may have directly reduced the bias by changing participants’ beliefs with new information, or the article may have changed how participants were thinking about the judgments they made or the people they judged.