The theory of depressive realism holds that depressed individuals are less prone to optimistic bias, and are thus more realistic, in assessing their control or performance. Since the theory was proposed 40 years ago, many innovations have been validated for testing cognitive accuracy, including improved measures of bias in perceived control and performance. We incorporate several of those innovations in a well-powered, pre-registered study designed to identify depressive realism. Amazon MTurk workers (N = 246) and undergraduate students (N = 134) completed a classic contingency task, an overconfidence task, and measures of mental health constructs, including depression and anxiety. We measured perceived control throughout the contingency task, allowing us to compare control estimates at the trial-level to estimates assessed at task conclusion. We found no evidence that depressive symptoms relate to illusory control or to overconfidence. Our results suggest that despite its popular acceptance, depressive realism is not replicable.
Introduction
Depressive realism, the idea that depression is associated with more accurate perceptions of personal control, is widely accepted. Alloy and Abramson’s (1979) original paper, “Judgment of contingency in depressed and nondepressed students: Sadder but wiser?” has been cited over two thousand times. Depressive realism has been a focus of more than 75 empirical studies (M. T. Moore & Fresco, 2012), five books (Alloy et al., 1990; Alloy & Abramson, 1988; Baker et al., 2012; Feltham, 2017; Murphy et al., 2005), and articles published in The Guardian (Burkeman, 2006), Psychology Today (Burton, 2012), The New Yorker (Konnikova, 2014), and Vice (Bucklin, 2017). Taylor and Brown (1988) built their influential theory of “positive illusions” on these claims; their 1988 article has been cited over ten thousand times. Wong et al. summarize the scholarly consensus: “The depressed exhibit a remarkable degree of realism in judgments about their personal and social worlds, whereas the less depressed tend to exhibit unrealistic optimism” (2006, p. 284).
Despite its pervasive influence, there is mixed empirical support for depressive realism. Some have failed to replicate the effect (for reviews see Ackermann & DeRubeis, 1991; Allan et al., 2007; and M. T. Moore & Fresco, 2012), even in clinical samples (Shruti Venkatesh et al., 2018). A meta-analysis of 75 studies indicated substantial variability in effect sizes (d = -.09 to .14) (M. T. Moore & Fresco, 2012), indicating the need to refine our understanding of this effect.
In contrast to depressive realism, other research suggests that individuals with depression are prone to cognitive bias. Studies have found that depressed individuals underestimate their true capabilities (Fu et al., 2005, 2012; Stone et al., 2001), experience overly negative self-focused thoughts (Gotlib & Joormann, 2010), forecast excessively negative moods (Zetsche et al., 2019), and generally show biases as pronounced as their nondepressed counterparts (Dobson & Pusch, 1995), if not more so (Dunning & Story, 1991). These negative biases have been found to precede the onset of depression (Everaert et al., 2012; Lewinsohn et al., 2001; Mathews & MacLeod, 2005; Rude et al., 2003; Segal et al., 2006).
Despite abundant evidence establishing the presence of biases in depression and numerous failures to replicate the depressive realism effect, research on depressive realism continues to be published and cited, indicating that psychological scientists and laypeople retain some belief in its plausibility. In this paper, we seek to establish if the sadder are indeed wiser. We set out to test the depressive realism effect; to do so, we conducted a direct replication test of Alloy and Abramson (1979) but also implemented several design improvements, incorporating additional, validated measures of bias and considering several clinical features that could inform findings. Our study design benefits from methodological innovations since the original studies, including design elements and boundary conditions to help identify the depressive realism effect.
In Alloy and Abramson’s original depressive realism study (1979), participants estimated the control their responses (button pushes) gave them over an outcome (a lightbulb turning on). The outcome measure used in this original study, though, did not compare subjective control estimates to actual control, thereby making it impossible to measure bias. In one meta-analysis, only 36 of 75 depressive realism studies employed measures of bias. These 36 studies found smaller effects (weighted mean d = -.03) than did studies without an objective standard (weighted mean d = -.15; M. T. Moore & Fresco, 2012). Furthermore, only 15 of these 36 studies evaluated depressive realism via a contingency task. We employed a contingency task that allowed for objective assessment of the degree to which participants over- or underestimate their control. We also aimed to test alternate hypotheses of the depressive realism effect. First, we test the response criterion hypothesis, which posits that depressive ideation will influence post-task assessments of overall control more strongly than trial-level estimates (Allan et al., 2007). The logic behind this hypothesis grows from depression’s memorial and attentional biases, which operate more strongly on recollections and high-level construals than on immediate and direct assessments (Marchetti et al., 2018). To test this hypothesis, we administer measures of perceived control throughout the task and also at the end. Second, given that motor response rates profoundly influence beliefs about control and are often diminished among those with depression (Blanco et al., 2012), we measure whether the frequency of responding helps explain how depression relates to beliefs about control (Blanco et al., 2009). Finally, we ask participants to estimate outcome probabilities based on both possible responses (pressing and not pressing the button), since true contingencies exist for both.
Moore and Fresco (2012) conclude that future research should seek to clarify if there are any situations which do yield a robust depressive realism effect, and their suggestions guided some of our additional design decisions. Specifically, we include a high-control condition that allows us to test for accuracy when accuracy demands high estimates of control. In addition, we consider the generalizability of depressive realism to an additional overconfidence task. The latter is a validated laboratory task that assesses the accuracy of self-assessment. Just as we measure illusory control in the contingency task against objective reality, we compare self-assessments with the truth in the overconfidence task. Finding increased accuracy among individuals with higher depressive symptoms in the overconfidence task would support the theory of depressive realism.
We also consider key issues in symptom assessment. Because taxometric analyses suggest that depression is best conceptualized as a dimensional rather than categorical construct (Prisciandaro & Roberts, 2005), and dichotomizing participants as ‘depressed’ or ‘not depressed’ prevents analyzing symptoms continuously (Soderstrom et al., 2011), we examine depressive symptoms over a range of severity. Moore and Fresco’s (2012) meta-analysis suggests that on average, both clinically depressed and subsyndromal individuals show overestimation of control, but that subsyndromal participants demonstrate the strongest depressive realism effect. Thus, we focus on subsyndromal participants, who may be particularly likely to show the effect. Previous studies have also found more robust effects for self-ratings interview-based ratings of symptoms; accordingly, we focus on self-ratings here (M. T. Moore & Fresco, 2012). The original depressive realism study (Alloy & Abramson, 1979) examined only current depression; to replicate findings that biases improve after recovery (S. Venkatesh, 2013), we examine lifetime as well as current depression symptom severity. We also measure clinical characteristics that may moderate the depressive realism effect, such as negative affective state, hypomanic tendencies, trauma history, and anxiety (Ackermann & DeRubeis, 1991; Stern & Berrenberg, 1979; Wong et al., 2006).
Method
Transparency and Openness
We report how we determined our sample size, all measures, experimental conditions, and data exclusions. Data were analyzed using R. We preregistered the study design and primary analyses. Our materials, including survey instruments, data, analysis code, and copies of the preregistration documents, are posted on the Open Science Framework here.
Participants
Our study includes two samples. Our first sample, collected in Fall 2020, came from Amazon Mechanical Turk. We sought to generalize and replicate our results with a second sample of undergraduates at a large, public U.S. university in Spring 2021.
Sample One
We based our sample size on a priori power analyses of three effects reported in experiments 2-4 of the original Alloy and Abramson (1979) paper. Averaging across these three effect sizes, an a priori power analysis (conducted in G*Power) suggested an average sample size of 40 to obtain 90% power for detecting the effect. To maximize our chances of detecting a true effect, we followed suggestions to conduct conservative power analyses based on the lower bound of the 90% confidence interval around each effect (Anderson & Maxwell, 2017). Results of these more conservative power analyses suggested N = 252 for 90% power to detect a depressive realism effect (see Supplement for more information).
Sample One (S1) included participants age 18 and older, located in the United States, recruited through Amazon Mechanical Turk (MTurk). To obtain at least 25 participants each with minimal, mild, moderate or severe depressive symptoms, we pre-screened 800 participants with the Inventory to Diagnose Depression - Current (IDD-C; cut-offs based on Rogers et al. (2005) and invited 339 to take the full survey (180 accepted the survey). Another 153 participants completed the survey without prescreening. After 57 exclusions for missing data, 29 exclusions for failing a comprehension check, and 1 exclusion of a duplicated response, Sample One included 246 participants (53% male, 1% gender nonbinary) with complete data. Of these, 128 had minimal depressive symptoms (IDD-C ≤ 15), 38 had mild depressive symptoms (IDD-C 16-24), 45 had moderate depressive symptoms (IDD-C 25-41), and 35 had severe depressive symptoms (IDD-C > 42).
Sample Two
Sample Two consisted of undergraduates from a large, public U.S. university, who participated in the study in return for course credit. While we sought to prescreen participants to oversample with higher depression symptom severity, we were constrained by the sampling pool. After 16 exclusions for missing data, 24 exclusions for failing a comprehension check, and 9 exclusions of duplicate responses, Sample Two included 136 participants (55.9% male, 1.5% gender nonbinary) with complete data. Of these, 14 had minimal depressive symptoms (IDD-C < 15), 104 had mild depressive symptoms (IDD-C 16-24), 14 had moderate depressive symptoms (IDD-C 25-41), and 2 had severe depressive symptoms (IDD-C > 42).
Procedure
Participants completed informed consent, the contingency task, the overconfidence task, and then questionnaires online using the Qualtrics survey platform.
Contingency Task
The contingency task was a revised version of the Alloy and Abramson (1979) task. In each of 40 trials, participants chose whether to press a button, after which a lightbulb or a black screen appeared. The survey randomly assigned participants to one of three experimental conditions. In two “zero contingency” conditions, there was no relationship between their action (the button) and the outcome (the lightbulb). Regardless of whether they pressed the button, the lightbulb appeared with either 25% or 75% chance (in the first and second zero contingency conditions, respectively). In the third “positive contingency” condition, the lightbulb appeared with a 75% chance following button press and never otherwise.
Participants read that “On every round of the lightbulb task, you will have the chance to press or not press a virtual button. You will then either see a yellow lightbulb icon or a black box. Your job is to figure out how your button responses (pressing or not pressing) affect the probability of the yellow light icon appearing.” To replicate previous work, after the final trial, the survey asked “How much control did you have over the lightbulb?” on a 0 (none) to 100 (complete control) scale. To improve upon this single, point estimate, we also used a more precise elicitation of subjective control. Every ten trials, participants estimated the likelihood (on a 0 to 100% scale) of the bulb lighting if they did press the button on the next trial, and if they did not. The survey randomly asked half of participants about the outcome of the lightbulb appearing and half about the outcome of the lightbulb not appearing. Participants read that more accurate predictions increased their chances of winning a drawing for a $50 prize.
We calculated three key measures. First, frequency of responding is the percent of trials on which participants pressed the button. The second and third measures were control bias and outcome prediction bias: both measured absolute values of subjective control minus actual control. Control bias, measured at the end of the task, was the difference between final ratings of control (“How much control did you have over the lightbulb?”) minus actual control (zero in the zero contingency conditions, 75 in the positive contingency condition). Outcome prediction bias, measured during the task, was the absolute difference between actual and estimated control. Estimated control was the reported probability that the lightbulb would light after (1) pressing and (2) not pressing the button, averaged across the estimates made every ten trials.
Overconfidence Task
Participants completed ten items from the Raven’s Progressive Matrices (RPM; Raven, 1963). They then provided two subjective probability distributions for performance on the Raven’s task: they estimated probabilities that they obtained each of the eleven possible scores (from zero correct to ten correct), for the self and for a random other (participants read: “We randomly selected one other participant out of the large number of people who have also completed this survey.”). As is standard for overconfidence tasks (D. A. Moore & Healy, 2008), we calculated three measures (see Supplement for detail). First, overestimation compares participants’ estimated scores for themselves to their actual scores. Second, overplacement compares participants’ predictions for how much better they score than others, and adjusts for how much better they actually do than others. Third, overprecision captures excessive certainty in participants’ estimates of others’ scores, assessed via the Gini index (a measure of concentration in the distribution of score estimates) and item confidence (maximum confidence attached to any single score when estimating the probability of different scores for a random other).
Questionnaires
The final section of the survey assessed the valence of participants’ current mood, from 0 (not negative at all) to 100 (very negative), and included several validated, psychometrically sound self-report questionnaires in randomized order: the Negative generalization subscale from the Attitudes Toward Self-Revised (ATS-R; Carver et al., 1988), the Hypomanic Personality Scale (HPS; Eckblad & Chapman, 1986), the Inventory to Diagnose Depression-Lifetime (IDD-L; Zimmerman & Coryell, 1987) the Mood and Anxiety Symptom Questionnaire-D30 (MASQ-D30; Wardenaar et al., 2010), and the Risky Families Questionnaire (RF; Taylor et al., 2004). Participants also completed the IDD-C, either in this final section of the survey or as part of pre-screening. Sample Two included additional measures: the Sheehan Disability Scale (SDS; Sheehan et al., 1996), current and lifetime anxiety diagnosis and treatment, as well as current and lifetime depression diagnosis and treatment.
Analysis Plan
Our preregistered analysis plan used an ANOVA to test the results of our direct replication, and parallel hierarchical linear models to examine the effect of continuous IDD-C scores on our overconfidence and contingency task measures.
Results
All analyses were conducted using R statistical software. Skew and kurtosis (≤|1.47), and Cronbach’s alpha values (≥ .79) were within acceptable ranges for all self-report scales (see Table S1 in the Supplement).
Contingency Task Results
Table 1 shows descriptive statistics for the end-of-task subjective control measure (i.e., participants’ answer when asked “How much control did you have over the lightbulb?”). It also presents one-sample t-tests comparing the end-of-task subjective control measure to actual control.
Sample One | Sample Two | |||||||
Condition | Actual control | Subjective control M (SD) | t a | df | Subjective control M (SD) | t | df | |
Zero contingency, 25% probability | 0 | 27.64 (29.33) | 8.59*** | 82 | 18.15 (15.21) | 7.55*** | 39 | |
Zero contingency, 75% probability | 0 | 34.23 (30.54) | 9.84*** | 76 | 36.83 (26.51) | 9.01*** | 41 | |
75% positive contingency | 75 | 61.54 (28.53) | -4.22*** | 79 | 65.02 (26.97) | -2.51* | 45 |
Sample One | Sample Two | |||||||
Condition | Actual control | Subjective control M (SD) | t a | df | Subjective control M (SD) | t | df | |
Zero contingency, 25% probability | 0 | 27.64 (29.33) | 8.59*** | 82 | 18.15 (15.21) | 7.55*** | 39 | |
Zero contingency, 75% probability | 0 | 34.23 (30.54) | 9.84*** | 76 | 36.83 (26.51) | 9.01*** | 41 | |
75% positive contingency | 75 | 61.54 (28.53) | -4.22*** | 79 | 65.02 (26.97) | -2.51* | 45 |
Note. * p < .05 ** p < .01 *** p < .001a Results from one-sample t-tests comparing the end-of-task subjective control measure to actual control.
In conditions with zero true contingency, end-of-task subjective control was significantly higher than actual control, indicating that participants overestimated control. In the positive contingency condition, end-of-task subjective control was significantly lower than actual control, indicating that participants underestimated control. Underestimation of control when control is high is a necessary consequence of regressive estimates of control given uncertainty and noise in participants’ control estimates (Gino et al., 2011).
Direct Replication
To test the depressive realism hypothesis, we conducted a 4 (IDD-C severity) x 3 (contingency condition) x 2 (gender) ANOVA estimating control bias.1 Depressive realism predicted a main effect of IDD-C.
In Sample One, the main effect of IDD-C score was indeed significant, F (3, 215) = 3.88, p = .01, ηp2 = 0.05), as was the main effect of contingency condition (F (2, 215) = 69.06, p < .001, ηp2 = 0.39). However, as Figure 1 shows, this result contradicts the sadder but wiser hypothesis, since more severe depression is associated with greater overestimates of control. We did not find any significant effect of the IDD-C in Sample Two (F (3, 110) = 0.821, p = 0.49). Our attempt to replicate the interaction Alloy and Abrahamson report between Contingency condition and IDD-C score was not successful in Sample One, (F (6, 215) = 1.09, p = .37, ηp2 = 0.03), or in Sample Two (F (4, 110) = 1.23, p = 0.30, ηp2=.04).
The main effect of contingency condition was significant in Sample One (F (2, 215) = 69.06, p < .001, ηp2 = 0.39) and in Sample Two (F (2, 110) = 41.597, p < .001, ηp2 = 0.43). This effect is a consequence of the fact that participants’ overestimates of control are limited to the zero contingency conditions in which they lacked control.
Hierarchical linear regressions. To examine IDD-C scores as continuous variables, we constructed three parallel hierarchical linear regressions estimating control bias, outcome prediction bias, and frequency of responding. Table 2 shows the preregistered models.2 Steps 1 and 2 included variables to replicate Alloy and Abramson (1979) and Steps 3-5 considered variables that may influence this effect. We only report significant variables below, but full model results appear in the supplementary material. We also conducted supplemental analyses: we tested models including polynomial terms to probe for potential curvilinear effects, and tested models including a trial-level random effect to better account for measurement error in the contingency task. Results from these supplemental analyses did not reveal quantitative or qualitative differences from our main analyses (see Supplement for full results). Note that supplemental analyses were not preregistered.
Step | Variable(s) |
Step 1 | |
Contingency type (zero contingency = 0 and positive contingency = 1) | |
Contingency outcome probability (25% or 75%) | |
IDD-C score | |
Step 2 | |
IDD-C score x contingency type | |
IDD-C score x contingency outcome probability | |
Step 3 | |
Negative affect | |
IDD-L score | |
Step 4a | |
Negative affect x IDD-L score | |
Treatment history (dummy coded: 0 (no) and 1 (yes)) | |
Hypomanic Personality Scale (HPS) score | |
Risky Families (RF) score | |
Attitudes Toward Self (ATS) - Negative Generalization score Sheehan Disability Scale (SDS) score Depression diagnosis Depression treatment Anxiety diagnosis Anxiety treatment | |
Step 5b | |
Frequency of responding |
Step | Variable(s) |
Step 1 | |
Contingency type (zero contingency = 0 and positive contingency = 1) | |
Contingency outcome probability (25% or 75%) | |
IDD-C score | |
Step 2 | |
IDD-C score x contingency type | |
IDD-C score x contingency outcome probability | |
Step 3 | |
Negative affect | |
IDD-L score | |
Step 4a | |
Negative affect x IDD-L score | |
Treatment history (dummy coded: 0 (no) and 1 (yes)) | |
Hypomanic Personality Scale (HPS) score | |
Risky Families (RF) score | |
Attitudes Toward Self (ATS) - Negative Generalization score Sheehan Disability Scale (SDS) score Depression diagnosis Depression treatment Anxiety diagnosis Anxiety treatment | |
Step 5b | |
Frequency of responding |
Note. IDD-C = Inventory to Diagnose Depression – Current. IDD-L = Inventory to Diagnose Depression – Lifetime. a Additional variables included in Sample Two were depression diagnosis, depression treatment, anxiety diagnosis, anxiety treatment, as well as the Sheehan Disability Scale. Diagnosis and treatment variables captured both current diagnosis/treatment, lifetime diagnosis/treatment, or no history of diagnosis/treatment. b Step 5 was included for the control and outcome prediction bias models only
Control bias. Step 1 accounted for significant variance in both samples (S1: F (3, 236) = 46.4, p < .001, R2 = 0.36; S2: F (3, 124) = 29.01, p < .00001, R2 = 0.40). No other model steps accounted for significantly increased variance (S1: R2change < .025, p > .06; S2: R2change < .023, p > .294). Contingency type (positive vs. 0) was associated with control bias (S1: β = -0.65, t (236) = -10.70, p < .001; S2: β = -0.74, t (124) = -9.242, p < .001). Higher IDD-C scores predicted higher control bias in Sample One (β = 0.18, t (236) = 3.38, p < .001); however, this result does not replicate in Sample Two (β = 0.05, t (124) = 0.723, p = 0.47). Figure 2 shows these results.
Given the significant effects for IDD-C in Sample One, we conducted preregistered analyses to examine depression and anxiety symptoms separately, by replacing the IDD-C with the MASQ Anhedonic Depression and Anxious Arousal subscales. When compared to Step 1, the Step 2 model predicted significantly increased variance, F (8, 232) = 23.75, p < .001, R2 = 0.43. As above, contingency type (positive vs. zero) significantly affected control bias, β = -0.63, t (231) = -3.01, p = .003. Anhedonia Depression did not significantly relate to control bias, but greater Anxious Arousal accompanied increased control bias, β = 0.26, t (231) = 3.72, p < .001. The interaction between contingency type and anxious arousal was also significant, β = -0.15, t (231) = -2.55, p = .01. No other model steps accounted for significantly increased variance, all R2change < .009, p > .46. See Figure 3. It did not make sense to conduct these post-hoc analyses in Sample Two given the null effects of IDD-C on control bias.
Outcome prediction bias. As noted above, outcome prediction bias was calculated as the absolute difference between actual control and estimated control, wherein estimated control was the reported probability that the lightbulb would light after (1) pressing and (2) not pressing the button, averaged across the estimates participants made every ten trials. Table 3 shows descriptive statistics for estimated control and results of one-sample t-tests comparing estimated control to actual control.
Sample One | Sample Two | |||||||
Condition | Actual control | Estimated control M (SD) | t | df | Subjective control M (SD) | t | df | |
Zero contingency, 25% probability | 0 | 11.33 (11.16) | 9.25*** | 82 | 10.83 (11.7) | 5.85*** | 39 | |
Zero contingency, 75% probability | 0 | 17.51 (16.35) | 9.40*** | 76 | 23.71 (20.08) | 7.65*** | 41 | |
75% positive contingency | 75 | 38.2 (24.6) | -13.4*** | 79 | 49.18 (24.15) | -7.25*** | 45 |
Sample One | Sample Two | |||||||
Condition | Actual control | Estimated control M (SD) | t | df | Subjective control M (SD) | t | df | |
Zero contingency, 25% probability | 0 | 11.33 (11.16) | 9.25*** | 82 | 10.83 (11.7) | 5.85*** | 39 | |
Zero contingency, 75% probability | 0 | 17.51 (16.35) | 9.40*** | 76 | 23.71 (20.08) | 7.65*** | 41 | |
75% positive contingency | 75 | 38.2 (24.6) | -13.4*** | 79 | 49.18 (24.15) | -7.25*** | 45 |
Note. * p < .05 ** p < .01 *** p < .001a Results from one-sample t-tests comparing the end-of-task subjective control measure to actual control.
Mirroring results for the end-of-task subjective control measure, participants’ estimated control was significantly higher than actual control in the zero contingency conditions and significantly lower than actual control in the positive contingency condition.
Overall, results for outcome prediction bias were consistent across samples. The Step 1 model accounted for significant variance in both samples (S1: F (3, 236) = 142.7, p < .001, R2 = 0.64; S2: F (3, 124) = 50.3, p < .001, R2 = 0.54). In this model, contingency type (positive vs. zero contingency) was significant (S1: β = -0.84, t(236) = -18.45, p < .001; S2: β = -0.83, t (124) = -11.767, p < .001), as was contingency probability (75% vs. 25%) (S1: β = 0.09, t (236) = 2.06, p = .04; S2: β = 0.21, t (124) = 2.97, p < .001). No other model steps accounted for significantly increased variance, (S1: R2change < .004, p > .36; S2: R2change < .03, p > .403). As with control bias, Figure 4 illustrates that participants overestimate control in the zero contingency conditions and underestimate it in the positive contingency condition. IDD-C score was not significantly associated with outcome prediction bias in either sample. Moreover, our results allow us to track the development of beliefs over time. As detailed in the Supplement, those results are consistent with those reported above: participants update their beliefs in the direction of the control they experience, and IDD-C scores do not affect either learning or reported control.
Frequency of responding. Participants pressed the button on roughly half of all trials (S1: M = 57%, SD = 19%, Range = 3% – 100%; S2: M = 58%, SD = 17%, Range = 28% – 100%).
In Sample One and Sample Two, the Step 1 model accounted for significant variance (S1: F (3, 236) = 4.76, p = .003, R2 = 0.04; S2: F (3, 124) = 7.88, p < .001, R2 = 0.14). Contingency type was significantly associated with frequency of responding in both samples (S1: β = 0.17, t (236) = 2.29, p = 0.02; S2: β = 0.26, t (124) = 2.68, p = 0.008). The IDD-C score was only marginally significant in Sample Two, with higher depression symptom severity corresponding to slightly higher response rates (β = 0.17, t (124) = 2.02, p = 0.046). No other model steps accounted for significantly increased variance in either sample (S1: R2change < .026, p > .26; S2: R2 change < .078, p > .164).
Overconfidence Results
In parallel with the contingency task, we conducted four hierarchical linear regressions predicting overestimation, overplacement, and both forms of overprecision. With the exception of one significant negative effect of the IDD-L on overplacement, current and lifetime depressive symptoms were not significantly associated with any overconfidence measure in Sample One nor in Sample Two (see Tables S8-S11). Across the overconfidence models, there was no consistent evidence of curvilinear effects, and inclusion of polynomial terms did not substantively change the nature of any overconfidence findings (see the Supplement for full results).
Discussion
In a well-powered, pre-registered study, we aimed to test the depressive realism hypothesis using the latest methodologies. We refined the contingency task to provide objective indices of control and to provide more measures of subjective control. Our study supplemented the original task with a well-validated measure of overconfidence, and included continuous indices of depressive severity, lifetime depression, hypomanic tendencies, anxiety, and trauma history.
Across two samples, we find no support for the depressive realism hypothesis. We first analyzed depressive realism using a standard, end-of-task measure eliciting control estimates. In our first sample, results contradicted depressive realism: participants with higher depressive symptoms displayed the greatest illusory control. This result appears to be driven by anxiety symptoms. In our second sample, depression symptoms did not significantly predict illusory control. In conditions with no contingency between response and outcome, participants tended to overestimate control in post-task ratings; higher anhedonic depression did not influence this illusory control. As expected, in circumstances with high actual control, they underestimated control (Gino et al., 2011); those with higher anhedonic depression displayed the same underestimation as did others. That is, estimates of control were noisy, regardless of current or lifetime depression. When we considered more frequent ratings of subjective control, we still found no association between depressive symptoms with outcome bias. Past work has suggested that psychomotor retardation in depression could be one mechanism through which depressive symptoms contribute to lower illusory control (Blanco et al., 2012), but we found no support for this hypothesis when examining frequency of responding in our samples.
Results of the overconfidence task mirror this null effect, in that current depressive symptom severity was not associated with any form of overconfidence. This not only contradicts the depressive realism hypothesis, but also contradicts past work reporting decreased overestimation in depressed individuals (Fu et al., 2005, 2012; Stone et al., 2001). The possibility remains that other measures of depressive symptoms or other measures of perceived control could demonstrate such a relationship, but that would raise the question of how to conceptually reconcile differences with the two measures of depression and multiple measures of control that we used.
We explored a number of clinical features that could influence the effect of depression symptoms on illusory control and/or overconfidence. Our results suggest that anxiety may influence illusory control, supporting previous suggestions to consider anxiety symptoms when testing depressive realism (Ackermann & DeRubeis, 1991). We also examined the potential effect of levels of mental-health related impairment, current and lifetime diagnoses of or treatment for depression and anxiety, hypomanic tendencies, and trauma history on depressive realism. None of these variables showed a robust influence on either illusory control or overconfidence.
Our results must be considered in light of a number of potential methodological limitations. First, as with any behavioral task, it is possible that our null results may be best attributed to measurement error. We took two steps to address this concern. First, we included not only the original task, but also included a well-validated overconfidence task (D. A. Moore & Healy, 2008) to consider various domains in which depressive realism may manifest. Second, to assess measurement error as a potential cause for our null results, we tested mixed models controlling for trial-level random effects in the contingency task, a statistical approach which has been suggested to be one of the best available tools to account for measurement error (Parsons et al., 2019). Across these measurement and statistical approaches, and across two samples, we find no evidence that depressive symptoms are tied to greater realism. Of course, it remains the case that we may not have fully recovered true effects (Rouder et al., 2019). Future research could further probe the reliability of the contingency task. It is possible that different tasks may be better suited to capturing the depressive realism effect.
While we acknowledge the innate limitations with sampling via MTurk, there is evidence that MTurk workers show more severe depression symptom severity than does the general population (Sheehan et al., 1996), and that their depression scores correspond to elevations in self-reported use of antidepressant medications (Ophir et al., 2019). We find remarkably consistent results across our MTurk and undergraduate samples, bolstering our confidence in the generalizability of our findings. That said, one limitation is that we exclusively sample US participants. We also did not measure participants’ racial/ethnic identities, income, or socioeconomic status, limiting our ability to characterize our sample, and thus to understand how sample characteristics may influence the generalizability of our results. Extending our study to participants located outside the US could add valuable insight into generalizability, but we regard it as unlikely that a three-way interaction between culture, control, and depression proves a robust moderator of the results we report.
It is also worth noting that due to our ability to pre-screen comparatively fewer undergraduate than MTurk participants, Sample One included more participants with moderate-to-severe depression symptoms than Sample Two. However, the original depressive realism study (Alloy & Abramson, 1979) was conducted within a subclinical undergraduate sample, casting doubt on sample selection as the cause for our failure to find depressive realism. Several other studies utilizing contingency tasks detect a depressive realism effect in dysphoric populations (Ackermann & DeRubeis, 1991). Similarly, one may criticize the reliance on self-rated symptoms rather than clinical interviews in assessing depression, but it is important to note that previous meta-analytic results find a stronger depressive realism effect for self-rated than interview-rated symptoms (M. T. Moore & Fresco, 2012).
The “sadder but wiser” hypothesis argues that depression enhances accuracy in judgment (Alloy & Abramson, 1979, 1988). Our results contradict these claims. We do not find that depressive symptoms correlate consistently with assessments of personal control. The implication is that errors in the assessment of personal control or overconfidence do not appear to be integral to depression.
Contributions
Contributed to conception and design: DAM, ASD, SLJ
Contributed to acquisition of data: ASD, KTG
Contributed to analysis and interpretation of data: ASD, KTG with supervision from DAM, SLJ
Drafted and/or revised the article: ASD, DAM, SLJ, KTG
Approved the submitted version for publication: ASD, DAM, SLJ, KTG
Competing Interests
An author on this manuscript (initials: DAM) serves as an associate editor at Collabra: Psychology.
Data Accessibility Statement
Our materials, including survey instruments, data, analysis code, are posted publicly on the project’s Open Science Framework page here.
Footnotes
Because Alloy and Abramson (1979) reported finding a depressive realism effect among only female participants, our preregistered analysis includes gender. We do not find any significant main or interaction effects of gender.
We erroneously omitted IDD-C score x contingency type from the pre-registration, but it is essential to understanding whether true contingency influences any potential depressive realism effect. Likewise, we realized after pre-registering that we needed a model including frequency of responding to test for a depressive realism effect in control and outcome bias.