Tests of generalizability can diversify psychological science and improve theories and measurement. To this end, we conducted five studies testing the cognitive vulnerability to depression hypothesis featured in the hopelessness theory of depression: Study 1 was conducted with Honduran young adults (n = 50); Study 2 was conducted with Nepali adults (n = 34); Study 3 was conducted with Western hemisphere adults (n = 104); Study 4 was conducted with Black U.S. adults (n = 119); and Study 5 was conducted with U.S. undergraduates (n = 110). Results showed that cognitive vulnerability could be measured reliably in diverse populations and the distribution of vulnerability scores was similar for all samples. However, the tendency to generate negative inferences about stress had different implications for depression depending on sample; the association between cognitive vulnerability and depressive symptoms did not generalize to Honduran and Nepali participants. It is now necessary to understand why a negative cognitive style confers risk for depression in some contexts but not others (e.g., is it issues related to measurement, theory, or both). The results also suggest that understanding and reducing the global burden of depression will require more than simply “translating” existing cognitive measures and theories to other countries.
Psychological scientists and the samples they use tend to be White and WEIRD (Western, Educated, Industrialized, Rich, and Democratic; Clancy & Davis, 2019; Henrich et al., 2010). According to Thalmayer and colleagues (2021), only 11% of the world’s population is represented in the top psychology journals with 60% percent of authors and samples specific to the United States. Further, only 3-6% of all published research articles on mental health include participants from low and middle-income countries (El Khoury et al., 2021; Gallegos et al., 2013; Haque & Kamal, 2019; Patel & Sumathipala, 2001). Given such limited representation, it is not surprising that the validity and robustness of psychological theories have been called into question.
There are a number of reasons for why the scientists and research samples in psychological science are White and WEIRD. First, there is systemic racism in who is admitted to doctoral programs and hired as faculty at research universities (e.g., Baumgartner et al., 2010; D’Augelli & Hershberger, 1993; Gildersleeve et al., 2011) Second, it is far easier to recruit and run studies with convenience samples (e.g., college students) than more representative samples. It is not merely a matter of swapping white and WEIRD samples for non-white and non-WEIRD samples; this research often requires travel, money, and additional research personnel. There can also be language barriers and differences in access to technology (e.g., computers, internet, etc.).
Another factor that may be preventing scientists from testing their theories in more diverse samples is fear that the original findings will not generalize. Although it is unrealistic to assume that all theories will apply equally well to all people and in all contexts, there seems to be a growing intolerance and stigmatization of failed replications. This is probably a result of psychology’s replication crisis (e.g., Open Science Collaboration, 2015), the prevalence of questionable research practices (e.g., p-hacking, HARKing, and piecemeal publication; Simmons et al., 2011), and recent high-profile cases of scientific misconduct (e.g., Giner-Sorolla, 2017; Ledford, 2010; Murray, 2002).
It is critical to incentivize and value research on generalizability (Haeffel & Cobb, 2022). This work can diversify psychological research, which makes for better science (e.g., Meadon & Spurrett, 2010; Medin & Lee, 2012; Plaut, 2010). Further, focusing on generalizability may help curb researchers’ tendency to overstate the generalizability and real-world implications of their findings (e.g., Rad et al., 2018). If researchers know that their theories will be tested in diverse samples, then they may be more conservative with their conclusions. Finally, studies that test generalizability can inform measurement and constrain theories. Replication attempts in diverse samples are going to expose problems with current theories and measures, which will require scientists to rethink and reformulate them. As scientists, we likely learn more from a failed replication than a successful replication because scientific progress comes from disagreement (i.e., we learn more from being wrong than from being right; Haeffel, 2022; Popper, 1959).
It is important to underscore that if a study does not replicate in a more diverse sample, then it does not mean that the original study was invalid or the result of scientific misconduct. Instead, it means that the theory and findings may only apply to a specific sample or within a specific context. For example, a study testing a theory about marriage perceptions in American college students may not generalize to other cultures around the globe (or even older adults in the U.S.), but that does not invalidate the work in college students. It simply constrains the theory and conclusions that can be made; namely, that the results only apply to American college students. The next step, then, is to understand why the theory does not apply to other populations and revise it. If researchers are to embrace more rigorous tests of their theories, then the field must accept the idea that being wrong can facilitate scientific progress. Clearly, a replicable knowledge base is a stalwart of science, but refutations are often more informative than confirmations. Failed replications push researchers to be more sophisticated and specific in their theorizing. It forces them to specify the conditions for when a theory will and will not apply. This will lead to stronger theories and a more inclusive and global psychological science.
The purpose of this research is to test the generalizability of hopelessness theory’s cognitive vulnerability hypothesis (Abramson et al., 1989) in five unique samples from around the globe. We chose to focus on the hopelessness theory of depression because of it is defined by a well-articulated theory, has strong empirical support, and there is a psychometrically validated measure of cognitive vulnerability (Haeffel et al., 2008). According to the theory (Abramson et al., 1989), some people have a cognitive vulnerability that puts them at heightened risk for developing depressive symptoms and depressive disorders. Cognitive vulnerability refers to the tendency of a person to generate overly negative inferences about the cause, consequences, and self-worth implications of stressful life events. Specifically, when faced with a stressful life event, an individual with high levels of cognitive vulnerability is likely to: (a) attribute the event to stable and global causes; (b) view the event as likely to lead to other negative consequences; and (c) construe the event as implying that he or she is unworthy or deficient. Individuals who generate these three types of negative inferences are hypothesized to be at greater risk for depression than people who do not generate these types of inferences.
Cognitive vulnerability tends to solidify during early adolescence, and then shows trait-like stability throughout the life span (see Romens et al., 2008 for review). It can be thought of as one’s “native language” for interpreting life stress (Haeffel & Kaschak, 2019). Research has largely supported the cognitive vulnerability hypothesis featured in the hopelessness theory (Abramson et al., 1999; Hankin et al., 2005; Hong et al., 2006; Metalsky et al., 1987; Swendsen, 1997). Over 50 published studies have shown that it is possible to reliably assess individual differences in cognitive vulnerability and that those differences precede and predict future depressive symptoms and disorders (see Haeffel et al., 2008 for review).
As is the case with much of psychological science, a limitation of research on hopelessness theory’s cognitive vulnerability hypothesis (negative attributional style) is that studies have largely focused on college students from the United States (Haeffel et al., 2008). Conducting studies with undergraduates makes sense because they are at the peak age for developing depression and experience increased levels of interpersonal stress (Hankin et al., 1998). However, it is now time to take the next logical step in this area of research and examine the generalizability of the theory to other populations. This is particularly important for understanding depression as it is a global disorder affecting over 300 million people around the world (World Health Organization, 2017).
To address this gap in the literature, we conducted 5 studies. Studies 1 and 2 were conducted with Honduran and Nepali adults, respectively. These were opportunity samples; they were chosen because they represent understudied populations and because we were had established relationships with locals at the study sites. Study 1 was conducted in a sample of Latino young adults from an orphanage in the Republic of Honduras. Honduras has some of the highest rates of depression in the world, yet it has received among the least empirical attention by depression researchers (Gallegos et al., 2013; Gibbons, 2020; Thalmayer et al., 2021; VandenBos & Winkler, 2015). The rate of adolescent depression in the Republic of Honduras is 21% (Central American Refugee Health Profiles, 2017) whereas in the United States it is approximately 13% (NIMH, 2017). Study 2 was conducted with a small sample of Nepali adults. Nepal has a population of about 30 million, and it is one of the most economically disadvantaged countries in South Asia. Nearly 90% of the population lives in rural areas, and “psychiatric disorders make up approximately 11% of the total amount of disease and illness in the country” (Hall et al., 2016, p. 278). According to Luitel and colleagues (2015), the mental health care situation in Nepal is “dire.” Studies 3-5 recruited samples living in the Western Hemisphere. Study 3 was conducted with a sample of adults from Western countries. Study 4 was conducted with a sample of Black adults from the United States. Study 5 was conducted with a U.S. sample of college students.
We had three primary hypotheses. First, we hypothesized that it would be possible to reliably detect individual differences in cognitive vulnerability and depressive symptoms in all five samples. Specifically, we predicted the translated (Spanish and Nepali) measures of cognitive vulnerability and depressive symptoms would demonstrate good reliability as operationalized by measures of internal consistency (i.e., alpha and Macdonald’s omega); based on prior research, we expected alpha and MacDonald’s omega levels to exceed .80 for the measure of cognitive vulnerability and the two measures of depressive symptoms. Second, we hypothesized that cognitive vulnerability would show a normal distribution of scores in all five samples. These hypotheses are based on the assumption that generating causal attributions for events in one’s life is a universal human process. In other words, the tendency to generate interpretations of stressful life events that have negative implications for one’s future and self-worth is present to a greater or lesser degree in all humans. Finally, we hypothesized that cognitive vulnerability would be moderately positively correlated (ranging from .3-.6) with depressive symptoms (regardless of depressive symptom measure used) as found in prior research (see Haeffel et al., 2008 for review).
All three of our hypotheses assume that cognitive vulnerability is a universal human construct that past research using Western and white samples will generalize to (behave similarly in) to other cultures and populations. This assumption is partially supported by preliminary work investigating related vulnerability constructs such as those featured in Beck’s theory (1979; dysfunctional attitudes) and Nolen-Hoeksema’s ruminative response theory (1991; brooding). Research on these theories found that the results reported in U.S. samples generalized to populations living in the United Arab Emirates, Egypt, Asia and Europe (Beshai et al., 2016; Chahar Mahali et al., 2020; Thomas & Altareb, 2012).
However, there is also reason to hypothesize that cognitive vulnerability may not generalize to more diverse populations. Research shows that a person’s level of cognitive vulnerability is influenced by his or her early social environment. One particularly important predictor of cognitive vulnerability levels is the direct feedback adolescents receive about stress from their parents, peers, and teachers (Alloy et al., 2001; Garber & Flynn, 2001; Mezulis et al., 2006; Peterson & Seligman, 1984; Stark et al., 1996). Cultural norms may also influence the types of feedback adolescents receive about stressful life events. In more individualistic cultures, adolescents might be more likely to receive feedback related to one’s worth and individual future. However, in more collectivistic cultures, feedback may focus on the role of the greater social context in interpreting the stressful event (Knyazev et al., 2017). This means that there may be some cultures in which cognitive vulnerability levels are lower on average.
Further, the cultural context may affect the degree to which cognitive vulnerability and depression levels are associated. In research using undergraduates, cognitive vulnerability scores are normally distributed, and stress tends to be acute (rather than chronic). In these studies, high levels of cognitive vulnerability combine with the greater stress occurrences to predict depressive symptoms. However, in non-White and non-WEIRD samples, it is possible that cultural and geographic differences in stress will affect how cognitive vulnerability is associated with depression. For example, in populations with chronically high levels of stress (e.g., poverty, war, etc.), people may experience depressive symptoms regardless of their cognitive vulnerability level. In other words, when stress is overwhelming, it may push any individual (even those with low levels of cognitive vulnerability) over the threshold into depression. In this case, cognitive vulnerability will not be as highly correlated with depressive symptoms (e.g., titration model; Abramson et al., 1997). There is preliminary evidence from studies testing similar cognitive vulnerability constructs showing that cultural context matters for risk for depression. For example, Auerbach and colleagues (2010) found that the interaction effects of perceived control and life stress on depressive symptoms was different for Canadian and Chinese adolescents (see Cohen et al., 2013 for similar findings in a Chinese adolescent sample for negative attachment cognitions). These results indicate that it may be presumptuous to hypothesize the results of the current study will directly map onto prior research using American college students.
Study 1- Honduran Young Adults. Participants were 50 young adults (26 females, 24 males) living in a Honduras orphanage approximately one hour outside of the capital city, Tegucigalpa. Participants ranged in age from 18-27 (M = 24, SD = 3). The Honduran orphanage provides living accommodations, medical care, food, and water for approximately 300 orphans. Schooling is required during the week, and upon entering adolescence, each student is also given the opportunity to learn a “trade” (shoemaking, electricity, etc.) and receive higher education. Data was collected during May 2018.
Study 2 – Nepali Adults. Participants were 34 native Nepali adults (29 females, 5 males) ranging in age from 22-55 (M = 42, SD = 8). They were receiving services from Koshish, which is part of a “a national self-help organization working in the field of mental health in Nepal.” Clients of Koshish (mainly women and girls) are often homeless, estranged from their families, and/or victims of sexual/physical abuse. All clients are screened at intake for acute episodes of mental illness and tend to stay at the resident home an average of three months. The Koshish transit home houses approximately thirty people at a time. The participants in this study were involved in self-help therapy groups, which Koshish runs for clients with chronic mental health problems. Data was collected during July 2019.
Study 3 – Western Hemisphere Adults. Participants were 104 adults (65 females, 39 males) from the Western hemisphere who were recruited via an online participant platform (Prolific; Palan & Schitter, 2018). Participants ranged in age from 18-61 (M = 36) and resided in a variety of Western countries including Canada, Ireland, Scotland, England, and the United States. Participants were homogenous with regard to race and ethnicity with 95% self-reporting as Caucasian, 3% African American, 4% “other”, 1 % Hispanic, and 1% Asian. One hundred thirteen participants started the study, but 9 participants were removed for not completing at least one of the measures. All participants included in the analyses completed at least 95% of the items on all measures. Participants write-in answers on the CSQ were examined for repetitions and unusual/non-sensical answers; no participants were excluded for these reasons. Data was collected during January 2020.
Study 4 – Black U.S. Adults. Participants were 119 adults (70 females, 49 males) from the United States who were recruited via an on-line participant platform (Prolific). Participants ranged in age from 18-66 (M = 31), were all from the United States, and all self-identified as Black. One hundred twenty participants started the study, but 1 participant was removed for not completing at least one of the measures. All participants included in the analyses completed at least 95% of the items on all measures. Participants write-in answers on the CSQ were examined for repetitions and unusual/non-sensical answers; no participants were excluded for these reasons. Data was collected during October 2019.
Study 5 – U.S. Undergraduates. Participants were 110 undergraduates (79 females, 31 males) from a private, midsized university in the Midwestern United States who were recruited through the University’s online volunteer participant pool. Participants ranged in age from 18-23 (M = 19) and the ethnicity of the sample was representative of the race and ethnicity of the University - 78% Caucasian, 13% African American, 2% Asian-American, 2% Native American, and 4% “Other.” One hundred fifteen participants started the study, but five participants were removed for not completing at least one of the measures. All participants included in the analyses completed at least 95% of the items on all measures. Data was collected between November 2019 and January 2020.
Analyses were not focused on null hypothesis testing, but rather on the distribution of scores, reliability estimates for measures, and a simple bivariate correlation. A post hoc power analysis using G*Power showed that a sample size of 34 participants was needed to detect a one-tailed (positive) bivariate correlation of medium effect size (r = ~.4) with power of .80 and alpha level of alpha = 05. The medium effect size was based on the correlation sizes between cognitive vulnerability and depressive symptoms found in prior work (correlations typically range between .3 and. 6; see Haeffel et al., 2008 for review). Note that the Honduran and Nepali samples were not sufficiently powered to detect small effects.
Cognitive Vulnerability. Two measures were used to measure cognitive vulnerability (as featured in the hopelessness theory of depression; Abramson et al., 1989). The Honduran young adults, Western adults, Black U.S. adults, and U.S. college students (Studies 1, 3, 4 and 5) were administered the Cognitive Style Questionnaire (CSQ; Haeffel et al., 2008). Participants are presented with 6 hypothetical negative events (3 achievement and 3 interpersonal) and asked to imagine the events happening to themselves; they then are asked to write down what they believe to be the cause of the event. Likert scale ratings (1-7) are then made for the three vulnerability dimensions featured in the hopelessness theory of depression: stability and globality; probable consequences of each event; and the self-worth implications of each event. An individual’s CSQ score is his/her average Likert scale rating across these three dimensions (stability and globality, consequences, and self-worth characteristics) for the 6 hypothetical negative life events. Composite score ranges from 1 to 7, with higher scores reflecting greater levels of cognitive vulnerability to depression. The CSQ has good internal consistency, reliability, and validity (see Haeffel et al., 2008 for review; Haeffel & Howard, 2010). The CSQ was translated to Spanish for use with the Honduran young adults with the help of the administrators of the Honduran orphanage as well as the University of Notre Dame Romance Languages department. The final translated version of the CSQ was evaluated and approved by administrators of the orphanage.
The Nepali sample was administered a single-scenario CSQ called the Particular Inference Questionnaire (PIQ; Haeffel, 2011; Metalsky et al., 1987). This short measure of vulnerability was used because of time constraints and feasibility issues related to translating a longer measure from English to Nepali. The PIQ uses a single idiographic scenario about which participants make Likert ratings (1-7) on the three vulnerability dimensions featured in the hopelessness theory of depression: stability and globality; probable consequences of each event; and the self-worth implications of each event. Like the CSQ, an individual’s PIQ score is his/her average rating across these three dimensions (stability and globality, consequences, and self-worth characteristics) for the single scenario. The PIQ was translated from English to Nepali with the help of a paid native speaking translator. The translated version of the PIQ was then evaluated and approved by three separate KOSHISH staff, including the head of program development. The PIQ was also administered to the Western adult, Black adult, and college student samples so that we could make direct comparisons to the Nepali sample.
Depressive Symptoms. Two measures of depressive symptoms were used. The Honduran adolescent, Western Adult, Black adult, and college student samples (Studies 1, 3, 4 and 5) were administered the Beck Depression Inventory (BDI; Beck et al., 1979). The BDI is a 21 item self-report questionnaire. Scores are created by summing the items (range 0-63), with higher scores indicating greater levels of depressive symptoms. The BDI has demonstrated high internal consistency (coefficient alpha is typically greater than .8), good test-retest reliability (r= .60-.83 for nonpsychiatric samples), and validity with both college and psychiatric samples (see Beck et al., 1988, for review). The BDI was translated to Spanish for use with the Honduran young adults with the help of the administrators of the Honduran orphanage as well as the University of Notre Dame Romance Language department. The translated version of the BDI was then evaluated and approved by administrators of the orphanage.
The Nepali sample was administered the anhedonic subscale of the Mood and Anxiety Symptom Questionnaire (MASQ; Watson et al., 1995). The MASQ is a 90-item self-report questionnaire that assesses general depressive and specific anhedonic symptoms of depression based on the tripartite theory of anxiety and depression (Clark & Watson, 1991). The anhedonic subscale contains 22 items that assess symptoms hypothesized to be specific to depression (e.g., low positive affect, loss of pleasure in daily activities). The MASQ has demonstrated good reliability and validity (e.g., Watson et al., 1995). The anhedonic subscale was translated from English to Nepali with the help of a paid native speaking translator. The translated version of the MASQ was then evaluated and approved by three separate KOSHISH staff, including the head of program development. The MASQ anhedonic subscale was also administered to the Western adult, Black adult, and college student samples so that we could make direct comparisons to the Nepali sample.
All five studies used a cross-sectional design in which measures of cognitive vulnerability and depressive symptoms were administered in a single study session. Measures were administered in-person for Studies 1 and 2 (Honduran and Nepali samples, respectively). In Study 1 (Honduran sample), native Spanish-speakers were on site during the questionnaire session to ensure an appropriate testing environment and to answer any questions. The questionnaires were administered during three classroom sessions, with the first two sessions consisting of twenty students and the third consisting of ten students. In Study 2 (Nepali sample), native Nepali speakers were on site during the questionnaire session to ensure an appropriate testing environment and to answer any questions. The questionnaires were administered across five group sessions with 5-9 people in session. Studies 3-5 (Western adults, Black U.S. adults, and U.S. undergraduates, respectively) were conducted online. Studies 3 and 4 used the online platform Prolific to recruit participants and were paid approximately $9.50 per hour. We chose Prolific because samples tend to be more diverse, are more scientifically naïve, more honest, and provide higher quality data than other popular online platforms (Peer et al., 2017, 2022). Study 5 used the University’s online extra credit psychology participant pool. Participants were given course credit for their participation.
For the Honduran study, we chose to create new Spanish versions of the CSQ and depressive symptom measures rather than use existing translations for several reasons (note, we did not find Nepali versions of these measures and, thus, had to create our own versions). The first reason is that our partnerships with the Honduran and Nepali sights were not primarily for research. The researchers from our lab were volunteers at these sites and were there to serve the needs of the local volunteer organizations. The research studies were secondary to the volunteer effort. When were afforded the opportunity to conduct research, we wanted to ensure the studies were collaborative and not directive. To this end, the research members provided information about our work to the local administrators by presenting short talks and discussions about cognition and depression. We then consulted with the local leaders to determine how best to conduct the study. As part of this collaboration, the local team was involved in the study design and measure translations. We believe that this approach led to assessments that more closely matched the local dialect and had increased face validity. The drawback of this approach is that we did not use an existing measure validated by prior work.
All procedures for all five studies were approved by the University of Notre Dame human subject review board as well as by each on-site location (Honduran orphanage and Koshish, respectively). The complete data set and other materials can be found here: https://osf.io/umg9p/. No data were excluded from the data set or the analyses. All measures administered to participants are described in the sections below and included in the data set and analyses.
Hypothesis 1: Cognitive Vulnerability Can Be Measured Reliably
The measure of cognitive vulnerability (CSQ) exhibited strong psychometric properties in the four samples in which it was administered. Coefficient alpha and McDonald’s omega were greater than .80 in the four samples in which it was administered (see Figure 1; Honduran sample [alpha = .81; McDonald’s omega = .83], Western adult sample [alpha = .89; McDonald’s omega = .89], Black adult sample [alpha = .92; McDonald’s omega = .93], and the undergraduate sample [alpha = .89; McDonald’s omega = .88]).
In contrast, the single-scenario measure of cognitive vulnerability (PIQ) exhibited low reliability estimates in the four samples in which it was administered. Coefficient alpha and McDonald’s omega ranged between .45 and .60 (see Figure 1; Nepali sample [alpha = .45; McDonald’s omega = .60], Western adult sample [alpha = .52; McDonald’s omega = .60], Black adult sample [alpha = .48; McDonald’s omega = .59], and the undergraduate sample [alpha = .57; McDonald’s omega = .60]). The low reliability was most likely due to the small number of items on the scale relative to the CSQ.
Both measures of depressive symptoms also exhibited strong psychometric properties in all five samples (see Figure 1). Consistent with prior research, results showed that the BDI had strong reliability estimates in the four samples in which it was administered: Honduran sample (alpha = .91; McDonald’s omega = .92), the Western adult sample (alpha = .92; McDonald’s omega = .92), the Black adult sample (alpha = .93; McDonald’s omega = .93), and the undergraduate sample (alpha = .91; McDonald’s omega = .91). The anhedonic subscale of the MASQ also exhibited strong reliability in the four samples in which it was administered: the Nepali sample (alpha = .89; McDonald’s omega = .89), the Western adult sample (alpha = .93; McDonald’s omega = .93), the Black adult sample (alpha = .95; McDonald’s omega = .95), and the undergraduate sample (alpha = .93; McDonald’s omega = .93).
Hypothesis 2: Cognitive Vulnerability Scores Will Be Normally Distributed
Cognitive vulnerability scores, as operationalized by both the CSQ and PIQ, tended to be normally distributed (see Figures 1 and 2). The p-values for Shapiro-Wilk test of normality were not significant in any of the samples for the CSQ. These results are consistent with the hypothesis that cognitive vulnerability, as measured by the CSQ, is a continuous construct present to a greater or lesser degree in most people; however, more research using diverse populations from around the world is needed before making definitive conclusions. Tests of normality for the PIQ also confirmed a normal distribution with one exception. The Shapiro-Wilk test for the PIQ was significant (p = .02) for the Western adult sample, which indicates non-normality.
Although the distribution of scores were highly similar for all five samples, overall levels of cognitive vulnerability varied by population (i.e., the distribution shifted). As shown in Figure 2, cognitive vulnerability scores, as measured by the CSQ, were significantly lower in the Honduran sample (M = 3.44, SD = .93) than in Western adults (M = 4.12, SD = .92; t = -4.22, ptukey < .001), Black U.S. adults (M = 3.85, SD = 1.02; t = -2.61, ptukey =.046), and U.S. undergraduates (M = 4.28, SD = .84; t =-5.11, ptukey < .001). CSQ scores in undergraduates were not significantly different than those of Western hemisphere adults (t = -1.05, ptukey =.72), but tended to be greater than those reported by Black U.S. Adults (t = 3.23, ptukey =.007). Western hemisphere adults and Black U.S. Adults did not differ on CSQ scores (t = 2.12, ptukey = .15)
Cognitive vulnerability scores, when measured with the PIQ, were significantly greater in the Nepali adults (M = 4.52, SD = 1.04) than in Western adults (M = 3.67, SD = .1.16; t = 3.67, ptukey =.002), Black U.S. adults (M = 3.49, SD = 1.26; t = 4.51, ptukey < .001), and U.S. undergraduates (M = 3.63, SD = 1.13; t = 3.89, ptukey < .001).
In contrast to the results for cognitive vulnerability, the distribution of depressive symptom scores tended to be skewed (see Figure 1). Samples also differed on levels of depressive symptoms (see Figure 2). When measured with the BDI, U.S. undergraduates (M = 8.41, SD = 7.78) reported significantly lower levels of depressive symptoms than Honduran young adults (M = 13.97, SD = 10.93; t = 3.33, ptukey =.005), Western hemisphere adults (M = 12.51, SD = 9.73; t = 3.06, ptukey =.01), and Black U.S. adults (M = 11.95, SD = 10.98; t = 2.73, ptukey =.03). When measured with anhedonic scale on the MASQ, Western adults (M = 67.20, SD = 18.52) had significantly greater levels of depressive symptoms than U.S. undergraduates (M = 58.50, SD = 16.87; t = 3.49, ptukey =.003). There were no other significant differences among the four samples (Nepali adults [M = 61.13, SD = 13.95; Black U.S. adults [M = 64.53, SD = 20.11]).
Hypothesis 3: Cognitive Vulnerability Will Be Correlated with Depressive Symptoms
The strength of the association between cognitive vulnerability and depressive symptoms was consistent across Western adult, Black U.S. adult, and U.S. undergraduate samples (i.e., WEIRD participants; See Figure 3). For these three populations, we found significant modest correlations between the BDI and CSQ ranging from .53 to .59 (correlations ranged from .48-.49 when measuring depressive symptoms using the Anhedonic subscale of the MASQ; tests of significance between correlations found no differences for the three samples, all p values > .29).
There was greater variation in the association between cognitive vulnerability and depressive symptoms when cognitive vulnerability was measured using the PIQ. The correlations ranged from .31 (Western adults) to .54 (Black U.S. adults) with most of the correlations hovering around .40. The unusually low correlation in Western adults (.31) was partially due to an outlier, which when removed, increased the correlation to .37. The size of the correlations found in these three populations were similar to those found in prior research with U.S. college students (correlations typically r = ~.4; Haeffel et al., 2008) and not significantly different from one another (tests of significance between correlations found all p values > .29).
In contrast, the association between cognitive vulnerability and depressive symptoms did not generalize to Honduran young adults and Nepali adults (non-WEIRD samples). In the Honduran young adults, the correlation between cognitive vulnerability (CSQ) and depressive symptoms (BDI) was .30, which was weaker (but not statistically significantly different; p = .11) than that found in the three WEIRD populations (in which correlations ranged from .53 to .59). Moreover, it appears that the small, but significant correlation in Honduran young adults, was due to a single outlier (see Figure 3). The correlation between cognitive vulnerability and depressive symptoms was no longer significant (r = .12, p = .42) when the outlier was removed from the data; the correlation without the outlier was significantly weaker in strength than the correlation in the other samples (p <.01 for all correlation comparisons). Similarly, the moderate association between cognitive vulnerability and depressive symptoms found in college students did not replicate in the Nepali adult sample. There was not a significant association between the PIQ and anhedonic subscale of the MASQ (r = -.11, p = .52; this value was significantly weaker than found in the WEIRD samples ( p <.001 for all correlation comparisons).
The people who might benefit most from psychological science (and from whom we might learn the most) are the least likely to be considered in our theories and studies. The purpose of the current research was to add to the 3-6% of published research articles on mental health that include participants from low and middle-income countries (Patel & Sumathipala, 2001), and to illustrate how tests of generalizability can advance science. Specifically, we tested the generalizability of hopelessness theory’s cognitive vulnerability hypothesis in five unique samples from around the world.
The results corroborated some aspects of hopelessness theory, but also revealed aspects of the theory that may not generalize. In support of the theory, results showed that individual differences in cognitive vulnerability could be measured reliably (with the CSQ) in diverse people samples. Indicators of internal reliability (alpha and McDonald’s omega) were similar in strength as those reported in prior studies using college samples (Haeffel et al., 2007, 2008). This finding suggests that the Spanish translation of the CSQ can be used to measure individual differences in this construct. Notably, the Spanish translation of the BDI and Nepali translation of the MASQ also exhibited strong psychometric properties.
Further, the distributions of vulnerability scores were highly similar in all five populations. Cognitive vulnerability was a normally distributed continuous construct present to a greater or lesser degree in all five samples. In other words, generating inferences about the stability and globality of life stress was something that all participants were able to do. This work is consistent with recent theorizing (Haeffel & Kaschak, 2019) proposing that cognitive vulnerability can be thought of as one’s native language for interpreting life stress.
However, results also revealed important differences among samples. The association between cognitive vulnerability and depressive symptoms did not generalize. Corroborating prior research, there was a moderately sized (r = ~.4 - .5) positive correlation between these constructs in samples of Western adults, Black U.S. adults, and U.S. undergraduates that was relatively robust to differences in measurement (e.g., CSQ vs. PIQ, BDI vs. MASQ). However, the association between cognitive vulnerability and depressive symptoms did not hold in Honduran young adults and Nepali adults. It is noteworthy that Honduran young adults reported significantly lower levels of cognitive vulnerability than U.S. undergraduates (about .5 of a standard deviation lower) yet reported twice the level of depressive symptoms. This is the opposite of what one would predict based on the cognitive vulnerability hypothesis. Further, the small significant correlation between cognitive vulnerability and depressive symptoms found in the Honduran group was no longer significant when removing an outlier. Similarly, in the Nepali adults, there was not a significant correlation between cognitive vulnerability and depressive symptoms. Although Nepali adults had among the highest levels of cognitive vulnerability, they did not report higher levels of depressive symptoms.
These results “disagree” with prior research testing hopelessness theory’s cognitive vulnerability hypothesis (Popper, 1959), and suggest that the generalizability of the theory and or the measurement of its constructs may be limited to specific populations. In other words, it may not be possible to simply translate existing measures in this area and expect to find the same results in more diverse populations. The most parsimonious explanation for the current findings is that the cognitive vulnerability hypothesis does not hold in these populations. In other words, the theory only applies to specific cultures or contexts. However, alternative explanations (e.g., methodological differences such as those discussed below) must be ruled out before concluding that cognitive vulnerability to depression (as conceptualized in the hopelessness theory) is largely a Western phenomenon.
One explanation for the discrepant results is with regard to the role of stress. The cognitive vulnerability hypothesis follows a diathesis-stress model (Abramson et al., 1989), which predicts that vulnerability and stress combine to predict depression. Typically, this interaction is assumed to be synergistic whereby high levels of cognitive vulnerability combine with high levels of stress to predict depression (Lewinsohn et al., 2001). The synergistic interaction pattern cannot explain why Honduran young adults with low levels of cognitive vulnerability reported high levels of depressive symptoms as they should be highly resistant to depression. However, a titration model (Abramson et al., 1989, 1997) of vulnerability-stress interactions can explain these findings. In the titration model, even low “doses” of stress are sufficient to trigger depression in highly vulnerable individuals whereas high doses of stress can trigger depression in both low and highly vulnerable individuals. This means that under conditions of high stress, cognitive vulnerability should have a weak association with depression symptoms because everyone is at risk. But, under conditions of low stress, only cognitively vulnerable individuals should be at risk for depression.
The titration framework can help explain the weak association between cognitive vulnerability and depressive symptoms in Honduran young adults. Despite having relatively low levels of cognitive vulnerability, Honduran young adults still had high levels of depressive symptoms. This makes sense if Honduran young adults in an orphanage setting were also experiencing high levels of stress as both vulnerable and non-vulnerable individuals would be at risk for depression.
The lack of an association between cognitive vulnerability and depressive symptoms (see scatter plot in Figure 3) in the Nepali sample is more difficult to understand. The Nepali sample had significantly higher levels of cognitive vulnerability (as measured by the PIQ) than the other populations yet did not report greater levels of depressive symptoms. According to titration model (and synergistic model) this is possible if participants were not experiencing stress or had a reduction in stress. In this specific case, the services provided by Koshish (mental health treatment and housing) may have resulted in significant stress relief.
This study had both strengths and limitations. A major strength of the current research is the use of five unique samples, including both WEIRD and non-WEIRD participants. Further, we demonstrated that it is possible to create translated measures of cognitive vulnerability (e.g., Spanish version of the CSQ; Nepali version of the PIQ) that have similar psychometric properties as their English counterparts. Finally, this work shows that it is possible to falsify hypotheses and contribute new knowledge without relying solely on null hypothesis testing. Rejecting the null is neither necessary nor sufficient for acquiring new knowledge (Meehl, 1978). Good theories lend themselves to a variety of examinations that do not always require p<.05 (e.g., examining distributions of scores, plotting bivariate correlations, translating and testing commonly used measures, etc.; Cumming, 2014; Sakaluk, 2016).
This research also had limitations, the most prominent being the use of a cross-sectional design and small sample sizes in two of the studies. The cross-sectional design did not allow for conclusions about temporal precedence. Thus, it remains possible that the association between cognitive vulnerability and depressive symptoms could emerge over a prospective longitudinal time frame (although researchers would still need to explain why the cross-sectional associations do not appear to generalize to all populations). Relatedly, we did not include a measure of stress, which is needed to determine if the titration model can account for the non-replications found in our research. Determining the pattern of the vulnerability-stress interaction will be an important avenue for future research testing the generalizability of hopelessness theory.
The sample sizes for the Honduran and Nepali studies were small and not powered to detect correlations smaller than those reported in prior work. This underscores the difficulty in doing generalizability research; in our case, this included short time frames for assessment and the availability of people on-site who could help with the translation of measures and answering participant questions. It was not possible to run 300 Honduran young adults or Nepali women because our partnership sites did not even have access to that many people. That said, the analyses were largely focused on the distribution of scores and a simple bivariate correlation rather than more sophisticated analyses requiring large sample sizes for null-hypothesis testing. Prior to running the current set of studies, we had not given a great deal of thought to how to test theories without searching for p < .05. However, alternative options exist. For example, Sakaluk (2016) argues that science should “explore small, confirm big.” According to this approach, scientists start with small scale studies that test the hypothesis of interest, but then work to corroborate those effects with larger data sets. Similarly, Lakens (2021) argues that underpowered studies can still provide information about potential effect sizes. The caveat to these other approaches is that they require a strong theory (e.g., highly specific hypotheses), an established knowledge base, and valid and reliable measures. In our case, the hopelessness theory and its accompanying measure, the CSQ, have been tested extensively in the literature (Liu et al., 2015). The cognitive vulnerability hypothesis has been tested cross-sectionally (Just et al., 2001; Oliver et al., 2007), longitudinally (e.g., Hankin et al., 2004; Lakdawalla et al., 2007; Russell et al., 2014), and in behavioral high-risk designs (e.g., Abramson et al., 1999; Alloy et al., 2000, 2006). There are decades of research (see reviews by Haeffel et al., 2008 and Liu et al., 2015) showing how cognitive vulnerability behaves; we know it is normally distributed, has moderate cross-sectional correlations with depressive symptoms, predicts depressive symptoms longitudinally, and has strong temporal stability (see Haeffel et al., 2008 for review). The CSQ and mood measures have been used extensively in past research, which means it is possible to compare the distributions of scores on the CSQ and correlation plots across studies.
Nonetheless, more research is needed before definitive conclusions can be made about the generalizability of hopelessness theory’s cognitive vulnerability hypothesis. Based on the current findings, we can conclude that “something is going on” regarding the generalizability of the hopelessness theory. Our results did not replicate/generalize to at least some non-Western populations when translating existing measures and using the same basic administration techniques. The question is “why?” Are the discrepant results due to a problem with the theory or a problem with the methodology (i.e., confounds such as measurement issues [translated vs. English] or differences in familiarity with answering questionnaires)? Future research will need to disentangle why and how the differences between samples emerged. In the meantime, it still useful to publish these kinds of discrepant findings even if we do not fully understand why the differences arise. These disagreements make us question our theories as well as the generalizability of our methods. This two-step approach is used in a variety of areas in science including psychotherapy research. There are hundreds of studies published every year demonstrating that Treatment A is more efficacious than Treatment B; however, we often do not understand why that is (or why a any specific treatment works).
That said, the obstacles to understanding non-replications in diverse samples are not insignificant. It is often impossible to use the same exact methodologies used for convenience samples. For example, our Nepali women were not Prolific users. And it would make little sense to run Western adults by sitting on the floor with them and the paper and pencil questionnaires. These kinds of methodological differences may be inevitable. Thus, it may be necessary to take a multi-method approach in future research. For example, in this case, the next step is to gather further information (e.g., via qualitative interviews) about cultural norms and participants’ understanding of these constructs (both vulnerability and depressive symptoms). This information may provide clues about the relevance and understanding of these constructs in specific cultures and how to better adapt the language of the measures to tap the constructs of interest.
In conclusion, there is now a spotlight on systemic racism and a growing choir of voices requesting greater diversity in psychological science (Lewis, 2021). Tests of generalizability using diverse samples can help to solve this problem as well as lead to better theories. These results show that we cannot simply “transport” the same measures and methods to different samples and expect the same results. The results also suggest that the tendency to generate negative inferences about stress has different implications for depression depending on culture. This means that researchers must now try to understand why negative cognitions confer risk for depression in some contexts but not others.
G.H., H.B, M.V.M contributed to conceptualization and design.
G.H., H.B, M.V.M contributed to acquisition of data.
G.H., H.B, M.V.M, L.B. contributed to analysis and interpretation of data.
G.H., H.B, M.V.M, L.B. contributed to drafts and revisions of the article.
G.H. supervised and mentored the student authors on the manuscript.
All authors approved the submitted version for publication.
We would like to thank Nuestros Pequeños Hermanos International and Koshish for allowing us to work with their residents. We would also like to thank Andrea Alatorre for her comments on an earlier version of the manuscript. This research was presented as a poster at the 2021 Association for Psychological Science Conference (recipient of the RISE Award and SSCP Global Mental Health Award).
The research was funded, in part, by UROP grants from the University of Notre Dame. Funding for Study 4 was provided by a Race and Ethnicity research grant from the University of Notre Dame.
We declare no conflicts of interest.
Data Accessibility Statement
Measures and participant data can be found on this paper’s OSF project page: https://osf.io/umg9p/