The Moral Character Questionnaire (MCQ) is a new measure of global morality and six morality domains (i.e., honesty, compassion, fairness, loyalty, purity, and respect). Critically, there is limited evidence on the questionnaire’s psychometric properties, especially its predictive validity with measures other than self-reports. In two demographically-diverse, German-speaking samples (N1 = 7,138, N2 = 616), we found acceptable-to-good reliability, unidimensionality of three domain-specific scales (i.e., compassion, loyalty, and respect), and good discriminant validity. Higher-order and bifactorial solutions both fitted the data reasonably well. However, only the compassion and fairness scales showed acceptable convergent validity across 27 broad and narrow morality-related traits, and only the purity scale was consistently correlated with six different measures of self-reported (im)moral behavior (i.e., civic behavior, counterproductive work behavior, crime and analogous behavior, prosocial behavior intentions, vandalism, and violence). No MCQ scale predicted behavioral measures of dishonesty (Mind Game) or prosociality (Social Value Orientation Slider and Triple Dominance Measure). Overall, our results question the validity of the MCQ to predict morality-related behaviors in particular and call for further validation of the questionnaire to carve out what it can and cannot serve for.
Moral character describes a set of characteristics capturing individual differences in thoughts, feelings, and behaviors related to morality. In other words, it reflects individuals’ tendency to act in ethical as opposed to unethical ways and to think and feel accordingly. In line with this definition, various traits encompassing individual differences in moral character, such as honesty, compassion, and fairness, have been shown to reliably predict a range of morality-related behaviors, including honest vs. dishonest behavior in experimental cheating paradigms (Heck, Thielmann, et al., 2018; Pfattheicher, Schindler, et al., 2019), prosocial behavior in economic games (Thielmann et al., 2020; Zhao & Smillie, 2015), and donating money or time to those in need (Best & Freund, 2021; Ferguson et al., 2019; Nilsson et al., 2020; Yelbuz & Thielmann, 2024).
Given the ubiquity of morality-related behaviors, research on moral character is rapidly gaining traction within psychology and beyond (Lieder et al., 2022; McAdams & Mayukha, 2023). Recent efforts within personality psychology specifically focus on how to theoretically conceptualize and consequently assess the “moral personality.” In this vein, Furr et al. (2022) proposed a conceptualization of moral character along with a self-report scale to operationalize it, the Moral Character Questionnaire (MCQ). According to the authors, global moral character involves four psychological dimensions, namely cognition (“I believe it’s important to be moral”), motivation (“I want to be moral/act morally”), behavior (“I behave in a generally moral way”), and identity (“I see myself as a generally moral person”). These dimensions tap into individuals’ idiosyncratic understanding of what “moral” means, an approach also known as “first-person” or idiographic and they are also well aligned with recent social-cognitive and personality-based views of morality (Dahl, 2023; Jayawickreme et al., 2019). In addition, Furr and colleagues proposed six specific domains of moral character, namely honesty, compassion, fairness, loyalty, purity, and respect, which were derived from the Honesty-Humility dimension of the HEXACO Model of Personality Structure (Ashton & Lee, 2007) and the five moral intuitions specified in Moral Foundations Theory (Haidt & Graham, 2007). The authors refer to this approach as “third-person” or “aggregated” because it includes moral domains that have been specified elsewhere. Altogether, the MCQ provides a measure of global morality and six specific domains of moral character, all assessed with a reference to cognition, motivation, behavior, and identity.
Across five calibration and five confirmation samples (96 ≤ N ≤ 9,365), Furr et al. (2022) provided initial support for the validity of the instrument. Specifically, the MCQ scales (global and domain-specific) turned out to be internally unidimensional, that is, they were adequately represented by their corresponding items while no subdimensions or poor loadings of any items on their respective scale occurred. Moreover, the MCQ scales showed acceptable internal consistency (i.e., Cronbach’s alpha and McDonald’s omega ≥ .70), although the domain-specific scales of purity and respect showed less than desirable (< .70) omega values. Finally, the six domain-specific scales were moderately to strongly correlated with each other (.28 ≤ r ≤ .56) and converged well in a higher-order global morality factor, suggesting that the six domains share a common denominator of global morality.
The authors also considered the correlations of the MCQ scales with basic personality traits (HEXACO and Big Five dimensions) and different criterion measures to investigate convergent, discriminant, and predictive validity. Among basic personality traits, the strongest correlations of the global morality scale were apparent for Big Five Agreeableness (r = .46) and Conscientiousness (both HEXACO and Big Five, mean = .36 across samples), whereas descriptively weaker correlations occurred for Honesty-Humility ( = .22) and HEXACO Agreeableness ( = .14). At the domain-specific level, HEXACO Agreeableness showed the most consistent associations with all MCQ scales after accounting for the remaining HEXACO dimensions (with mean = .16), whereas descriptively lower associations were apparent for Honesty-Humility ( = .10). Regarding more specific criterion measures, more varied correlational patterns emerged, as had been predicted by the authors. All MCQ scales showed medium to large correlations (.35 ≤ ≤ .73) with narrow moral traits (e.g., self-report measures of fairness, honesty, and moral identity), somewhat lower correlations (.17 ≤ ≤ .56) with moral values and obligations (e.g., filial piety, moral concerns) – except for authoritarian child-rearing values for which almost no correlations emerged – and relatively small correlations (.06 ≤ ≤ .24) with the three self-report behavioral measures considered (i.e., hedonistic behavior, profanity use)1, with the exception of purity (both = .42). These results were mostly consistent with the authors’ hypotheses which were based on theoretical conceptualizations from five independent raters.
Notwithstanding these findings, there is still limited evidence on the psychometric properties of the MCQ. First, especially the evidence on convergent and discriminant validity provided by Furr et al. (2022) is rather weak as the correlations with morality-related basic personality traits were only small in size and, in part, smaller than correlations with traits that are largely unrelated to morality2 (e.g., Conscientiousness). More critically still, the scales’ predictive validity has only been tested using a few self-reports of moral tendencies and behavior (Furr et al., 2022; Prentice et al., 2020). Using self-reports to validate a self-report measure arguably comes with caveats, especially that any correlation between the two may, at least in part, be attributable to common method bias rather than providing evidence for the measure’s true predictive validity. Finally, another issue pertains to the distinction of different moral character domains: If the MCQ is sought to provide researchers with a flexible instrument to measure only those domains they consider relevant to moral character (Furr et al., 2022), one would assume that the domains can also be empirically distinguished from one another, in the sense that each domain predicts (conceptually) closely related outcomes best. For example, the honesty domain should particularly correlate with measures of (dis)honesty, whereas the fairness domain should particularly correlate with measures of prosociality (i.e., fair behavior). Furr et al. (2022) had hypothesized (and found) quite homogenous correlations across the domains of honesty, fairness, and loyalty, while they anticipated some distinctions for compassion, purity, and respect, which were confirmed empirically. Such differential relations have, however, not been tested with criteria other than self-reports, leaving the question of whether the distinction of different morality domains is useful for prediction essentially unanswered.
The present investigation aimed to tackle these limitations and, thereby, provide an arguably more stringent test of the psychometric properties of the MCQ. Specifically, we thoroughly investigated the factorial structure and validity of the MCQ in two large and demographically heterogenous samples. Sample 1 (N = 7,138) served to examine the factorial structure of the MCQ using both higher-order and bifactor modeling and to test convergent validity with (low levels of) the Dark Factor of Personality (D, Moshagen et al., 2018). D represents the common core of all socially/ethically aversive traits and is defined as the general tendency to maximize one’s utility at the expense of others. Thus, by definition, D is incompatible with moral character and has, correspondingly, been consistently linked to various immoral and socially aversive tendencies and behaviors (e.g., Hilbig et al., 2021, 2022, 2023; Moshagen et al., 2020; Scholz et al., 2022).
In Sample 2 (N = 616), we sought to replicate these analyses and, additionally, used several self-report and behavioral criterion measures to examine the convergent, discriminant, and predictive validity of the MCQ. Here, we relied on data from a larger project, the Prosocial Personality Project (PPP), which includes various measures of morality-related traits and fully consequential behaviors. Regarding the traits, we selected those measures from the PPP that cover both broad personality traits (i.e., HEXACO dimensions, Big Five Agreeableness, D, and Personality Inventory for DSM-5 domains) and narrower traits. Among the broad traits, we distinguished between morality-related traits to test convergent validity and traits that are conceptually unrelated to morality to test discriminant validity. We based this distinction on conceptual and empirical grounds (see Footnote 2), meaning, we regarded those traits as morality-related that are commonly considered moral in the literature (Graham et al., 2011; Gulliford et al., 2021; Hofmann et al., 2014; Peterson & Seligman, 2004; Stahlmann & Ruch, 2020; Wright et al., 2017) and/or show robust relations with (im)moral behavior (e.g., Scholz et al., 2022; Thielmann et al., 2020). Narrower traits, in turn, included various specific morality-related constructs (e.g., fairness concerns, moral disengagement, moral identity). We anticipated global morality to be related to all of them, compassion to be specifically related to other measures of compassion and altruism (i.e., the BFAS Compassion facet, the Prosocialness Scale, the Santa Clara Brief Compassion Scale, the Interpersonal Reactivity Index, the Prosocial Personality Inventory Altruism subscale), and fairness to be specifically related to the fairness subscale of the VIA Inventory. To assess predictive validity, we chose three established behavioral measures of (im)moral behavior from the PPP, namely the Mind Game (Jiang, 2013) and a lottery task (inspired by Heck, Hoffmann, et al., 2018) for dishonest behavior and the Social Value Orientation (SVO) Slider Measure (Murphy et al., 2011) for prosociality. We expected stronger correlations between honesty and dishonest behavior, and between fairness and prosocial behavior as compared to the remaining morality dimensions. Moreover, we selected a set of self-report measures of unethical and prosocial behavior (e.g., crime, counterproductive work behavior, prosocial behavior intentions) to further examine the MCQ’s predictive validity.
Overall, our studies provide novel insights into the MCQ’s usefulness as a measure of moral character, both at a general and a domain-specific level.
Methods
Transparency and Openness
We report how we determined our sample size, all data exclusions, all manipulations, and all measures in the study. The datasets and analysis scripts are available on the Open Science Framework (OSF; https://osf.io/7mp45/). Both studies followed ethical guidelines for the treatment of human subjects. The studies were not pre-registered.
Participants and Procedure
Sample 1. Sample 1 comprised 7,138 participants (2,749 females, 4,310 males, 68 diverse) aged 18 to 86 years (M = 38.9, SD = 12.9). The majority resided in Germany (89.1%), Austria (3.8%), or Switzerland (2.7%). A post-hoc power analysis using semPower (Moshagen & Erdfelder, 2016) indicated that this sample size ensured very high power > .99 to detect model misspecifications for the most complex factor model we specified, i.e., the higher-order model (df = 246), corresponding to RMSEA ≥ .08 with an alpha error of .05. Similarly, following a simulation procedure for sample size estimation for bifactor models (Bader et al., 2022), we estimated a power of > .99 to detect model misspecifications corresponding to RMSEA ≥ .08 with α = .05. A sensitivity power analysis using G*Power (Faul et al., 2007) further indicated that this sample size allowed us to detect very small correlations (r = .05) with very high power of .99 (and two-tailed α = .05).
Sample 2. A total of 627 participants completed the MCQ as part of the PPP. Based on pre-specified exclusion criteria (see details in the PPP online documentation; https://osf.io/m2abp/), 616 participants (214 female, 402 male, 0 diverse) were retained. Participants were aged 22 to 74 years (M = 50.9, SD = 10.5) and had diverse educational backgrounds, with 38.2% holding a general certificate of secondary education, 25.7% a university-entrance diploma, and 36.2% a university/college degree. The post-hoc power analysis for the higher-order model again indicated power above .99. The sample size was also adequate for detecting misspecifications for the bifactor model with very high power (i.e., 1-β > .99). Finally, the sample size of Sample 2 allowed us to detect small correlations (r = .11) with satisfactory power of .80. Power simulations (with 10,000 samples) conducted with RRreg (Heck & Moshagen, 2018) further indicated that our sample size was sufficient to detect correlations of r = .15 between the MCQ scales and dishonest behavior in the Mind Game (where random noise is added by design; see details below) with satisfactory power of 1-β = .81 (assuming a true prevalence of dishonesty equal to 30%, as is commonly observed; Gerlach et al., 2019).
Participants in Sample 1 completed the MCQ on https://qst.darkfactor.org. This website provides information on D and allows individuals to take a measure of D (in different languages) and to receive feedback on their individual D score. In April 2023, participants completing a measure of D in German also had the opportunity to complete the German MCQ. Participation was voluntary and anonymous. All exclusion criteria were fixed a priori (for details, see https://osf.io/93tw6/). Data collection via the website was approved by the local ethics committee of the RPTU University Kaiserslautern-Landau, Department of Psychology (approval #LEK-154 and #LEK-567).
Participants in Sample 2 completed the MCQ in a follow-up wave of the PPP. The first measurement occasion (T1) was completed by 4,585 participants (2,356 female, 2,223 male, 6 diverse) spanning a wide age range from 18 to 78 years (M = 40.2, SD = 13.0). Participants were then re-invited to participate in several subsequent measurement occasions. The MCQ was completed in follow-up wave 2022-11a which took place around three years after T1 (M = 1,087, SD = 6 days). Data used to evaluate the scale’s validity stem from eight different measurement occasions of the PPP (see the Measures section). Data for the PPP were collected online via a professional panel provider in Germany. Participation was voluntary and participants provided informed consent at each measurement occasion. Studies followed protocols and procedures preapproved by the local institutional review board. Different subsets of the PPP data have been used in prior publications (see https://osf.io/m2abp/); importantly, the MCQ data has not been used before.
Measures
Moral Character Questionnaire (MCQ). The MCQ (Furr et al., 2022) contains 30 items (see Appendix in the Supplementary Materials) measuring moral character along four dimensions, namely identity, cognition, motivation, and behavior. The instrument contains six items measuring global morality (three items for identity, and one for each remaining dimension) and four items for each of the six domain-specific scales of honesty, compassion, fairness, loyalty, purity, and respect (one for each dimension). To derive a German version, we used a standard backward translation procedure (Brislin, 1970).3
Broad and Narrow (Morality-related) Personality Traits
Table S1 provides an overview of the self-report measures used to test convergent and discriminant validity of the MCQ, including construct definitions. Among the broad personality traits, we used the following morally-tinged constructs to test for convergent validity: Honesty-Humility, HEXACO and Big Five Agreeableness, D, and Personality Inventory for DSM-5 (PID-5) Antagonism. To test for discriminant validity, we used the remaining four dimensions from the HEXACO model (Emotionality, Extraversion, Conscientiousness, and Openness to Experience) and the remaining four PID-5 dimensions (Negative Emotionality, Detachment, Psychoticism, and Disinhibition). Among narrow personality traits, we further selected 27 traits related to morality to test convergent validity of the MCQ (see Table S1 for details). All responses to the personality items were collected on a 5-point Likert scale (from 1 ‘strongly disagree’ to 5 ‘strongly agree’), except for the PID-5, which was assessed on a 4-point response scale (from 1 ‘does not apply at all’ to 4 ‘fully applies’).
Behavioral Measures
Dishonest Behavior. To measure dishonest behavior, we selected the Mind Game (Jiang, 2013) and a lottery task (Heck, Hoffmann, et al., 2018) from the PPP. In the Mind Game, participants were asked to think of a number (integer) between 1 and 8 and write it down on a sheet of paper. They were then presented with a randomly drawn target number between 1 and 8 (from a uniform distribution) and asked to report whether the number they had previously noted down matched the target number displayed on the screen, which should occur with a probability of 1/8, or 12.5%. If participants answered “yes”, they received a bonus payment of €2.00. If participants answered “no,” they did not receive a bonus payment. Participants were aware that the bonus payment solely depended on their response rather than on whether they had actually noted down the target number. In fact, the experimenter could not observe which number a participant had selected given that the data were collected online. Thus, cheating is possible and individually profitable while at the same time remaining non-incriminating. This is because there is a 12.5% chance of actually noting down the winning target number and thus any individual “yes”-response can plausibly stem from full honesty (and luck). However, although it is impossible to determine for any individual participant whether they lied, the prevalence of dishonesty can be estimated at the aggregate level since the probability of (honestly) obtaining the target number is known. More importantly, correlations between dishonesty and covariates (e.g., personality traits) can be computed at the individual level using tailored analyses that account for the random noise added to the responses (Heck & Moshagen, 2018; Moshagen & Hilbig, 2017). In other words, these analyses relate individual responses to the covariate(s) in question while correcting the estimate for the known probability of honestly winning. Thus, probabilistic cheating paradigms, such as the Mind Game, have become a gold-standard for the study of individual differences in dishonesty (see, e.g., Hilbig, 2022). Participants completed the Mind Game as part of T6 of the PPP (April 2020).
Similarly, in the lottery task (Heck, Hoffmann, et al., 2018) participants were asked whether their mother’s birthday was in a certain randomly selected month (e.g., January), which should occur with a probability of 1/12, or 8.3%. If participants answered “yes”, they received a bonus payment of €5.00. If participants answered “no,” they did not receive a bonus payment. Participants were aware that the bonus payment solely depended on their response rather than on whether their mother’s birthday was indeed in that month. In fact, the experimenter could not observe which month the participant’s mother was born in given that this information was never collected. Thus, cheating was once again possible and individually profitable while at the same time remaining non-incriminating. Nonetheless, exactly as in the Mind Game, the prevalence of dishonesty can be estimated at the aggregate level and correlations between dishonesty and covariates can be computed. Participants completed the lottery task as part of follow-up wave 2023-05 of the PPP (May 2023).
Prosocial Behavior. To measure prosocial behavior, we relied on two different measures of Social Value Orientation (SVO), the six primary items from the SVO Slider Measure (Murphy et al., 2011) and the Triple Dominance Measure (Van Lange et al., 1997). The SVO Slider items involve choosing between nine distributions of outcomes (i.e., points that are worth money, i.e., €0.20 per 10 points in our case) for oneself and another individual. Prosociality is measured as the resulting SVO angle (an individual’s position on the self/other-allocation plane), with higher values indicating more consideration of others’ payoffs at the expense of the self and, thus, more prosociality. To render decisions truly consequential, participants received behavior-contingent incentives. To this end, they were randomly assigned the role of the allocator or the recipient and they received a bonus payment according to their own behavior (as allocator) or the randomly assigned partner’s behavior (as recipient) in one randomly chosen trial. Bonus payments varied between €0.30 and €2.00 (M = €1.54, SD = €0.31). Participants completed the SVO Slider as part of T5 of the PPP (March 2020). The Triple Dominance Measure, in turn, includes nine items, each of which asks respondents to choose one out of three own-other point allocations. One option maximizes the outcome for the self (e.g., 540 points for self, 280 for other), one option maximizes the sum of the outcomes for self and other (joint outcome; e.g., 480 points for both self and other), and one option maximizes the relative gain for the self vs. other (i.e., the positive difference between the outcomes for the self and the other; e.g., 480 points for self, 80 for other). As a measure of prosociality, we considered the number of prosocial (i.e., joint outcome maximizing) choices made in the Triple Dominance Measure. Points were converted at a rate of €0.30 per 100 points. Participants were randomly assigned to be paid either as sender or receiver, and then one of the nine trials was selected as relevant for payment (M = €1.37, SD = €0.08). Participants completed the Triple Dominance Measure at the same measurement occasion as the MCQ, i.e., follow-up wave 2022-11a (November 2022). In our sample, the Triple Dominance Measure and the Slider Measure categorized the same participants in the same SVO category 78.3% of the time, similarly to previous reports (Murphy et al., 2011), and correlated moderately (r = .36, 95% CI [.28, .43], p < .001).
Self-Reported Behavioral Measures
Participants completed the following self-report behavioral measures as part of follow-up wave 2020-05b (May 2020).
Civic Behavior. Civic behavior was measured using six items from the World Values Survey (EVS/WVS, 2022) adapted by Hilbig et al. (2022) so as to assess behavioral tendencies to unduly claim government benefits, avoid the fare on public transport, steal property, cheat on taxes, accept a bribe or bribe, and engage in violence against other people. For instance, participants indicated on a 4-point Likert scale (from 1 ‘never’ to 4 ‘frequently’) how frequently they had claimed government benefits to which they were not entitled.
Counterproductive Work Behavior. Counterproductive work behavior was measured using the Deviant Workplace Behavior Scale (Bennett & Robinson, 2000; Zettler & Hilbig, 2010). This measure consists of 19 items that participants answered on a 7-point Likert scale (from 1 ‘never’ to 7 ‘every day’). The scale assesses the frequency with which participants have, in the past year, engaged in behaviors that violate important organizational norms, threatening the well-being of the organization or its members, or both (e.g., “Said something hurtful to someone at work”).
Crime and Analogous Behavior. To measure crime and analogous behavior, the PPP includes the Crime and Analogous Behavior Scale (Miller & Lynam, 2003). This scale consists of 10 yes/no items investigating the engagement with different, relatively infrequent unethical behaviors such as stealing or getting in a fight (e.g., “Have you ever stolen less than $50?”).
Prosocial Behavior Intentions. To measure prosocial behavior intentions, we selected the Prosocial Behavioral Intentions Scale (Baumsteiger & Siegel, 2019). This measure involves four items that participants responded to on a 7-point Likert scale (from 1 ‘would definitely not do’ to 7 ‘would definitely do’). The scale assesses how willing individuals would be to perform each helping behavior (e.g., “Comfort someone I know after they experience a hardship”).
Vandalism. Vandalism was measured using six items developed by Pfattheicher and colleagues (2019). These items assess intentional acts of destroying or defacing other people’s property (e.g., “Sometimes I have to hold myself back not to destroy anything for fun.”) on a 5-point Likert scale (from 1 ‘strongly disagree’ to 5 ‘strongly agree’).
Violence. As a measure of violence, we selected the McArthur Community Violence Screening Instrument, Revised (Pailing et al., 2014) from the PPP. This scale contains 11 items measuring the frequency with which a person has engaged in violent behaviors over the past year (e.g., “Have you said something to another person, which was fully intended to hurt their feelings or make them feel bad about themselves?”). Responses were collected on a 4-point Likert scale (from 1 ‘never’ to 4 ‘frequently’).
Data Analysis
The analysis contained different steps, including examinations of reliability, internal structure, and convergent, discriminant, and predictive validity of the MCQ scales. First, we calculated reliability indices (Cronbach’s alpha and McDonald’s omega) for both the MCQ’s global and the domain-specific scales. Values > .70 can be considered acceptable (Cortina, 1993). Next, we examined the internal structure of the MCQ scales using confirmatory factor analysis (CFA). We conducted three sets of CFAs (see Figures S1-3) using the R package lavaan (Rosseel, 2012), specifying a unidimensional model for the global morality scale and one for each of the six domain-specific scales (Figure S1), a higher-order model (Figure S2), and a bifactor model (Figure S3)4. The first two sets of models are a replication of the original validation study (Furr et al., 2022), whereas the bifactor model was added as it may provide some advantages over the higher-order model. Bifactor models are a more general version of restricted (nested) higher-order models. Unlike higher-order models, bifactor models allow to determine how much of the items’ explained variance is due to the general factor and they also do not assume proportionality (i.e., that the general factor explains an equal amount of variance across facets) – an assumption that is commonly violated and, thus, often induces model misfit. Indeed, the superiority of bifactor models has also been shown for other measures of morality, such as the Moral Foundations Questionnaire (Wormley et al., 2023). Bifactor models are also more robust than higher-order models when the model fit is anticipated to be imperfect due to secondary loadings or residual correlations (Moshagen, 2023), as it is the case for the MCQ (Furr et al., 2022, Table S3).
For all CFAs, we treated items as ordinal and used robust standard errors and a Satorra-Bentler scaled test statistic (Satorra & Bentler, 2001). To assess model fit, we used multiple indices: The comparative fit index (CFI), the Tucker Lewis index (TLI), the root mean squared error of approximation (RMSEA), and the standardized root mean squared residual (SRMR). According to common cutoff criteria (Hu & Bentler, 1999; Schermelleh-Engel et al., 2003), CFI and TLI > .95 and RMSEA and SRMR < .08 can be considered good. For the bifactor model, we additionally computed the explained common variance (ECV) as an indicator of how much variance is due to the general factor versus the specific factors. Although there is no consensus on cutoff criteria for the ECV, higher values (i.e., closer to 1) indicate that the general factor accounts for larger portions of the total variance (Rodriguez et al., 2016). We interpreted model fit as good if the majority of fit indices were acceptable in at least one of the two samples. When fit indices were discrepant, we prioritized RMSEA and SRMR over CFI and TLI. We gave lower priority to CFI because it is more strongly influenced by factor loadings than by actual model misfit (Moshagen & Auerswald, 2018). TLI, on the other hand, is biased by model size (Moshagen, 2012), which is particularly problematic in the present case given the large higher-order and bifactor models. We interpreted the model fit to be good overall if the majority of fit indices were acceptable in at least one of the two samples. In addition, we assessed the convergent and discriminant validity of the MCQ by computing zero-order correlations between the (manifest) mean scores of the MCQ scales and both broad and narrow personality traits. We conservatively considered correlations of .30 as indicative of convergent validity and correlations smaller than .20 as indicative of discriminant validity (and correlations between .20 and .30 as indicative of weak convergence). Although a correlation of .30 arguably represents a lower bound for convergent validity (considering meta-analytical correlations between honesty-humility and exploitation of ρ = –.48, or between agreeableness and forgiveness of ρ = .33, Zettler et al., 2020), we opted for this conservative criterion because the time lag between the measurement of the validity criteria and the MCQ was considerable in most cases, reaching up to three years. To test the specificity of the correlations between the Compassion scale and other measures of compassion (BFAS Compassion facet, Prosocialness Scale, Santa Clara Brief Compassion Scale, Interpersonal Reactivity Index, and Prosocial Personality Inventory Altruism subscale) and between the Fairness scales and the fairness measure included in the VIA Inventory, we used one-tailed z-tests with α = .05 (critical z = 1.65) between the respective Compassion or Fairness correlation and the second highest correlation among the remaining domains.
Finally, we evaluated the predictive validity of the MCQ by examining the zero-order correlations with the different prosocial and (im)moral behaviors measured. Given that in the Mind Game, noise is added to participants’ responses – there is a 12.5% chance of actually selecting the target number and thus honestly responding “yes” – we used a modified logistic regression as provided in the R package RRreg (Heck & Moshagen, 2018) to correct for this noise. We considered correlations of r ≥ .15 as indicative of a meaningful prediction of the behavioral outcomes by the MCQ scales, in line with previous research on the link between morality-related traits and behavioral measures of (im)morality (Heck, Thielmann, et al., 2018; Thielmann et al., 2020). This cutoff corresponds to a small-to-medium correlation with practical value according to recent effect size interpretation guidelines for personality research (Funder & Ozer, 2019). Moreover, we tested the empirical distinctiveness of two MCQ domains, namely honesty and fairness. For these two moral character domains in particular, the PPP contains highly relevant behavioral criteria (i.e., dishonesty in the Mind Game and a lottery task and prosociality as captured in two established SVO measures) that allowed us to test whether the two domains show meaningful (r ≥ .15) and stronger correlations with these criteria compared to the other domain-specific scales. Thus, whenever the correlations for these two domains exceeded the predefined .15 cutoff, we conducted one-tailed z-tests with α = .05 (critical z = 1.65) to verify if the respective correlation was significantly stronger than the corresponding correlations for the other domain-specific scales.
Results
Scale Reliability and Intercorrelations
Table 1 presents the descriptive statistics, reliability indices, and intercorrelations for all MCQ scales. Descriptively, all MCQ scores were above the scale midpoint of 3, ranging from 3.09 for Purity to 4.26 for Fairness in Sample 1 and from 3.51 for Purity to 4.25 for Fairness in Sample 2. The MCQ scores were slightly negatively skewed in Sample 1, ranging from –1.04 for Fairness to –.14 for Purity, and nearly symmetrical in Sample 2, ranging from –.43 for Fairness to .04 for Respect. Both Cronbach’s alpha and McDonald’s omega indicated good reliability for the Global Morality scale, yielding values above .80 in both samples. For the domain-specific scales, Cronbach’s alpha was mostly adequate to good, ranging from .69 (Respect) to .83 (Honesty and Fairness) in Sample 1 and from .71 (Respect) to .86 (Fairness) in Sample 2. McDonald’s omega was, however, generally lower, ranging from .57 (Respect) to .78 (Honesty) in Sample 1 and from .52 (Respect) to .79 (Fairness) in Sample 2, with four of the six scales yielding omega values < .70 in both samples. All domain-specific scales were significantly positively correlated with the Global Morality scale and with each other in both samples. Correlations with the Global Morality scale ranged from .46 (Loyalty) to .62 (Purity) in Sample 1 and from .39 (Compassion) to .59 (Purity) in Sample 2. Moreover, the Global Morality scale correlated very strongly with the sum score of the six domain-specific scales, yielding r = .85 in Sample 1 and r = .80 in Sample 2.
. | M (SD) . | α . | Ω . | Correlations . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sample 1 | Sample 2 | Sample 1 | Sample 2 | Sample 1 | Sample 2 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
1. Global Morality | 3.72 (.63) | 3.68 (.58) | .83 | .85 | .81 | .83 | - | .54 | .50 | .59 | .46 | .62 | .49 |
2. Honesty | 3.87 (.68) | 3.94 (.54) | .83 | .78 | .78 | .70 | .51 | - | .36 | .53 | .49 | .48 | .39 |
3. Compassion | 3.82 (.74) | 3.72 (.68) | .75 | .75 | .68 | .69 | .39 | .34 | - | .51 | .33 | .40 | .43 |
4. Fairness | 4.26 (.56) | 4.25 (.53) | .83 | .86 | .77 | .79 | .47 | .51 | .44 | - | .48 | .47 | .54 |
5. Loyalty | 4.16 (.57) | 4.1 (.51) | .73 | .71 | .65 | .64 | .38 | .48 | .32 | .56 | - | .34 | .40 |
6. Purity | 3.09 (.77) | 3.51 (.66) | .71 | .69 | .67 | .67 | .59 | .47 | .31 | .39 | .31 | - | .45 |
7. Respect | 3.99 (.56) | 4.01 (.53) | .69 | .71 | .57 | .52 | .39 | .38 | .36 | .50 | .38 | .33 | - |
. | M (SD) . | α . | Ω . | Correlations . | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sample 1 | Sample 2 | Sample 1 | Sample 2 | Sample 1 | Sample 2 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | |
1. Global Morality | 3.72 (.63) | 3.68 (.58) | .83 | .85 | .81 | .83 | - | .54 | .50 | .59 | .46 | .62 | .49 |
2. Honesty | 3.87 (.68) | 3.94 (.54) | .83 | .78 | .78 | .70 | .51 | - | .36 | .53 | .49 | .48 | .39 |
3. Compassion | 3.82 (.74) | 3.72 (.68) | .75 | .75 | .68 | .69 | .39 | .34 | - | .51 | .33 | .40 | .43 |
4. Fairness | 4.26 (.56) | 4.25 (.53) | .83 | .86 | .77 | .79 | .47 | .51 | .44 | - | .48 | .47 | .54 |
5. Loyalty | 4.16 (.57) | 4.1 (.51) | .73 | .71 | .65 | .64 | .38 | .48 | .32 | .56 | - | .34 | .40 |
6. Purity | 3.09 (.77) | 3.51 (.66) | .71 | .69 | .67 | .67 | .59 | .47 | .31 | .39 | .31 | - | .45 |
7. Respect | 3.99 (.56) | 4.01 (.53) | .69 | .71 | .57 | .52 | .39 | .38 | .36 | .50 | .38 | .33 | - |
Note.N1 = 7138; N2 = 616. All correlations are significant at p < .001. Correlations above the diagonal refer to Sample 1; correlations below the diagonal refer to Sample 2.
Internal Structure of the MCQ Scales
As summarized in Table S2, the results from the first set of unidimensional CFA models (see Fig. 1) mainly supported the unidimensionality of some of the MCQ scales: Most fit indices were good in at least one of the samples, except for the RMSEA. In Sample 1, only Respect had TLI values > .95, whereas all other scales showed inadequate TLI values, ranging from .773 for Fairness to .930 for Global Morality. In Sample 2 as well, the TLI for Honesty, Loyalty, and Purity was inadequate (i.e., .949, .928, and .778, respectively), whereas it was acceptable for Global Morality, Compassion, and Fairness, together with Respect as in Sample 1. Furthermore, in Sample 1, only Respect had RMSEA values < .08, whereas Global Morality, Honesty, Compassion, Fairness, Loyalty, and Purity showed RMSEA values of .092, .127, .123, .196, .082, and .149, respectively. In Sample 2 as well, the RMSEA for Global Morality, Honesty, Fairness, and Purity was inadequate (i.e., .088, .081, .101, and .158, respectively), whereas this time, it was acceptable (i.e., < .08) for Compassion and Loyalty, along with Respect as in Sample 1. Overall, Honesty and Purity showed consistently unacceptable fit indices across both samples. All item loadings were statistically significant (p < .001) and above .30 in both samples, except for two items, namely item 28 belonging to the Respect scale (“It is not important to show respect to tradition and authority”; λSample 1 = .16, p < .001, λSample 2 = .06, p = .183) and item 14 belonging to the Compassion scale (“It’s not important to me to be compassionate”; λSample 1 = .27, p < .001 and λSample 2 = .10, p = .049). Full results for all CFAs are available on the OSF (Tables S3-5).
In Sample 1, the higher-order model (see Fig. S2) showed good RMSEA and SRMR values, whereas the CFI and TLI values fell below .95 (χ2 = 6,600.2, p < .001, CFI = .855, TLI = .837, RMSEA = .067, SRMR = .056). All item and factor loadings were significant (p < .001) and mostly above .30. Again, item 28 had a relatively low loading (λ = .19), and so did item 14 (λ = .29). Factor loadings ranged from .71 (Compassion) to .93 (Fairness), with a mean of .80. The bifactor model (see Fig. 3), too, showed a good fit to the data in terms of RMSEA and SRMR values (χ2 = 5,107.9, p < .001, CFI = .910, TLI = .891, RMSEA = .055, SRMR = .042), whereas the CFI and the TLI again indicated slightly poorer fit. The general factor explained 58% of the common variance across items. All item loadings on the general factor were significant (p < .001), with a mean item loading of .57. Item 28 belonging to the Respect scale again had the weakest loading (λ = .19). At the level of specific factors, Item 17 belonging to the Fairness scale (“I don’t believe it is important to treat others fairly”) showed a negative loading (λ = −.03, p > .05). We also observed low loadings (< .30) again for items 14 (λ = .11) and 28 (λ = .02, p > .05) as well as for item 7 belonging to the Honesty scale (“I don’t believe that honesty is that important”; λ = .28), item 18 belonging to the Fairness scale (“I want to treat everyone as fairly as possible”; λ = .10), item 21 belonging to the Loyalty scale (“I believe it is important not to betray people”; λ = .02, p > .05), item 26 belonging to the Purity scale (“I want to think and act without vulgarity or filth”; λ = .22), and item 30 belonging to the Respect scale (“I do not want to be rude or irreverent toward others”; λ = .21).
The results for Sample 2 were similar. The higher-order model showed a good fit to the data when considering the RMSEA and SRMR values, albeit the CFI and the TLI again fell under the prespecified cutoff criterion (χ2 = 720.9, p < .001, CFI = .868, TLI = .851, RMSEA = .068, SRMR = .070). All item and factor loadings were significant (p < .001) and mostly above .30, except for items 14 and 28 (p > .05), which again had very low loadings (.10 and .05, respectively). Factor loadings ranged from .57 (Compassion) to .84 (Fairness), with a mean of .75. The bifactor also showed a good fit to the data in terms of RMSEA and SRMR (χ2 = 762.86, p < .001, CFI = .900, TLI = .879, RMSEA = .062, SRMR = .056), although the CFI and the TLI values were once again below the cutoff for acceptable fit. The general factor explained 54% of the common variance across items. All item loadings on the general factor were significant (p < .001) except for items 14 and 28, which also had low loadings on the general factor (.08 and .03, respectively); the mean item loading was .32. At the level of specific factors, three items (item 7 belonging to the Honesty scale, item 17 belonging to the Fairness scale again, and item 21 belonging to the Loyalty scale) showed negative, albeit nonsignificant loadings on their respective scales (λ = −.04, λ = −.02, and λ = −.01, respectively). We also again observed low loadings (< .30) for items 14 (λ = .06, p > .05) and 28 (λ = .05, p > .05) as well as for item 18 belonging to the Fairness scale (λ = .11, p > .05), items 20 (“I shift my loyalties easily”) and 22 (“I want to be loyal even when it’s hard”) belonging to the Loyalty scale (λ = .18, p = .045 and λ = .22, p = .044, respectively), and items 25 (“I will admit that some things I do are indecent”) and again 26 belonging to the Purity scale (λ = .22, p < .001 and λ = .001, p > .05, respectively). These low loadings suggest that the shared explained variance of these items in particular is due to the general factor rather than their corresponding specific factor.
Convergent and Discriminant Validity
To test the convergent validity of the MCQ scales, we first examined their correlations with broad morality-related traits (see Table 2).
. | N . | MCQ-Global Morality . | MCQ-Honesty . | MCQ-Compassion . | MCQ-Fairness . | MCQ-Loyalty . | MCQ-Purity . | MCQ-Respect . |
---|---|---|---|---|---|---|---|---|
Broad traits | ||||||||
Honesty-Humility | 616 | .08* | .24*** | .24*** | .29*** | .23*** | .15*** | .22*** |
HEXACO-Agreeableness | 616 | .19*** | .20*** | .31*** | .31*** | .20*** | .33*** | .31*** |
NEO-Agreeableness | 616 | .30*** | .30*** | .43*** | .44*** | .36*** | .37*** | .42*** |
BFAS-Agreeableness | 616 | .36*** | .32*** | .52*** | .47*** | .40*** | .33*** | .41*** |
BFAS-Compassion | 616 | .40*** | .27*** | .60*** | .42*** | .34*** | .28*** | .34*** |
BFAS-Politeness | 616 | .16* | .25*** | .19*** | .34*** | .32*** | .27*** | .33*** |
BFI-Agreeableness | 616 | .38*** | .29*** | .53*** | .48*** | .32*** | .42*** | .45*** |
IPIP-Agreeableness | 616 | .39*** | .29*** | .60*** | .44*** | .34*** | .34*** | .36*** |
IPIP-NEO-Agreeableness | 616 | .29*** | .34*** | .58*** | .48*** | .39*** | .30*** | .40*** |
Dark Factor (D) | N1 = 7,138 | −.57*** | −.54*** | −.58*** | −.63*** | −.47*** | −.56*** | −.50*** |
N2 = 616 | −.26*** | −.28*** | −.42*** | −.47*** | −.43*** | −.22*** | −.39*** | |
PID-Antagonism | 569 | −.10* | −.21*** | −.23*** | −.26*** | −.27*** | −.15*** | −.21*** |
Narrow traits | ||||||||
Altruism | 517 | .34*** | .20*** | .58*** | .39*** | .29*** | .23*** | .30*** |
Compassion | 528 | .28*** | .17*** | .50*** | .31*** | .18*** | .24*** | .21*** |
Empathy | 522 | .35*** | .29*** | .52*** | .44*** | .35*** | .27*** | .33*** |
Fairness concerns | 524 | .28*** | .30*** | .41*** | .45*** | .33*** | .34*** | .36*** |
Guilt proneness | 532 | .27*** | .28*** | .41*** | .36*** | .31*** | .27*** | .31*** |
Humility | 529 | .08 | .16*** | .11* | .20*** | .11** | .25*** | .16*** |
LTS – Faith in humanity | 525 | .23*** | .17*** | .41*** | .26*** | .15*** | .18*** | .29*** |
LTS – Humanism | 525 | .27*** | .18*** | .42*** | .25*** | .18*** | .18*** | .29*** |
LTS – Kantianism | 525 | .21*** | .34*** | .27*** | .31*** | .31*** | .10* | .23*** |
Moral idealism | 530 | .25*** | .24*** | .32*** | .38*** | .36*** | .28*** | .34*** |
Moral identity | 528 | .36*** | .23*** | .44*** | .31*** | .18*** | .26*** | .28*** |
PPI – Altruism | 530 | .17*** | .20*** | .39*** | .27*** | .29*** | .09* | .15*** |
PPI – Forgiveness | 530 | .13** | .18*** | .30*** | .27*** | .15*** | .21*** | .23*** |
PPI – Gratitude | 530 | .26*** | .30*** | .33*** | .34*** | .31*** | .29*** | .34*** |
Morality-as-Cooperation | 528 | .35*** | .28*** | .31*** | .26*** | .24*** | .37*** | .31*** |
Unmitigated communion | 526 | .34*** | .22*** | .44*** | .31*** | .25*** | .25*** | .28*** |
Amoralism-Crudalia | 517 | −.33*** | −.28*** | −.55*** | −.47*** | −.46*** | −.23*** | −.37*** |
Amoralism-Frustralia | 519 | −.20*** | −.24*** | −.33*** | −.39*** | −.34*** | −.12** | −.26*** |
Egoism | 516 | −.15*** | −.21*** | −.23*** | −.26*** | −.31*** | −.16*** | −.19*** |
Exploitativeness | 523 | −.19*** | −.21*** | −.36*** | −.39*** | −.34*** | −.18*** | −.31*** |
Greed | 522 | −.08 | −.15*** | −.28*** | −.22*** | −.20*** | −.12** | −.20*** |
Moral disengagement | 518 | −.12** | −.21*** | −.20*** | −.29*** | −.35*** | −.13** | −.21*** |
Moral relativism | 526 | −.11* | −.02 | −.02 | −.05 | −.03 | .01 | −.07 |
Self-centeredness | 521 | −.24*** | −.24*** | −.39*** | −.39*** | −.37*** | −.25*** | −.33*** |
Selfishness | 532 | −.24*** | −.29*** | −.48*** | −.36*** | −.32*** | −.27*** | −.31*** |
Spitefulness | 521 | −.21*** | −.24*** | −.24*** | −.34*** | −.38*** | −.22*** | −.31*** |
Psychological entitlement | 515 | .01 | −.07 | −.19*** | −.21*** | −.18*** | −.02 | −.15*** |
. | N . | MCQ-Global Morality . | MCQ-Honesty . | MCQ-Compassion . | MCQ-Fairness . | MCQ-Loyalty . | MCQ-Purity . | MCQ-Respect . |
---|---|---|---|---|---|---|---|---|
Broad traits | ||||||||
Honesty-Humility | 616 | .08* | .24*** | .24*** | .29*** | .23*** | .15*** | .22*** |
HEXACO-Agreeableness | 616 | .19*** | .20*** | .31*** | .31*** | .20*** | .33*** | .31*** |
NEO-Agreeableness | 616 | .30*** | .30*** | .43*** | .44*** | .36*** | .37*** | .42*** |
BFAS-Agreeableness | 616 | .36*** | .32*** | .52*** | .47*** | .40*** | .33*** | .41*** |
BFAS-Compassion | 616 | .40*** | .27*** | .60*** | .42*** | .34*** | .28*** | .34*** |
BFAS-Politeness | 616 | .16* | .25*** | .19*** | .34*** | .32*** | .27*** | .33*** |
BFI-Agreeableness | 616 | .38*** | .29*** | .53*** | .48*** | .32*** | .42*** | .45*** |
IPIP-Agreeableness | 616 | .39*** | .29*** | .60*** | .44*** | .34*** | .34*** | .36*** |
IPIP-NEO-Agreeableness | 616 | .29*** | .34*** | .58*** | .48*** | .39*** | .30*** | .40*** |
Dark Factor (D) | N1 = 7,138 | −.57*** | −.54*** | −.58*** | −.63*** | −.47*** | −.56*** | −.50*** |
N2 = 616 | −.26*** | −.28*** | −.42*** | −.47*** | −.43*** | −.22*** | −.39*** | |
PID-Antagonism | 569 | −.10* | −.21*** | −.23*** | −.26*** | −.27*** | −.15*** | −.21*** |
Narrow traits | ||||||||
Altruism | 517 | .34*** | .20*** | .58*** | .39*** | .29*** | .23*** | .30*** |
Compassion | 528 | .28*** | .17*** | .50*** | .31*** | .18*** | .24*** | .21*** |
Empathy | 522 | .35*** | .29*** | .52*** | .44*** | .35*** | .27*** | .33*** |
Fairness concerns | 524 | .28*** | .30*** | .41*** | .45*** | .33*** | .34*** | .36*** |
Guilt proneness | 532 | .27*** | .28*** | .41*** | .36*** | .31*** | .27*** | .31*** |
Humility | 529 | .08 | .16*** | .11* | .20*** | .11** | .25*** | .16*** |
LTS – Faith in humanity | 525 | .23*** | .17*** | .41*** | .26*** | .15*** | .18*** | .29*** |
LTS – Humanism | 525 | .27*** | .18*** | .42*** | .25*** | .18*** | .18*** | .29*** |
LTS – Kantianism | 525 | .21*** | .34*** | .27*** | .31*** | .31*** | .10* | .23*** |
Moral idealism | 530 | .25*** | .24*** | .32*** | .38*** | .36*** | .28*** | .34*** |
Moral identity | 528 | .36*** | .23*** | .44*** | .31*** | .18*** | .26*** | .28*** |
PPI – Altruism | 530 | .17*** | .20*** | .39*** | .27*** | .29*** | .09* | .15*** |
PPI – Forgiveness | 530 | .13** | .18*** | .30*** | .27*** | .15*** | .21*** | .23*** |
PPI – Gratitude | 530 | .26*** | .30*** | .33*** | .34*** | .31*** | .29*** | .34*** |
Morality-as-Cooperation | 528 | .35*** | .28*** | .31*** | .26*** | .24*** | .37*** | .31*** |
Unmitigated communion | 526 | .34*** | .22*** | .44*** | .31*** | .25*** | .25*** | .28*** |
Amoralism-Crudalia | 517 | −.33*** | −.28*** | −.55*** | −.47*** | −.46*** | −.23*** | −.37*** |
Amoralism-Frustralia | 519 | −.20*** | −.24*** | −.33*** | −.39*** | −.34*** | −.12** | −.26*** |
Egoism | 516 | −.15*** | −.21*** | −.23*** | −.26*** | −.31*** | −.16*** | −.19*** |
Exploitativeness | 523 | −.19*** | −.21*** | −.36*** | −.39*** | −.34*** | −.18*** | −.31*** |
Greed | 522 | −.08 | −.15*** | −.28*** | −.22*** | −.20*** | −.12** | −.20*** |
Moral disengagement | 518 | −.12** | −.21*** | −.20*** | −.29*** | −.35*** | −.13** | −.21*** |
Moral relativism | 526 | −.11* | −.02 | −.02 | −.05 | −.03 | .01 | −.07 |
Self-centeredness | 521 | −.24*** | −.24*** | −.39*** | −.39*** | −.37*** | −.25*** | −.33*** |
Selfishness | 532 | −.24*** | −.29*** | −.48*** | −.36*** | −.32*** | −.27*** | −.31*** |
Spitefulness | 521 | −.21*** | −.24*** | −.24*** | −.34*** | −.38*** | −.22*** | −.31*** |
Psychological entitlement | 515 | .01 | −.07 | −.19*** | −.21*** | −.18*** | −.02 | −.15*** |
Note. BFAS = Big Five Aspect Scales; BFI = Big Five Inventory; IPIP = International Personality Item Pool, LTS = Light Triad Scale, PPI = Prosocial Personality Inventory.
In Sample 1, we found good convergent validity (r ≥ .30) with the Dark Factor (D) for all MCQ scales. In Sample 2, the pattern was more mixed, with good convergence with D for some scales (Compassion, Fairness, Loyalty, and Respect), but weak convergence (i.e., .20 ≤ r ≤ .30) for other scales (Global Morality, Honesty, and Purity).
Convergent validity with Big Five Agreeableness (as measured via the NEO-FFI, BFAS, BFI-2, IPIP-50, and IPIP-NEO) was typically moderate, except for Honesty, which showed only weak convergent validity with both BFI-2- and IPIP-Agreeableness (both r = .29), and Global Morality, which showed weak convergence with IPIP-NEO-Agreeableness (r = .29). At the facet level, correlations were higher for BFAS-Compassion (.27 ≤ r ≤ .60) than for BFAS-Politeness (.16 ≤ r ≤ .34). For HEXACO-Agreeableness, convergent validity was more mixed across scales, yielding r < .20 for Global Morality and weak convergent validity for Honesty and Loyalty (both r = .20). For Honesty-Humility, convergence was mostly weak across scales (.20 ≤ r ≤ .30) and below the prespecified r = .20 for Global Morality and Purity. A similar pattern emerged for PID-5 Antagonism, with mostly weak correlations and evidence for discriminant instead of convergent validity (r < .20) for Global Morality and Purity.
To inform convergent validity further, we also examined the MCQ’s correlations with a variety of more narrow morality-related traits (see Table 2). Out of the 189 correlations examined, 72 (38.1%) were indicative of convergent validity, which was mainly due to Compassion (19 out of 27), Fairness (16 out of 27), and Loyalty (14 out of 27) showing convergence with more than half of the validation measures. By contrast, Respect, Global Morality, Purity, and Honesty showed weak convergent validity for many measures, with only 12, 6, 3, and 2 correlations, respectively, reaching the r ≥ .30 cutoff. Most correlations (57 out of 72) were in the range between .30 and .40, with 11 correlations in the range between .40 and .50, and only four above .50. Overall, we found mixed evidence in favor of MCQ’s convergent validity. Global Morality was not similarly related to the convergent validity measures, with only six out of 27 correlations surpassing the .30 cutoff. Only Compassion and Fairness showed a relatively consistent pattern of acceptable convergent validity with both broad and narrow morality-related traits. Indeed, Compassion was the domain most strongly related to other measures of compassion, with correlations ranging from .39 (PPI Altruism) to .60 (BFAS-Compassion), that were all significantly higher than the correlations for the domain with the second highest correlation in each case (2.03 ≤ z ≤ 4.91, ps < .05). Fairness correlated the strongest (r = .45) with the VIA Fairness scale, but this correlation was not significantly higher than that of Compassion (r = .41, z = 1.00, p = .16).
Next, we tested the discriminant validity of the MCQ scales based on their correlations with broad personality traits that are, conceptually and empirically (Zettler et al., 2020), largely unrelated to morality (i.e., Emotionality, Extraversion, Conscientiousness, and Openness to Experience). As summarized in Table 3, the majority of these correlations (66.1%) were indicative of good discriminant validity (r < .20). Honesty and Loyalty had the highest discriminant validity, with correlations only exceeding the weak convergence cutoff for Conscientiousness (both r = .22). A sizable proportion of correlations (32.1%) was indicative of weak convergence and, thus, relatively poor discriminant validity. This was particularly the case for Global Morality and Compassion, which showed correlations > .20 with 5 and 4 out of 8 discriminant validity measures, respectively. Nonetheless, our results mostly supported the MCQ scales’ discriminant validity.
. | N . | MCQ-Global Morality . | MCQ-Honesty . | MCQ-Compassion . | MCQ-Fairness . | MCQ-Loyalty . | MCQ-Purity . | MCQ-Respect . |
---|---|---|---|---|---|---|---|---|
Emotionality | 616 | .21*** | .10* | .27*** | .11** | .10* | .14*** | .16*** |
Extraversion | 616 | .22*** | .16*** | .30*** | .21*** | .13*** | .25*** | .17*** |
Conscientiousness | 616 | .21*** | .22*** | .08* | .22*** | .25*** | .24*** | .22*** |
Openness to Experiences | 616 | .16*** | .04 | .23*** | .16*** | .18*** | −.03 | .03 |
PID-Negative affectivity | 569 | −.04 | −.13** | −.16*** | −.16*** | −.15*** | −.11** | −.10* |
PID-Detachment | 569 | −.14** | −.17*** | −.26*** | −.20*** | −.14** | −.20*** | −.21*** |
PID-Psychoticism | 569 | −.06 | −.07 | −.05 | −.08 | −.11** | −.17*** | −.13** |
PID-Disinhibition | 569 | −.04 | −.11** | −.08 | −.18*** | −.16*** | −.11** | −.15*** |
. | N . | MCQ-Global Morality . | MCQ-Honesty . | MCQ-Compassion . | MCQ-Fairness . | MCQ-Loyalty . | MCQ-Purity . | MCQ-Respect . |
---|---|---|---|---|---|---|---|---|
Emotionality | 616 | .21*** | .10* | .27*** | .11** | .10* | .14*** | .16*** |
Extraversion | 616 | .22*** | .16*** | .30*** | .21*** | .13*** | .25*** | .17*** |
Conscientiousness | 616 | .21*** | .22*** | .08* | .22*** | .25*** | .24*** | .22*** |
Openness to Experiences | 616 | .16*** | .04 | .23*** | .16*** | .18*** | −.03 | .03 |
PID-Negative affectivity | 569 | −.04 | −.13** | −.16*** | −.16*** | −.15*** | −.11** | −.10* |
PID-Detachment | 569 | −.14** | −.17*** | −.26*** | −.20*** | −.14** | −.20*** | −.21*** |
PID-Psychoticism | 569 | −.06 | −.07 | −.05 | −.08 | −.11** | −.17*** | −.13** |
PID-Disinhibition | 569 | −.04 | −.11** | −.08 | −.18*** | −.16*** | −.11** | −.15*** |
Note. PID = Personality Inventory for DSM-5.
Predictive Validity
Finally, we assessed the MCQ’s predictive validity by inspecting the correlations of the MCQ scales with different indicators of immoral and prosocial behavior as measured in behavioral paradigms and self-report measures (see Table 4). First, there were no correlations above the prespecified |r| = .15 criterion between any of the MCQ scales and the behavioral measure of dishonesty (Mind Game and lottery task) and prosociality as measured by the SVO Slider. Strikingly, this was also true for Honesty and Fairness, which showed only weak, non-significant correlations with their corresponding behavioral measures (i.e., r = −.08 and r = .08, respectively). The only meaningful correlation emerged between Compassion and prosociality as measured by the Triple Dominance Measure of SVO, which amounted to r = .15. The one-tailed z-tests indicated that this correlation was significantly higher than the correlation of Global Morality (z = 1.99, p = .024), Honesty (z = 1.74, p = .041), and Purity (z = 3.22, p < .001), but not higher than the correlation of Fairness (z = 0.25, p = .401), Loyalty (z = 1, p = .160), and Respect (z = 0.75, p = .227). We even found a positive correlation (r = .12, p = .06) between Global Morality and dishonesty as measured by the lottery task.
. | N . | MCQ-Global Morality . | MCQ-Honesty . | MCQ-Compassion . | MCQ-Fairness . | MCQ-Loyalty . | MCQ-Purity . | MCQ-Respect . |
---|---|---|---|---|---|---|---|---|
Dishonesty (Mind Game) | 531 | −.07 | −.08 | −.09 | −.09 | −.06 | −.06 | −.08 |
Dishonesty (Lottery task) | 289 | .12 | −.03 | −.10 | −.01 | .08 | −.03 | −.00 |
Prosociality (SVO Slider) | 531 | .02 | .03 | .14** | .08 | .07 | −.05 | .05 |
Prosociality (SVO TDM) | 616 | .07 | .08* | .15*** | .14*** | .11** | .02 | .12** |
Civic behavior | 480 | −.10* | −.13** | −.01 | −.01 | −.11* | −.20*** | −.12** |
CWB | 489 | −.11* | −.24*** | −.10* | −.19*** | −.16*** | −.26*** | −.13** |
CAB | 460 | .11* | .12* | .06 | .02 | −.04 | .22*** | −.03 |
Prosocial intentions | 531 | .27*** | .20*** | .43*** | .33*** | .34*** | .22*** | .26*** |
Vandalism | 509 | −.08 | −.09* | −.09* | −.19*** | −.18*** | −.17*** | −.14** |
Violence | 458 | −.12** | −.09* | −.05 | −.10* | −.05 | −.24*** | −.16*** |
. | N . | MCQ-Global Morality . | MCQ-Honesty . | MCQ-Compassion . | MCQ-Fairness . | MCQ-Loyalty . | MCQ-Purity . | MCQ-Respect . |
---|---|---|---|---|---|---|---|---|
Dishonesty (Mind Game) | 531 | −.07 | −.08 | −.09 | −.09 | −.06 | −.06 | −.08 |
Dishonesty (Lottery task) | 289 | .12 | −.03 | −.10 | −.01 | .08 | −.03 | −.00 |
Prosociality (SVO Slider) | 531 | .02 | .03 | .14** | .08 | .07 | −.05 | .05 |
Prosociality (SVO TDM) | 616 | .07 | .08* | .15*** | .14*** | .11** | .02 | .12** |
Civic behavior | 480 | −.10* | −.13** | −.01 | −.01 | −.11* | −.20*** | −.12** |
CWB | 489 | −.11* | −.24*** | −.10* | −.19*** | −.16*** | −.26*** | −.13** |
CAB | 460 | .11* | .12* | .06 | .02 | −.04 | .22*** | −.03 |
Prosocial intentions | 531 | .27*** | .20*** | .43*** | .33*** | .34*** | .22*** | .26*** |
Vandalism | 509 | −.08 | −.09* | −.09* | −.19*** | −.18*** | −.17*** | −.14** |
Violence | 458 | −.12** | −.09* | −.05 | −.10* | −.05 | −.24*** | −.16*** |
Note. SVO = Social Value Orientation; TDM = Triple Dominance Measure; CWB = Counterproductive work behavior; CAB = Crime and analogous behavior.
As for the self-reported indicators of immoral behavior (i.e., counterproductive work behavior, violence, crime and analogous behavior, civic behavior, and vandalism), only 18 of the 42 correlations exceeded our validity criterion of r = .15. Fairness and Loyalty were also both correlated with counterproductive work behavior (r = −.19 and r = −.16, respectively) and vandalism (r = −.19 and r = −.18, respectively). Honesty was related to counterproductive work behavior (r = −.24), whereas Respect was related to violence (r = −.16). Prosocial behavior intentions was the only self-report behavior criterion consistently and meaningfully related to all MCQ scales (.20 ≤ r ≤ .43). Only Purity was consistently related to self-reported immoral behavior. Overall, we found little support for the predictive validity of the MCQ scales.
Discussion
The Moral Character Questionnaire (MCQ; Furr et al., 2022) is a newly proposed measure of global moral character and specific moral character traits (i.e., honesty, compassion, fairness, loyalty, purity, and respect) that may not only account for individual differences in various moral tendencies and behaviors, but also socially aversive psychopathology. Previous validation studies of the MCQ are, however, limited in several respects, especially because they only include a small set of exclusively self-report criteria. With the current investigation, we aimed to overcome these limitations by examining the internal structure and validity of the MCQ in two large and diverse samples, using a variety of broad and narrow morality-related traits and objective as well as subjective measures of (im)moral behavior as validity criteria. Overall, the MCQ scales showed acceptable reliability, factor structure, and discriminant validity, but there were notable weaknesses regarding the scales’ convergent and predictive validity. We discuss each of these aspects in detail in what follows.
Reliability
Our results from reliability analyses of the MCQ were largely consistent with Furr et al.’s (2022) results, showing good internal consistency for the global morality scale, and lower values for the purity and respect subscales. As Furr et al. (2022) suggested, possible reasons for this may be the brevity of the scales (four items per domain) and the goal to capture different psychological modalities (cognition, behavior, motivation, and identity). This latter finding may also be attributable to specific items showing poor (below .30) loadings in some of the factorial models. For the purity scale, these were items 25 (“I will admit that some things I do are indecent”) and 26 (“I want to think and act without vulgarity or filth”) in particular. Whereas the former stands out due to its wordier formulation (i.e., “I will admit that some things I do…”), the latter is the only item referring to vulgarity and filth. For respect, in turn, items 28 (“It is not important to show respect to tradition and authority”) and 30 (“I do not want to be rude or irreverent toward others”) showed the poorest loadings across models. As discussed further below, item 28 is the only one referring to respect for authority; item 30 is the only one that does not include the term “respect” (or “respectful”, respectively). These items also showed the poorest loadings on their corresponding factors in Furr et al.’s (2022) analyses.
Internal Structure
Concerning the structure of the MCQ, the present results showed acceptable to good fit for global morality and some of the domain-specific structural models, supporting the unidimensionality of three of the six MCQ scales (i.e., compassion, loyalty, and respect) at least in one of our two samples. However, the RMSEA values were consistently higher than desirable for global morality and the domain-specific models of honesty, fairness, and purity. This was also the case in the investigation by Furr et al. (2022). Together with the small SRMR values and relatively low factor loadings observed, this suggests misspecified measurement models (Moshagen & Auerswald, 2018). In line with this, there were some items that consistently showed poor loadings on their respective scale across models and samples. This was particularly apparent for two negatively keyed items measuring the cognitive dimensions of compassion (item 14; “It’s not important to me to be compassionate”) and respect (item 28; “It’s not important to show respect to tradition and authority”). In Furr et al.’s (2022) data as well, these two items showed the lowest loadings on their respective scale. On the one hand, this finding may be due to the reverse keying of these items. Indeed, for the remaining MCQ scales as well, the reverse-keyed items produced (very) low loadings throughout. On the other hand, it may be attributable to the specific item content of these two items. Unlike the other items measuring the cognitive component of a domain, item 14 from the compassion scale specifically refers to individuals’ identity (“It’s not important to me to be compassionate.”; emphasis added). Item 28 from the respect scale, in turn, specifically refers to respect in authority and tradition – unlike the remaining three respect items, which tap into general respectfulness and respect for others. In summary, our results supported the unidimensionality of the MCQ’s compassion, loyalty, and respect domains, while questioning it for the global morality, honesty, fairness, and purity domains.
The higher-order model conceptualizing the specific morality domains as indicating one higher-order factor of global morality described the data reasonably well, thereby largely replicating the findings of Furr et al. (2022). Of note, however, the CFI and TLI values were not always satisfactory. The discrepancy between these indices and the RSMEA and SRMR may be due to several reasons (Lai & Green, 2016), including the specific estimation method used (Xia & Yang, 2019) and the low correlations between the observed variables. Specifically, these comparative fit indices compare the model at hand with the null model (i.e., the baseline model assuming no covariance between variables). Thus, whenever the average correlations between variables are relatively small (as in the present case, where the average item loading was around .30), the model looks more similar to the null model, resulting in lower comparative fit indices.
In both samples, the general factor explained over 50% of the variance in responses across items on average, whereas it subsumed fairness best (reaching 70% explained common variance in both samples). This means that for most of the MCQ domains, the common variance of the respective items was more due to the general factor than due to the domain-specific factors, as also evidenced by the low (and even negative) item loadings at the specific factor level. In other words, our data consistently show that the general factor explained the commonalities across MCQ items better than the domain-specific scales, thus questioning the specific scales’ added value. This was further supported by the very strong (.80 ≤ r ≤ .85) correlations between the global morality scale (i.e., the average of the six items measuring global morality) and the overall morality score based on the domain-specific items (i.e., the average of the 24 items measuring the six specific morality domains). By implication, the former may already be a good indicator of moral character, whereas the domain-specific scales may not add much.
Convergent and Discriminant Validity
Although the compassion and fairness scales showed convergent validity with the selected measures of morality-related traits, the correlations of both the global morality scale and the remaining four domain-specific scales (i.e., honesty, loyalty, purity, and respect) with these measures largely fell below our predefined, conservative cutoff for convergent validity of r = .30. By implication, these scales seem to measure the respective constructs differently than existing measures. We suspect that this may be due to the first-person approach used in MCQ item construction. For example, by asking participants to report whether they are honest without providing them with a definition of what honesty entails, one could end up with many different conceptualizations of honesty (in the extreme as many as there are participants), and this may result in the MCQ not converging well with established measures of honesty-related characteristics, such as Honesty-Humility. This may also explain the poor convergent validity of the global morality scale in particular, the items of which all incorporate the first-person approach (e.g., “I tend to act morally” or “I consistently want to do the moral thing.”; emphases added). For compassion, in contrast, for which we found satisfactory convergent validity, this may not be the case. In fact, the term “compassionate” seems more semantically narrow and unambiguous, as evidenced by the ease with which lay people can identify compassionate acts in their daily life (Gulliford et al., 2021; Hofmann et al., 2014).
Regarding discriminant validity, the pattern of correlations was largely indicative of validity. However, the lack of convergent validity for the honesty and loyalty domains in particular (see Table 2) undermines the value of these findings; arguably, these domains do not correlate with any other self-report measure (see Table 3), neither broad nor narrow, related or unrelated to morality.
Overall, our findings do not provide reliable evidence for the convergent and discriminant validity of the MCQ, except for its compassion scale: The instrument does not seem to measure morality-related characteristics in a comparable way as established scales do and, at the same time, falls short of reliably distinguishing moral characteristics from theoretically dissimilar constructs.
Predictive Validity
The MCQ scales were neither meaningfully correlated with behavioral measures of dishonesty (i.e., cheating) nor with behavioral measures of prosociality (i.e., Social Value Orientation; SVO). The only exception was apparent for compassion, which was weakly positively related to one of the two SVO measures used. Overall, the lack of predictive validity for behavioral measures limits the practical value of the MCQ, especially when considering that other self-report measures of morality-related characteristics – most prominently Honesty-Humility and D – show consistent, at least small-to-medium-sized correlations (r > .20) with both dishonest (Heck, Thielmann, et al., 2018; Hilbig, 2022) and prosocial behavior (Thielmann et al., 2020; Zettler et al., 2020), including SVO (Hilbig et al., 2014, 2023). For comparison, evidence using the same dataset shows that other traits (e.g., honesty-humility, D) do show meaningful relations with prosocial behavior (Hilbig et al., 2023) and dishonest behavior (Hilbig et al., 2024; Thielmann et al., 2025). This rules out that the weak correlations observed for the MCQ are due to this specific sample or the particular behavioral measures. Moreover, the fact that the MCQ fails to adequately account for individual differences in moral behavior is at odds with the conceptualization of moral character as having a behavioral dimension. Critically, there was also no evidence for the predictive validity of the domain-specific scales: Neither was there a meaningful correlation between the honesty scale and a measure of dishonest behavior, nor between the fairness scale and a measure of prosocial behavior. Thus, even when the domain-specific scales’ content closely matched the measured behavior, the scales failed to account for meaningful variance in behavior. This was also true for the self-reports of (im)moral behavior. Here, only purity was consistently related to all six measures of self-reported criminal, deviant, and prosocial behavior employed. This lack of predictive validity may partly be due to the questionable unidimensionality of these domain-specific scales compared to compassion. However, considering that purity also showed fit issues while predicting corresponding criteria reasonably well, this alone may not explain the lack of predictive validity of the honesty and fairness scales (see also Hopwood & Donnellan, 2010). Overall, our results fail to support the predictive validity of the MCQ.
Limitations
Despite providing valuable insights into the psychometric properties of the MCQ, the current results should be considered in light of the considerable time lag between the measurement of the MCQ and some of the validation criteria in Sample 2, in the extreme reaching three years for the measurement of basic personality traits (e.g., Honesty-Humility, Big Five Agreeableness, D). This time lag may explain, at least in part, why the correlations between the basic personality traits and the MCQ scales were descriptively smaller than in Sample 1 (for D) and corresponding correlations observed by Furr et al. (2022).
Indeed, convergent correlations were similar in size compared to the original validation study for the IPIP-NEO indicator of Big Five Agreeableness, which was measured at the same measurement occasion as the MCQ. Then again, contrary to the idea that the time lag may have decreased the convergence between measures in meaningful ways, the correlations between the MCQ scales and the two measures of SVO administered were essentially the same even though the SVO Triple Dominance Measure was administered at the same measurement occasion as the MCQ, whereas the SVO Slider was measured more than two years before the MCQ. Additionally, none of the two cheating tasks were meaningfully related with the MCQ dimensions even though one, the lottery task, was administered much closer in time to the MCQ (i.e., only six months after the MCQ) than the other, the Mind Game (i.e., nearly three years before the MCQ). For comparison, note that Honesty-Humility showed significant positive correlations with both SVO measures (r = .23, p < .001 for the SVO Slider, r = .24, p < .001 for the SVO Triple Dominance) and negative correlations with both cheating tasks (r = −.15, p < .001 with the Mind Game, r = −.09, p < .05 with the lottery task), irrespective of the time-lag (five months for the Mind Game and SVO Slider, nearly three years for the SVO Triple Dominance, three and a half years for the cheating task). Thus, the time lag alone cannot explain the lack of convergent and predictive validity that we observed, even though it may have reduced some effect sizes.
Conclusions
Across two samples (total N = 7,754), the MCQ showed mostly acceptable reliability and acceptable fit to the data for the higher-order and bifactor solutions. However, we found rather poor convergent, divergent, and predictive validity with a variety of self-report and behavioral measures of (im)moral tendencies and behaviors. In conclusion, our findings question the validity of the MCQ as a measure of moral character that is able to predict individual differences in moral behaviors, with the partial exception of the compassion domain. This criticism especially applies to the domain-specific scales, some of which exhibited low reliability and misfit in internal structure in addition to limited validity. Even when the validity criterion matched the content of the domain-specific scale closely, there was a lack of predictive validity. In this sense, the global morality scale may be a better candidate for further developments of the measure than the domain-specific scales. At the current state though, this scale also failed to account for individual differences in (im)moral behavior. Thus, as it stands, we cannot recommend using the MCQ as a measure to explain and predict moral behavior (Egloff, 2020).
Contributions
Contributed to conception and design: NC, IT
Contributed to acquisition of data: IT, BEH
Contributed to analysis and interpretation of data: NC, BEH, IT
Drafted and/or revised the article: NC, BEH, IT
Approved the submitted version for publication: NC, BEH, IT
Funding Information
This research was funded by grants from the German Research Foundation (DFG, HI 1600/1-2) to the second author and the European Union (ERC, KNOW-THYSELF, 101039433) to the third author. Views and opinions expressed are those of the authors only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.
Competing Interests
The authors have no competing interests to disclose.
Data Accessibility Statement
All the participant data and analysis scripts can be found on this paper’s project page on the OSF: https://osf.io/7mp45/
Supplemental Material
The supplemental material accompanying this manuscript can be found on the OSF: https://osf.io/7mp45/
Footnotes
Furthermore, a study using the experience sampling method (ESM) with an earlier version of the MCQ that also includes humility among the domain-specific scales (Prentice et al., 2020) showed that individuals who scored in the top 5% on global morality (the “morally exceptional”, n = 3) reported to engage more in moral behavior (e.g., actively contributing to the happiness and well-being of others) compared to both individuals scoring average (n = 40, d = .71) and low (n = 26, d = .85) on global morality.
Commonly considered dimensions of morality are honesty (often equated to morality by laypeople, Gulliford et al., 2021), modesty (Stahlmann & Ruch, 2020; Wright et al., 2017), fairness (Graham et al., 2011; Peterson & Seligman, 2004; Stahlmann & Ruch, 2020), and compassion (Gulliford et al., 2021; Hofmann et al., 2014; Stahlmann & Ruch, 2020), which best reflect the basic traits of Honesty-Humility and Agreeableness. Comparatively, extraversion, emotionality, conscientiousness, and openness show weaker, if any, conceptual and empirical links to morality (Gulliford et al., 2021; Lawn et al., 2022; Stahlmann & Ruch, 2020; Thielmann et al., 2020).
In order to test for measurement invariance across our German translation of the MCQ and the original English version, we conducted a measurement invariance analysis using Sample J (N = 9,365) from Furr et al. (2022). Detailed results can be found in the online supplementary materials (Table S6). In brief, in both of our samples, we found the higher-order model, the bifactor model, and the honesty domain to be strictly invariant across languages. For global morality, the compassion, and the respect domains, we found strict invariance in Sample 2 (but scalar or metric invariance in Sample 1). Weaker invariance (i.e., mixed results across indices and samples) emerged for the fairness, loyalty, and purity domains. Overall, we found evidence of strong or strict invariance for all scales except purity (Meredith, 1993). Based on the measurement invariance analyses we conclude that any differences in results between our study and the one by Furr et al. (2022) cannot be attributed to different language versions. We thank Mike Furr for making the data from their Sample J available to us.
Note that the higher-order model and the bifactor model do not include the global morality scale, but only the six domain-specific scales.