Numeracy is individuals’ capacity to understand and process basic probability and numerical information required to make decisions. We conducted a Replication Registered Report of Peters et al. (2006) examining numeracy as a predictor of positive-negative framing effect (Study 1), frequency-percentage effect (Study 2), ratio effect (Study 3), and bets effect (Study 4). With an online US American Amazon Mechanical Turk sample (N = 860), our replication using the target’s dichotomizing of the numeracy measure found support for the original findings regarding interactions between numeracy and three decision-making effects. Numeracy was associated with weaker framing effect (η2p = 0.01, 90% CI [0.00, 0.02]), weaker ratio bias (Cramer’s V = 0.17, 95% CI [0.10, 0.24]), and stronger bets effect (η2p = 0.02, 90% CI [0.01, 0.04]), yet we found no support for the frequency-percentage effect (η2p = 0.00, 90% CI [0.00, 0.01]). However, we found support for associations with all four studies when treating numeracy as a continuous variable. We extended the replication to examine confidence, yet the results were mixed with support found for only three conditions (Study 1 positive framing condition: r = -0.11, 95% CI [-0.20, -0.02]; Study 3: r = 0.15, 95% CI [0.08, 0.21]; Study 4 no-loss bet condition: r = 0.10, 95% CI [0.01, 0.20]), suggesting a much weaker and more complex relationship than anticipated. Materials, data, and code are available on: https://osf.io/4hjck/.
Numeracy
Decisions involving numbers, math, and statistics are common, and people rely heavily on their ability to accurately interpret, think about, and act on them. Numeracy is defined as the individuals’ capacity to understand and process basic probability and numerical information required to make decisions. Research by Peters (2012) demonstrated that numeracy is a predictor of behavior in judgment and decision-making tasks.
We embarked on a direct replication of Peters et al. (2006) with two primary goals. Our first goal was to conduct an independent replication of the associations between numeracy and four decision-making paradigms. Our second goal was to examine an extension regarding the role of numeric confidence (or, subjective numeracy).
We begin by introducing the literature on numeracy and various decision making biases examined in the chosen article for replication - Peters et al. (2006). We provide a brief overview of the decision-making paradigms, in relation to numeracy. We then discuss our chosen target article, summarize its hypotheses and findings, and introduce our extension on the relationship between confidence, numeracy, and decision-making.
Question | Hypothesis | Analysis plan | Rationale for deciding the sensitivity of the test for confirming or disconfirming the hypothesis | Interpretation given different outcomes | Theory that could be shown wrong by the outcomes | Observed outcome (Added in Stage 2) |
What is the relationship between numeracy and positive-negative framing effects | Higher numeracy is associated with weaker positive-negative framing effect | Mixed ANOVA Correlations | Our strategy for all replicated studies: 1. We keep the statistical method of the original paper as it treats numeracy as dichotomized. 2. We treat numeracy as a continuous variable, therefore adapt correlation. | Based on the criteria used by Lebel et al. (2019) We examine the replicability of the findings of Peters et al. (2006), and support for our suggested extensions. | Attribute framing effect | Numeracy was associated with weaker framing effect (η2p = 0.01, 90% CI [0.00, 0.02]) |
What is the relationship between numeracy and percentage and frequency effects | Higher numeracy is associated with weaker frequency-percentage effects | Factorial ANOVA Correlations | Frequency-percentage framing effect | We found mixed support for numeracy association with frequency-percentage effect. No support in replication using dichotomy (η2p = 0.00, 90% CI [0.00, 0.01]), but supported in an extension using continuous. | ||
What is the relationship between numeracy and ratio bias What is the relationship between numeracy and affect precision | Higher numeracy is associated with more optimal choices in competing affective decisions. Higher numeracy is associated with higher affective precision, in competing affective decisions. | Chi-square test Independent t-test Correlations | Deliberate-experiential thinking modes Ratio bias | Numeracy was associated with weaker ratio bias (Cramer’s V = 0.17, 95% CI [0.10, 0.24]) | ||
What is the relationship between numeracy and affective precision and affect in probabilities and numerical comparisons | Higher numeracy is associated with higher affective precision in probabilities and numerical comparisons. Higher numeracy is associated with greater affect in probabilities and numerical comparisons. | Factorial ANOVA Independent t-test Correlations | The highly numerate will focus more on details of numbers and draw more affective meanings. Bets effect | Numeracy was associated with stronger bets effect (η2p = 0.02, 90% CI [0.01, 0.04]) | ||
What is the relationship between objective numeracy and confidence under specific conditions? | The highly numerate is related to higher subjective confidence | Correlations | Associations with subjective confidence and objective numeracy | Mixed weak results. |
Question | Hypothesis | Analysis plan | Rationale for deciding the sensitivity of the test for confirming or disconfirming the hypothesis | Interpretation given different outcomes | Theory that could be shown wrong by the outcomes | Observed outcome (Added in Stage 2) |
What is the relationship between numeracy and positive-negative framing effects | Higher numeracy is associated with weaker positive-negative framing effect | Mixed ANOVA Correlations | Our strategy for all replicated studies: 1. We keep the statistical method of the original paper as it treats numeracy as dichotomized. 2. We treat numeracy as a continuous variable, therefore adapt correlation. | Based on the criteria used by Lebel et al. (2019) We examine the replicability of the findings of Peters et al. (2006), and support for our suggested extensions. | Attribute framing effect | Numeracy was associated with weaker framing effect (η2p = 0.01, 90% CI [0.00, 0.02]) |
What is the relationship between numeracy and percentage and frequency effects | Higher numeracy is associated with weaker frequency-percentage effects | Factorial ANOVA Correlations | Frequency-percentage framing effect | We found mixed support for numeracy association with frequency-percentage effect. No support in replication using dichotomy (η2p = 0.00, 90% CI [0.00, 0.01]), but supported in an extension using continuous. | ||
What is the relationship between numeracy and ratio bias What is the relationship between numeracy and affect precision | Higher numeracy is associated with more optimal choices in competing affective decisions. Higher numeracy is associated with higher affective precision, in competing affective decisions. | Chi-square test Independent t-test Correlations | Deliberate-experiential thinking modes Ratio bias | Numeracy was associated with weaker ratio bias (Cramer’s V = 0.17, 95% CI [0.10, 0.24]) | ||
What is the relationship between numeracy and affective precision and affect in probabilities and numerical comparisons | Higher numeracy is associated with higher affective precision in probabilities and numerical comparisons. Higher numeracy is associated with greater affect in probabilities and numerical comparisons. | Factorial ANOVA Independent t-test Correlations | The highly numerate will focus more on details of numbers and draw more affective meanings. Bets effect | Numeracy was associated with stronger bets effect (η2p = 0.02, 90% CI [0.01, 0.04]) | ||
What is the relationship between objective numeracy and confidence under specific conditions? | The highly numerate is related to higher subjective confidence | Correlations | Associations with subjective confidence and objective numeracy | Mixed weak results. |
Note. For the sampling plan please see power analysis in the methods section.
Attribute framing and numeracy
Framing effect is a well-established phenomenon in psychology and behavioral economics, in which decisions are influenced by the way information is presented, such as variations in valence - positive versus negative framing (Tversky & Kahneman, 1985).
Attribute framing is a type of framing effect and relates to the labeling of a particular attribute of an object or an event. For instance, ground beef with 75%-25% meat-fat ratio could be presented as “75% lean” or “25% fat”. Levin and Gaelth (1988) found that people evaluated beef under the “% lean” framing more positively than in the “% fat” framing. Such framing effects have received empirical support by many follow-up studies (e.g., Freling et al., 2014; Piñon & Gambara, 2005).
Attribute framing is related to how people understand and process numerical concepts, suggesting a possible link between numeracy and framing effects. Some studies found that less numerate people were more susceptible to framing effects (including attribute framing) (Choi et al., 2011; Gamliel et al., 2016; Gamliel & Kreiner, 2017). For instance, Gamliel and Kreiner (2017) demonstrated the relationship between numeracy and attribute framing bias: students with lower numeracy rated a university course higher if presented with success rates compared to failure rates. They suggested that decision makers with lower numeracy rely more heavily on “non-numerical information”, whereas those with high numeracy pay more attention to numerical information and attain greater accuracy with numbers. Therefore, lower numeracy may be associated with stronger polarization due to the positive or negative valence of framing presentations.
Peters (2012) suggested that highly numerate individuals have the capacity to go beyond the specific numerical information and understand underlying relational information. For example, when a positive outcome is presented as 75% success rate, the highly numerate are more likely to also infer the complementary proportion of the failure rate of 25%, with similar logic for when the failure rates are presented and success rates are inferred. Therefore, the argument in relation to numeracy was that framing bias is attenuated when one is capable of grasping and processing both the positive and the negative information in a decision.
Frequency-percentage effect and numeracy
Frequency-percentage effect (or “frequency effect”) is the phenomenon that decision making changes when the numbers are presented in forms of frequency (e.g., 10 out of 100) compared to percentage (e.g., 10%) (Gigerenzer, 1991; Hill & Brase, 2012).
Those higher on numeracy seem less likely affected by whether the number is represented in frequency or in percentage (Dickert et al., 2011; Hill & Brase, 2012; Peters et al., 2011). For instance, Peters et al. (2011) tested the relationship among patients. They informed patients of the side-effects of a medication in either frequency or percentage formats (i.e., 10 out 100 versus 10%) and then asked them to rate risk levels. They found that the less numerate were more likely to perceive the medication as less risky when presented in percentage format than in a frequency format. The possible mechanisms could be similar to those we previously discussed regarding the framing effect. Those higher on numeracy may be able to better understand the frequency and probability information as the same mathematical quantity (Hill & Brase, 2012; Peters, 2012).
Ratio bias and numeracy
Ratio bias (or numerosity effect) is the phenomenon that people tend to focus on absolute numbers rather than on probabilities (Peters et al., 2008; Reyna et al., 2009). For example, people are more likely to choose to select from a sample with a relatively large numerator/large denominator (e.g., 9 in 100) rather than the preferred odds yet relatively smaller numerator/small denominator (e.g., 1 in 10).
Reyna and Brainerd (2008) separated ratio bias into a heuristic ratio bias (i.e., identical probabilities in the two samples) and a non-optimal ratio bias (i.e., higher probabilities but smaller absolute numerator or lower probabilities but greater absolute numerator). One classic heuristic ratio bias example was from the study of Miller et al. (1989). Children randomly choose a cookie from one of two cookie jars, one containing 1 chocolate chip and 19 oatmeal cookies and the other containing 10 chocolate chips and 190 oatmeal cookies. The probabilities of having a chocolate chip are the same, yet Miller et al. (1989) found that children preferred to choose from the later one, with the larger numbers.
Peters et al. (2006) demonstrated that lower numeracy was associated with less optimal choices in ratio related decisions.
Affect, bets effect, and numeracy
Two modes of processing information appear to be affective-experiential and deliberative and are also known as the dual process model (Kahneman, 2003; Sloman, 1996). The model suggests that affective-experiential mode produces thoughts and feelings in a relatively effortless and spontaneous manner, whereas deliberative mode requires conscious reason-based and analytical thinking. Affect may provide information about the goodness and badness of an option and might as a consequence influence further choice processes.
Numeracy has been argued as moderating the association between affect and decision-making (Rottenstreich & Hsee, 2001; Traczyk & Fulawka, 2016), with the potential of aiding decision making, yet may sometimes lead to number overuse and worsen decisions (Pachur & Galesic, 2013; Peters & Bjalkebring, 2015). Those with higher numeracy seem to draw more precise affective information, then form relevant risk perception, and use that information in making related decisions.
In a demonstration of the possible advantages, Petrova et al. (2014) conducted a study about decision making regarding camera insurance. They found that participants with higher numeracy reported greater negative emotions to 90% chance of losing camera compared to 50% chance. In addition, they were willing to pay more on insurance against the loss when the loss probability was higher. By contrast, participants with lower numeracy seemed less sensitive to the two probabilities levels.
However, there are possible nuances and unintended side-effects to drawing precise numerical information, depending on the defined desired outcome. For example, Kleber et al. (2013) conducted research on donations and they found that numeracy was associated with donation behavior, with the more numerate focusing on projects with the greatest proportion of recipients, whereas those lower on numeracy tended to donate more with increases in both the number of recipients and the total number of people in need.
Choice of study for replication: Peters et al. (2006)
We chose the article by Peters et al. (2006) as the target for replication based on the following factors: its impact and potential for improvement on methodological limitations in the original studies.
The article has had much impact on scholarly research in the area of social psychology and judgment and decision making. At the time of writing (July, 2022), there were over 1400 Google Scholar citations of the article. In addition, Peters et al. (2006)’s work had important practical implications especially in the domains of medical decision making (Okamoto et al., 2012; Reyna et al., 2009) and financial decision making (Estrada-Mejía et al., 2016; Traczyk et al., 2018).
We reached out to the authors to request assistance with the original materials, and to try and assess any published and ongoing replication work. They indicated most of the original materials have been lost to time, yet kindly referred us to some of the extensive follow-up literature with conceptual replications and related materials, from which we were able to reconstruct most of the studies. We have also learned of other attempts at a replication of the broader numeracy literature in other languages (Polish) and have been in touch with their authors to coordinate efforts. To our knowledge, there are no published direct close replications of the target article’s studies.
Examining the studies, we believe a direct replication is especially relevant given the low power and some of the statistical method choices. Their Studies 1-4 had 100, 46, 46, and 171 participants, respectively, which may seem low, especially given the interaction and supplementary analyses. Furthermore, the methods employed dichotomizing of the continuous numeracy scale, which we thought could be improved by analyzing as the intended continuous measure, and may allow for more accurate insights and conclusions.
We therefore aimed to revisit the classic phenomenon to examine the reproducibility and replicability of the findings with independent replications. We followed recent growing recognition of the importance of reproducibility and replicability in psychological science (e.g., Brandt et al., 2014; Open Science Collaboration, 2015; van’t Veer & Giner-Sorolla, 2016; Zwaan et al., 2018) and embarked on a well-powered pre-registered very close replication of Peters et al. (2006).
Hypotheses and findings in target article
Peters et al. (2006) conducted four studies and we aimed to replicate all of them with needed adjustments and collected in a single data collection, with the experiments displayed in a random order (more on that in the methods section). Below we review the findings in each of the target’s studies. We summarized the target’s hypotheses and our extension’s hypothesis in Table 1.
Study | Hypothesis | Description of hypothesis |
1 | 1(original) | The less numerate show a stronger framing effect than the highly numerate. |
1 (extension) | Higher numeracy is associated with weaker positive-negative framing effects. | |
2 | 1 (original) | The less numerate are affected more by the frequency-percentage effect than the highly numerate. |
1 (extension) | Higher numeracy is associated with weaker frequency-percentage effects. | |
3 | 1 (original) | The less numerate make more sub-optimal choices in competing affective decisions than the highly numerate. |
1 (extension) | Higher numeracy is associated with more optimal choices in competing affective decisions.. | |
2 (original) | The less numerate have the more positive affect about the affectively appealing bowl with less favorable objective probabilities in competing affective decisions than the highly numerate. | |
2 (extension) | Higher numeracy is associated with more negative affect about the affectively appealing bowl with less favorable objective probabilities. | |
3 (original) | The less numerate have lower affective precision about the affectively appealing bowl with less favorable objective probabilities than the highly numerate. | |
3 (extension) | Higher numeracy is associated with higher affective precision about the affectively appealing bowl with less favorable objective probabilities. | |
4 | 1 (original) | The less numerate show smaller difference of rating of bets than the highly numerate. |
1 (extension) | Higher numeracy is associated with larger differences in the rating of bets. | |
2 (original) | The less numerate draw less affective meaning in probabilities and numerical comparisons than the highly numerate. | |
2 (extension) | Higher numeracy is associated with drawing more affective meaning in probabilities and numerical comparisons. | |
Extension: Confidence | ||
1, 2, 3, 4 | 1 | Numeracy is positively associated with confidence. |
Study | Hypothesis | Description of hypothesis |
1 | 1(original) | The less numerate show a stronger framing effect than the highly numerate. |
1 (extension) | Higher numeracy is associated with weaker positive-negative framing effects. | |
2 | 1 (original) | The less numerate are affected more by the frequency-percentage effect than the highly numerate. |
1 (extension) | Higher numeracy is associated with weaker frequency-percentage effects. | |
3 | 1 (original) | The less numerate make more sub-optimal choices in competing affective decisions than the highly numerate. |
1 (extension) | Higher numeracy is associated with more optimal choices in competing affective decisions.. | |
2 (original) | The less numerate have the more positive affect about the affectively appealing bowl with less favorable objective probabilities in competing affective decisions than the highly numerate. | |
2 (extension) | Higher numeracy is associated with more negative affect about the affectively appealing bowl with less favorable objective probabilities. | |
3 (original) | The less numerate have lower affective precision about the affectively appealing bowl with less favorable objective probabilities than the highly numerate. | |
3 (extension) | Higher numeracy is associated with higher affective precision about the affectively appealing bowl with less favorable objective probabilities. | |
4 | 1 (original) | The less numerate show smaller difference of rating of bets than the highly numerate. |
1 (extension) | Higher numeracy is associated with larger differences in the rating of bets. | |
2 (original) | The less numerate draw less affective meaning in probabilities and numerical comparisons than the highly numerate. | |
2 (extension) | Higher numeracy is associated with drawing more affective meaning in probabilities and numerical comparisons. | |
Extension: Confidence | ||
1, 2, 3, 4 | 1 | Numeracy is positively associated with confidence. |
Note. For each of the hypotheses we reframed the hypotheses deduced from the conclusions in the original article from a dichotomy (high numerate versus low numerate, labeled as “original”) to a continuous association (higher numeracy is associated with…, labeled as “extension”).
Study 1: Numeracy and Positive-negative Framing
Study 1 sought to examine the relationship between numeracy and attribute framing. They hypothesized that participants with low numeracy are more likely to be affected by attribute framing.
To test this, they recruited participants through campus newspapers. Participants first answered the numeracy scale developed by Lipkus et al. (2001). Then, they rated the quality of five psychology students’ work. Participants were randomly assigned to positive or negative framing conditions. For instance, Emily received either 74% correct or 26% incorrect on her exam.
Peters et al. (2006) dichotomized numeracy to high numerate (9-11 items correct) and low numerate (2-8 items correct) with a median split. To test the hypothesis, they used a mixed ANOVA. They reported that higher numerate participants were less susceptible for framing bias (f = 0.25, 90% CI [0.00, 0.42]).
Study 2: Numeracy and Frequency-percentage Effect
Study 2 aimed to examine the relationship between numeracy and percentage-frequency framing effect. They hypothesized that participants with low numeracy are more likely to be affected by frequency-percentage effect. To test this, they recruited university students from a psychology course. Participants read the mental-patient scenario in either a frequency or percentage format and rate that the risk level of that patient who would harm someone. They ran a factorial ANOVA and found that low numerate rated lower in the percentage condition than frequency condition whereas the high numerate rated both conditions similarly (f = 0.31, 90% CI [0.00, 0.58]).
Study 3: Numeracy, Affect, and Ratio Bias
Study 3 intended to explore the association between numeracy and ratio bias as well as numeracy and the influence of affective information. They hypothesized that numeracy is associated with more optimal choices, evoking less affect and higher affective precision.
To test this, they recruited university students from a psychology course. Participants from Studies 2 and 3 were the same group. Participants read about a choice between two bowls, Bowl A-9-100 with affectively appealing description but less objectively favorable outcome (9 jellybeans of a bowl of 100) and Bowl B-1-10 with less appealing description but better results (1 jellybean of a bowl of 10). Participants rated their preference for a bowl and selected one. After indicating the preference and choice, they rated affect towards the Bowl A-9-100 option.
The authors used a chi-square test to examine participants’ choices of two bowls and found that the less numerate were more likely to choose Bowl A-9-100 (φ = 0.77) and that the highly numerate showed higher preference for Bowl B-1-10 (d = -0.74, 95% CI [-1.33, -0.13]). In addition, the high numerate reported higher affect precision towards Bowl A-9-100 (d = 0.78, 95% CI [0.17, 1.36]). The study reported no support for differences in feelings (d = 0.46).
Study 4: Numeracy, Affect, and Bets Effect
Study 4 examined the relationship between numeracy and affect in probabilities and numerical comparisons. They hypothesized that numeracy is associated with affect arousal and affective precision.
To test this, they recruited volunteers from a subject’s pool of psychology department. Participants read the scenario about a bet with 7/36 chance to win $9 and 29/36 chance to win nothing or a bet with 7/36 chance to win $9 but 29/36 chance to lose 5 cents. The possible bets were visualized with a roulette wheel. Participants evaluated the attractiveness of the bet and their affect precision and affect, using the same scales as in Study 3.
They employed a factorial ANOVA and an independent samples t-test. They found that those high on numeracy rated the loss bet as more attractive in loss bet condition, whereas participants with low numeracy rated two conditions the same on average (f = 0.23, 90% [0.10, 0.35]). With respect to affect precision, participants with high numeracy had clearer feelings about the bets than those with low numeracy. The high numerate also reported more positive affect in the loss condition than in the no-loss condition, whereas there were weaker differences for the low numerate (f = 0.20, 90% [0.10,0.50]). Peters (2020) summarized such findings as “bets effect” in her book and we therefore also use this term.
We summarized the findings in the target article in Table 2.
S | Factors | E | Effect | % CIs | CIL | CIH |
1 | Numeracy and framing effect | f | 0.25 | 90% | 0.00 | 0.42 |
2 | Numeracy and frequency-percentage effect | f | 0.31 | 90% | 0.00 | 0.58 |
3 | Numeracy and bowl choice | φ | 0.77 | 95% | / | / |
Numeracy and preference for bowls | d | -0.74 | 95% | -1.33 | -0.13 | |
Numeracy and affect precision | d | 0.78 | 95% | 0.17 | 1.36 | |
Numeracy and affect | d | 0.46 | 95% | / | / | |
4 | Numeracy and attractiveness of bet | f | 0.23 | 90% | 0.10 | 0.35 |
Numeracy and affect precision | f | / | / | / | / | |
Numeracy and affect | f | 0.20 | 90% | 0.10 | 0.50 |
S | Factors | E | Effect | % CIs | CIL | CIH |
1 | Numeracy and framing effect | f | 0.25 | 90% | 0.00 | 0.42 |
2 | Numeracy and frequency-percentage effect | f | 0.31 | 90% | 0.00 | 0.58 |
3 | Numeracy and bowl choice | φ | 0.77 | 95% | / | / |
Numeracy and preference for bowls | d | -0.74 | 95% | -1.33 | -0.13 | |
Numeracy and affect precision | d | 0.78 | 95% | 0.17 | 1.36 | |
Numeracy and affect | d | 0.46 | 95% | / | / | |
4 | Numeracy and attractiveness of bet | f | 0.23 | 90% | 0.10 | 0.35 |
Numeracy and affect precision | f | / | / | / | / | |
Numeracy and affect | f | 0.20 | 90% | 0.10 | 0.50 |
Note. CIL = lower bounds for CIs. CIH = higher bounds of CIs. We report 90% CIs for ANOVA eta-squared given the effect size is always positive.
Extension: Numeracy as a Continuous Measure
We added analyses to treat numeracy as a continuous variable instead of the dichotomy used in the target article. Methodologists have increasingly expressed concerns regarding the dichotomization of continuous variables as it might result in suboptimal interpretations (Altman & Royston, 2006; Fedorov et al., 2009; Lazic, 2018; Mariooryad & Busso, 2015). One of the primary limitations is the loss of information, and treating samples within the same group as having the same underlying properties.
Peters et al. (2006) conducted a median split of numeracy scores: Participants who achieved a score of 9 or more were categorized as highly numerate whereas those who achieved 8 or lower were categorized as less numerate. However, the differences between individuals who achieved 8 and 9 might be neglectable, and no different than the differences between individuals who achieved 9 compared to 10 or 7 compared to 8. In addition, dichotomization reduces the power of statistical tests and effect sizes (Bakhshi et al., 2012). Fedorov et al. (2009) argued that 100 continuous observations are statistically equivalent to 158 dichotomized observations. Thus, the aim of treating numeracy as a continuous variable is to obtain more accurate effects, maximize power, and address potential misinterpretations resulting from dichotomization.
Extension: Confidence
We aimed to extend the replication by examining decision-making confidence. Confidence regarding a decision involving statistics may be considered as a measure of subjective numeracy or numeric self-efficacy, concerning how confident people are in their ability to understand numeric information and use mathematical concepts (Peters, 2020, p. 5). We discuss two rationales for this extension.
First, there are mixed findings regarding the association between subjective and objective numeracy. A body of research illustrates that subjective numeracy is positively associated with objective numeracy (Garcia-Retamero et al., 2015; Nelson et al., 2013; Peters, Fennema, et al., 2019). According to the Health Information National Trends Survey conducted by Nelson et al. (2013), participants who regarded themselves as high in subjective numeracy had higher correction rates of objective numeracy questions. Another recent study done by Rolison et al. (2020) illustrated that individuals with higher objective numeracy were more likely to have correct answers in health risk comprehension questions. However, some research found no support for such an association and people with low objective numeracy sometimes deem themselves as highly numerate (Gamliel et al., 2016; Liberali et al., 2012; Peters, Tompkins, et al., 2019). For instance, Peters et al. (2019) reported that the objective numeracy sometimes mismatches subjective confidence: 31% participants with high numeracy but low confidence and 44% participants with low numeracy but high confidence.
Second, most current studies measure trait subjective numeracy with self-report questionnaires and two frequently-used scales are Subjective Numeracy Scale developed by Fagerlin et al. (2007) and STAT-Confidence Scale developed by Woloshin et al. (2005). Self-report questionnaires target participants’ traits or general impressions about their numeracy competence and preference for numbers. It may vary from specific numeric confidence regarding specific decision making paradigms. Very few studies directly ask participants to rate their confidence about their decisions and answers in response to specific scenarios. Therefore, this study intends to examine the relationship between objective numeracy and subjective confidence in four studies of Peters et al. (2006). We hypothesized that objective numeracy is positively associated with confidence in each study.
Pre-Registration and Open-science
We pre-registered the experiment on the Open Science Framework (OSF) and data collection was launched later that week. Pre-registrations, power analyses, and all materials used in these experiments are available in the supplementary materials. We provided all materials, data, code, and pre-registration on: https://osf.io/4hjck/. The IPA registration link is https://osf.io/r73fb.
We provided additional open-science details and disclosures in the supplementary materials under “Open Science disclosures” sub-section. All measures, manipulations, exclusions conducted for this investigation are reported, all studies were pre-registered with power analyses, and data collection was completed before analyses.
Methods
Power Analysis
We calculated effect sizes (ES) and power based on the statistics reported in the target article (see supplementary materials). We then conducted a power analysis using G*Power (Faul et al., 2007) for the statistical tests in each of the decision-making risk paradigms separately (i.e. framing effect, frequency-percentage effect, ratio bias and bets effect).
Power analyses were conducted on the results of the main findings in the original study that yielded significant effect and supported the hypotheses for Studies 1 to 4. The largest required sample size in all effects was a result from two-way between-subjective ANOVA, which when aiming for a power of 0.95 and alpha of 0.05 one-tail was N = 314. We provide further information regarding our calculations in the “Power analysis of original study effect to assess required sample for replication” section in the supplementary materials.
Given the possibility that the original effects are overestimated, we used the suggested Simonsohn (2015) rule of thumb, even if meant for other designs, and multiplied 314 by 2.5 resulting in 785 participants. Allowing for possible exclusions we summarized a total sample of 850 participants. Our sensitivity analysis indicated that a sample of 850 would allow the detection of f = 0.12 (one covariate, groups = 2, df = 1, 95% power, alpha = 5%, one-tail), an effect much weaker than any of the effects reported in the original, and the detection of r = 0.12 in our continuous measures extension, an effect considered weak in social psychology (Lovakov & Agadullina, 2021).
Participants
We recruited 919 participants, but 59 of them failed the verifications or consent at the beginning. They were not considered as participants and were filtered out. Therefore, we had 860 participants (Mage = 43.19, SD = 12.73; 415 (48.3%) females) from Amazon Mechanical Turk using the CloudResearch/Turkprime platform (Litman et al., 2017). We summarized sample demographics and details in Table 3.
Peters et al. (2006) Study 1 | Peters et al. (2006) Study 2 and 3 | Peters et al. (2006) Study 4 | US MTurk workers | ||
Sample size | 100 | 46 | 171 | 860 | |
Geographic origin | US American | US American | |||
Gender | 55 males, 45 females | Not reported | 79 males, 92 females | 441 males, 415 females, 4 other/did not disclose | |
Median age (years) | Not reported | Not reported | Not reported | 40 | |
Average age (years) | 26 | Not reported | 19 | 43.19 | |
Standard deviation age (years) | Not reported | Not reported | Not reported | 12.73 | |
Age range (years) | Not reported | Not reported | Not reported | 19-81 | |
Medium (location) | Pencil and paper | Not reported | Not reported | Computer (online) | |
Compensation | $10 | Not reported | Not reported | Nominal payment | |
Year | 2005 | 2005 | 2005 | 2022 |
Peters et al. (2006) Study 1 | Peters et al. (2006) Study 2 and 3 | Peters et al. (2006) Study 4 | US MTurk workers | ||
Sample size | 100 | 46 | 171 | 860 | |
Geographic origin | US American | US American | |||
Gender | 55 males, 45 females | Not reported | 79 males, 92 females | 441 males, 415 females, 4 other/did not disclose | |
Median age (years) | Not reported | Not reported | Not reported | 40 | |
Average age (years) | 26 | Not reported | 19 | 43.19 | |
Standard deviation age (years) | Not reported | Not reported | Not reported | 12.73 | |
Age range (years) | Not reported | Not reported | Not reported | 19-81 | |
Medium (location) | Pencil and paper | Not reported | Not reported | Computer (online) | |
Compensation | $10 | Not reported | Not reported | Nominal payment | |
Year | 2005 | 2005 | 2005 | 2022 |
Based on our extensive experience of running similar judgment and decision making replications on MTurk, to ensure high quality data collection, we employed the following CloudResearch options: Duplicate IP Block. Duplicate Geocode Block, Suspicious Geocode Block, Verify Worker Country Location, Enhanced Privacy, CloudResearch Approved Participants and Block Low Quality Participants. We also employed the Qualtrics fraud and spam prevention measures: reCAPTCHA, prevent multiple submission, prevent ballotstuffing, bot detection, security scan monitor and relevantID. We also reported the details of payment and duration of study in the “additional information in the study” section in the supplementary.
Design: Replication and Extension
We summarized the experimental design in Tables 4, 5, 6, and 7. To conduct a replication of the four studies in the original article, we ran the four studies together in a single data collection. The display of scenarios and conditions were counterbalanced using the randomizer “evenly present” function in Qualtrics. Scenarios were presented in random order and participants were randomly and evenly assigned into different conditions. This method was previously tested successfully in many of the replications and extensions conducted by our team (e.g., Adelina & Feldman, 2021; Vonasch et al., 2022; Yeung & Feldman, 2022). The methodology is especially powerful in addressing potential concerns about the target sample (e.g., naivety and attentiveness), such as when some studies in the target article replicate successfully whereas others from the same article do not, which suggests that it is likely the failed study that is the cause for the failure rather than the participants’ characteristics. This methodology also allows for examining possible links between the different studies and the consistency in participants’ responding to similar decision-making paradigms.
IV1: Numeracy [between subject/continuous] IV2: Positive-negative framing [between subject] | IV1: Numeracy Original numeracy scale. Extension numeracy scale. |
IV2: Positive framing condition Scores framed positively “% correct” | Dependent variable Evaluation of students’ performance Please rate each student’s quality of work "Very poor" (-3) to "Very good" (3) (for each of the five students) Extension dependent variable Evaluation of subjective confidence level How confident are you that you made an accurate assessment of the five students? “Not at all confident” (0) to “Very confident” (6) |
IV2: Negative framing condition Scores framed negatively “% incorrect” |
IV1: Numeracy [between subject/continuous] IV2: Positive-negative framing [between subject] | IV1: Numeracy Original numeracy scale. Extension numeracy scale. |
IV2: Positive framing condition Scores framed positively “% correct” | Dependent variable Evaluation of students’ performance Please rate each student’s quality of work "Very poor" (-3) to "Very good" (3) (for each of the five students) Extension dependent variable Evaluation of subjective confidence level How confident are you that you made an accurate assessment of the five students? “Not at all confident” (0) to “Very confident” (6) |
IV2: Negative framing condition Scores framed negatively “% incorrect” |
IV1: Numeracy [between subject/continuous] IV2: Frequency-percentage description (risk format) [between subject] | IV1: Numeracy Original numeracy scale. Extension numeracy scale. |
IV2: Frequency condition “Of every 100… 10 are estimated…” | Dependent variable Evaluation of risk level Please rate the level of risk that Mr. Jones would harm someone “Low risk” (1) to “High risk” (6) Extension dependent variable Evaluation of subjective confidence level How confident are you that made an accurate risk assessment? “Not at all confident” (0) to “Very confident” (6) |
IV2: Percentage condition “Of every 100… 10% are estimated… |
IV1: Numeracy [between subject/continuous] IV2: Frequency-percentage description (risk format) [between subject] | IV1: Numeracy Original numeracy scale. Extension numeracy scale. |
IV2: Frequency condition “Of every 100… 10 are estimated…” | Dependent variable Evaluation of risk level Please rate the level of risk that Mr. Jones would harm someone “Low risk” (1) to “High risk” (6) Extension dependent variable Evaluation of subjective confidence level How confident are you that made an accurate risk assessment? “Not at all confident” (0) to “Very confident” (6) |
IV2: Percentage condition “Of every 100… 10% are estimated… |
IV: Numeracy [between subject/continuous] Original numeracy scale. Extension numeracy scale. |
Dependent variables Preference of bowl Bowl A-9-100; 100 jellybeans, 9% colored (odds = 9 out of 100 = 9%) Bowl B-1-10: 10 jellybeans, 10% colored (odds = 1 out of 10 = 10%) “Imagine that if you select a colored bean, you will WIN $5. Would you prefer to pick from Bowl A or Bowl B?” “Strong preference for Bowl A” (6) to “Strong preference for Bowl B” (6) Affect precision for Bowl A-9-100 choice How clear a feeling do you have about the goodness or badness of Bowl A’s 9% chance of winning? “Completely unclear” (0) to “Completely clear” (6) Affect for Bowl A-9-100 choice How good or bad does Bowl A’s 9% chance of winning make you feel? “Very bad” (-3) to “Very good” (3) [Added adjustment dependent variables] Affect precision for Bowl B-1-10 choice How clear a feeling do you have about the goodness or badness of Bowl B’s 10% chance of winning? “Completely unclear” (0) to “Completely clear” (6) Affect for Bowl B-1-10 choice How good or bad does Bowl B’s 10% chance of winning make you feel? “Very bad” (-3) to “Very good” (3) Forced choice of bowls If you were forced to choose, which bowl would you prefer to choose from? “Bowl A” or “Bowl B” Extension dependent variables Evaluation of subjective confidence level How confident are you that made an optimal selection between Bowl A and Bowl B? “Not at all confident” (0) to “Very confident” (6) |
IV: Numeracy [between subject/continuous] Original numeracy scale. Extension numeracy scale. |
Dependent variables Preference of bowl Bowl A-9-100; 100 jellybeans, 9% colored (odds = 9 out of 100 = 9%) Bowl B-1-10: 10 jellybeans, 10% colored (odds = 1 out of 10 = 10%) “Imagine that if you select a colored bean, you will WIN $5. Would you prefer to pick from Bowl A or Bowl B?” “Strong preference for Bowl A” (6) to “Strong preference for Bowl B” (6) Affect precision for Bowl A-9-100 choice How clear a feeling do you have about the goodness or badness of Bowl A’s 9% chance of winning? “Completely unclear” (0) to “Completely clear” (6) Affect for Bowl A-9-100 choice How good or bad does Bowl A’s 9% chance of winning make you feel? “Very bad” (-3) to “Very good” (3) [Added adjustment dependent variables] Affect precision for Bowl B-1-10 choice How clear a feeling do you have about the goodness or badness of Bowl B’s 10% chance of winning? “Completely unclear” (0) to “Completely clear” (6) Affect for Bowl B-1-10 choice How good or bad does Bowl B’s 10% chance of winning make you feel? “Very bad” (-3) to “Very good” (3) Forced choice of bowls If you were forced to choose, which bowl would you prefer to choose from? “Bowl A” or “Bowl B” Extension dependent variables Evaluation of subjective confidence level How confident are you that made an optimal selection between Bowl A and Bowl B? “Not at all confident” (0) to “Very confident” (6) |
IV1: Numeracy [between subject/continuous] IV2: Bet type (loss vs. no-loss) [between subject] | IV1: Numeracy Original numeracy scale. Extension numeracy scale. |
IV2: Bet - No loss condition “There is a 7/36 chance to win $9 and 29/36 chance to win nothing.” | Dependent variable Evaluation of bet’s attractiveness Please indicate your opinion of this bet’s attractiveness "Not at all attractive bet" (0) to "Extremely attractive bet" (20) Affect precision for bet How clear a feeling do you have about the goodness or badness of the bet? “Completely unclear” (0) to “Completely clear” (6) Affect for bet How good or bad does the bet make you feel? “Very bad” (-3) to “Very good” (3) Extension dependent variable Evaluation of subjective confidence level How confident are you that you made an accurate assessment of the bet’s attractiveness? “Not at all confident” (0) to “Very confident” (6) |
IV2: Bet - Loss condition “There is a 7/36 chance to win $9 and 29/36 chance to lose 5 cents.” |
IV1: Numeracy [between subject/continuous] IV2: Bet type (loss vs. no-loss) [between subject] | IV1: Numeracy Original numeracy scale. Extension numeracy scale. |
IV2: Bet - No loss condition “There is a 7/36 chance to win $9 and 29/36 chance to win nothing.” | Dependent variable Evaluation of bet’s attractiveness Please indicate your opinion of this bet’s attractiveness "Not at all attractive bet" (0) to "Extremely attractive bet" (20) Affect precision for bet How clear a feeling do you have about the goodness or badness of the bet? “Completely unclear” (0) to “Completely clear” (6) Affect for bet How good or bad does the bet make you feel? “Very bad” (-3) to “Very good” (3) Extension dependent variable Evaluation of subjective confidence level How confident are you that you made an accurate assessment of the bet’s attractiveness? “Not at all confident” (0) to “Very confident” (6) |
IV2: Bet - Loss condition “There is a 7/36 chance to win $9 and 29/36 chance to lose 5 cents.” |
Procedures
Participants first read the consent form, study outline, and then acknowledged a warning about not looking up answers online. They were then randomly assigned to a condition in each of the four studies. The order of four studies and their conditions were randomized. After the completion of tasks of four scenarios, they completed two numeracy scales, in random order. Then, they verified not using external aids in answering the questionnaire. At the end, participants answered a number of funneling questions (seriousness towards the survey, study purpose conjecture, and feedback) and provided demographic information. We added a more comprehensive overview of the survey procedure in the “procedure” section in the supplementary.
Measures
Numeracy
Objective numeracy predictor was measured using the Numeracy Scale developed by Lipkus et al. (2001) (Cronbach’s = 0.64). We refer to it as the “original numeracy scale”, and it has 11 items and the total mark is 11.
We added an additional numeracy measure as an extension: Numeracy Scale developed by Weller et al. (2013) (Cronbach’s = 0.66). We refer to it as the “Rasch-based numeracy scale”, and it has eight items and the total mark is 8.
Manipulations
Study 1: Positive versus Negative Framing
Participants were randomly assigned to either positive framing or negative framing conditions. They were asked to rate the quality of five psychology students’ exam scores framed positively or negatively. The order of five exam scores was randomized.
Study 2: Frequency and Percentage Condition
Participants were randomly assigned to frequency or percentage conditions. Participants read the scenario of Mr. Jones, a mental patient with the potential to harm someone when released. Participants then rate the risk level of patients like Mr. Jones under either frequency framing (i.e., 10 out of 100 patients) or percentage framing (i.e., 10% of 100 patients).
Study 3: Ratio Bias
Participants first read a scenario describing two jellybean bowls. Bowl A-9-100 is the more attractive yet with less objectively favorable outcome than Bowl B-1-10. Participants rated their preference for Bowl A-9-100, and then chose one of the bowls. They then rated affect levels and affect precision of both bowls.
Study 4: No Loss versus Loss Condition
Participants were randomly assigned to loss versus no-loss conditions. Participants read the scenario on a bet with “a chance 7 out 36 chance to win $9 and 29 out 36 chance to win nothing” or with “a chance 7 out 36 chance to win $9 but 29 out 36 chance to lose 5 cents’’. The chance of bet was visualized using a picture of a roulette wheel. Participants evaluated the attractiveness of the bet, and then rated affect and affect precision towards two bets.
Deviations
We note we made several adjustments that are deviations from the original’s design. We summarize the details of the deviations with comparisons of the original paper and our replication in Table 8.
Design facet . | Replication . | Details of deviation . |
---|---|---|
Hypothesis | Same+extension | We ran the original analyses and added a reframing of the hypotheses treating numeracy as a continuous variable. |
IV construct | Same | |
DV construct | Similar | We reconstructed our version of the scores of the four students described in Study 1, as the stimuli was not provided in the article. |
IV operationalization | Similar | We randomized the order of the numeracy questions |
DV operationalization | Similar | In Study 3, we added exploratory extra questions for more optimal choice on affect and affect precision (on Bowl B-1-10). We also added the question on compulsory bowl choices. |
IV stimuli | Similar | Added an extra numeracy scale |
DV stimuli | Same | |
Procedural details | Similar+extensions | The dependent variables on the four studies were completed together, in random order Added a warning pledge before test Added a question confirming not using external aids to find answers Added familiarity questions in Studies 2, 3, and 4 We did not collect SAT scores |
Physical settings | Different | Online questionnaire |
Population (e.g., age) | Different | Online US American MTurk workers |
Replication classification | Close replication |
Design facet . | Replication . | Details of deviation . |
---|---|---|
Hypothesis | Same+extension | We ran the original analyses and added a reframing of the hypotheses treating numeracy as a continuous variable. |
IV construct | Same | |
DV construct | Similar | We reconstructed our version of the scores of the four students described in Study 1, as the stimuli was not provided in the article. |
IV operationalization | Similar | We randomized the order of the numeracy questions |
DV operationalization | Similar | In Study 3, we added exploratory extra questions for more optimal choice on affect and affect precision (on Bowl B-1-10). We also added the question on compulsory bowl choices. |
IV stimuli | Similar | Added an extra numeracy scale |
DV stimuli | Same | |
Procedural details | Similar+extensions | The dependent variables on the four studies were completed together, in random order Added a warning pledge before test Added a question confirming not using external aids to find answers Added familiarity questions in Studies 2, 3, and 4 We did not collect SAT scores |
Physical settings | Different | Online questionnaire |
Population (e.g., age) | Different | Online US American MTurk workers |
Replication classification | Close replication |
Note. We summarized the replication as a close replication using the criteria by LeBel et al. (2018) criteria, summarized in the supplementary materials in section “Replication closeness”.
In terms of the measurement of numeracy, we added an objective numeracy scale developed by Weller et al. (2013). The rationale for this extension is that this scale has demonstrated sound psychometric properties based on Rasch analysis and is argued to have better predictive validity than previous scales. Several recent studies have adopted it and shown support for high internal consistency (Cheng, 2020; Dolan et al., 2016; Peters, Fennema, et al., 2019).
We added a warning pledge block at the beginning of the questionnaire to ask participants not to look for answers and added a question at the end asking participants whether they used any external aids to search answers after the completion of two numeracy scales.
We made minor visual adjustments in the original numeracy scale (Lipkus et al., 2001), we removed decimals in 10.00, and turned the 1,000 into 1000. Given that we asked for and validated the input of numbers without decimals and commas, these may confuse participants.
The target article paper used SAT scores as a proxy measure for intelligence as they demonstrated that intelligence is positively associated with objective numeracy. Collection of SAT scores is not applicable to our target sample, and is not a core component of the target article.
The target article ran data collection for each of the studies separately, and reported using pencil and paper in Study 1 (Studies 2, 3, and 4 not reported). We conducted data collection online in a unified design in which participants answer all the dependent variables of the four studies in random order.
In Study 1, the original paper did not report the specific scores of five psychology students, which only had one example. Therefore, we reconstructed our own version of the four students’ scores with four percent increment or four percent decrement (i.e., 66%, 70%, 74%, 78% and 82%).
In Study 3, we added the questions of affect precision and affect for Bowl B-1-10. These were meant as exploratory measures to allow us to determine how participants feel about both options to allow for baseline comparisons. We considered the possibility that drawing conclusions from the ratings of only one of the two bowls may be lacking, whereas a comparison of the two options would be more accurate. In addition, we added the forced bowl choice after the preference of two bowls rating. The original paper conducted the chi-square test but did not report the process of categorization of Bowl A-9-100 and Bowl B-1-10. Therefore, we required an extra question to confirm the choices.
Data Analysis Strategy
Replication: As in the original
The original paper dichotomized the numeracy scores as high numerate and low numerate. They conducted a 2 between x 5 within mixed ANOVA in Study 1, a two-way ANOVA in Study 2, and in Study 3 a chi-square test on the question of choosing Bowl, and an independent t-test to test bowl preference, affect, and affect precision. In Study 4, they conducted a factorial ANOVA for main interaction effect (i.e., numeracy and attractiveness of bet, numeracy and affect, numeracy and affect precision) and independent t-test to compare the responses (i.e., rate of attractiveness, affect and affect precision) of the high numerate under two conditions.
Extension: Additional analyses
To our understanding, one of the major weaknesses of the target article is in their decision to dichotomize a continuous measure. In the replication, we supplemented the original analyses with additional analyses treating numeracy scores as intended - a continuous variable. Therefore, in Studies 1, 2, 3 and 4 we conducted correlational analyses and in Study 3 we conducted an extra independent t-test for bowl choice.
Extension: Confidence
We conducted correlational analyses for confidence level and numeracy scale score.
Results
In this section, we reported the results of the sample without exclusion. Our original plan for the exclusion was for the case in which the replication failed, and for the most part they did not, and we also realized that the large number of exclusions severely limited our power to detect the effects. Therefore, we focus our analyses on the full sample. We provided the results post-exclusions in the section “overview of post-exclusions” in the supplementary materials, and provided a table comparing the results with and without exclusions in Table S14.
Replication: Main effects
We first examined the main effect of each study, examining classic phenomena in judgment and decision-making.
In Study 1, we conducted an independent t-test and found the framing effect that participants rated the students’ performance on exams higher when the results is positively framed than that is negatively framed (positive framing: M = 0.48, SD = 0.75; negative framing: M = -0.09, SD = 1.07; t(858) = 9.12, p < .001, d = 0.62, 95% CI [0.48, 0.76]).
In Study 2, we conducted an independent t-test to test the frequency-percentage effect. Participants rated the higher level of risk under frequency condition than percentage condition (frequency: M = 3.03, SD = 1.27; percentage: M = 2.58, SD = 1.17; t(858) = 9.12, p < .001, d = 0.37, 95% CI [0.23, 0.50])
In Study 3, we conducted a one-sample t-test on the preference of bowls and found that participants showed a stronger preference towards Bowl B-1-10 (t(859) = 20.04, p < .001, d = 0.68, 95% CI [0.61, 0.76]). We conclude that we failed to find support for previous ratio bias findings which showed stronger preference for the suboptimal choice, Bowl A-9-100.
In Study 4, we ran an independent t-test to examine the bets effect and found that participants rated higher attractiveness for the loss bet than the no-loss bet (no loss bet: M = 6.22, SD = 4.57; loss bet: M = 9.33, SD = 7.07; t(858) = 7.66, p < .001, d = 0.52, 95% [0.38, 0.66]), which supported the phenomenon.
Replication: Dichotomized numeracy
We first conducted statistical analyses that closely followed the methods used in the original article which dichotomized the continuous measure of numeracy into high numerate and low numerate via median split. The median of numeracy scores was 10 (mean = 9.69, range = 0-11). Therefore, participants whose overall score was 10 and above were classified as highly numerate and those whose overall score equal to or below 9 were classified as low numerate. We summarized the results of Studies 1, 2, and 4 in Table 9 and the results of Study 3 in Table 10. Further, we provided the descriptives of each subgroup analyzed of the following ANOVA tests in the supplementary materials (Table S17) to elaborate on the interaction effects.
F | df | p | η2p and CI | Interpretation | |
Study 1 (Mixed ANOVA) | |||||
Numeracy and framing effect | 5.02 | 1, 855 | .025 | 0.01 [0.00, 0.02] | signal inconsistent smaller |
Study 2 (Factorial ANOVA) | |||||
Numeracy and frequency-percentage effect | 3.40 | 1, 856 | .065 | 0.00 [0.00, 0.01] | no-signal inconsistent |
Study 4 (Factorial ANOVA) | |||||
Numeracy and attractiveness of bet | 17.87 | 1, 856 | < .001 | 0.02 [0.01, 0.04] | signal inconsistent smaller |
Numeracy and affect of bet | 9.27 | 1, 856 | .002 | 0.01 [0.00, 0.03] | signal inconsistent smaller |
Numeracy and affect precision of bet | 0.02 | 1, 856 | .890 | 0.00 [0.00, 0.00] | no-signal consistent |
F | df | p | η2p and CI | Interpretation | |
Study 1 (Mixed ANOVA) | |||||
Numeracy and framing effect | 5.02 | 1, 855 | .025 | 0.01 [0.00, 0.02] | signal inconsistent smaller |
Study 2 (Factorial ANOVA) | |||||
Numeracy and frequency-percentage effect | 3.40 | 1, 856 | .065 | 0.00 [0.00, 0.01] | no-signal inconsistent |
Study 4 (Factorial ANOVA) | |||||
Numeracy and attractiveness of bet | 17.87 | 1, 856 | < .001 | 0.02 [0.01, 0.04] | signal inconsistent smaller |
Numeracy and affect of bet | 9.27 | 1, 856 | .002 | 0.01 [0.00, 0.03] | signal inconsistent smaller |
Numeracy and affect precision of bet | 0.02 | 1, 856 | .890 | 0.00 [0.00, 0.00] | no-signal consistent |
Note. CI = 90% confidence intervals. The interpretation of outcome is based on LeBel et al. (2019).
Low versus high numerate and bowl choice (Bowl A-9-100 and Bowl B-1-10) | |||||
Chi-square test | χ2 | df | p | Cramer’s V and CI | |
Dichotomized continuous numeracy | 12.53 | 1 | <.001 | 0.13 [0.06, 0.20] | signal |
Dichotomized forced bowl choices | 24.91 | 1 | <.001 | 0.17 [0.10, 0.24] | signal |
Low versus high numerate | t | df | p | d and CI | Interpretation |
Preference of Bowls | 5.51 | 858 | <.001 | 0.40 [0.25, 0.54] | signal inconsistent smaller |
Affect for Bowl A-9-100 | -4.62 | 858 | <.001 | 0.33 [0.19, 0.48] | signal consistent |
Affect precision for Bowl A-9-100 | 3.00 | 858 | .003 | 0.22 [0.07, 0.36] | signal inconsistent smaller |
Low versus high numerate and bowl choice (Bowl A-9-100 and Bowl B-1-10) | |||||
Chi-square test | χ2 | df | p | Cramer’s V and CI | |
Dichotomized continuous numeracy | 12.53 | 1 | <.001 | 0.13 [0.06, 0.20] | signal |
Dichotomized forced bowl choices | 24.91 | 1 | <.001 | 0.17 [0.10, 0.24] | signal |
Low versus high numerate | t | df | p | d and CI | Interpretation |
Preference of Bowls | 5.51 | 858 | <.001 | 0.40 [0.25, 0.54] | signal inconsistent smaller |
Affect for Bowl A-9-100 | -4.62 | 858 | <.001 | 0.33 [0.19, 0.48] | signal consistent |
Affect precision for Bowl A-9-100 | 3.00 | 858 | .003 | 0.22 [0.07, 0.36] | signal inconsistent smaller |
Note. CI = 95% confidence intervals. Independent t-test comparing the stated DVs between the high and low numerate split sub-samples. Dichotomized continuous numeracy is categorized bowl choices according to the preference of bowl. Dichotomized forced bowl choices is the adjusted DV. The interpretation of outcome is based on LeBel et al. (2019).
In Study 1, we performed a mixed ANOVA and found an interaction effect of numeracy on framing effect (F(1, 855) = 5.02, p = .025, η2p = 0.01, 90% CI [0.00, 0.02]) (Figure 1). The effects were rather weak, barely below the pre-registered alpha threshold. We concluded support for the hypothesis that the less numerate show a stronger framing effect than the highly numerate, with weaker effects.
In Study 2, we conducted a 2-way ANOVA and failed to find support for an interaction between numeracy and the frequency-percentage effect, with weak effect just above the set alpha (F(1, 856) = 3.40, p = .065, η2p = 0.00, 90% CI [0.00, 0.01]) (Figure 2). The findings were in the right direction, and the differences in effect size and p-values between our replication of Study 1 and Study 2 are rather minor, so we hesitate to conclude one as supported and the other as failed, and yet the results did not meet our pre-set criteria, and therefore inconsistent with the hypothesis that the less numerate are affected more by the frequency-percentage effect than the highly numerate. We will return to this in our additional extension analyses.
In Study 3, we conducted two Chi-squared tests, one based on preference of bowl and the other based on the forced choice of bowl, to test the association between numeracy and bowl choice. Aforementioned, the original articles did not report the method of categorization of bowl choices based on the preferences of bowls. Then, we made an assumption that the original authors coded participants who indicated stronger preference for Bowl A-9-100 (i.e., 1-6) were coded as having selected Bowl A-9-100 and those who indicated preference for Bowl B-1-10 (i.e., 1-6) were coded as Bowl B-1-10, with the neutral value (0) neglected. We found support for an interaction between numeracy and ratio bias, and the result based on the forced choice of bowl (χ2(1, N = 860) = 24.91, p < .001, Cramer’s V = 0.17, 95% CI [0.10, 0.24]) had a stronger effect than that based on preference of bowls (χ2(1, N = 789) = 12.53, p < .001, Cramer’s V = 0.13, 95% CI [0.06, 0.20]).
In addition, we conducted independent t-tests on preferences of bowl and affective variables comparing the two numeracy groups. Participants with high numeracy (M = 3.01, SD = 3.50) showed greater preference for Bowl B-1-10 (over Bowl A-9-100) than participants with low numeracy (M = 1.56, SD = 3.89; t(858) = 5.51, p < .001, d = 0.40, 95% CI [0.25, 0.54]) (Figure 3). Those with lower numeracy (M = -0.48, SD = 1.48) showed higher affect than those with high numeracy (M = -0.94 , SD = 1.30; t(858) = -4.62, p < .001, d = 0.33, 95% CI [0.19, 0.48]). By contrast, the less numerate (M = 4.19, SD = 1.64) had less precise feelings towards Bowl A-9-100 than the highly numerate (M = 4.53 , SD = 1.53; t(858) = 3.00, p = .003, d = 0.22, 95% CI [0.07, 0.36]).
Therefore, our findings for Study 3 were consistent with the hypothesis that the less numerate make less optimal choices in competing affective decisions than the highly numerate, with lower affective precision.
In Study 4, high numerate participants rated the loss bet more attractively than no-loss bet (loss bet: M = 10.27, SD = 7.21; no-loss: M = 5.95 , SD = 4.47; t(570) = 8.63, p < .001, d = 0.72, 95% CI [0.55, 0.90]). By contrast, we found no support for difference in low numerate participants with much weaker effects(loss bet: M = 7.50, SD = 6.44; no-loss: M = 6.78 , SD = 4.74; t(286) = 1.09, p = .277, d = 0.13, 95% CI [-0.10, 0.36]). We conducted a two-way ANOVA and found support for an interaction between numeracy and bets effect with (F(1, 856) = 17.87, p < .001, η2p = 0.02, 90% CI [0.01, 0.04]) (Figure 4).
In addition, the highly numerate rated stronger affect towards bets in the loss condition (M = 0.16, SD = 1.83) than no-loss condition (M = -0.72 , SD = 1.35; t(570) = 6.62, p < .001, d = 0.55, 95% CI [0.38, 0.72]), with no support and weaker effects for the low numerate (loss bet: M = -0.18, SD = 1.61; no-loss: M = -0.38 , SD = 1.36; t(286) = 1.13, p = .260, d = 0.13, 95% CI [-0.10, 0.36]). We analyzed the interaction effect between numeracy and affect of bets and the results supported the hypothesis that the highly numerate experience stronger affect in probabilities and numerical comparisons than the low numerate (F(1, 856) = 17.87, p < .001, η2p = 0.02, 90% CI [0.01, 0.04]).
Concerning the affect precision, both the low numerate and highly numerate showed greater affect precision towards loss bet than no-loss bet (low numeracy: t(286) = 2.59, p < .001, d = 0.31, 95% CI [0.07, 0.54]); high numeracy: t(570) = 4.36, p = .010, d = 0.36, 95% CI [0.20, 0.53]). Therefore, we found no support for an interaction between numeracy and affect precision of bets (F(1, 856) = 0.02, p = .890, η2p = 0.00, 90% CI [0.00, 0.00]), consistent with the original findings of the target article.
Extension: Continuous numeracy
Original numeracy scale
In Study 1, we found support for stronger association between numeracy and ratings of students in the positive framing condition (r = -0.10, 95% CI [-0.19, -0.01], p = .036) than in the negative framing condition (r = 0.07, 95% CI [-0.03, 0.16], p = .177) (Figure 5). We compared the two correlations with the tool “cocor” (Diedenhofen & Musch, 2015) and found support for differences in the strength of the two associations (z = -2.49, p = .013).
In Study 2, we found support for a stronger association between numeracy and ratings of risk level in frequency condition (r = -0.17, 95% CI [-0.26, -0.07], p < .001) than in the percentage condition (r = -0.03, 95% CI [-0.12, 0.07], p = .543) (Figure 6). We compared the two correlations using “cocor” and found support for differences in the strength of the two associations (z = -2.07, p = .039).
In Study 3, we conducted an independent t-test comparing the numeracy of the two bowl selections. The numeracy of participants who selected Bowl A-9-100 (M = 9.05, SD = 1.90) was lower than those who chose Bowl B-1-10 (M = 9.98, SD = 1.39; t(858) = 6.81, p < .001, d = 0.55, 95% CI [0.38, 0.72]). We also found support for associations between numeracy and bowl preference towards Bowl B-1-10 (r = 0.21, 95% CI [0.14, 0.27], p < .001) (Figure 7), lower affect for Bowl A-9-100 (r = -0.19, 95% CI [-0.25, -0.12], p < .001), and higher affect precision for Bowl A-9-100 (r = 0.14, 95% CI [0.07, 0.20], p < .001). These findings support the association between higher numeracy with the more rational choice of Bowl B-1-10, and less affect and higher affect precision in such a competing affective decision paradigm.
In Study 4, we found support for an association between numeracy and attractiveness of the two bets, a negative association with the no-loss condition (r = -0.13, 95% CI [-0.22, -0.04], p = .006), and a positive association with the loss condition (r = 0.21, 95% CI [0.11, 0.30], p < .001; comparison the associations: z = -5.03, p < .001) (Figure 8). We also found differences in associations between numeracy and affect of bets (z = -3.82, p < .001), but no support for an effect regarding affect precision (z = 0.45, p = .654).
Original scale | Rasch scale | ||||||
Study | r and CI | p | Spearman’s rho | r and CI | p | Spearman’s rho | |
1 | Students rating in Positive framing | -0.10 [-0.19, -0.01] | .036 | -0.11 | -0.12 [-0.21, -0.02] | .017 | -0.11 |
Students rating in Negative framing | 0.07 [-0.03, 0.16] | .177 | 0.07 | 0.07 [-0.02, 0.17] | .134 | 0.09 | |
2 | Risk rating in Frequency condition | -0.17 [-0.26, -0.07] | < .001 | -0.12 | -0.17 [-0.26, -0.08] | < .001 | -0.15 |
Risk rating in Percentage condition | -0.03 [-0.12, 0.07] | .543 | -0.02 | 0.00 [-0.09, 0.10] | .919 | -0.03 | |
3 | Bowl preference | 0.21 [0.14, 0.27] | < .001 | 0.22 | 0.20 [0.14, 0.27] | < .001 | 0.00 |
Affect for Bowl A-9-100 | -0.19 [-0.25, -0.12] | < .001 | -0.17 | -0.20 [-0.26, -0.13] | < .001 | -0.18 | |
Affect precision for Bowl A-9-100 | 0.14 [0.07, 0.20] | < .001 | 0.14 | 0.17 [0.11, 0.24] | < .001 | 0.18 | |
4 | No Loss condition | ||||||
Attractiveness | -0.13 [-0.22, -0.04] | .006 | -0.07 | -0.07 [-0.16, 0.03] | .161 | -0.03 | |
Affect | -0.16 [-0.25, -0.06] | .001 | -0.12 | -0.09 [-0.19, 0.00] | .053 | -0.06 | |
Affect precision | 0.16 [0.07, 0.25] | < .001 | 0.11 | 0.15 [0.05, 0.24] | .002 | 0.13 | |
Loss condition | |||||||
Attractiveness | 0.21 [0.11, 0.30] | < .001 | 0.23 | 0.21 [0.30, 0.12] | < .001 | 0.22 | |
Affect | 0.10 [0.01, 0.20] | .032 | 0.14 | 0.11 [0.02, 0.21] | .020 | 0.14 | |
Affect precision | 0.13 [0.04, 0.22] | .006 | 0.09 | 0.13 [0.03, 0.22] | .010 | 0.10 |
Original scale | Rasch scale | ||||||
Study | r and CI | p | Spearman’s rho | r and CI | p | Spearman’s rho | |
1 | Students rating in Positive framing | -0.10 [-0.19, -0.01] | .036 | -0.11 | -0.12 [-0.21, -0.02] | .017 | -0.11 |
Students rating in Negative framing | 0.07 [-0.03, 0.16] | .177 | 0.07 | 0.07 [-0.02, 0.17] | .134 | 0.09 | |
2 | Risk rating in Frequency condition | -0.17 [-0.26, -0.07] | < .001 | -0.12 | -0.17 [-0.26, -0.08] | < .001 | -0.15 |
Risk rating in Percentage condition | -0.03 [-0.12, 0.07] | .543 | -0.02 | 0.00 [-0.09, 0.10] | .919 | -0.03 | |
3 | Bowl preference | 0.21 [0.14, 0.27] | < .001 | 0.22 | 0.20 [0.14, 0.27] | < .001 | 0.00 |
Affect for Bowl A-9-100 | -0.19 [-0.25, -0.12] | < .001 | -0.17 | -0.20 [-0.26, -0.13] | < .001 | -0.18 | |
Affect precision for Bowl A-9-100 | 0.14 [0.07, 0.20] | < .001 | 0.14 | 0.17 [0.11, 0.24] | < .001 | 0.18 | |
4 | No Loss condition | ||||||
Attractiveness | -0.13 [-0.22, -0.04] | .006 | -0.07 | -0.07 [-0.16, 0.03] | .161 | -0.03 | |
Affect | -0.16 [-0.25, -0.06] | .001 | -0.12 | -0.09 [-0.19, 0.00] | .053 | -0.06 | |
Affect precision | 0.16 [0.07, 0.25] | < .001 | 0.11 | 0.15 [0.05, 0.24] | .002 | 0.13 | |
Loss condition | |||||||
Attractiveness | 0.21 [0.11, 0.30] | < .001 | 0.23 | 0.21 [0.30, 0.12] | < .001 | 0.22 | |
Affect | 0.10 [0.01, 0.20] | .032 | 0.14 | 0.11 [0.02, 0.21] | .020 | 0.14 | |
Affect precision | 0.13 [0.04, 0.22] | .006 | 0.09 | 0.13 [0.03, 0.22] | .010 | 0.10 |
Note. CI = 95% confidence intervals.
Fisher’s z | p | Interpretation | ||
Original numeracy scale | ||||
Study 1 | Numeracy and framing effect | -2.49 | .013 | signal |
Study 2 | Numeracy and frequency-percentage effect | -2.07 | .039 | signal |
Study 4 | Numeracy and attractiveness of bets | -5.03 | < .001 | signal |
Numeracy and affect | -3.82 | < .001 | signal | |
Numeracy and affect precision | 0.45 | .654 | no-signal consistent | |
Rasch-based numeracy scale | ||||
Study 1 | Numeracy and framing effect | -2.79 | .005 | signal |
Study 2 | Numeracy and frequency-percentage effect | -2.51 | .012 | signal |
Study 4 | Numeracy and attractiveness of bets | -4.14 | < .001 | signal |
Numeracy and affect | -2.93 | .003 | signal | |
Numeracy and affect precision | 0.30 | .766 | no-signal consistent |
Fisher’s z | p | Interpretation | ||
Original numeracy scale | ||||
Study 1 | Numeracy and framing effect | -2.49 | .013 | signal |
Study 2 | Numeracy and frequency-percentage effect | -2.07 | .039 | signal |
Study 4 | Numeracy and attractiveness of bets | -5.03 | < .001 | signal |
Numeracy and affect | -3.82 | < .001 | signal | |
Numeracy and affect precision | 0.45 | .654 | no-signal consistent | |
Rasch-based numeracy scale | ||||
Study 1 | Numeracy and framing effect | -2.79 | .005 | signal |
Study 2 | Numeracy and frequency-percentage effect | -2.51 | .012 | signal |
Study 4 | Numeracy and attractiveness of bets | -4.14 | < .001 | signal |
Numeracy and affect | -2.93 | .003 | signal | |
Numeracy and affect precision | 0.30 | .766 | no-signal consistent |
Independent t-test | t | df | p | d and CI | Interpretation |
Original numeracy scale | |||||
Bowl Choice | 6.81 | 858 | < .001 | 0.55 [0.38, 0.72] | signal consistent |
Rasch-based numeracy scale | |||||
Bowl Choice | 6.59 | 858 | < .001 | 0.54 [0.37, 0.70] | / |
Independent t-test | t | df | p | d and CI | Interpretation |
Original numeracy scale | |||||
Bowl Choice | 6.81 | 858 | < .001 | 0.55 [0.38, 0.72] | signal consistent |
Rasch-based numeracy scale | |||||
Bowl Choice | 6.59 | 858 | < .001 | 0.54 [0.37, 0.70] | / |
Note. CI = 95% confidence intervals. Independent t-test comparing the numeracy between Bowl A-9-100 and Bowl B-1-10.
Extension: Rasch-based numeracy scale
Extension: Confidence
We added an extension examining numeric confidence and tested the association between objective numeracy and numeric confidence with both original numeracy scale and rasch-based numeracy scale. The findings were mixed. We only found support for an association in the positive framing condition in Study 1 (r = -0.11, 95% CI [-0.20, -0.02], p = .021), in Study 3 (r = 0.15, 95% CI [0.08, 0.21], p < .001), and in the no-loss bet condition in Study 4 (r = 0.10, 95% CI [0.01, 0.20], p = .030) derived from original numeracy scale. We also conducted analyses using Rasch-based numeracy scale with similar results, detailed in Table 14.
Study | Original | Rasch | |||||
r and CI | p | Spearman’s rho | r and CI | p | Spearman’s rho | ||
1 | Positive framing condition | -0.11 [-0.20, -0.02] | .021 | -0.12 | -0.10 [-0.19, -0.01] | .038 | -0.07 |
Negative framing condition | -0.03 [-0.13, 0.06] | .474 | -0.02 | -0.04 [-0.14, 0.05] | .376 | 0.00 | |
2 | Frequency condition | -0.01 [-0.10, 0.09] | .868 | 0.02 | -0.01 [-0.10, 0.09] | .852 | 0.03 |
Percentage condition | -0.01 [-0.11, 0.08] | .801 | -0.01 | 0.00 [-0.10, 0.09] | .952 | 0.03 | |
3 | 0.15 [0.08, 0.21] | < .001 | < .001 | 0.14 [0.08, 0.21] | < .001 | 0.19 | |
4 | No loss condition | 0.10 [0.01, 0.20] | .030 | 0.08 | 0.11 [0.01, 0.20] | .027 | 0.10 |
Loss condition | 0.05 [-0.05, 0.14] | .325 | 0.03 | 0.06 [-0.03, 0.16] | .196 | 0.08 |
Study | Original | Rasch | |||||
r and CI | p | Spearman’s rho | r and CI | p | Spearman’s rho | ||
1 | Positive framing condition | -0.11 [-0.20, -0.02] | .021 | -0.12 | -0.10 [-0.19, -0.01] | .038 | -0.07 |
Negative framing condition | -0.03 [-0.13, 0.06] | .474 | -0.02 | -0.04 [-0.14, 0.05] | .376 | 0.00 | |
2 | Frequency condition | -0.01 [-0.10, 0.09] | .868 | 0.02 | -0.01 [-0.10, 0.09] | .852 | 0.03 |
Percentage condition | -0.01 [-0.11, 0.08] | .801 | -0.01 | 0.00 [-0.10, 0.09] | .952 | 0.03 | |
3 | 0.15 [0.08, 0.21] | < .001 | < .001 | 0.14 [0.08, 0.21] | < .001 | 0.19 | |
4 | No loss condition | 0.10 [0.01, 0.20] | .030 | 0.08 | 0.11 [0.01, 0.20] | .027 | 0.10 |
Loss condition | 0.05 [-0.05, 0.14] | .325 | 0.03 | 0.06 [-0.03, 0.16] | .196 | 0.08 |
Note. CI = 95% confidence intervals
Assumption checks and non-parametric tests
We used Levene’s test to check the homogeneity of variances and the Shapiro-Wilks test to check the normality of variables for ANOVA and independent t-test. The homogeneity and normality were violated primarily because of the highly negative skewness of original numeracy scale and rasch-based numeracy scale. We first supplemented the analyses with a report of Spearman correlations, provided in the tables alongside correlations. We also conducted non-parametric tests: Aligned Rank Transform (ART) (Kay et al., 2021) to supplement the Mixed ANOVA (Study 1) and the factorial ANOVA (Study 2 and Study 4), and Mann-Whitney U test to supplement the independent t-test (Study 3). These robust tests showed similar results with comparable conclusions except for Study 2, which shifted from just above the threshold to just below the threshold (F(1, 856) = 3.40, p = .065, η2p = 0.00, 90% CI [0.00, 0.01]; non parametric: F(1,856) = 4.18, p = .04), which again shows the issues in the over-reliance on p-values threshold as a dichotomy of success/failure decisions.
We summarized results of robustness check in Table S18 in the supplementary materials.
Exploratory analyses: Affect precision for Bowl B-1-10 and numeracy scale associations
We added extra questions for affect and affect precision for Bowl B-1-10 in Study 3. We found that participants rated more positive affect towards Bowl B-1-10 than for Bowl A-9-100 (Bowl B-1-10: M = -0.22 , SD = 1.50; Bowl A-9-100: M = -0.78 , SD = 1.38; t(858) = 11.86, p < .001, d = 0.40, 95% CI [0.33, 0.47]). Participants also showed greater affect precision for Bowl B-1-10 than Bowl A-9-100 (Bowl B-1-10: M = 4.65 , SD = 1.47; Bowl A-9-100: M = 4.42 , SD = 1.58; t(858) = 6.09, p < .001, d = 0.21, 95% CI [0.14, 0.28]).
We examined the associations between the original numeracy scale and the extension rasch-based numeracy scale and found that they were strongly correlated (r = 0.83, 95% CI [0.81, 0.85], p < .001).
As we failed to find the support for Study 2 using dichotomized numeracy, we ran an additional analysis to examine possible order effects, with display order as a covariate, and found no support for the interaction (F(1, 855) = 3.29, p = .070, η2p = 0.00, 90% CI [0.00, 0.02]). In addition, we analyzed Study 2 only when it was the first study displayed to the participant, and also found no support for an interaction with an even weaker effect than for the whole sample (n = 224; F(1, 220) = 0.00, p = .986, η2p = 0.00, 90% CI [0.00, 0.00]). This suggests the order is not the reason for the failed replication using the dichotomous measure.
Comparing replication to original findings
Compared to the original findings (Table S1, S2, and S3 in the supplementary, Peters et al., 2006, p. 6), our replication findings based on dichotomous numeracy suggest support for numeracy as a predictor of framing effect (Study 1), ratio bias (Study 3), and bets effect (Study 4). When treating numeracy as a continuous variable, all four studies (including Study 2’s frequency-percentage effect) could be regarded as successful.
According to the criteria of LeBel et al. (2019) on the evaluation of replication results (see Figure S5 and S6), the replication effect sizes (i.e., Study 1, 2, and 4) showed signals and had inconsistent and smaller effects than those reported in the original. We summarize two minor discrepancies: (1) we found no support for numeracy as a predictor of frequency-percentage in Study 2 using the dichotomization method applied in the original, (2) the highly numerate felt more negative about the affectively appealing bowl with less favorable objective probabilities (i.e., Bowl A-9-100) compared to the low numerate.
Overall, we conclude this to be a successful replication of the target article, yet with much weaker effects than those reported by the original, and better aligned results when improving on the original’s methods using continuous measures rather than dichotomizing.
Discussion
We conducted a pre-registered replication and extension of Peters et al. (2006) with a larger, well-powered, and more diverse sample. Our findings mirroring the original’s method of dichotomizing numeracy were mostly consistent with the original: (1) the highly numerate showed weaker framing effect (Study 1), (2) the low numerate participants showed stronger preference towards suboptimal choices, and showed more positive affect and low affect precision about their choices (Study 3), (3) the highly numerate showed a stronger bets effect (i.e., larger difference of rated attractiveness of bet under no-loss and loss conditions) and drew more affect from the less objectively favorable choices (Study 4). The findings for numeracy and frequency-percentage were weaker, though in the right direction and just below our pre-set threshold (Study 2). Our additional extension analyses using the continuous numeracy measure successfully replicated the results of all four original studies. Therefore, we conclude that our replication was mostly successful, with findings in the expected direction, yet with weaker effects. Concerning our added extension examining confidence, our findings regarding the association between objective numeracy and confidence were mixed.
Replication
The goal of the project was to assess the replicability of the research presented by Peters et al. (2006) in support of the interaction effects between numeracy and four decision-making paradigms. We first demonstrated support for three of the four classic effects: We showed a main effect for the framing effect, frequency-percentage effect, and bets effect, yet failed to show support for the ratio bias. That we were able to find numeracy as a predictor of ratio bias suggests that the bias is sensitive to the population and that factors such as sample numeracy impacted results. Our sample generally showed high numeracy, which may have resulted in weaker ratio bias effects. Also, our exploratory analysis of Bowl B-1-10 revealed that participants expressed more positive affect towards the optimal choice (Bowl B-1-10) rather than the non-optimal choices (Bowl A-9-100), as suggested by the dual process model. Bourdin and Vetschera (2018) discussed the situations that ratio bias phenomena may occur and found that it happens more frequently for low probabilities. However, the ratios they manipulated were more complex and required extra mental calculations compared to our target’s paradigm (e.g., 1:9 vs. 9:90, and 1:9 vs. 8:91). It is possible that such ratios would show stronger effects, impacting the affective understanding of numbers. If that were the case, then those with lower numeracy would rely more on absolute numbers, which are more readily available. The scenario that we used might not be challenging enough, and therefore unable to serve as an affective hit to decision making.
That we failed to detect numeracy as a predictor of frequency-percentage effect using the original’s method of dichotomizing was inconsistent with the target’s findings and other previous studies (e.g., Dickert et al., 2011; Hill & Brase, 2012), though it is reassuring that we found support for the effect using the more accurate continuous method. One likely possible explanation is that the dichotomization of numeracy leads to the loss of power, as we noted in our discussion of the target’s methodological weaknesses, and that given the weaker effects we needed a larger sample.
Extensions
Analyses Using Continuous Numeracy and the Rasch-based Numeracy Scale
We successfully replicated the original findings when we treated numeracy as a continuous variable, including Study 2 and affect for Bowl A-9-100 in Study 3. The results based on the rasch-based numeracy scale were consistent with those drawn on the original numeracy scale, which provided additional robust evidence to our findings. In future studies of numeracy and decision-making, we strongly recommend conducting continuous measure analyses. Given that the two numeracy scales showed comparable results, we consider either one or both to be good options.
Confidence
We ran extensions examining the relationship between objective numeracy and numeric confidence under specific conditions, and discovered three significant results with small effects. Therefore, we take our mixed findings as an indicator that such a relationship is not consistent or robust. It is possible that our single item question measuring confidence should be better validated or was not comprehensive enough to measure participants’ confidence regarding engaging with and processing numeric information. We recommend more work to construct and test well-validated questions. For instance, Peters et al. (2019) selected the first four items from the subjective numeracy scale (Fagerlin et al., 2007) to measure numeric confidence (e.g., “How good are you at calculating 15% tip?”). In addition, Peters and Shoots-Reinhard (2022) suggested in their latest paper that numeric confidence is associated with persistence of choices being made and emotional reactions from experienced difficulty. Future studies could underline such measurement of variables. Another likely possibility is that numeracy and confidence are simply different constructs that capture different aspects of decision-making abilities which impact decision-making in different ways. Future studies can build on our data and initial investigation to examine the associations between confidence and decision-making biases and heuristics, and relate those to the literature on overconfidence, underconfidence, and the need for accuracy and calibration.
Limitations and Future Directions
As with all studies, several limitations should be addressed in future research. We initially set out to examine whether our participants were familiar with the very common decision-making paradigms and use that as an exclusion criteria. When analyzing the results, we realized the problem with this approach as the number of participants who indicated familiarity with the paradigm was much higher than we anticipated. Though we find support for the target’s findings regardless of, it is possible that this resulted in the much weaker effects.
The first limitation is that our questions regarding familiarity (i.e., familiarity of scales and scenarios) were too ambiguous. Reviewing the feedback given by participants, some of them were confused about the meaning of familiarity. For instance, several participants perceived the understanding of questions as familiarity, rather than our intent in assessing whether they have seen those before, had experienced with similar decisions in real-life, or already know the right answer to the paradigm. This likely resulted in many of the participants being flagged for possible exclusion despite them not knowing the paradigms, which meant a severe loss of power and difficulty of detecting effect sizes for the post-exclusion analyses. However, when we compared the effects of pre and post exclusions, the effects were overall much stronger before exclusions, which may indicate that familiarity - at least in how we measured it - does not necessarily weaken the effects. This is an empirical question that should be examined in future studies using large samples, and we therefore hesitate against recommending excluding all participants who indicate knowing the paradigm. Instead, familiarity or experience with the paradigm can be considered as a possible moderator.
The high rate of indicated familiarity might also have to do with our target sample of highly experienced MTurk workers, a point that was raised in our Stage 1 review process. A large proportion of individuals indicated familiarity with the numeracy scales (56.2% for original numeracy scale and 63.4% for rasch-based numeracy scale), and given the popularity of these scales and MTurk workers experience with online studies, it is possible that they have indeed come across some of those scales before.
Moreover, the results of both numeracy scales were not normally distributed. The non-normal distribution of the original numeracy scale has been discussed by Weller et al. (2013), and they developed the rasch-based numeracy scale to avoid such statistical violation. However, the rasch-based scale appears to have been as easy as the original numeracy scale for our sample. Therefore, we conclude that future studies would need to take into account much a larger sample size and exclusion rates than we anticipated, as well as considering employing alternative numeracy scales, or to test for sample naivete.
Another methodological issue we faced was that we set our question validation for certain questions as too strict, without sufficient instructive instructions. For instance, the answer to Question 3 in the original numeracy scale should be 0.1% and we allowed only decimal input. However, several participants failed to input 0.001, which confused them about whether they gave a correct answer. Such issues should be noted in future studies with Qualtrics or other online questionnaire platforms, to be mindful of all the likely options of how participants may perceive the question or use the answer field.
Another obvious limitation with running studies online is the inability to completely prevent participants from using online shortcuts or calculators to answer numerical questions. We tried to address that best we could with a warning, and asking participants to pledge not doing so, and we relied on the participants’ built-in incentives to finish the survey quickly to get paid, hoping that they would prefer to answer intuitively and fast rather than take the more lengthy and costly process of looking up answers. Similar studies done online may consider implementing extra measures with scripts that detect whether the participant has left the survey window, and take response time into account, to address possible issues.
To conclude, we were able to find support for the target’s findings despite all those limitations, and this is an indication of robust findings. Our weaker effects may well be attributed to some of these limitations, and it is possible, and likely, that a more tightly controlled study would yield larger effects. Future studies can now use our materials to design stronger studies in the future.
Competing Interests
The author(s) declared no potential conflicts of interests with respect to the authorship and/or publication of this article.
Funding
This project has been supported by the Teaching Development Grant from the University of Hong Kong awarded to Gilad Feldman.
Authorship Declaration
Minrui Zhu conducted the replication as part of his thesis in psychology.
Gilad Feldman guided and supervised each step in the project, (later: conducted the pre-registrations, ran data collection), and edited the manuscript for submission.
Important Links and Information
Citation of the target research article:
Peters, E., Västfjäll, D., Slovic, P., Mertz, C. K., Mazzocco, K., & Dickert, S. (2006). Numeracy and decision making. Psychological science, 17(5), 407-413. https://doi.org/10.1111/j.1467-9280.2006.01720.x
In-principle Acceptance and Open-review
Provided on: https://rr.peercommunityin.org/articles/rec?id=165
Data Accessibility Statement
Materials, data, and code are available on: https://osf.io/4hjck/.
Contributor Roles Taxonomy
In the table, employ CRediT (Contributor Roles Taxonomy) to identify the contribution and roles played by the contributors in the current replication effort. Please refer to https://www.casrai.org/credit.html for details and definitions of each of the roles listed below.
Role | Minrui Zhu | Gilad Feldman |
Conceptualization | X | X |
Pre-registration | X | |
Data curation | X | |
Formal analysis | X | |
Funding acquisition | X | |
Investigation | X | |
Pre-registration peer review / verification | X | |
Data analysis peer review / verification | X | |
Methodology | X | |
Project administration | X | |
Resources | X | |
Software | X | |
Supervision | X | |
Validation | X | |
Visualization | X | |
Writing-original draft | X | |
Writing-review and editing | X |
Role | Minrui Zhu | Gilad Feldman |
Conceptualization | X | X |
Pre-registration | X | |
Data curation | X | |
Formal analysis | X | |
Funding acquisition | X | |
Investigation | X | |
Pre-registration peer review / verification | X | |
Data analysis peer review / verification | X | |
Methodology | X | |
Project administration | X | |
Resources | X | |
Software | X | |
Supervision | X | |
Validation | X | |
Visualization | X | |
Writing-original draft | X | |
Writing-review and editing | X |