Women and girls consistently outperform men and boys in educational outcomes. A large body of qualitative work suggests that boys’ difficulties in integrating peer acceptance with being studious may explain this gender gap. Gender differences in personality have also been suggested as an explanation. We investigated whether empirically constructed “male-typicality” in personal values, personality traits, and cognitive performance profiles is related to poorer school performance. In a sample of Finnish 15-year-olds, three male-typicality variables were formed empirically via supervised machine learning techniques. These variables were used to predict 9th grade GPA derived from school records. “Male-typicality” based on personal values and cognitive skills was related to lower GPA, but male-typicality in personality traits was unrelated to GPA. The results tentatively suggest that a male-typical value profile may be problematic for academic effort. The results also suggest that a “girl-typical” cognitive skill profile emphasizing verbal skills may provide students with a small overall educational advantage.

In most industrialized countries, women and girls outperform men and boys in educational outcomes. Girls receive better grades than do boys on average (O’Dea et al., 2018; Voyer & Voyer, 2014), boys are more likely to repeat grades (OECD, 2015), men are more likely to leave early from education and training in young adulthood (Eurostat, 2020), and women are more likely to obtain a tertiary degree (Buchmann & DiPrete, 2006; England et al., 2020; Snyder & Dillow, 2011). Similarly gendered patterns of educational attainment are emerging or already in place in many developing countries as well (Grant & Behrman, 2010; Ullah & Ullah, 2019).

Explanations for gender differences in educational outcomes have been targets of intense research in sociology, psychology, and educational sciences. Whilst the educational gender gap is a complex issue likely to have multiple causes, explanations related to boys’ social environment may be particularly helpful in understanding the phenomenon (Mittleman, 2022). To elaborate, ethnographic and sociological studies have repeatedly shown that being a hardworking, conscientious student may undermine peer acceptance among boys (e.g. Coleman, 1961; Jackson, 2003; Jackson & Dempster, 2009; Morris, 2011; Pascoe, 2005).

Recently, advanced computational methods have made it possible to investigate explanations drawing on these qualitative findings from a novel angle, thus complementing the qualitative approaches (Ilmarinen et al., 2023; Mittleman, 2022; Yavorsky & Buchmann, 2019). In two recent studies using the gender-diagnosticity approach (Lippa & Connelly, 1990), Mittleman (2022) and Yavorski and Buchmann (2019) showed that empirically constructed male-typicality, i.e. a summation variable based on a set of actual gender differences in the given sample of participants, is negatively related to school performance. The present research builds on and extends this research by investigating how male-typicality based on human values, personality traits, and performance in cognitive tasks is related to grade point average among Finnish 15-year-olds, with the aim of clarifying the nature of “male-typicality” that may be disadvantageous to school performance.

The current research builds on recent advances of computational methods (Ilmarinen et al., 2023; Mittleman, 2022; Yavorsky & Buchmann, 2019) in the gender-diagnosticity approach (Lippa & Connelly, 1990). In this approach, a composite variable is empirically created using supervised machine learning techniques by predicting (binary) gender with items describing participants in a penalized logistic regression model and saving the participant-specific predicted continuous log-odds scores on gender (e.g. Ilmarinen et al., 2023). The resulting variable represents a bipolar dimension ranging from “male-typical” to “female-typical” that summarizes multiple actual gender differences present in that sample of individuals. Therefore, this approach opens the possibility of investigating whether summated empirical differences between men and women, or boys and girls, capture something that may be driving gender differences in important outcomes, such as academic performance.

In contrast to conventional logistic regression models where predictor variables are directly weighted, penalized logistic regression introduces a regularization mechanism that penalizes these weights. This regularization, implemented through techniques like Lasso, ridge regression, or Elastic Net, serves different purposes, ranging from promoting sparsity to addressing multicollinearity (McNeish, 2015). However, when applied within the gender-diagnosticity approach, the primary goal is not necessarily to interpret individual predictor coefficients that have gone through regularization. Instead, the focus lies on generating gender predictions based on these coefficients with robustness against overfitting, ensuring that the model’s performance extends well to unseen data (Yarkoni & Westfall, 2017). This is achieved through cross-validation, a procedure where the model performance is evaluated on independent data partitions. Commonly, a k-fold cross-validation is applied where the number across which the performance is examined is k.

In k-fold cross-validation, if a predictor coefficient consistently performs well across different data partitions, it remains largely unaffected by penalization. Conversely, coefficients demonstrating poor cross-validation performance are penalized toward zero or set precisely to zero indicating that their predictive performance generalized poorly. This approach prevents the model from becoming overly tailored to the specific dataset at hand, thus enhancing its ability to generalize effectively. By employing penalized logistic regression over traditional non-penalized methods, a better generalizability in regression coefficients can be achieved and, consequently, more reliable individual-level gender predictions, such as male-typicality scores can be obtained.

Among the first empirical applications of the computational gender-typicality approach, Mittleman (2022) and Yavorski and Buchmann (2019) showed that empirically constructed male-typicality among adolescents was negatively related to school performance, measured by students’ GPA. These results were seminal in that they provided quantitative evidence for the idea that empirical male-typicality may produce an academic disadvantage. However, an important open question of what kind of male-typicality is responsible of this possible disadvantage, remained. The items Mittleman (2022) and Yavorski and Buchmann (2019) had at their disposal represented a diverse collection of self-reports on behaviors, preferences, and interests (e.g. “how often do you cry?” and “what is the job that you expect or plan to have at age 30?”). The items functioned well in generating the male-typicality scores but did not facilitate a clear interpretation of the resulting variable. Thus, the question about the driving force behind the postulated male-typicality academic disadvantage was left open. The relevant gender differences might be related to, for instance, beliefs, attitudes, motivation, values, or personality. Clarifying whether certain psychological domains are more important than others for the possible “male-typicality disadvantage” is an important step towards understanding educational gender gaps.

Given the (near-)endless number of psychological constructs potentially intertwining with behavior, “male-typicality vs. female-typicality” will mean many different things, depending on which theoretical framework and which variables one is working with. However, studies using theoretically justified variable sets as basis of empirical male-typicality would shed light on the phenomenon. The present study uses three variable sets with distinct theoretical and conceptual backgrounds: personal values, personality traits, and cognitive performance.

Personal values represent abstract, trans-situational goals that guide actions and evaluations through informing individuals on what is desirable and worthy (e.g. Sagiv et al., 2017). In the present study, we adopt the Schwartz’ theory of basic values (e.g. Schwartz, 1992), in which values are considered to form a hierarchy with ten lower-order basic values (e.g. Achievement, Benevolence, Self-Direction) that can be further organized into two higher-order values dimensions, Self-Enhancement vs. Self-Transcendence, and by Openness to Change vs. Conservation values. Personal values are thought of as integral to identity and as providing individuals with abstract-level life goals (e.g. Sagiv et al., 2017).

Considering adolescent values that might translate into attitudes detrimental to schoolwork and that might be more prevalent among boys than girls, power and conformism values seem most relevant. A considerable body of qualitative work in ethnography, sociology, and masculinities research has established that hardworking, conscientious, and rule-following boys are often low in status, undermined and bullied, among their peers (e.g. Coleman, 1961; Jackson, 2003; Jackson & Dempster, 2009; Morris, 2011; Musto, 2019; Pascoe, 2005). Power value in Schwartz’ theory represents valuing social status and dominance over other people and Conformism represents valuing adherence to social expectations and norms (e.g. Schwartz, 1992). In light of the above qualitative studies, both valuing social status and the need to conform to others’ (that is, peers’) expectations could be seen as leading to the rejection of the role of hardworking, rule-following student in adolescence among boys. Being responsible and playing by the rules is typically associated with femininity among adolescents (e.g., Morris, 2011; Riegle-Crumb et al., 2018). This implies that Self-Transcendence values, which represent responsibility towards others over self-interests (Schwartz, 1992) could also be relevant. To summarize, within the framework of personal values, strong adherence to Power and Conformism values and low adherence to Self-Transcendence values could be relevant for boy’s educational disadvantage.

Personality traits, on the other hand, are understood as stable tendencies to behave, feel, and think in a certain way (e.g. McCrae & Costa, 2013). Personality traits and values are moderately correlated (Parks-Leduc et al., 2015) but are considered conceptually and empirically distinct. While values represent abstract goals and valuations, personality traits represent behavioral dispositions more directly associated with everyday behaviors.

From the perspective of a possible personality-based male-typicality disadvantage in education, the traits of Conscientiousness and Agreeableness could be of particular importance. Conscientiousness encompasses individual differences in dutifulness, responsibility, industriousness, punctuality, self-control and goal-directedness, which are attributes that are likely to predict engaging in schoolwork and doing it diligently. Agreeableness captures individual differences in cooperativeness, kindness, and honesty, which can be seen as important correlates of following school rules and getting along with teachers and peers. Indeed, previous studies have shown that Conscientiousness is substantially and positively related to academic performance (Mammadov, 2021; Poropat, 2009), and Agreeableness is also positively related to school performance, though mostly during elementary and middle school (Mammadov, 2021).

Several studies have found that girls score higher on Conscientiousness and Agreeableness in childhood and adolescence than boys (e.g. Laidra et al., 2017; Soto, 2016; Van den Akker et al., 2014). Furthermore, gender differences in self-regulation and self-discipline, which can be seen as parts of Conscientiousness, have been put forth as explanations for the educational gender gap (Duckworth & Seligman, 2006). Thus, within the framework of trait theory, boys’ lower Conscientiousness and Agreeableness could help understand boys’ academic disadvantage.

Based on prior research, the “male-typicality” hypothesized to be detrimental to school performance would most likely be visible in socio-emotional or behavioral variables such as values or personality traits. However, gendered cognitive performance patterns could also contribute to the gender gap. Although there are no gender differences in general cognitive ability (e.g. Colom et al., 2002; Hedges & Nowell, 1995), more fine-grained gender differences in cognitive profiles may exist (Hedges & Nowell, 1995; Hyde, 2005; Voyer et al., 2021). As some cognitive skills might better support performance within a typical school curriculum than others, gender differences in cognitive skill sets could also be related to gender gap in school outcomes. Using a wide and diverse battery of cognitive test results as basis of our third male-typicality variable, we explore this possibility.

To sum up, we will create three empirical male-typicality variables based on the three distinct sets of variables described above. With this design, we hope to provide information about the nature of the postulated “male-typicality disadvantage” (e.g. Mittleman, 2022) in the educational context. Value-based male-typicality will reflect overall male-typicality (vs. female typicality) in personal values, personality trait -based male-typicality will reflect overall male-typicality in behavioral tendencies, and cognitive test results -based male-typicality will reflect overall male-typicality in cognitive performance. Investigating whether all, none, or some of these three variables are related to school performance will provide novel information about the postulated “male-typicality disadvantage” in the academic domain.

The data we use is collected from Finnish 15-year-olds in 2015. Finland has high level of accessibility to education (OECD, 2015), and high gender equality (World Economic Forum, 2021), but relatively high gender differences in favor of girls in school performance (OECD, 2015, 2018; Pöysä & Kupiainen, 2018). Thus, given that there are no gender-related structural obstacles in obtaining schooling, but a clear gender difference in schooling outcomes, Finland’s school system offers a good environment for testing the link between male-typicality and school performance.

Preregistration

The hypotheses and the analysis plan of this study were not preregistered. However, large parts of the analyses follow a preregistration made for another study (Ilmarinen et al., 2023). Variable selection, variable transformations, and the use of elastic net logistic regression for measuring gender atypicality were all conducted exactly in the manner that was specified in the preregistration that can be found at https://osf.io/vrzy2/. Nevertheless, the current research questions and hypotheses were not included in the preregistration. The authors declare no conflict of interest.

Data Accessibility Statement

In agreement with the Education Department of the city where the study was conducted, the data are stored on a private university network to which researchers can gain access only by application and no part of the data are allowed to be downloaded from that network to another location. Doing so would be a breach of contract. Thus, the data are not publicly available.

Participants and Procedure

Participants (total initial sample = 4074, 49.7 % male) were in their last year of Finnish comprehensive school and lower secondary education (ninth grade). The mean age of the participants was 15.79 years (SD = 0.41). Participants were from 242 classrooms from 49 urban schools in Southern Finland. Academic achievement (GPA) and gender were available for all participants. Data on each variable domain was incomplete, availability ranging between 63.6% (personality) to 88.5% (cognitive tests); thus, the effective sample size varies from 2537 to 3621. Missing values were treated in two different ways for domain-specific analyses: listwise deletion and imputation.

Personality

Personality was measured by having participants complete, in self-report format, the 30-item National Character Survey (NCS; Terracciano et al., 2005; for the approved Finnish translation, see Realo et al., 2009). This measure–designed to mimic the original 240 item NEO PI-R (Costa & McCrae, 1992)–consists of 30 bipolar items, of which each measures a facet of the FFM (Costa & McCrae, 1992). Cross-instrument correlations between the NCS personality factors and longer measures of the FFM personality factors tend to vary between .70 and .80 (Konstabel et al., 2012). Participants were instructed to rate themselves on a seven-point scale using the 30 NCS items and at the top of the questionnaire was printed ‘I am…’. For instance, the two poles of the Extraversion Warmth facet were ‘Friendly, warm, affectionate’ and ‘Cool, aloof’. Personality reports were available for 2565 participants. Reliabilities (omega hierarchical/omega total/alpha) were .65/.83/.78 for Neuroticism, .63/.80/.74 for Extraversion, .47/.66/.66 for Openness, .63/.75/.69 for Agreeableness, and .63/.80/.75 for Conscientiousness.

Values

Personal values were measured with the ten-item Short Schwartz’ Value Survey (SSVS; Lindeman & Verkasalo, 2005). SSVS has been validated in a Finnish sample against two longer instruments, the 56-item Schwartz’ Value Survey (Schwartz, 1992), and the 40-item Portrait Value Questionnaire (Schwartz et al., 2001), showing correlations of .45 to .70 between corresponding value constructs (Lindeman & Verkasalo, 2005, Study 1). SSVS also has good psychometric qualities and it reproduces the circumplex value structure postulated in the Schwartz’ value theory (Lindeman & Verkasalo, 2005). Participants were presented with the name of each of the ten basic values (Power, Achievement, Hedonism, Stimulation, Self-Direction, Universalism, Benevolence, Tradition, Conformity, and Security) identified by Schwartz theory of basic values (Schwartz, 1992) along with the corresponding items from the original Schwartz Value Survey (Schwartz, 1992). For instance, participants were asked to rate on a 9-point scale from 0 (opposed to my principles), 1 (not important), 4 (important), to 8 (of supreme importance), the importance of “Power, that is, social power, authority, wealth” and “Achievement, that is, success, capability, ambition, and influence on people and events” as life-guiding principles. Value reports were available for 2537 participants.

Cognitive performance

Nine tests of cognitive performance were administered. Altogether 3621 participants completed at least one test, and 2628 (72.6%) completed all tests. In terms of the general taxonomy of cognitive abilities (Schneider & McGrew, 2012), our tests assessed domains and sub-domains of fluid reasoning, reading and writing, short-term memory, and quantitative knowledge. The scores on the nine tests were used separately for computing an index of the male-typicality of cognitive ability because a previous study (Ilmarinen et al., 2023) showed that they clearly outperformed the summary index (g-factor) in predicting gender.

Invented mathematical concepts. In a modified version of the Creative-Quantitative test in Sternberg’s Triarchic Abilities Test (Sternberg et al., 2001), students (n = 3424) were presented with the novel concepts “lag” and “sev,” the definitions of which varied in a way that was conditioned on the involved numbers (whether the first number is greater than, equal to, or less than the second number). Participants answered to ten items (e.g., “How much is 2 sev 3 lag 4?”), each of which had four multiple-choice alternatives. The sum score on the test (one point for each correct item) was used (M = 5.88, SD = 2.59). Scale reliability, indexed as item-response theory based reliability for dichotomous data (Cheng et al., 2012) was π(2) = .79, whereas alpha for nominal scales was α = .76.

Hidden arithmetic operators. In a task based on the quantitative-relational arithmetic operators task (Demetriou et al., 1996), ten items (e.g. “6 a 2 = 3, what is a?”) with multiple choice (+, –, ×, and ÷) were given. One point was given for each correct answer and the sum of correct answers was used (n = 3135, M = 3.66, SD = 2.04, α = .75, π(2) = .80).

Visual working memory. This ten-item task measured the capacity of the visuospatial sketchpad (e.g. Logie & Pearson, 1997). Participants were presented with ten grids of different size where some of the squares were painted black. After showing the grid for three seconds, participants were asked to reproduce the grid by coloring the correct squares in an empty grid. One point was given for each correctly reproduced grid and the sum of correct answers was used as score of visual working memory (n = 3621, M = 5.55, SD = 2.42, α = .74, π(2) = .76).

Mental arithmetics. Participants listened to the teacher read aloud a mathematics problem (e.g. “Employee earned 360 euro and was paid 40 euro per day. How many days did the employee work?”) and responded on their answering sheet. The task comprised of eight items adapted from the Mental Arithmetics task of the Wechsler Adult Intelligence Scale-Revised (Wechsler, 1981). One point was given for each correct answer and the sum of correct answers was used as score of mental arithmetics (n = 3621, M = 4.73, SD = 2.47, α = .82, π(2) = .83).

Analogical reasoning. In each of eight tasks adapted from the geometric analogies test (Hosenfeld et al., 1997), participants were presented with an initial pair of geometric figures that were transformations of each other. Simultaneously, participants were to find a match for a third figure (from five options) using the same transformation as in the initial pair. One point was given for each correct answer, and the sum of correct answers was used as a score for analogical reasoning (n = 2906, M = 3.74, SD = 2.28, α = .73, π(2) = .75).

Reading comprehension: Multiple-choice. A narrative passage concerning a visit to a travel agency was followed by four multiple choice questions (four options, one of which was correct; Lehto et al., 2001). Participants were allowed to consult the passage when answering. One point was given for each correct answer, and the sum of correct answers was used as a measure of multiple-choice reading comprehension (n = 3262, M = 2.75, SD = 1.25, α = .63, π(2) = .65).

Reading comprehension: Macroprocessing. To test for multi-layer mental text representation and macroprosessing (i.e., distinguishing central themes from minor details; Lyytinen & Lehto, 1998), participants first read a passage about US cities in 19th century (279 words and six paragraphs). After this, participants selected the two most important topic statements and six main themes out of 16 statements (the eight remaining were considered minor details). To give some examples, a topic statement was “The passage tells about the development of cities in the USA in the 1800s”, a main points was “Slums were problematic neighborhoods”, and a minor detail was “Garbage had been eaten by pigs in the street”. The student had access to the text when responding. One point was given for each correctly identified statement, and the sum of correct answers was used as a measure of text macroprocessing (n = 2871, M = 7.54, SD = 3.32, α = .70, π(2) = .69).

Verbal proportional reasoning. The missing premises task was adapted from the Ross test of Higher Cognitive Processes (Ross & Ross, 1979). The task consisted of eight items each presenting participants with one premise and the conclusion. Participants then selected a second premise, based on which the conclusion would be correct, from five alternatives. Only one of the alternatives was correct, and one point was given for each correct answer. The sum of correct answers was used as a measure of verbal proportional reasoning (n = 3522, M = 4.38, SD = 1.98, α = .65, π(2) = .67).

Scientific reasoning. A Piagetian formal operations tasks was used to assess level of formal thinking (Hautamäki, 1989; Thuneberg et al., 2015). For instance, students were asked to consider F1 drivers, cars, tires, and racetracks (four variables each, all with two given values from which to select: Räikkönen, Schumacher; Ferrari, McLaren; Michelin, Bridgestone; Monaco, Silverstone). In half of the items, students were given a set of values for the four variables (such as Räikkönen, Ferrari, Michelin, Monaco) and asked to construct another set that would clarify the role of a specified variable (say, tires). Students should produce a set of values for all four variables in such a way that would allow for the focal variable to be studied in an unconfounded pair (see Strand-Cary & Klahr, 2008). In the other half of the items, the students are given a dual set (Räikkönen, McLaren, Michelin, Monaco vs. Räikkönen, Ferrari, Michelin, Monaco) and asked if this is a good test of, for example, the role of tires (in this case, the question is confounded for tires, unconfounded for the nonfocal variable car). The response options were “yes”, “I do not know”, and “no”, with “I do not know” always coded 0. The number of items was six, and one point was given for each correct answer. The sum of correct answers was used as a measure of scientific reasoning (n = 3413, M = 2.51, SD = 1.63, α = .67, π(2) = .77).

Between-classroom variation in continuous variables. Classroom membership did not, as expected, explain much variance in personality and values. Intraclass correlations (ICC) equal to or larger than .05 were only observed for the Openness to Experience item ‘Unartistic, uninterested in art - Sensitive to art and beauty’ (ICC = .05) and Universalism values (ICC = .05). For cognitive tests and grades, substantial variation between classrooms was observed for all variables as well as for general cognitive ability and GPA (ICCs for these variables ranged between .11 and .33). In line with the preregistration for an earlier study (Ilmarinen et al., 2023), we did not transform variables for which ICCs (intra-class correlations) were smaller than .05, but when ICC was larger than .05, scores on this variable were centered around the classroom mean. The cutoff point of 0.05 was arbitrary and chosen based on a common rule of thumb in the field of multilevel modeling. Centering was done around the grand mean in classrooms for which we had less than seven data points available.

Academic achievement

The dependent variable used in this study, Grade-point average (GPA), was calculated from archival data of grades on a total of 16 school subjects at the end of the school year (9th grade): native language, first foreign language, biology, physics, geography, history, chemistry, home economics, handicraft, ethical studies, visual arts, physical training, mathematics, music, health education, and social studies. Pupils are graded on a scale from 4 to 10. The total number of participants for whom all obligatory grades were available was 3991. Grade-point average (GPA) was calculated, excluding the missing values, from the arithmetic mean across school subjects (M = 8.18, SD = 0.96).

Statistical Analysis

We first computed male-typicality scores for each participant separately for each domain (personal values, personality, and cognitive ability) using regularized logistic regressions with elastic net penalty (McNeish, 2015) and back-transforming the obtained log-odds into probabilities. As noted earlier in this article, regularized regression is especially suitable when there are many highly intercorrelated potential predictor variables – the method offers parsimonious and precise models, which leads to better predictive performance in independent datasets.

The obtained male-typicality scores were a variant of the femininity-masculinity scores used in Ilmarinen et al. (2023). For the resulting variable, higher scores represented higher probability of being a boy, and lower scores represented higher probability of being a girl. These calculations were conducted with the multid (Ilmarinen, 2021) and glmnet packages (Friedman et al., 2010) in R (R Core Team, 2019). The scores were then correlated with GPA, similarly to Mittleman’s (2022) approach.

Two sets of analyses were run with somewhat different analytical pipelines. The first analysis was similar to Mittleman (2022); that is, gender was predicted in the entire original sample form which missing values were excluded in a regularized logistic regression with elastic net penalty and predicted values (male-typicality) were calculated on this same sample but still using cross-validation within 10-folds on this sample for regression coefficient regularization. These predicted values were then correlated with GPA in this same sample, separately among boys and girls.

An alternative strategy that more closely corresponds to that used in the preregistration for Ilmarinen et al. (2023) was conducted by first imputing the missing values, then splitting the data to training and testing sets (50/50), followed by estimating the log-odds coefficients in the training data first and then calculating the predicted values (male-typicality) in the independent testing data. The association between male-typicality and GPA was also calculated only in the testing data. This second procedure was repeated 1000 times to include variability in data imputation, data splits, and elastic net regression to the uncertainty of the estimates. In the regularized regression (training data), gender was regressed on each of the above-described domain- and bandwidth specific variable sets by binomial link regression in which the regularization parameter was obtained using 10-fold cross-validation that sought to minimize cross-validated prediction error. Missing data was imputed with predictive mean matching method in the mice package (Van Buuren & Groothuis-Oudshoorn, 2011).

We primarily report results provided by the first method, because it is more straightforward, and report all the deviations that the alternative strategy produced. All the analysis scripts with related output are available at https://osf.io/vrzy2/.

Statistical Power

For the gender-specific correlation estimates (about half of each domain-specific dataset), the sample sizes (ranging between n = 1300 and n = 2000) were sufficient for detecting effects between sizes r = .06 and r = .08 (see https://osf.io/vrzy2/ for power calculations).

Means, standard deviations, and intercorrelations of the measures are presented in Tables 1-3 (descriptive statistics by gender can be found in the online supplement in Supplementary Tables S4-S9). The logistic regression coefficients from the penalized elastic net models are presented in Tables 4-6. As shown there, for values, Benevolence, Universalism, Power, and Tradition had strongest contributions to the value-based male-typicality variable, with Power and Tradition endorsement predicting the likelihood of being a boy positively and endorsing Benevolence and Universalism predicting the likelihood of being a boy negatively (see Table 4). For personality traits and facets, the most important contributors were the Conscientiousness facet Dutifulness (C3), Extraversion facet Positive Emotions (E6), Openness facets Openness to Feelings (O3) and Openness to values (O6), and Neuroticism facet Anxiety (N1), all negatively predicting being a boy (see Table 5). For cognitive tests, mental arithmetics and multiple choice reading comprehension were most relevant for the male-typicality variable, with higher scores on the former positively predicting being a boy, and higher scores in the latter negatively predicting being a boy (see Table 6). Means, standard deviations and intercorrelations of the male-typicality variables and their correlations with GPA are presented in Supplementary Tables S1-S3.

Table 1.
Means, standard deviations, and correlations for personal values
Variable M SD 
            
1. Power 3.33 2.29          
2. Achievement 5.58 1.89 .41         
3. Hedonism 5.88 1.85 .22 .34        
4. Stimulation 5.47 1.94 .18 .29 .41       
5. Self-direction 5.77 1.85 .07 .25 .32 .44      
6. Universalism -0.01 2.00 -.06 .05 .17 .21 .40     
7. Benevolence 6.42 1.71 -.14 .13 .26 .23 .36 .40    
8. Tradition 4.07 2.34 .11 .12 .08 .12 .12 .10 .24   
9. Conformity 5.51 1.89 .10 .24 .18 .15 .19 .17 .42 .49  
10. Security 6.41 1.68 .04 .21 .21 .11 .21 .26 .39 .29 .46 
Variable M SD 
            
1. Power 3.33 2.29          
2. Achievement 5.58 1.89 .41         
3. Hedonism 5.88 1.85 .22 .34        
4. Stimulation 5.47 1.94 .18 .29 .41       
5. Self-direction 5.77 1.85 .07 .25 .32 .44      
6. Universalism -0.01 2.00 -.06 .05 .17 .21 .40     
7. Benevolence 6.42 1.71 -.14 .13 .26 .23 .36 .40    
8. Tradition 4.07 2.34 .11 .12 .08 .12 .12 .10 .24   
9. Conformity 5.51 1.89 .10 .24 .18 .15 .19 .17 .42 .49  
10. Security 6.41 1.68 .04 .21 .21 .11 .21 .26 .39 .29 .46 

Note. Univariate sample sizes range from 2,604 to 2,630. Bivariate sample sizes range from 2,559 to 2,601. At least one measure from n = 2,666. M and SD are used to represent mean and standard deviation, respectively. Universalism centered around classroom mean-levels.

Table 2.
Means, standard deviations, and correlations for personality facets
 M SD N1 N2 N3 N4 N5 N6 E1 E2 E3 E4 E5 E6 O1 O2 O3 O4 O5 O6 A1 A2 A3 A4 A5 A6 C1 C2 C3 C4 C5 
N1 3.09 1.50                              
N2 3.18 1.47 .44                             
N3 3.01 1.58 .50 .47                            
N4 3.01 1.47 .36 .42 .45                           
N5 3.40 1.61 .30 .28 .29 .16                          
N6 3.20 1.49 .40 .42 .32 .34 .30                         
E1 5.02 1.57 -.20 -.45 -.32 -.39 -.16 -.22                        
E2 4.94 1.72 -.26 -.25 -.38 -.50 -.10 -.17 .35                       
E3 4.61 1.48 -.24 -.27 -.31 -.52 -.10 -.30 .21 .39                      
E4 4.93 1.47 -.22 -.28 -.39 -.37 -.18 -.22 .28 .45 .30                     
E5 4.93 1.49 -.12 -.22 -.21 -.41 .01 -.21 .25 .32 .30 .35                    
E6 4.98 1.49 -.20 -.26 -.36 -.35 -.09 -.10 .33 .38 .20 .33 .24                   
O1 4.49 1.65 .02 -.14 -.03 -.16 .06 -.00 .25 .11 .10 .07 .23 .15                  
O2 -0.01 1.93 .08 .05 .03 .05 .02 .03 .04 .02 -.08 .01 .03 .07 .19                 
O3 4.99 1.55 .04 -.18 -.07 -.24 .03 -.08 .39 .21 .15 .14 .21 .20 .25 .20                
O4 4.23 1.60 -.09 -.06 -.11 -.12 -.09 -.07 .04 .25 .11 .30 .23 .20 .05 .11 .07               
O5 5.01 1.56 -.18 -.31 -.23 -.30 -.15 -.27 .27 .15 .23 .21 .31 .12 .18 .15 .27 .09              
O6 5.05 1.52 -.07 -.13 -.16 -.17 -.07 -.08 .20 .19 .05 .17 .12 .29 .12 .21 .22 .19 .21             
A1 4.24 1.53 -.13 -.30 -.17 -.24 -.12 -.14 .39 .18 .09 .14 .15 .20 .12 -.03 .23 .05 .11 .05            
A2 4.83 1.42 -.18 -.25 -.26 -.15 -.27 -.21 .35 .15 -.02 .17 .02 .19 .01 .15 .25 .05 .17 .20 .26           
A3 5.13 1.45 -.18 -.39 -.28 -.39 -.19 -.28 .57 .29 .15 .24 .25 .25 .16 .08 .45 .04 .34 .21 .33 .42          
A4 4.13 1.73 .00 -.05 -.04 .08 -.14 -.01 .08 -.00 -.17 -.03 -.14 .09 -.05 .12 .06 .11 -.05 .13 .08 .23 .15         
A5 4.58 1.35 -.02 -.17 -.01 -.03 -.13 -.15 .22 -.07 -.14 -.04 .04 .06 -.01 .03 .18 -.05 .13 .07 .16 .28 .35 .19        
A6 5.27 1.48 -.17 -.29 -.33 -.31 -.20 -.20 .49 .30 .09 .29 .23 .41 .14 .11 .38 .10 .26 .37 .26 .41 .51 .15 .24       
C1 5.09 1.48 -.31 -.48 -.37 -.46 -.23 -.38 .40 .30 .38 .34 .29 .17 .16 -.00 .21 .08 .41 .15 .20 .19 .37 -.11 .06 .27      
C2 4.65 1.61 -.24 -.25 -.28 -.20 -.32 -.22 .21 .13 .14 .23 .06 .07 -.05 .05 .11 .02 .22 .09 .11 .28 .26 .07 .10 .22 .32     
C3 5.54 1.44 -.23 -.41 -.31 -.46 -.23 -.34 .48 .25 .24 .24 .29 .24 .17 .04 .40 -.02 .39 .24 .24 .40 .61 .02 .26 .47 .47 .35    
C4 4.51 1.52 -.16 -.22 -.31 -.22 -.27 -.20 .19 .24 .21 .38 .13 .12 .03 .07 .15 .19 .27 .12 .07 .20 .18 -.02 -.01 .23 .37 .39 .28   
C5 4.63 1.45 -.22 -.29 -.25 -.28 -.25 -.40 .19 .15 .25 .25 .19 .04 .01 -.03 .15 .07 .32 .07 .08 .17 .24 -.07 .20 .14 .42 .28 .30 .36  
C6 4.56 1.52 -.16 -.17 -.18 -.10 -.31 -.20 .11 .03 .05 .11 -.04 .02 -.05 .05 .08 -.03 .18 .11 .04 .24 .22 .11 .14 .28 .22 .44 .28 .33 .19 
 M SD N1 N2 N3 N4 N5 N6 E1 E2 E3 E4 E5 E6 O1 O2 O3 O4 O5 O6 A1 A2 A3 A4 A5 A6 C1 C2 C3 C4 C5 
N1 3.09 1.50                              
N2 3.18 1.47 .44                             
N3 3.01 1.58 .50 .47                            
N4 3.01 1.47 .36 .42 .45                           
N5 3.40 1.61 .30 .28 .29 .16                          
N6 3.20 1.49 .40 .42 .32 .34 .30                         
E1 5.02 1.57 -.20 -.45 -.32 -.39 -.16 -.22                        
E2 4.94 1.72 -.26 -.25 -.38 -.50 -.10 -.17 .35                       
E3 4.61 1.48 -.24 -.27 -.31 -.52 -.10 -.30 .21 .39                      
E4 4.93 1.47 -.22 -.28 -.39 -.37 -.18 -.22 .28 .45 .30                     
E5 4.93 1.49 -.12 -.22 -.21 -.41 .01 -.21 .25 .32 .30 .35                    
E6 4.98 1.49 -.20 -.26 -.36 -.35 -.09 -.10 .33 .38 .20 .33 .24                   
O1 4.49 1.65 .02 -.14 -.03 -.16 .06 -.00 .25 .11 .10 .07 .23 .15                  
O2 -0.01 1.93 .08 .05 .03 .05 .02 .03 .04 .02 -.08 .01 .03 .07 .19                 
O3 4.99 1.55 .04 -.18 -.07 -.24 .03 -.08 .39 .21 .15 .14 .21 .20 .25 .20                
O4 4.23 1.60 -.09 -.06 -.11 -.12 -.09 -.07 .04 .25 .11 .30 .23 .20 .05 .11 .07               
O5 5.01 1.56 -.18 -.31 -.23 -.30 -.15 -.27 .27 .15 .23 .21 .31 .12 .18 .15 .27 .09              
O6 5.05 1.52 -.07 -.13 -.16 -.17 -.07 -.08 .20 .19 .05 .17 .12 .29 .12 .21 .22 .19 .21             
A1 4.24 1.53 -.13 -.30 -.17 -.24 -.12 -.14 .39 .18 .09 .14 .15 .20 .12 -.03 .23 .05 .11 .05            
A2 4.83 1.42 -.18 -.25 -.26 -.15 -.27 -.21 .35 .15 -.02 .17 .02 .19 .01 .15 .25 .05 .17 .20 .26           
A3 5.13 1.45 -.18 -.39 -.28 -.39 -.19 -.28 .57 .29 .15 .24 .25 .25 .16 .08 .45 .04 .34 .21 .33 .42          
A4 4.13 1.73 .00 -.05 -.04 .08 -.14 -.01 .08 -.00 -.17 -.03 -.14 .09 -.05 .12 .06 .11 -.05 .13 .08 .23 .15         
A5 4.58 1.35 -.02 -.17 -.01 -.03 -.13 -.15 .22 -.07 -.14 -.04 .04 .06 -.01 .03 .18 -.05 .13 .07 .16 .28 .35 .19        
A6 5.27 1.48 -.17 -.29 -.33 -.31 -.20 -.20 .49 .30 .09 .29 .23 .41 .14 .11 .38 .10 .26 .37 .26 .41 .51 .15 .24       
C1 5.09 1.48 -.31 -.48 -.37 -.46 -.23 -.38 .40 .30 .38 .34 .29 .17 .16 -.00 .21 .08 .41 .15 .20 .19 .37 -.11 .06 .27      
C2 4.65 1.61 -.24 -.25 -.28 -.20 -.32 -.22 .21 .13 .14 .23 .06 .07 -.05 .05 .11 .02 .22 .09 .11 .28 .26 .07 .10 .22 .32     
C3 5.54 1.44 -.23 -.41 -.31 -.46 -.23 -.34 .48 .25 .24 .24 .29 .24 .17 .04 .40 -.02 .39 .24 .24 .40 .61 .02 .26 .47 .47 .35    
C4 4.51 1.52 -.16 -.22 -.31 -.22 -.27 -.20 .19 .24 .21 .38 .13 .12 .03 .07 .15 .19 .27 .12 .07 .20 .18 -.02 -.01 .23 .37 .39 .28   
C5 4.63 1.45 -.22 -.29 -.25 -.28 -.25 -.40 .19 .15 .25 .25 .19 .04 .01 -.03 .15 .07 .32 .07 .08 .17 .24 -.07 .20 .14 .42 .28 .30 .36  
C6 4.56 1.52 -.16 -.17 -.18 -.10 -.31 -.20 .11 .03 .05 .11 -.04 .02 -.05 .05 .08 -.03 .18 .11 .04 .24 .22 .11 .14 .28 .22 .44 .28 .33 .19 

Note. Univariate sample sizes range from 2,474 to 2,559. Bivariate sample sizes range from 2,414 to 2,522. At least one measure from n = 2,592. M and SD are used to represent mean and standard deviation, respectively. N = Neuroticism. E = Extraversion. O = Openness to Experience. A = Agreeableness. C = Conscientiousness. O2 centered around classroom mean-level.

Table 3.
Means, standard deviations, and correlations for cognitive performance
Variable M SD 
           
1. Invented math concepts -0.03 2.33         
2. Hidden arithmethic operators -0.02 1.76 .41        
3. Visual working memory -0.03 2.19 .33 .34       
4. Mental arithmetics -0.05 2.05 .29 .27 .29      
5. Analogical reasoning -0.02 2.03 .35 .41 .33 .20     
6. Reading comprehension multiple choice -0.01 1.11 .33 .48 .23 .16 .34    
7. Verbal proportional reasoning -0.02 1.76 .43 .35 .29 .27 .34 .35   
8. Reading comprehension macroprocessing -0.02 3.01 .24 .28 .18 .13 .32 .25 .29  
9. Scientific reasoning -0.01 1.46 .44 .35 .32 .30 .32 .33 .44 .28 
Variable M SD 
           
1. Invented math concepts -0.03 2.33         
2. Hidden arithmethic operators -0.02 1.76 .41        
3. Visual working memory -0.03 2.19 .33 .34       
4. Mental arithmetics -0.05 2.05 .29 .27 .29      
5. Analogical reasoning -0.02 2.03 .35 .41 .33 .20     
6. Reading comprehension multiple choice -0.01 1.11 .33 .48 .23 .16 .34    
7. Verbal proportional reasoning -0.02 1.76 .43 .35 .29 .27 .34 .35   
8. Reading comprehension macroprocessing -0.02 3.01 .24 .28 .18 .13 .32 .25 .29  
9. Scientific reasoning -0.01 1.46 .44 .35 .32 .30 .32 .33 .44 .28 

Note. Univariate sample sizes range from 2,860 to 3,606. Bivariate sample sizes range from 2,723 to 3,606. At least one measure from n = 3,606. M and SD are used to represent mean and standard deviation, respectively. All variables centered around classroom mean-levels.

Table 4.
Coefficient Weight Distributions across Training Set Permutations for Personal Values
 SE 2.5% 97.5% OR Non-zero 
Intercept 1.06 0.25 0.56 1.54  1000 
PO 0.16 0.02 0.12 0.20 1.18 1000 
AC 0.01 0.02 -0.02 0.06 1.01 663 
HE -0.01 0.02 -0.06 0.04 0.99 575 
ST -0.04 0.02 -0.09 0.00 0.96 930 
SD -0.01 0.02 -0.06 0.02 0.99 663 
UN -0.10 0.02 -0.15 -0.05 0.91 1000 
BE -0.28 0.03 -0.35 -0.22 0.75 1000 
TR 0.08 0.02 0.04 0.13 1.09 1000 
CO 0.04 0.03 0.00 0.10 1.05 891 
SE -0.04 0.03 -0.10 0.00 0.96 826 
 SE 2.5% 97.5% OR Non-zero 
Intercept 1.06 0.25 0.56 1.54  1000 
PO 0.16 0.02 0.12 0.20 1.18 1000 
AC 0.01 0.02 -0.02 0.06 1.01 663 
HE -0.01 0.02 -0.06 0.04 0.99 575 
ST -0.04 0.02 -0.09 0.00 0.96 930 
SD -0.01 0.02 -0.06 0.02 0.99 663 
UN -0.10 0.02 -0.15 -0.05 0.91 1000 
BE -0.28 0.03 -0.35 -0.22 0.75 1000 
TR 0.08 0.02 0.04 0.13 1.09 1000 
CO 0.04 0.03 0.00 0.10 1.05 891 
SE -0.04 0.03 -0.10 0.00 0.96 826 

Note: PO = Power. AC = Achievement. HE = Hedonism. ST = Stimulation. SD = Self-direction. UN = Universalism. BE = Benevolence. TR = Tradition. CO = Conservation. SE = Security. OR = Odds-ratio. Non-zero = Number of permutations in which the coefficient was not penalized to zero. B = log odds of being male. OR = Odds Ratio of being male. Table copied and redistributed from Ilmarinen et al. (2023) under CC BY 4.0 license (https://creativecommons.org/licenses/by/4.0/). No changes, except for table numbering, were made.

Table 5.
Coefficient Weight Distributions across Training Set Permutations for Personality Facets
 SE 2.5% 97.5% OR Non-zero 
Intercept 7.44 0.82 5.86 9.01  1000 
N1: Anxiety -0.18 0.04 -0.25 -0.10 0.84 1000 
E1: Warmth -0.02 0.03 -0.08 0.00 0.98 606 
O1: Fantasy 0.01 0.02 -0.01 0.07 1.01 604 
A1: Trust 0.02 0.02 -0.01 0.07 1.02 622 
C1: Competence -0.04 0.04 -0.12 0.00 0.96 756 
N2: Angry hostility -0.06 0.04 -0.14 0.00 0.94 928 
E2: Gregariousness -0.03 0.03 -0.09 0.00 0.97 772 
O2: Aesthetics -0.12 0.02 -0.17 -0.07 0.89 1000 
A2: Straightforwardness -0.02 0.03 -0.09 0.00 0.98 677 
C2: Order -0.04 0.03 -0.10 0.00 0.96 841 
N3: Depression -0.13 0.04 -0.20 -0.05 0.88 1000 
E3: Assertiveness 0.05 0.04 0.00 0.13 1.06 905 
O3: Feelings -0.20 0.03 -0.26 -0.13 0.82 1000 
A3: Altruism -0.01 0.02 -0.06 0.04 0.99 377 
C3: Dutifulness -0.20 0.04 -0.29 -0.12 0.82 1000 
N4: Self-consciousness -0.05 0.04 -0.14 0.00 0.95 859 
E4: Activity -0.05 0.04 -0.13 0.00 0.95 893 
O4: Actions 0.00 0.02 -0.04 0.04 1.00 495 
A4: Compliance -0.01 0.02 -0.05 0.02 0.99 518 
C4: Achievement striving -0.02 0.02 -0.08 0.01 0.98 591 
N5: Impulsiveness -0.08 0.03 -0.14 -0.01 0.93 988 
E5: Excitement-seeking -0.05 0.03 -0.13 0.00 0.95 919 
O5: Ideas 0.00 0.02 -0.05 0.03 1.00 422 
A5: Modesty -0.11 0.04 -0.18 -0.03 0.90 998 
C5: Self-discipline 0.03 0.03 0.00 0.10 1.03 753 
N6: Vulnerability -0.09 0.04 -0.16 -0.01 0.92 985 
E6: Positive emotions -0.19 0.04 -0.27 -0.12 0.82 1000 
O6: Values -0.20 0.03 -0.26 -0.14 0.82 1000 
A6: Tender-mindedness -0.05 0.04 -0.13 0.00 0.95 896 
C6: Deliberation -0.02 0.03 -0.09 0.00 0.98 685 
 SE 2.5% 97.5% OR Non-zero 
Intercept 7.44 0.82 5.86 9.01  1000 
N1: Anxiety -0.18 0.04 -0.25 -0.10 0.84 1000 
E1: Warmth -0.02 0.03 -0.08 0.00 0.98 606 
O1: Fantasy 0.01 0.02 -0.01 0.07 1.01 604 
A1: Trust 0.02 0.02 -0.01 0.07 1.02 622 
C1: Competence -0.04 0.04 -0.12 0.00 0.96 756 
N2: Angry hostility -0.06 0.04 -0.14 0.00 0.94 928 
E2: Gregariousness -0.03 0.03 -0.09 0.00 0.97 772 
O2: Aesthetics -0.12 0.02 -0.17 -0.07 0.89 1000 
A2: Straightforwardness -0.02 0.03 -0.09 0.00 0.98 677 
C2: Order -0.04 0.03 -0.10 0.00 0.96 841 
N3: Depression -0.13 0.04 -0.20 -0.05 0.88 1000 
E3: Assertiveness 0.05 0.04 0.00 0.13 1.06 905 
O3: Feelings -0.20 0.03 -0.26 -0.13 0.82 1000 
A3: Altruism -0.01 0.02 -0.06 0.04 0.99 377 
C3: Dutifulness -0.20 0.04 -0.29 -0.12 0.82 1000 
N4: Self-consciousness -0.05 0.04 -0.14 0.00 0.95 859 
E4: Activity -0.05 0.04 -0.13 0.00 0.95 893 
O4: Actions 0.00 0.02 -0.04 0.04 1.00 495 
A4: Compliance -0.01 0.02 -0.05 0.02 0.99 518 
C4: Achievement striving -0.02 0.02 -0.08 0.01 0.98 591 
N5: Impulsiveness -0.08 0.03 -0.14 -0.01 0.93 988 
E5: Excitement-seeking -0.05 0.03 -0.13 0.00 0.95 919 
O5: Ideas 0.00 0.02 -0.05 0.03 1.00 422 
A5: Modesty -0.11 0.04 -0.18 -0.03 0.90 998 
C5: Self-discipline 0.03 0.03 0.00 0.10 1.03 753 
N6: Vulnerability -0.09 0.04 -0.16 -0.01 0.92 985 
E6: Positive emotions -0.19 0.04 -0.27 -0.12 0.82 1000 
O6: Values -0.20 0.03 -0.26 -0.14 0.82 1000 
A6: Tender-mindedness -0.05 0.04 -0.13 0.00 0.95 896 
C6: Deliberation -0.02 0.03 -0.09 0.00 0.98 685 

Note: N = Neuroticism. E = Extraversion. O = Openness to Experience. A = Agreeableness. C = Conscientiousness. OR = Odds-ratio. Non-zero = Number of permutations in which the coefficient was not penalized to zero. B = log odds of being male. OR = Odds Ratio of being male. Table copied and redistributed from Ilmarinen et al. (2023) under CC BY 4.0 license (https://creativecommons.org/licenses/by/4.0/). No changes, except for table numbering, were made.

Table 6.
Coefficient Weight Distributions across Training Set Permutations for Cognitive Performance
 SE 2.5% 97.5% OR Non-zero 
Intercept -0.01 0.04 -0.08 0.06  1000 
Invented mathematical concepts -0.01 0.02 -0.04 0.02 0.99 709 
Hidden arithmetic operators -0.06 0.03 -0.12 -0.01 0.94 993 
Visual working memory 0.04 0.02 0.00 0.07 1.04 954 
Mental arithmetics 0.25 0.02 0.21 0.29 1.28 1000 
Analogical reasoning -0.05 0.02 -0.09 0.00 0.95 974 
Reading comprehension: Multiple-choice -0.24 0.04 -0.32 -0.17 0.78 1000 
Verbal proportional reasoning -0.08 0.02 -0.13 -0.04 0.92 1000 
Reading comprehension: Macroprocessing -0.03 0.01 -0.06 -0.01 0.97 994 
Scientific reasoning 0.03 0.03 -0.01 0.09 1.03 792 
 SE 2.5% 97.5% OR Non-zero 
Intercept -0.01 0.04 -0.08 0.06  1000 
Invented mathematical concepts -0.01 0.02 -0.04 0.02 0.99 709 
Hidden arithmetic operators -0.06 0.03 -0.12 -0.01 0.94 993 
Visual working memory 0.04 0.02 0.00 0.07 1.04 954 
Mental arithmetics 0.25 0.02 0.21 0.29 1.28 1000 
Analogical reasoning -0.05 0.02 -0.09 0.00 0.95 974 
Reading comprehension: Multiple-choice -0.24 0.04 -0.32 -0.17 0.78 1000 
Verbal proportional reasoning -0.08 0.02 -0.13 -0.04 0.92 1000 
Reading comprehension: Macroprocessing -0.03 0.01 -0.06 -0.01 0.97 994 
Scientific reasoning 0.03 0.03 -0.01 0.09 1.03 792 

Note: OR = Odds-ratio. Non-zero = Number of permutations in which the coefficient was not penalized to zero. B = log odds of being male. OR = Odds Ratio of being male. Table copied and redistributed from Ilmarinen et al. (2023) under CC BY 4.0 license (https://creativecommons.org/licenses/by/4.0/). No changes, except for table numbering, were made.

Next, we turned to our main analyses, predicting GPA from the male-typicality variables. These analyses are conducted separately for boys and girls because the male-typicality variable is confounded with gender differences (for confounded overall associations between male-typicality and GPA, see Supplementary Figure S1). Male-typicality in personality was not associated with GPA among boys r(855) = .00, p = .888, 95% CI [-.07, .06] or girls r(1106) = -.04, p = .202, 95% CI [-.10, .02]. However, male-typicality in both personal values and in cognitive ability profiles were associated with GPA. Male-typical girls had lower GPA scores, r(1277) = -.15, p \< .001, 95% CI [-.20, -.09] and r(1278) = -.07, p = .007, 95% CI [-.13, -.02], for personal values and cognitive test profiles, respectively. Male-typical boys had lower GPA scores, r(1111) = -.13, p \< .001, 95% [-.19, -.08] and r(1337) = -.07, p = .014, 95% CI [-.12, -.01], for personal values and cognitive test profiles, respectively. These associations are depicted in Figure 1 (value-based male-typicality and GPA), Figure 2 (cognitive performance-based male-typicality and GPA), and Figure 3 (personality-based male-typicality). All interpretations based on the above analyses were supported by the alternative set of analyses that employed the above described different analytical pipeline (missing data imputation and split to training and testing; see https://osf.io/vrzy2/).

Figure 1.
Associations between gender atypicality in values and academic achievement
Figure 1.
Associations between gender atypicality in values and academic achievement
Close modal
Figure 2.
Associations between gender atypicality in cognitive performance and academic achievement
Figure 2.
Associations between gender atypicality in cognitive performance and academic achievement
Close modal
Figure 3.
Associations between gender atypicality in personality and academic achievement
Figure 3.
Associations between gender atypicality in personality and academic achievement
Close modal

We also compared the strengths of these correlations. For boys, the association with GPA for male-typicality in values was stronger than for male-typicality in personality, z = -3.40, p \< .001. The difference between association with GPA for male-typicality in values as compared to that for male-typicality in cognitive performance was non-significant, z = -1.83, p = .067, and the latter association also did not significantly differ from the association between GPA and male-typicality in personality, z = -1.50, p = .134. For girls, male-typicality in values was more strongly association with GPA than male-typicality in personality, z = -2.95, p = .003, and male-typicality in cognitive performance, z = -2.20, p = .028, but male-typicalities in personality and cognitive performance domains showed no difference in associations with GPA, z = -.50, p = .615.

We also explored the possibility for non-linearity in the associations between male-typicality measures and GPA by estimating quantile correlation coefficients (Choi & Shin, 2022) at 10th and 90th score percentiles of these measures and compared these to the above-reported correlations that are indicative of the average linear association. The bootstrap 95% confidence intervals for the quantile correlation coefficients included the reported correlation coefficients (see supplementary tables S13 and S14), and therefore there was no general support for male-typicality or female-typicality or lower or higher GPA tails of the continuums driving these associations. There was one exception, however: for girls, the quantile correlation coefficient estimated at 90th percentiles of value male-typicality and GPA was weaker than the average association (r = -.15), ρ.90 = -.05, 95% CI [-.14, -.01]. Nevertheless, considering that this was a data exploration with multiple tests conducted for tail-dependence, this should only be interpreted as anecdotal weak pattern that should be examined more thoroughly in future studies. Dashed lines in Figures 1-3 are smoothed lines obtained with general additive modeling approach that depict the possible non-linearities as deviations from the linear trend lines.

The widespread and robust gender differences in educational and academic outcomes favoring girls and women (Buchmann et al., 2008; England et al., 2020; Mittleman, 2022; OECD, 2015, 2018) have been the subject of much interest and research in the social and educational sciences. One prominent explanation for the gender gap, emerging from ethnographic and sociological work (Jackson, 2003; Morris, 2011; Musto, 2019; Pascoe, 2005), refers to the difficulties boys often face in reconciling conscientious and studious behavior with peer acceptance (Epstein, 1998; Jackson, 2003; Mittleman, 2022; Pascoe, 2003).

In line with this explanation, we found, in a sample of Finnish 15-year-olds that male-typicality based on values–a construct strongly entwined with personal identity, goals, and worldviews–was negatively related to GPA. Male-typicality in cognitive profiles based on a variety of heterogeneous cognitive tasks was also negatively – though very weakly – related to GPA. By contrast, male-typicality based on personality traits was unrelated to GPA, even though personality-based male-typicality variable clearly separated girls and boys.

The results suggest that the framework provided by personal values may be helpful in understanding why male-typicality can be harmful to school performance. The values that most strongly contributed to the value-based male-typicality variable were Benevolence and Power, followed by Universalism and Tradition (see Table 4). Valuing Power and Tradition while placing low value in Benevolence and Universalism was most strongly diagnostic of being a boy in the present sample. Power values represent prioritizing social status and dominance over other people and resources, whereas Universalism and Benevolence represent altruistic goals and caring for other people and for the nature (Schwartz, 1992). Tradition value represents respecting and committing to the traditional and/or religious customs and ideas of one’s culture (Schwartz, 1992). In sum, the results suggest that the “male-typicality” that has been linked to an academic disadvantage (Mittleman, 2022; Yavorsky & Buchmann, 2019) can in part be understood in terms of gender differences in personal values—boys prioritizing self-enhancement and conservatism over self-transcendence.

Male-typicality in personality was unrelated to GPA. This can be considered somewhat surprising, first, because there were gender differences in personality in the current sample, and second, because differences in behavioral tendencies—especially those related to Conscientiousness—between boys and girls have been cited as an important reason for the gender gap (e.g. Duckworth & Seligman, 2006). However, the present results suggest that even when gender differences in personality trait profiles are found, these differences are not necessarily related to GPA. It should be noted, though, that for the personality-based male-typicality variable, the Conscientiousness facet Dutifulness was among the most important contributors, but along with several others, such as the Neuroticism facets Anxiety and Depression, the Openness facets Openness to Feelings and Openness to Ideas, and the Extraversion facet Positive Emotions (higher scores on these facets meant lower male-typicality, i.e. higher “girl-typicality”; see Table 5). Thus, Conscientiousness did not emerge as a particularly prominent personality trait in distinguishing between boys and girls. This could in part explain why the gender gap in GPA was unrelated to differences in girls’ and boys’ personality profiles.

The personality-related results are interesting, as gender differences in behavioral tendencies, and schools favoring girl-typical behavioral tendencies have in the popular literature been cited as a potential cause of the academic gender gap (Sommers, 2000; Tyre, 2008). Policymakers and gender researchers have criticized such accounts on empirical and theoretical basis (King, 2000; Morris, 2011). The present results provide a novel angle to the above discussion: a nuanced variable created by capitalizing on gender differences in personality was found to be unrelated to GPA. Thus, the present results suggest that gender differences in behavioral tendencies, conceptualized as personality traits, are not behind the gender gap, at least not in Finland (which has quite large educational gender gap). Of course, behavior in specific situations is not strongly determined by personality traits, so gender differences in actual behavior may still play a role. However, in the present sample, personality traits did not.

Results regarding cognitive profiles mirrored those of values: having a male-typical cognitive performance profile predicted (very slightly) lower GPA. To our knowledge, this is the first time a gender-diagnosticity approach based on cognitive performance has been investigated in relation to school performance. The mental arithmetic task and the reading comprehension multiple choice task had most weight in the cognitive performance-based male-typicality variable (see Table 6). Boys scored better in the former and girls in the latter. Verbal proportional reasoning, hidden arithmetic operators, and analogical reasoning tasks also carried moderate amount of weight; girls outperformed boys in these three tasks. Considering this, one explanation for the results could be that verbal skills carry a lot of weight in school performance among adolescence and help the student to do well in general. Previous research has suggested that girls’ better verbal skills can only explain grade differences to a small extent (Calvin et al., 2010; Spinath et al., 2014). Given that gender-based profile differences in cognitive abilities were moderate, and their predictive power with regards to GPA was very small, the present results are in line with these previous results (Calvin et al., 2010). However, as most schoolwork in adolescence requires some level of reading and/or verbal reasoning skills, it seems plausible that a cognitive ability profile that emphasizes verbal skills—a “girl-typical” profile in the present study—may give girls a small edge.

Given the novelty of the gender-diagnosticity method and its disparity in comparison to existing measures and approaches, the utility of our male-typicality construct can be legitimately challenged – is it merely a statistical proxy, not a substantial construct? In response, we argue, first, that given the ubiquity of the gender gap in education, there is a need to investigate gender differences in relation to educational outcomes from many perspectives, including the gender-diagnosticity approach. Second, the gender-diagnosticity method offers distinct advantages over traditional approaches to “genderedness” by prioritizing predictive performance across diverse datasets rather than relying solely on researcher-selected variables weighted according to researcher’s perception of masculinity or femininity. Unlike methods based on predetermined gender stereotypes, which may inadvertently incorporate biased or stereotypical content into their weighting schemes, our approach aims to mitigate such influences. By adopting a bottom-up, empirical approach, our method ensures that variables are weighted based on their ability to reliably predict gender across different data partitions. The variables demonstrating consistent predictive power are included in male-typicality scores, while those that do not meet this criterion are excluded. This data-driven strategy not only reduces the risk of stereotype-driven bias but also allows for the incorporation of more nuanced or surprising content that may be overlooked by researcher-selected measures of femininity or masculinity.

Moreover, in contrast to more subjective methods, potentially undermined by cultural stereotypes and researchers’ biases, the gender-diagnosticity approach—with a measure that is keyed to a criterion—offers a measure that allows for the evaluation of the performance of the measure. That is, the researcher has an estimate, be it high or low, of how well the male-typicality measure that the gender-diagnosticity approach offers performs in predicting known gender. This is not the case if the researcher constructs a measure by picking (or writing) items judged by someone to be relevant for masculinity and femininity (and weights them in some way). Even by taking the extra step of validating such a subjectively constructed measured with gender as a criterion, the performance of the measure could not surpass the performance of the measure that the gender-diagnosticity approach, with its use of robust cross-validation, can offer (given, of course, the same initial set of items).

Limitations

The first and foremost limitation of the present study was that it was fully exploratory; the hypotheses were not pre-registered or set in advance. This makes it imperative to view all results and conclusions as preliminary and subject to alternative interpretations. Second, our main finding was that differences in empirically constructed male-typicality based on personal values and cognitive profiles predict school performance. However, the associations were small and can only explain a small proportion of the gender gap in educational outcomes. Third, we did not investigate mechanisms that could help explain why value profiles or cognitive profiles are linked to grades. Therefore, the associations observed remain descriptive. Fourth, our results are fully correlational and we cannot draw causal conclusions about the relations between gender, values, personality, cognitive performance, and grades.

The use of bipolar male-typicality vs. female-typicality variable may evoke questions about construct validity, as it can be argued that male-typicality and female-typicality would be better conceptualized as separate, unipolar dimensions, as is often done in the literature. This is a valid criticism, and the use of a bipolar dimension is likely to lose nuance in male-typicality and female-typicality. However, the use of a bipolar dimension is unavoidable when using an empirical approach to male-typicality, and we believe that the advantages of the empirical approach outweigh this limitation. Such advantages include avoiding stereotype influences in responding, making use of reliable population patterns, excluding irrelevant content, and using cross-validation (see also Revelle, 2024, with regards to the benefits of using predictive modeling to construct measures).

Finally, our study rationale was inspired by the qualitative findings according to which for boys it is often difficult to reconcile being hard-working and studious with peer acceptance (e.g. Musto, 2019; Pascoe, 2005; Plummer, 2001). However, adolescent boys’ peer groups are not a monolith, and there are likely plenty of different routes to peer acceptance depending on many factors related to the boys themselves and to the idiographic social environment they grow up and go to school in. Our study design lacked nuance in this sense.

Conclusions

A value profile typical for boys related to prioritizing power and dominance over other-focus and altruism was revealed in the present study, and endorsement of such a profile predicted lower 9th-grade GPA. As values have no inherent connection to cognitive ability or school performance, we believe that this type of value profile, more common among boys than girls, promotes the rejection of the role of a “good student” and, therefore, may undermine one’s academic performance, offering a possible, partial explanation for the gender gap in educational outcomes.

Substantial contributions to conception and design: VJI, JEL, SL

Contributed to acquisition of data: MPV

Contributed to analysis and interpretation of data: VJI

Drafted the article: SL

Revised the article: SL, VJI, JEL

Approved the submitted version for publication: SL, VJI, MPV, JEL

We have no known conflict of interest to disclose.

This research was supported by Academy of Finland, Grant numbers 338891 and 309537. Open access funded by Helsinki University Library.

Buchmann, C., & DiPrete, T. A. (2006). The growing female advantage in college completion: The role of family background and academic achievement. American Sociological Review, 71(4), 515–541. https://doi.org/10.1177/000312240607100401
Buchmann, C., DiPrete, T. A., & McDaniel, A. (2008). Gender inequalities in education. Annual Review of Sociology, 34, 319–337. https://doi.org/10.1146/annurev.soc.34.040507.134719
Calvin, C. M., Fernandes, C., Smith, P., Visscher, P. M., & Deary, I. J. (2010). Sex, Intelligence and Educational Achievement in a National Cohort of Over 175,000 11-year-old School Children in England. Intelligence, 38(4), 424–432. https://doi.org/10.1016/j.intell.2010.04.005
Cheng, Y., Yuan, K. H., & Liu, C. (2012). Comparison of reliability measures under factor analysis and item response theory. Educational and Psychological Measurement, 72(1), 52–67. https://doi.org/10.1177/0013164411407315
Choi, J. E., & Shin, D. W. (2022). Quantile correlation coefficient: a new tail dependence measure. Statistical Papers, 63(4), 1075–1104. https://doi.org/10.1007/s00362-021-01268-7
Coleman, J. S. (1961). The Adolescent Society. Free Press.
Colom, R., García, L. F., Juan-Espinosa, M., & Abad, F. J. (2002). Null sex differences in general intelligence: Evidence from the WAIS-III. The Spanish Journal of Psychology, 5(1), 29–35. https://doi.org/10.1017/S1138741600005801
Costa, P. T., & McCrae, R. R. (1992). Neo personality inventory-revised (NEO PI-R). Psychological Assessment Resources.
Duckworth, A. L., & Seligman, M. E. (2006). Self-discipline gives girls the edge: Gender in self-discipline, grades, and achievement test scores. Journal of Educational Psychology, 98(1), 198–208. https://doi.org/10.1037/0022-0663.98.1.198
England, P., Levine, A., & Mishel, E. (2020). Progress toward gender equality in the United States has slowed or stalled. Proceedings of the National Academy of Sciences, 117(13), 6990–6997. https://doi.org/10.1073/pnas.1918891117
Epstein, D. (1998). Real Boys Don’t Work: Underachievement, Masculinity, and the Harassment of Sissies. In D. Epstein, J. Elwood, V. Heym, & J. Maw (Eds.), Failing Boys? Issues in Gender and Achievement (pp. 96–108). Open University Press.
Eurostat. (2020). Early leavers from education and training by sex and labour status. http://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=edat_lfse_14=en
Friedman, J., Hastie, T., Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/10.18637/jss.v033.i01
Grant, M. J., Behrman, J. R. (2010). Gender gaps in educational attainment in less developed countries. Population and Development Review, 36(1), 71–89. https://doi.org/10.1111/j.1728-4457.2010.00318.x
Hautamäki, J. (1989). The application of a Rasch model on Piagetian measures of stages of thinking. In P. Adey, J. Bliss, J. Head, M. Shayer (Eds.), Adolescent development and school science (pp. 342–349). Falmer Press.
Hedges, L. V., Nowell, A. (1995). Sex differences in mental test scores, variability, and numbers of high-scoring individuals. Science, 269(5220), 41–45. https://doi.org/10.1126/science.7604277
Hosenfeld, B., van den Boom, D. C., Resing, W. C. M. (1997). Constructing geometric analogies for the longitudinal testing of elementary school children. Journal of Educational Measurement, 34(4), 367–372. https://doi.org/10.1111/j.1745-3984.1997.tb00524.x
Hyde, J. S. (2005). The gender similarities hypothesis. American Psychologist, 60(6), 581–592. https://doi.org/10.1037/0003-066X.60.6.581
Ilmarinen, V. J. (2021). multid: Multivariate Difference between Two Groups. R package version 0.7.0. https://CRAN.R-project.org/package=multid
Ilmarinen, V. J., Vainikainen, M. P., Lönnqvist, J. E. (2023). Is There a g-factor of genderedness? Using a Continuous Measure of Genderedness to Assess Sex Differences in Personality, Values, Cognitive Ability, School Grades, and Educational Track. European Journal of Personality. https://doi.org/10.1177/08902070221088155
Jackson, C. (2003). Motives for ‘laddishness’ at school: Fear of failure and fear of the ‘feminine.’ British Educational Research Journal, 29(4), 583–598. https://doi.org/10.1080/01411920301847
Jackson, C., Dempster, S. (2009). ‘I sat back on my computer… with a bottle of whisky next to me’: Constructing ‘cool’ masculinity through ‘effortless’ achievement in secondary and higher education. Journal of Gender Studies, 18(4), 341–356. https://doi.org/10.1080/09589230903260019
King, J. E. (2000). Gender Equity in Higher Education: Are Male Students at a Disadvantage? American Council on Education.
Konstabel, K., Lönnqvist, J. E., Walkowitz, G., Konstabel, K., Verkasalo, M. (2012). The ‘Short Five’(S5): Measuring personality traits using comprehensive single items. European Journal of Personality, 26(1), 13–29. https://doi.org/10.1002/per.813
Laidra, K., De Fruyt, F., Konstabel, K. (2017). Assessing childhood personality with the Estonian short version of the Hierarchical Personality Inventory for Children (HiPIC). Personality and Individual Differences, 112, 31–36. https://doi.org/10.1016/j.paid.2017.02.050
Lehto, J., Scheinin, P., Kupiainen, S., Hautamäki, J. (2001). National survey of reading comprehension in Finland. Journal of Research in Reading, 24(1), 99–110. https://doi.org/10.1111/1467-9817.00135
Lindeman, M., Verkasalo, M. (2005). Measuring values with the short Schwartz’s value survey. Journal of Personality Assessment, 85(2), 170–178. https://doi.org/10.1207/s15327752jpa8502_09
Lippa, R., Connelly, S. (1990). Gender diagnosticity: A new Bayesian approach to gender-related individual differences. Journal of Personality and Social Psychology, 59(5), 1051–1065. https://doi.org/10.1037/0022-3514.59.5.1051
Logie, R. H., Pearson, D. G. (1997). The inner eye and the inner scribe of visuo-spatial working memory: Evidence from developmental fractionation. European Journal of Cognitive Psychology, 9(3), 241–257. https://doi.org/10.1080/713752559
Lyytinen, S., Lehto, J. E. (1998). Hierarchy rating as a measure of text macroprocessing: Relationship with working memory and school achievement. Educational Psychology, 18(2), 157–169. https://doi.org/10.1080/0144341980180202
Mammadov, S. (2021). The Development of the High Ability Child. Routledge.
McCrae, R. R., Costa, P. T., Jr. (2013). Introduction to the empirical and theoretical status of the five-factor model of personality traits. In T. A. Widiger P. T. Costa Jr. (Eds.), Personality disorders and the five-factor model of personality (pp. 15–27). American Psychological Association. https://doi.org/10.1037/13939-002
McNeish, D. M. (2015). Using lasso for predictor selection and to assuage overfitting: A method long overlooked in behavioral sciences. Multivariate Behavioral Research, 50(5), 471–484. https://doi.org/10.1080/00273171.2015.1036965
Mittleman, J. (2022). Intersecting the academic gender gap: The education of lesbian, gay, and bisexual America. American Sociological Review. https://doi.org/10.31235/osf.io/26a8d
Morris, E. W. (2011). Bridging the gap: ‘Doing gender’,‘hegemonic masculinity’, and the educational troubles of boys. Sociology Compass, 5(1), 92–103. https://doi.org/10.1111/j.1751-9020.2010.00351.x
Musto, M. (2019). Brilliant or bad: The gendered social construction of exceptionalism in early adolescence. American Sociological Review, 84(3), 369–393. https://doi.org/10.1177/0003122419837567
O’Dea, R. E., Lagisz, M., Jennions, M. D., Nakagawa, S. (2018). Gender differences in individual variation in academic grades fail to fit expected patterns for STEM. Nature Communications, 9(1), 1–8. https://doi.org/10.1038/s41467-018-06292-0
OECD. (2015). Education at a Glance 2015: OECD Indicators. OECD Publishing. https://doi.org/10.1787/eag-2015-en
OECD. (2018). Education at a Glance 2018: OECD Indicators. OECD Publishing. https://doi.org/10.1787/eag-2018-en
Parks-Leduc, L., Feldman, G., Bardi, A. (2015). Personality traits and personal values: A meta-analysis. Personality and Social Psychology Review, 19(1), 3–29. https://doi.org/10.1177/1088868314538548
Pascoe, C. J. (2003). Multiple masculinities? Teenage boys talk about jocks and gender. American Behavioral Scientist, 46(10), 1423–1438. https://doi.org/10.1177/0002764203046010009
Pascoe, C. J. (2005). ‘Dude, you’re a fag’: Adolescent masculinity and the fag discourse. Sexualities, 8(3), 329–346. https://doi.org/10.1177/1363460705053337
Plummer, D. C. (2001). The quest for modern manhood: Masculine stereotypes, peer culture and the social significance of homophobia. Journal of Adolescence, 24(1), 15–23. https://doi.org/10.1006/jado.2000.0370
Poropat, A. E. (2009). A meta-analysis of the five-factor model of personality and academic performance. Psychological Bulletin, 135(2), 322–338. https://doi.org/10.1037/a0014996
Pöysä, S., Kupiainen, S. (2018). Tytöt ja pojat koulussa. Miten selättää poikien heikko suoriutuminen peruskoulussa? [Girls and boys in school. How to overcome boys’ weak elementary school performance?]. Valtioneuvoston selvitys- ja tutkimustoiminnan julkaisusarja 36/2018. http://urn.fi/URN:ISBN:978-952-287-541-9
R Core Team. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Realo, A., Allik, J., Lönnqvist, J. E., Verkasalo, M., Kwiatkowska, A., Kööts, L., … Renge, V. (2009). Mechanisms of the national character stereotype: How people in six neighbouring countries of Russia describe themselves and the typical Russian. European Journal of Personality, 23(3), 229–249. https://doi.org/10.1002/per.719
Revelle, W. (2024). The seductive beauty of latent variable models: Or why I don’t believe in the Easter Bunny. Personality and Individual Differences, 221, 112552. https://doi.org/10.1016/j.paid.2024.112552
Riegle-Crumb, C., Kyte, S. B., Morton, K. (2018). Gender and racial/ethnic differences in educational outcomes: Examining patterns, explanations, and new directions for research. In Handbook of the Sociology of Education in the 21st Century (pp. 131–152). https://doi.org/10.1007/978-3-319-76694-2_6
Ross, J. D., Ross, C. M. (1979). Ross test of higher cognitive processes. Academic Therapy.
Sagiv, L., Roccas, S., Cieciuch, J., Schwartz, S. H. (2017). Personal values in human life. Nature Human Behaviour, 1(9), 630–639. https://doi.org/10.1038/s41562-017-0185-3
Schneider, W. J., McGrew, K. S. (2012). The Cattell-Horn-Carroll model of intelligence. In D. P. Flanagan P. L. Harrison (Eds.), Contemporary intellectual assessment: Theories, tests, and issues (pp. 99–144). The Guilford Press.
Schwartz, S. H. (1992). Universals in the content and structure of values: Theoretical advances and empirical tests in 20 countries. In M. Zanna (Ed.), Advances in experimental social psychology (Vol. 25, pp. 1–65). Academic Press. https://doi.org/10.1016/S0065-2601(08)60281-6
Schwartz, S. H., Melech, G., Lehmann, A., Burgess, S., Harris, M., Owens, V. (2001). Extending the cross-cultural validity of the theory of basic human values with a different method of measurement. Journal of Cross-Cultural Psychology, 32, 519–542. https://doi.org/10.1177/0022022101032005001
Snyder, T. D., Dillow, S. A. (2011). Digest of Education Statistics, 2010 (No. NCES 2011-015). National Center for Education Statistics. http://ies.ed.gov/pubsearch/pubsinfo.asp?pubid=2011015
Sommers, C. H. (2000). The war against boys: How misguided feminism is harming our young men. Simon and Schuster.
Soto, C. J. (2016). The little six personality dimensions from early childhood to early adulthood: Mean-level age and gender differences in parents’ reports. Journal of Personality, 84(4), 409–422. https://doi.org/10.1111/jopy.12168
Spinath, B., Eckert, C., Steinmayr, R. (2014). Gender differences in school success: What are the roles of students’ intelligence, personality and motivation? Educational Research, 56(2), 230–243. https://doi.org/10.1080/00131881.2014.898917
Sternberg, R. J., Castejón, J. L., Prieto, M. D., Hautamäki, J., Grigorenko, E. L. (2001). Confirmatory factor analysis of the Sternberg Triarchic Abilities Test in three international samples. European Journal of Psychological Assessment, 17(1), 1–16. https://doi.org/10.1027//1015-5759.17.1.1
Strand-Cary, M., Klahr, D. (2008). Developing elementary science skills: Instructional effectiveness and path independence. Cognitive Development, 23(4), 488–511. https://doi.org/10.1016/j.cogdev.2008.09.005
Terracciano, A., Abdel-Khalek, A. M., Adam, N., Adamovova, L., Ahn, C., … Ahn, H. N. (2005). National character does not reflect mean personality trait levels in 49 cultures. Science, 310(5745), 96–100. https://doi.org/10.1126/science.1117199
Thuneberg, H., Hautamäki, J., Hotulainen, R. (2015). Scientific reasoning, school achievement and gender: A multilevel study of between and within school effects in Finland. Scandinavian Journal of Educational Research, 59(3), 337–356. https://doi.org/10.1080/00313831.2014.904426
Tyre, P. (2008). The Trouble With Boys: A Surprising Report Card on Our Sons, Their Problems at School, and What Parents and Educators Must Do. Three Rivers Press.
Ullah, R., Ullah, H. (2019). Boys versus Girls’ Educational Performance: Empirical evidence from Global North and Global South. African Educational Research Journal, 7(4), 163–167. https://doi.org/10.30918/AERJ.74.19.036
Van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45, 1–67. https://doi.org/10.18637/jss.v045.i03
Van den Akker, A. L., Deković, M., Asscher, J., Prinzie, P. (2014). Mean-level personality development across childhood and adolescence: a temporary defiance of the maturity principle and bidirectional associations with parenting. Journal of Personality and Social Psychology, 107(4), 736–750. https://doi.org/10.1037/a0037248
Voyer, D., Saint Aubin, J., Altman, K., Gallant, G. (2021). Sex differences in verbal working memory: A systematic review and meta-analysis. Psychological Bulletin, 147(4), 352–398. https://doi.org/10.1037/bul0000320
Voyer, D., Voyer, S. D. (2014). Gender differences in scholastic achievement: a meta-analysis. Psychological Bulletin, 140(4), 1174–1204. https://doi.org/10.1037/a0036620
Wechsler, D. (1981). WAIS-R manual: Wechsler adult intelligence scale-revised. Brace Jovanovich for Psychological Corp.
World Economic Forum. (2021). Global gender gap report. https://www.weforum.org/publications/global-gender-gap-report-2021/
Yarkoni, T., Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100–1122. https://doi.org/10.1177/1745691617693393
Yavorsky, J. E., Buchmann, C. (2019). Gender Typicality and Academic Achievement among American High School Students. Sociological Science, 6, 661–683. https://doi.org/10.15195/v6.a25
This is an open access article distributed under the terms of the Creative Commons Attribution License (4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Supplementary data