It is a long-held belief in psychology and beyond that individuals’ music preferences reveal information about their personality traits. While initial evidence relates self-reported preferences for broad musical styles to the Big Five dimensions, little is known about day-to-day music listening behavior and the intrinsic attributes of melodies and lyrics that reflect these individual differences. The present study (N = 330) proposes a personality computing approach to fill these gaps with new insights from ecologically valid music listening records from smartphones. We quantified participants’ music preferences via audio and lyrics characteristics of their played songs through technical audio features from Spotify and textual attributes obtained via natural language processing. Using linear elastic net and non-linear random forest models, these behavioral variables served to predict Big Five personality on domain and facet levels. Out-of-sample prediction performances revealed that – on the domain level – Openness was most strongly related to music listening (r = .25), followed by Conscientiousness (r = .13), while several facets of the Big Five also showed small to medium effects. Hinting at the incremental value of audio and lyrics characteristics, both musical components were differentially informative for models predicting Openness and its facets, whereas lyrics preferences played the more important role for predictions of Conscientiousness dimensions. In doing so, the models’ most predictive variables displayed generally trait-congruent relationships between personality and music preferences. These findings contribute to the development of a cumulative theory on music listening in personality science and may be extended in numerous ways by future work leveraging the computational framework proposed here.

Music was my first love and it will be the lastMusic of the future and music of the pastTo live without my music would be impossible to doIn this world of troubles my music pulls me through(Miles, 1976) 

Most of us will agree with John Miles’ iconic song quote that music plays an important role in our lives. Indeed, we spend nearly one-fourth of our waking time listening to music (Billboard, 2019), and the digitalization of the music market is further increasing these numbers as online streaming services make music more pervasive than ever, with tens of millions of songs accessible anywhere and anytime by over 440 million paid subscribers (IFPI, 2021). This transformation in music consumption has turned streaming platforms and devices into digital sources of music listening data, creating an unprecedented opportunity to investigate natural music listening behavior “in the wild” (see I. Anderson et al., 2021). In doing so, digital listening records provide fine-grained data on various psychologically relevant behavioral outcomes such as music preferences or listening durations (Greenberg & Rentfrow, 2017). Music preferences, in particular, can be automatically represented in terms of the intrinsic properties of the songs played on an everyday basis using tools from computational music information retrieval (e.g., Fricke et al., 2018).

This new ecological validity and granularity in music listening assessment has the potential to push the boundaries of research in personality science, which has long been adopting the interactionist perspective that the music people listen to calibrates their external environments with their personalities and, hence, reflects their individual traits (Greenberg et al., 2020; Rentfrow et al., 2011). Parallel to other types of digital data, such as app usage (Stachl et al., 2017) or social media postings (Schwartz et al., 2013), music listening records can now be assessed for personality-relevant information via machine learning algorithms (see Phan & Rauthmann, 2021). The present study adopts this so-called personality computing approach to overcome methodological limitations of the past and model personality from various indicators of natural music listening on smartphones.

Personality researchers have been exploring the associations between music listening and the Big Five personality traits for the past two decades. These studies have mainly focused on individuals’ preferences for different styles of music, finding the most robust patterns for the personality dimension of Openness, which correlated positively with preferences for intense (e.g., Rock) and complex (e.g., Classical) musical styles (e.g., I. Anderson et al., 2021; Bonneville-Roussy et al., 2013; Greenberg et al., 2016; Langmeyer et al., 2012; Nave et al., 2018; Rentfrow & Gosling, 2003).

However, a meta-analysis of the correlation between musical style preferences and personality concluded that the effect sizes for Openness were rather small across studies (r = .12 for intense and r = .21 for complex music), while the remaining Big Five dimensions exhibited average correlations near zero (Schäfer & Mehlhorn, 2017). The studies included in this meta-analysis shared as a limitation tough that they analyzed music preferences via self-reported genre preferences (e.g., Bonneville-Roussy et al., 2013; Rentfrow & Gosling, 2003) or ratings of musical excerpts (e.g., Langmeyer et al., 2012), which may not accurately represent natural music listening behavior (Greenberg & Rentfrow, 2017). That is because self-reports may suffer from socially desirable responding (e.g., towards music favored by one’s peer group; cf. Tarrant et al., 2000) or biased memory recollection (Baumeister et al., 2007), while affective reactions to artificially manipulated or unreleased music excerpts may not reflect preferences displayed on the natural music market.

Only recently, Anderson et al. (2021) overcame this limitation by investigating music listening behavior exhibited on the streaming service Spotify. They predicted personality in a machine learning framework and achieved moderate to high performances for the Big Five dimensions, whose predicted and self-reported scores correlated at a range between .26 for Agreeableness and .37 for Emotional Stability. While these findings deviate from those of self-report-based studies with regard to the strength and rank order of associations, these discrepancies cannot be directly attributed to the ecologically valid assessment. That is because Anderson et al. (2021) included not only behavioral music preferences as personality predictors but also streaming behaviors (e.g., the streaming device or the number of artists followed) and participants’ demographics (i.e., age and gender). In particular, the demographic predictors, which are known to correlate with personality (see Soto et al., 2011) and improve personality predictions from music preferences (Nave et al., 2018), were among the most predictive variables across all Big Five dimensions except Openness (I. Anderson et al., 2021). Thus, the current state of literature does not allow for unambiguous conclusions about the personality-relevant information contained in natural music listening behavior.

While previous studies reported important insights into personality correlates in broad musical style or genre preferences, they rarely investigated music preferences on a more granular level, preventing inferences about the intrinsic musical properties underlying these personality associations (Aucouturier & Pachet, 2003; Rentfrow et al., 2011). Non-instrumental songs, in particular, are defined by audio and lyrics characteristics, which may play a distinct role in music preferences and their association with personality. While empirical findings suggest that melodies and lyrics are independently processed when listening to music (Besson et al., 1998; Bonnel et al., 2001) and that both components have a unique impact on the affective listening experience (Ali & Peynircioğlu, 2006; C. A. Anderson et al., 2003), they were never compared in a comprehensive analysis in personality psychology.

However, few studies have separately related personality traits to preferences for either audio or lyrics characteristics. For audio characteristics, they found that Openness was correlated to preferences for music with a slow tempo, minor mode, acoustic sounds, and negative valence, while Extraversion was related to preferences for music with major mode, high tones, and positive valence (Dobrota & Reić Ercegovac, 2015; Flannery & Woolhouse, 2021; Fricke & Herzberg, 2017; Vuoskoski & Eerola, 2011). Regarding lyrics characteristics, a pioneering study by Qiu et al. (2019) connected the Big Five personality traits with linguistic style preferences in lyrics, reporting the strongest associations for Conscientiousness, which, for example, correlated positively with a preference for achievement words, and for Emotional Stability, which was related to a preference for positive emotion words in lyrics. These preliminary findings indicate that different aspects of music preferences may be of incremental value for personality prediction.

The sparsity of studies investigating intrinsic musical attributes may be ascribed to a lack of automated extraction tools as researchers had to rely on human labeling to quantify audio and lyrics characteristics (e.g., Dobrota & Reić Ercegovac, 2015; Rentfrow et al., 2011). This approach was not only burdensome and practically infeasible for large collections of songs in natural music listening records but also at risk of assessing music’s subjective experience rather than intrinsic musical properties. However, advances in music information retrieval now enable the automatic extraction of musical characteristics from audio recordings or song lyrics. In particular, technical audio characteristics, ranging from basic physical parameters (e.g., tempo) to more complex aggregated features (e.g., valence) learned via machine learning algorithms, can now be obtained in a ready-to-use format from external sources such as Spotify (I. Anderson et al., 2021; Stachl et al., 2020) or via music analysis software (e.g., ESSENTIA; Fricke et al., 2018). To obtain the textual lyrics characteristics, researchers can apply natural language processing (NLP), choosing between closed-vocabulary approaches, which count word usage in a text over pre-defined word categories (see Qiu et al., 2019), and open-vocabulary approaches, which analyze language in a bottom-up manner (e.g., by word clusters). While the closed-vocabulary approaches are often easier to interpret, they are restricted by the word coverage and subjectivity of the underlying dictionaries, which may be why open approaches have proven to be more informative of personality when investigating other sources of written text (e.g., G. Park et al., 2015; Schwartz et al., 2013). These automated approaches for extracting various musical characteristics open up new possibilities for comparing the contribution of melodies and lyrics when predicting personality from music preferences.

The present study applied a personality computing approach to efficiently collect, computationally represent, and jointly model different aspects of music listening behavior. For the ecologically valid assessment of music listening, we used smartphones which are currently the most used device for music listening besides radios (IFPI, 2019) and provide granular digital listening records. We analyzed a smartphone sensing dataset of 330 participants collected over 3 to 85 study days and represented music preferences in terms of intrinsic musical attributes of the songs listened to. Here, we distinguished between preferences quantified via technical audio characteristics from Spotify.com and textual lyrics variables obtained through different natural language models. In addition, we considered habitual listening behaviors that quantified participants’ engagement with music (e.g., their listening duration). An extensive set of 844 strictly behavioral variables served us to predict self-reported Big Five personality trait scores on domain and facet level. To counteract overfitting, we applied two machine learning algorithms suitable for high-dimensional data (i.e., data in which the number of predictors is larger than the number of observations) and evaluated prediction performance in a strict out-of-sample fashion. Finally, we used interpretable machine learning techniques to compare the independent contribution of audio- and lyrics-based preferences and explored which single music listening variables were most important in personality predictions.

We conducted a secondary data analysis based on three mobile sensing datasets summarized in Stachl et al., 2020. Since the datasets were previously published, we focus our report on procedures and decisions relevant to the present study. Additional details on the study procedures are available in the original articles (Schoedel et al., 2019; Schuwerk et al., 2019; Stachl et al., 2017).

This study’s design and analyses are purely exploratory and were not pre-registered. However, preliminary (and also exploratory) groundwork provided in a student thesis was preregistered under https://osf.io/as3ze. While this preregistration does not directly pertain to the current study, we still communicate deviations in our Disclosure of Prior Data Uses available in our project’s online repository under https://osf.io/x7dar/. In this repository, we also provide the code for preprocessing, variable extraction, and predictive modeling, as well as a dataset of aggregated variables used for predictive modeling. However, please understand that the privacy-sensitive nature of the smartphone usage data prevents us from sharing the raw logging data.

Dataset

In the present study, we re-analyzed data from three separate studies conducted within the PhoneStudy project at LMU Munich between 2014 and 2018 (Schoedel et al., 2019; Schuwerk et al., 2019; Stachl et al., 2017). In the supplemental Table S1, we provide an overview of the included datasets. The procedures of all three studies were approved by institutional review boards and carried out according to EU laws and ethical standards. All subjects participated willingly and gave informed consent prior to their participation.

In all three studies, participants completed a series of self-report questionnaires, including the personality inventory used here. Furthermore, they installed an Android research application on their private smartphones, which logged a variety of smartphone usage behaviors, including music listening, for a period of at least 14 study days. A detailed description of the individual study procedures and all collected measures is available in the respective research articles and in Stachl et al. (2020).

The initial sample was determined by the availability of secondary data and contained logging and self-report data from 684 participants. During pre-processing, we removed participants who had played fewer than five different songs with available lyrics characteristics (see our section on Song-Level Variables), resulting in a sample size of 330 participants (54% women) with sufficient music listening data. We additionally assessed the response validity of our self-report measure but refrained from removing participants based on inconclusive evidence of careless responding (see Text S1; Curran, 2016; Ward & Meade, 2023). Our final sample was skewed towards younger age (M = 22.42, SD = 4.33, Min = 18, Max = 57) and better education (93% with A-levels and 20% with a university degree).

Personality Measure

All three studies used the German Big Five Structure Inventory (BFSI; Arendasy, 2009) to assess personality based on the well-established Big Five taxonomy: Openness, Conscientiousness, Extraversion, Agreeableness, and Emotional Stability (McCrae & John, 1992). The BFSI consists of 300 items (adjectives and short phrases) and measures the Big Five personality dimensions on five broad domains and 30 more specific facets. Item agreement is stated on a 4-point Likert scale ranging from “untypical for me” to “rather untypical for me” to “rather typical for me” to “typical for me.” The BFSI corresponds to the partial credit model (Masters, 1982), which defines an individual’s observed item response as a function of their latent trait value (i.e., their person parameter) and the item’s latent difficulty thresholds. Correspondingly, we used the person parameters assigned to participants based on their item sum scores as personality estimates in our analyses. Confidence intervals of internal consistencies obtained in our sample are available in Table S2 in the supplemental material.

Behavioral Music Listening Measures

An Android-based research application provided raw sensing data on participants’ natural smartphone usage, including their music listening records. Whenever participants had listened to locally stored or streamed music, the app created time-stamped event logs with the title, artist, and album name of the played song.

Song-Level Variables

To describe the played songs in terms of musical attributes, we enriched the music event logs with audio and lyrics characteristics. Therefore, we retrieved additional song-level data from two external sources. We visualize the data enrichment workflow with exemplary songs in Figure 1 and provide further details in the supplemental material (see Text S2).

Figure 1.
Workflow for Enriching Smartphone-Sensed Music Listening Records

Note. Sensed music listening data were enriched with different song-level information. The exemplary songs in the tables demonstrate the face validity of the different audio and lyrics variables. Details on the APIs can be found on the respective websites Spotify.com and Genius.com. The enriched musical attributes are defined in Table 1. API = application programming interface; NLP = Natural Language Processing. This Figure is available at https://osf.io/x7dar/, under a CC-BY4.0 license.

Figure 1.
Workflow for Enriching Smartphone-Sensed Music Listening Records

Note. Sensed music listening data were enriched with different song-level information. The exemplary songs in the tables demonstrate the face validity of the different audio and lyrics variables. Details on the APIs can be found on the respective websites Spotify.com and Genius.com. The enriched musical attributes are defined in Table 1. API = application programming interface; NLP = Natural Language Processing. This Figure is available at https://osf.io/x7dar/, under a CC-BY4.0 license.

Close modal

First, we used Spotify’s Track API to retrieve 12 song-level variables provided by Spotify.com (see Table 1; Spotify, 2022). These variables contained 11 computationally derived technical audio characteristics (e.g., the songs’ “tempo” and “acousticness”) based on the songs’ audio recordings and one lyrics-based variable indicating the presence of explicit lyrical contents (i.e., strong language or references to sexual or violent behavior).

Table 1.
Description of Variables Representing Musical Attributes on the Song-Level
Song-⁠level VariableData SourceDescription
Audio Characteristics 
Mode Spotify API The song’s modality (major vs. minor), i.e., the type of scale the song’s melodic content is derived from. 
Key Spotify API The song’s melodic key in standard pitch class notation (e.g., 0 = C, 1 = C/D, 2 = D). 
Tempo Spotify API The song’s overall estimated tempo in beats per minute (BpM). 
Loudness Spotify API The song’s average loudness in decibels (dB). 
Energy Spotify API The song’s perceived intensity and activity on a scale from 0.0 to 1.0, whereby songs with a high energy feel fast, loud, and noisy. The measure is defined by several elements such as perceived loudness. 
Danceability Spotify API The song’s suitability for dancing on a scale from 0.0 to 1.0, whereby higher values represent more danceable songs. The measure combines several musical elements including tempo and overall regularity. 
Acousticness Spotify API The song’s acousticness (i.e., absence of electronic sounds) on a scale from 0.0 to 1.0, whereby higher values represent an increased confidence that the song is acoustic. 
Valence Spotify API The musical positiveness conveyed by the song, whereby songs with valence closer to 1.0 sound more positive (e.g., happy) and songs with values closer to 0.0 sound more negative (e.g., sad). 
Speechiness Spotify API The probability of spoken words in a song, whereby values below 0.33 most likely represent pure music, while higher values represent songs containing both music and speech (e.g., rap music). 
Instrumentalness Spotify API The probability of vocals in a song, whereby values closer to 1.0 represent a greater likelihood that the song contains no vocal content. Values above 0.5 most likely represent instrumental songs. 
Liveness Spotify API The probability of an audience in the song’s recording, whereby values closer to 1.0 represent a greater likelihood that the song was performed live. Values above 0.8 most likely represent live songs. 
Lyrics Characteristics 
Length Genius API
+ lyrics NLP 
The number of words of the song’s lyrics. 
Language Genius API
+ lyrics NLP 
The song’s language (i.e., English vs. German vs. Other) derived via language detection from the lyrics. 
Explicit content Spotify API The presence of explicit words (e.g., swear words) in the song’s lyrics. 
10 Emotionality
scores 
Genius API
+ lyrics NLP 
The probability by which the song’s lyrics contain words from ten emotion categories of the NRC Emotion Lexicon (e.g., Positivity, Negativity, Sadness, Anger, Joy, Trust). 
30 Topics Genius API
+ lyrics NLP 
The probability by which the song’s lyrics belong to each of 30 lyrical topics derived via Latent Dirichlet Allocation. 
768 Word
embeddings 
Genius API
+ lyrics NLP 
The song’s value on each of the 768 dimensions in the lyrics embedding space of the BERT-model. 
Song-⁠level VariableData SourceDescription
Audio Characteristics 
Mode Spotify API The song’s modality (major vs. minor), i.e., the type of scale the song’s melodic content is derived from. 
Key Spotify API The song’s melodic key in standard pitch class notation (e.g., 0 = C, 1 = C/D, 2 = D). 
Tempo Spotify API The song’s overall estimated tempo in beats per minute (BpM). 
Loudness Spotify API The song’s average loudness in decibels (dB). 
Energy Spotify API The song’s perceived intensity and activity on a scale from 0.0 to 1.0, whereby songs with a high energy feel fast, loud, and noisy. The measure is defined by several elements such as perceived loudness. 
Danceability Spotify API The song’s suitability for dancing on a scale from 0.0 to 1.0, whereby higher values represent more danceable songs. The measure combines several musical elements including tempo and overall regularity. 
Acousticness Spotify API The song’s acousticness (i.e., absence of electronic sounds) on a scale from 0.0 to 1.0, whereby higher values represent an increased confidence that the song is acoustic. 
Valence Spotify API The musical positiveness conveyed by the song, whereby songs with valence closer to 1.0 sound more positive (e.g., happy) and songs with values closer to 0.0 sound more negative (e.g., sad). 
Speechiness Spotify API The probability of spoken words in a song, whereby values below 0.33 most likely represent pure music, while higher values represent songs containing both music and speech (e.g., rap music). 
Instrumentalness Spotify API The probability of vocals in a song, whereby values closer to 1.0 represent a greater likelihood that the song contains no vocal content. Values above 0.5 most likely represent instrumental songs. 
Liveness Spotify API The probability of an audience in the song’s recording, whereby values closer to 1.0 represent a greater likelihood that the song was performed live. Values above 0.8 most likely represent live songs. 
Lyrics Characteristics 
Length Genius API
+ lyrics NLP 
The number of words of the song’s lyrics. 
Language Genius API
+ lyrics NLP 
The song’s language (i.e., English vs. German vs. Other) derived via language detection from the lyrics. 
Explicit content Spotify API The presence of explicit words (e.g., swear words) in the song’s lyrics. 
10 Emotionality
scores 
Genius API
+ lyrics NLP 
The probability by which the song’s lyrics contain words from ten emotion categories of the NRC Emotion Lexicon (e.g., Positivity, Negativity, Sadness, Anger, Joy, Trust). 
30 Topics Genius API
+ lyrics NLP 
The probability by which the song’s lyrics belong to each of 30 lyrical topics derived via Latent Dirichlet Allocation. 
768 Word
embeddings 
Genius API
+ lyrics NLP 
The song’s value on each of the 768 dimensions in the lyrics embedding space of the BERT-model. 

Note. Spotify variable descriptions were derived from Spotify.com. API = application program interface; API calls retrieved ready-to-use variables from Spotify.com and raw song lyrics from Genius.com. NLP = natural language processing; NLP extracted variables from the song lyrics.

In addition, we retrieved song lyrics from Genius.com and created meaningful textual variables via a text-mining pipeline combining closed and open vocabulary approaches (see Table 1; Genius, 2022). We describe all lyrics analyses at an abstract level here and provide further details in the supplemental material (see Text S3). We extracted two stylistic variables representing the lyrics’ length and language and applied three natural language models to quantify the content characteristics of the lyrics. First, we detected the emotional content of the lyrics using the NRC Word-Emotion Association Lexicon (Mohammad & Turney, 2013). Based on word occurrences, the NRC lexicon assigned each song a score on two sentiments (positive and negative valence) and eight emotion categories (anger, anticipation, disgust, fear, joy, sadness, surprise, and trust). Second, we applied Latent Dirichlet Allocation (LDA; Blei et al., 2003) to obtain the topics covered in the song lyrics. This generative probabilistic model assumes that each document (in our case, song lyrics) in a corpus contains a mixture of latent topics, where each topic is a cluster of co-occurring words. To avoid overfitting the LDA to sample-specific patterns in our lyrics corpus, we pre-trained the model on a large external lyrics corpus (the Million Song Dataset; Bertin-Mahieux et al., 2011). We determined the topic count such that the topic coherence (i.e., the semantic similarity between words within a topic; Chang et al., 2009) was maximized, which resulted in a model with 30 topics. This pre-trained topic model assigned each song in our corpus to a score on each of the 30 topics. We provide details on the topic modeling, including coherence metrics (see Table S3) and topic keywords (see Table S4), in the supplemental material. Finally, we represented the lyrics as word embeddings using the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT; Devlin et al., 2018). BERT embeddings use a neural network architecture to convert textual data into context-sensitive numerical representations. We employed the pre-trained BERT implementation from the HuggingFace framework (Wolf et al., 2020) to extract one embedding vector for each song’s full lyrics. This BERT vector had a length of 768, so each song was assigned a score on 768 embedding dimensions. Again, more details on the BERT modeling are available in the supplemental material (see Text S3).

In total, we computed 822 variables quantifying different intrinsic musical characteristics of the songs played in our study (see Table 1). These song-level variables were assigned to the respective music events in the logging data. Figure 1 illustrates this matching and provides examples of the face validity of song-level variables. However, not all music listening events could be enriched because some contained non-musical tracks (e.g., audiobooks), had incorrect song information (e.g., typos in the song title), or were not covered by the respective online sources.

Person-Level Variables

In the next preprocessing step, we used the song-level enriched music event logs to extract person-level variables capturing music preferences and habitual listening behaviors. Therefore, we first reduced the logs to music events that lasted longer than 20 seconds to exclude skipped songs. Furthermore, we removed music events from the first study day to avoid potential reactivity biases.

We aggregated the distribution of the song-level variables (see Table 1) over each participant’s played songs via the arithmetic mean (for numeric variables) or percentage scores (for factor variables). We focused on participants’ average music preferences to limit our predictor space while also enabling a comparison with past research (e.g., Nave et al., 2018; Rentfrow & Gosling, 2003; Schäfer & Mehlhorn, 2017). The resulting 833 variables covered average music preferences for 1) audio characteristics (e.g., the mean tempo of played songs) and 2) lyrics characteristics represented by a) emotion scores (e.g., the mean negative emotionality of played songs), b) topics (e.g., the mean probability of the topic “love” in played songs), c) word embeddings (e.g., the mean embedding dimension 1 of played songs), and d) other lyrics characteristics (e.g., the percentage of English songs among all played songs). As noted above, the external song-level variables were not available for all tracks, so the music preference variables only covered a portion of participants’ played tracks. On average, participants’ preferences for Spotify-based variables covered 57% (SD = 0.21) of played tracks, while preferences for lyrics-based variables covered 42% (SD = 0.19). To account for the limited song coverage, we created an additional validity variable indicating the proportion of participants’ songs represented by lyrics-based preference variables.

In addition, we extracted ten variables on habitual listening behaviors by quantifying the extent of participants’ music consumption, for example, the total number of played songs, the number of unique artists listened to, or the average daily number of played songs.

In total, we obtained 844 variables capturing participants’ music preferences and habitual listening behaviors, which served as predictors in our personality predictions. We provide a list of all person-level variables, including summary statistics, in our online repository.

Personality Predictions

Machine Learning Analyses

We trained machine learning models for the prediction of the five domains and 30 facets of our personality inventory. While we provide a short overview of basic machine learning concepts relevant to understanding our study here, a more detailed introduction to supervised machine learning can be found in a state-of-the-art tutorial by Pargent et al. (2022).

Models. For each personality outcome, as a benchmark, we compared the predictive performance of elastic net (Zou & Hastie, 2005) and random forest models (Breiman, 2001) with those of a featureless baseline model. The baseline model predicted the mean personality score of a training set for all observations in a respective test set. The elastic net model is an extension of basic linear regression that applies two regularization penalties to encourage simpler models, and the random forest aggregates the output of multiple decision trees to account for non-linear relationships. We chose these models because of their ability to automatically perform a selection of relevant predictors, allowing them to cope with high-dimensional and inter-correlated predictor spaces in small samples. We used the default settings of the models’ hyperparameters as specified in their implementation within the mlr3 environment (e.g., Lang et al., 2019).

Resampling Strategy. For a strict separation of training and test data, we estimated the models’ expected predictive performance on unseen data using 10-times repeated 10-fold cross-validation (10x10 CV). In this cross-validation scheme, a dataset is randomly split into 10 folds, and each fold serves as an unseen hold-out set for prediction (i.e., the test set) once, while the models are trained on the data of the remaining nine folds (i.e., the training set). Prediction performance is computed separately for each fold of 10x10 CV and then aggregated to the mean across the 100 iterations per model. Such out-of-sample prediction performances have a reduced risk of overfitting sample-specific patterns and provide a more reliable estimate of the models’ ability to make predictions in new samples (e.g., Yarkoni & Westfall, 2017).

Performance Evaluation. We evaluated model performances by correlating predicted personality scores with the person-parameter estimates from the self-reported personality trait measure using Spearman rank order correlations (rs). However, the baseline model produced invariant predictions (i.e., the training set’s mean) across all observations, preventing us from calculating this correlation metric. Hence, we additionally determined the mean squared error (MSE) for all models (see the supplemental Text S4 for the respective formulas). We tested if the MSEs of our prediction models were significantly lower than those of the corresponding featureless baselines. We treated the MSEs of prediction vs. baseline models obtained in the same cross-validation iteration as dependent pairs (due to their shared training set) and compared them across iterations using variance-corrected pairwise student t-tests (one-sided; Bouckaert & Frank, 2004; Nadeau & Bengio, 2003; Stachl et al., 2020). For each personality outcome, we adjusted for multiple comparisons (n = 2 models against the common baseline) via Bonferroni correction. Based on this conservative approach, prediction models with a significantly smaller MSE than the baseline were considered predictive as they were consistently successful across resampling iterations.

Interpretable Machine Learning

Machine learning models often lack natural interpretability, so we combined different approaches to gain insights into successful prediction models. First, we grouped our variables by the overarching aspects of music listening (e.g., audio vs. lyrics preferences) they represented and investigated the unique importance of these groups as a whole. We used the settings described above (10x10 CV) and ran additional benchmark analyses with seven different subsets of music listening variables. More specifically, we compared the independent predictive performance of 1) habitual listening behaviors, 2) preferences for audio characteristics, and 3) preferences for lyrics characteristics, whereby the third group was considered both in aggregation and separately by the types of lyrics information, namely lyrics’ a) emotionality, b) topics, c) word embeddings, and d) other lyrics characteristics. As an effect size index of the groups’ importance, we considered their individual performance in terms of the Spearman correlation (rs) metric and computed variance-corrected 95% confidence intervals based on the student t-distribution (Bouckaert & Frank, 2004; Nadeau & Bengio, 2003). However, we refrained from conducting significance tests for between-group comparisons due to the highly exploratory nature of these analyses.

For insights into the importance of single predictors within the full set of music listening variables, we applied interpretable machine learning tools to the full personality prediction models. For random forest models, we computed permutation variable importance, which measures the decrease in a model’s prediction performance after randomly permuting one single variable (Casalicchio et al., 2019). Variable importance scores were aggregated across 50 iterations to provide stable estimates. For elastic net models, we considered the model-inherent, non-standardized beta weights known from simple linear regression.

To further explore predictor effects, we extracted the 15 most important variables of the respective models and illustrated their influence on the prediction with accumulated local effects (ALE; Apley & Zhu, 2020). ALE plots visualize the effect of an individual predictor variable by showing how its manifestations, on average, affect the model prediction.

Statistical Software

API calls and natural language processing analyses were conducted in Python, version 3.7.10 (Python Software Foundation, 2021). We used the libraries MALLET (McCallum, 2002) and gensim (Rehurek & Sojka, 2010) for Latent Dirichlet Allocation, the library NRXLex (Bailey, 2019) for emotion analysis, and the Hugging Face Transformers (Wolf et al., 2020) for extracting BERT embeddings.

All other analyses were conducted with the statistical software R (version 4.0.3 for preprocessing and version 4.2.1 for data analysis; R Core Team, 2022). We used the packages dplyr (version 1.0.7, Wickham et al., 2021) and fxtract (version 0.9.4, Au, 2020) for extracting person-level variables. For predictive modeling, we employed the packages mlr3 (version 0.14.1, Lang et al., 2019), glmnet (version 4.1-6, Friedman et al., 2010), and ranger (version 0.14.1, Wright & Ziegler, 2017). Furthermore, we used iml (version 0.11.1, Molnar et al., 2018) for interpretable machine learning. Finally, the packages ggplot2 (version 3.3.5, Wickham, 2016) and ggwordcloud (version 0.5.0, Le Pennec & Slowikowski, 2019) served for visualizing our results.

Descriptive Statistics

Across our sample, participants provided between 3 and 85 days of logged smartphone data (M = 43.4, SD = 15.8) and, on average, listened to music on half of these days (M = 47.1%, SD = 28.5%). They used an average of 2.3 different music apps (SD = 1.4), with Spotify being the most used app (40.6%), followed by Android Music (19.7%) and Google Play Music (9.1%). The number of songs listened to per participant ranged between 5 and 4387 (M = 397.6, SD = 547.2), and, on average, participants played 9.4 songs per day (SD = 12.7) for 31.4 minutes (SD = 42.9). Participants’ self-reports are summarized in the supplemental material (see Table S2). Furthermore, we provide detailed descriptive statistics for behavioral variables, including pairwise Spearman correlations with self-reports, in our online project repository.

Personality Predictions

In our main benchmark analysis, we evaluated the performance of two machine learning algorithms predicting personality from our full spectrum of music listening variables. In this analysis, the linear elastic net and non-linear random forest models obtained similar prediction performances for most Big Five dimensions (see Figure 2). However, the elastic net produced only one (instead of three) significant models (see Table 2) and failed to make variable-based predictions in many of the 100 resampling iterations of the 10x10 CV scheme for several personality dimensions in Figure 2 (e.g., the facet Modesty of Agreeableness)1. Hence, we focus our reports on the random forest models in the remainder of this article.

Figure 2.
Box and Whisker Plot of Prediction Performance From Repeated Cross-Validation for Each Personality Dimension and Algorithm

Note. Prediction performance over the 100 resampling iterations of the cross-validation scheme (10x10 CV). Performance is measured via the Spearman rank correlation between predicted and measured personality scores. The middle symbol represents the median, boxes include values between the 25 and 75% quantiles, and whiskers extend to the 2.5 and 97.5% quantiles. Outliers are depicted by single points. The grey line indicates a correlation of 0.0 between the predicted and self-reported personality scores. Asterisks indicate significantly predictive models. This Figure is available under a CC-BY4.0 license at https://osf.io/x7dar/.

Figure 2.
Box and Whisker Plot of Prediction Performance From Repeated Cross-Validation for Each Personality Dimension and Algorithm

Note. Prediction performance over the 100 resampling iterations of the cross-validation scheme (10x10 CV). Performance is measured via the Spearman rank correlation between predicted and measured personality scores. The middle symbol represents the median, boxes include values between the 25 and 75% quantiles, and whiskers extend to the 2.5 and 97.5% quantiles. Outliers are depicted by single points. The grey line indicates a correlation of 0.0 between the predicted and self-reported personality scores. Asterisks indicate significantly predictive models. This Figure is available under a CC-BY4.0 license at https://osf.io/x7dar/.

Close modal
Table 2.
Mean Prediction Performance per Personality Dimension and Algorithm
 Random Forest  Elastic Net  Baseline 
Personality Dimension \(r_{\text{s}}\) MSE \(p_{\text{adj}}\)  \({r}_{\text{s}}\) MSE \({p}_{\text{adj}}\)  MSE 
*(O) Openness .25 0.50 .041  .27 0.49 .012  0.53 
*(O1) Openness to imagination .23 1.91 .032  .19 1.98 .400  2.04 
 (O2) Openness to aesthetics .21 1.64 .151  .18 1.66 .371  1.71 
*(O3) Openness to feelings .22 4.39 .027  .19 4.53 .237  4.64 
 (O4) Openness to actions .03 2.23  .12 2.16 .726  2.18 
 (O5) Openness to ideas .15 2.16 .378  .18 2.13 .166  2.22 
 (O6) Openness to value & norm .15 1.09  .10 1.09  1.08 
 (C) Conscientiousness .13 0.55 .987  .14 0.54 .527  0.55 
 (C1) Competence -.02 1.45  -.01 1.40  1.39 
 (C2) Love of order .15 2.37 .686  .11 2.39 .998  2.39 
 (C3) Sense of duty .11 1.91  .09 1.91  1.91 
 (C4) Ambition .13 2.87  .10 2.85  2.84 
 (C5) Discipline .04 2.23  .08 2.17  2.17 
 (C6) Caution .05 1.92  -.09 1.89  1.87 
 (E) Extraversion .07 0.53  .03 0.54  0.53 
 (E1) Friendliness .13 1.55 .540  .17 1.55 .568  1.57 
 (E2) Sociableness .10 2.97 .444  .06 3.03  3.03 
 (E3) Assertiveness .07 1.90  -.08 1.90  1.88 
 (E4) Dynamism .08 2.56  -.07 2.57  2.54 
 (E5) Adventurousness .06 2.29  2.28  2.26 
 (E6) Cheerfulness .01 2.82  .13 2.71 .612  2.74 
 (A) Agreeableness .04 0.63  0.63  0.62 
 (A1) Willingness to trust .11 2.14 .851  2.18  2.15 
 (A2) Genuineness .11 1.01 .898  .13 .386  1.01 
 (A3) Helpfulness -.03 1.98  -.14 1.92  1.91 
 (A4) Obligingness -.02 1.96  -.15 1.94  1.92 
 (A5) Modesty -.09 1.38  -.28 1.31  1.31 
 (A6) Good naturedness .03 3.52  -.03 3.51  3.47 
 (ES) Emotional Stability -.06 0.55  -.30 0.53  0.52 
 (ES1) Carefreeness .05 1.80  -.17 1.78  1.77 
 (ES2) Equanimity -.01 1.20  -.11 1.16  1.16 
 (ES3) Positive mood -.03 2.24  -.08 2.18  2.15 
 (ES4) Self consciousness .08 1.40 .905  .05 1.42  1.40 
 (ES5) Self control -.08 1.03  0.99  0.99 
 (ES6) Emotional robustness -.05 1.44  -.24 1.40  1.39 
 Random Forest  Elastic Net  Baseline 
Personality Dimension \(r_{\text{s}}\) MSE \(p_{\text{adj}}\)  \({r}_{\text{s}}\) MSE \({p}_{\text{adj}}\)  MSE 
*(O) Openness .25 0.50 .041  .27 0.49 .012  0.53 
*(O1) Openness to imagination .23 1.91 .032  .19 1.98 .400  2.04 
 (O2) Openness to aesthetics .21 1.64 .151  .18 1.66 .371  1.71 
*(O3) Openness to feelings .22 4.39 .027  .19 4.53 .237  4.64 
 (O4) Openness to actions .03 2.23  .12 2.16 .726  2.18 
 (O5) Openness to ideas .15 2.16 .378  .18 2.13 .166  2.22 
 (O6) Openness to value & norm .15 1.09  .10 1.09  1.08 
 (C) Conscientiousness .13 0.55 .987  .14 0.54 .527  0.55 
 (C1) Competence -.02 1.45  -.01 1.40  1.39 
 (C2) Love of order .15 2.37 .686  .11 2.39 .998  2.39 
 (C3) Sense of duty .11 1.91  .09 1.91  1.91 
 (C4) Ambition .13 2.87  .10 2.85  2.84 
 (C5) Discipline .04 2.23  .08 2.17  2.17 
 (C6) Caution .05 1.92  -.09 1.89  1.87 
 (E) Extraversion .07 0.53  .03 0.54  0.53 
 (E1) Friendliness .13 1.55 .540  .17 1.55 .568  1.57 
 (E2) Sociableness .10 2.97 .444  .06 3.03  3.03 
 (E3) Assertiveness .07 1.90  -.08 1.90  1.88 
 (E4) Dynamism .08 2.56  -.07 2.57  2.54 
 (E5) Adventurousness .06 2.29  2.28  2.26 
 (E6) Cheerfulness .01 2.82  .13 2.71 .612  2.74 
 (A) Agreeableness .04 0.63  0.63  0.62 
 (A1) Willingness to trust .11 2.14 .851  2.18  2.15 
 (A2) Genuineness .11 1.01 .898  .13 .386  1.01 
 (A3) Helpfulness -.03 1.98  -.14 1.92  1.91 
 (A4) Obligingness -.02 1.96  -.15 1.94  1.92 
 (A5) Modesty -.09 1.38  -.28 1.31  1.31 
 (A6) Good naturedness .03 3.52  -.03 3.51  3.47 
 (ES) Emotional Stability -.06 0.55  -.30 0.53  0.52 
 (ES1) Carefreeness .05 1.80  -.17 1.78  1.77 
 (ES2) Equanimity -.01 1.20  -.11 1.16  1.16 
 (ES3) Positive mood -.03 2.24  -.08 2.18  2.15 
 (ES4) Self consciousness .08 1.40 .905  .05 1.42  1.40 
 (ES5) Self control -.08 1.03  0.99  0.99 
 (ES6) Emotional robustness -.05 1.44  -.24 1.40  1.39 

Note. Performance metrics were first computed separately for each of the 100 iterations of our cross-validation scheme (10x10 CV) and then aggregated to the mean. rs = Spearman’s rank order correlation between predicted and measured personality scores. MSE = Mean squared error. padj = Bonferroni adjusted p-values of variance corrected one-sided t-tests comparing the MSE measures of prediction models with the baseline. Overarching personality domains are printed in bold font. Significant models (α = .05) are indicated by an asterisk.

The results summarized in Table 2 show that the Big Five personality dimension Openness (O) and its facets Openness to imagination (O1) and Openness to feelings (O3) were successfully predicted from our music listening variables. That means the MSEs of their random forest models across resampling iterations were, on average, significantly lower than those of the featureless baseline model. While the remaining Big Five criteria exhibited no significant reduction in MSEs, the distribution of correlations between predicted and self-reported personality scores in Figure 2 reveals promising prediction performances in many resampling iterations for several other personality dimensions. More specifically, 14 outcomes in Table 2 exhibited a small- to medium-sized mean correlation on or above a threshold of .10 suggested by Cohen’s (1992) effect size conventions (rs between .10 and .25). Inspection of these selected outcomes suggests that our random forest models worked best for the domain Openness (O, rs = .25) and its facets Openness to imagination (O1, rs = .23), followed by Openness to feelings (O3, rs = .22), Openness to aesthetics (O2, rs = .21), Openness to ideas (O5, rs = .15), and Openness to value and norm (O6, rs = .15). Second best prediction performances were obtained for the domain Conscientiousness (C, rs = .13) and its facets Love of order (C2, rs = .15), followed by Ambition (C4, rs = .13) and Sense of duty (C3, rs = .11). In contrast, the remaining facets of Openness and Conscientiousness exhibited correlations close to zero. While the domains Extraversion (E) and Agreeableness (A) obtained correlations below .10, each two of their six facets showed moderate prediction performances, namely Friendliness (E1, rs = .13) and Sociableness (E2, rs = .10) as well as Willingness to trust (A1, rs = .11) and Genuineness (A2, rs = .11). Only the dimensions of Emotional Stability were completely unrelated to music listening behavior according to our performance metrics. Please note that all models with moderate prediction performance, including those reaching significance, also contained few resampling iterations with a negative correlation between predicted and self-reported outcomes in Figure 2, indicating that the random forests failed to learn systematic patterns in some instances.

Interpretation of Prediction Models

After providing an overview of how well different personality dimensions can be predicted from music listening variables, we considered what aspects of music listening drove our models’ predictions. We applied two interpretation approaches to all random forest models with a minimum mean performance of rs = .10 listed above.

Importance of Variable Groups

We conducted an additional benchmark analysis comparing the independent performance of each group of music listening variables when separately predicting the respective personality scores. We report prediction performances in terms of the average Spearman correlation with 95% confidence intervals across iterations in Table S5 of the supplemental material and illustrate them in Figure 3. The unique prediction performance represents the relevance of each variable group as a whole (i.e., including all of its variables and their interactions) for our random forest models.

Figure 3.
Heatmap of Prediction Performance by Variable Group for Illustration of Grouped Variable Importance

Note. Prediction performance when using each group of music listening variables (see columns) separately for predicting personality outcomes (see rows). One benchmark comparing the seven variable groups was conducted for each personality outcome predicted with a minimum performance rs .10 by the full variable set (see Table 2). The average Spearman rank correlation (rs) between predicted and measured personality scores across resampling iterations serves as an indicator of grouped variable importance, whereby higher values indicate greater relevance of the respective variable group. The higher-level group of “Lyrics Characteristics” comprised the four lower-level groups of Lyrics’ Emotionality, Topics, Word Embeddings, and Other Lyrics Characteristics (see Table 1). This Figure is available under a CC-BY4.0 license at https://osf.io/x7dar/.

Figure 3.
Heatmap of Prediction Performance by Variable Group for Illustration of Grouped Variable Importance

Note. Prediction performance when using each group of music listening variables (see columns) separately for predicting personality outcomes (see rows). One benchmark comparing the seven variable groups was conducted for each personality outcome predicted with a minimum performance rs .10 by the full variable set (see Table 2). The average Spearman rank correlation (rs) between predicted and measured personality scores across resampling iterations serves as an indicator of grouped variable importance, whereby higher values indicate greater relevance of the respective variable group. The higher-level group of “Lyrics Characteristics” comprised the four lower-level groups of Lyrics’ Emotionality, Topics, Word Embeddings, and Other Lyrics Characteristics (see Table 1). This Figure is available under a CC-BY4.0 license at https://osf.io/x7dar/.

Close modal

Figure 3 shows that, across all personality outcomes, habitual listening behaviors (range rs = -.04 to .14) were less predictive than music preferences (range rs = -.04 to .29). In contrast, preferences for audio and lyrics characteristics were relevant for many outcomes. Audio characteristics obtained the highest prediction performances for Openness dimensions (range rs = .09 to .29) and lowest ones for Conscientiousness facets (e.g., range rs = -.04 to .17). Lyrics characteristics were also particularly informative about Openness dimensions (range rs = .15 to .23) but least relevant for facets of Extraversion (range rs = .10 to .13) and Agreeableness (range rs = .09 to .12). Among lyrics characteristics, word embeddings were the most relevant group for the largest number of outcomes (8), followed by topics (2), other lyrics characteristics (2), and emotionality (1) – not regarding ties between groups. One may argue that the superiority of word embeddings is related to the large size of this predictor group (i.e., 768 lyrics word embeddings vs. 30 lyrics topics in the second largest group). However, as seen in Figure 3, other aspects of lyrics (e.g., topics for Conscientiousness) and also audio characteristics (e.g., for the facet Openness to aesthetics) outweighed word embeddings for several outcomes, indicating that the number of variables per group does not determine its prediction performance.

Looking further into the relevance of music preferences, we can compare their importance for all four personality domains featured in Figure 3. For several Openness dimensions, preferences for audio (range rs = .09 to .29) and lyrics characteristics (range rs = .15 to .23) were both informative for predictions with differential patterns per dimension. In particular, lyrics characteristics were more relevant for the domain itself (O, rs = .20 for audio vs. rs = .23 for lyrics) and its facets Openness to imagination (O1, rs = .13 for audio vs. rs = .23 for lyrics) and Openness to ideas (O5, rs = .09 for audio vs. rs = .16 for lyrics), while audio characteristics were more important for the facets Openness to aesthetics (O2, rs = .26 for audio vs. rs = .19 for lyrics) and Openness to feelings (O3, rs = .29 for audio vs. rs = .21 for lyrics). For Openness to value and norm (O6), audio and lyrics preferences were equally predictive (both rs = .15). Taking a closer look at the different types of lyrics information, word embeddings were most relevant for Openness predictions (range rs = .15 to .23), followed by topics (range rs = .07 to .19), other lyrics characteristics (range rs =.09 to .24) and emotionality (range rs = -.03 to .10). Only for the facet Openness to ideas, other lyrics characteristics produced best predictions (rs = .24). For the domain Conscientiousness (C, rs = .07 for audio vs. rs = .12 for lyrics) and its facets Sense of duty (C3, rs = -.04 for audio vs. rs = .11 lyrics) and Ambition (C4, rs = -.03 for audio vs. rs = .15 for lyrics), lyrics characteristics were more informative for prediction models because audio characteristics were (almost) unpredictive. Only for the facet Love of order (C2, rs = .17 for audio vs. rs = .16 for lyrics), audio and lyrics characteristics were similarly important. More specifically, lyrics’ topics (range rs = .10 to .16) and word embeddings (range rs = .11 to .16) were particularly meaningful, while emotionality (range rs = -.10 to .03) and other lyrics characteristics (range rs = .02 to .11) were not very predictive. For the Extraversion facets Friendliness (E1, rs = .16 for audio vs. rs = .13 for lyrics) and Sociableness (E2, rs = .16 for audio vs. rs = .10 for lyrics), audio characteristics were more relevant for predictions compared to lyrics, whose different types of variables were similarly predictive. Finally, lyrics preferences were slightly more predictive for the Agreeableness facet Willingness to trust (A1, rs = .08 for audio vs. rs = .12 for lyrics), while audio preferences were more relevant for the facet Genuineness (A2, rs = .18 for audio vs. rs = .09 for lyrics). Here, the different aspects of song lyrics were again of comparable relevance.

As seen in the importance measures above, some variable groups performed better on their own than in combination with the remaining variables (see Table 2 and Table S5). For example, Openness to feelings (O3) obtained better performances when predicted only from audio characteristics (rs = .29) compared to the performance of the full predictor set in Table 2 (rs = .22). Please note, however, that these results were obtained for different benchmarks, and that the full variable set performance in the grouped benchmark are reported in Table S5. Such discrepancies highlight the predictive power of single variable groups for the respective outcome and indicate that some of the other groups introduced noise that hindered random forest models from learning systematic patterns.

Importance of Single Variables

We also explored which variables – considered individually among the full set of music listening variables – were most important for predicting each personality dimension. Therefore, we considered the loss in prediction performance after permuting a single variable of the random forest models. In Table 3, we present the top ten variables (i.e., those causing the greatest performance loss) for each outcome with some exemplary variable effects in ALE plots. In addition, we provide full lists of variable importance and beta weights for the elastic net models in the online project repository.

Table 3.
Top 10 Most Important Music Listening Variables per Random Forest Model with Selected Accumulated Local Effect Plots

Note. The top 10 most important music listening variables in decreasing order for each personality outcome that was predicted with a minimum performance of rs .10 (see Table 2). Significant prediction models are marked with an asterisk. The variables were selected and ranked based on the permutation feature importance extracted from the respective random forest models. Pairwise Spearman correlations between music listening variables and personality outcomes illustrate the directionality of prediction effects. Colors in the two left-most columns indicate the group membership of each music listening variable to either [pink] (A) Audio Characteristics or [green] (L) Lyrics Characteristics, and, in the latter case, the specific type of Lyrics Characteristics, namely [light blue] Emotionality, [orange] Topics, or [teal] Word Embeddings. The two remaining groups of Habitual Listening Behavior and Other Lyrics Characteristics were not represented in the top 10 variables. For visibility, variables based on Lyrics’ Word Embeddings, which are non-interpretable but make up most top predictors, are printed in grey font. In the right-most column, exemplary accumulated local effects (ALEs) are presented to illustrate how model predictions changed on average regarding different values in local value areas of the respective predictor. The x-axis differs depending on the respective variable’s scale (see Table 1) and ranges between the 10th and 90th percentile of the variable’s distribution. ALE values are centered around zero. Further ALE plots are available in the supplemental material (see Figure S1).

Table 3.
Top 10 Most Important Music Listening Variables per Random Forest Model with Selected Accumulated Local Effect Plots

Note. The top 10 most important music listening variables in decreasing order for each personality outcome that was predicted with a minimum performance of rs .10 (see Table 2). Significant prediction models are marked with an asterisk. The variables were selected and ranked based on the permutation feature importance extracted from the respective random forest models. Pairwise Spearman correlations between music listening variables and personality outcomes illustrate the directionality of prediction effects. Colors in the two left-most columns indicate the group membership of each music listening variable to either [pink] (A) Audio Characteristics or [green] (L) Lyrics Characteristics, and, in the latter case, the specific type of Lyrics Characteristics, namely [light blue] Emotionality, [orange] Topics, or [teal] Word Embeddings. The two remaining groups of Habitual Listening Behavior and Other Lyrics Characteristics were not represented in the top 10 variables. For visibility, variables based on Lyrics’ Word Embeddings, which are non-interpretable but make up most top predictors, are printed in grey font. In the right-most column, exemplary accumulated local effects (ALEs) are presented to illustrate how model predictions changed on average regarding different values in local value areas of the respective predictor. The x-axis differs depending on the respective variable’s scale (see Table 1) and ranges between the 10th and 90th percentile of the variable’s distribution. ALE values are centered around zero. Further ALE plots are available in the supplemental material (see Figure S1).

Close modal
Table 3 (continued)
Table 3 (continued)
Table 3 (continued)

The leftmost column in Table 3 shows that, across all outcomes, the majority of the most important variables represented lyrics’ characteristics (127), followed by audio characteristics (13), while none of the top predictors captured habitual listening behaviors. This finding confirms the superiority of music preferences over habitual listening behaviors visible in the grouped importance presented earlier (see Figure 3). The color-coding in Table 3’s second leftmost column further indicates that among lyrics characteristics, word embeddings were by far the most relevant group (121), followed by topics (5) and emotionality (1), while other lyrics characteristics were not among the most predictive variables.

For the different outcomes, the variables featured in the top 10 single predictors mostly represent groups identified as most relevant in Figure 3. For example, topics were the most predictive group as a whole and were among the most relevant individual variables for Conscientiousness (C) and its facets Sense of duty (C3), and Ambition (C4). However, there were also some discrepancies, where the most relevant individual variables did not (or only sparsely) contain predictors from the most important group. For example, the facet Openness to aesthetics (O2) had only two audio characteristics but eight lyrics characteristics in its top 10 predictors, even though the combined importance of these groups was reversed in Figure 3. One possible explanation is that audio characteristics are most predictive when combined as a group. For example, the music’s loudness, tempo, and danceability may not be as informative on their own as they are together because only their constellations reveal what a song sounds like (e.g., a fast song vs. a fast, loud, energetic, and danceable song). If that were the case, the random forest models in our grouped benchmarks could have learned interaction effects from audio characteristics, resulting in high grouped prediction performances. In contrast, our single variable importance metric indicates the performance loss after permuting one specific variable and, thus, captures the relevance of a single variable but not its interactions.

For most of the Big Five domains, some individual music listening variables were repeatedly listed in the top ten predictors across facets, highlighting the relevance of these particular variables in the respective random forest models. While many of these recurring variables were word embeddings (e.g., embedding 315 for Openness, embedding 486 for Conscientiousness), we refrain from elaborating on them because word embeddings are non-interpretable. For Openness, predictions across several dimensions were higher for people listening to melodies with quieter, less danceable, and more acoustic audio characteristics (see Table 3). Similarly, two other audio characteristics representing lower energy and higher instrumentalness of melodies were also relevant for the prediction of one Openness facet each. Providing an exemplary effect interpretation, the ALE plots in Table 3 illustrate that random forest models using all variables predicted higher scores in Openness to imagination (O1) for participants listening to music with lower average values on the audio characteristics variable danceability. Regarding Conscientiousness, participants listening to lyrics with more love-related topics (see Figure 4 for topic interpretations) received higher predictions on the domain and two of its facets. For Extraversion, none of the top predictors were relevant across both facets inspected in Table 3. However, for the facet Sociableness (E2), people obtained higher predicted scores if they listened to lyrics with more celebration- and less goth-themed lyrics. Finally, for Agreeableness, people predicted to score high on the first two facets listened to melodies with less energetic and more acoustic audio characteristics. Furthermore, the models for the facet Willingness to trust (A1) predicted higher scores for participants listening to music with less emotionally negative lyrics, as visible in the corresponding ALE plot in Table 3.

Figure 4.
Word Clouds of Most Predictive Lyrics Topics

Note. Keywords of lyrics topics, that belong to the most important predictors of personality in random forest models (see Table 3). Preference for these topics was predictive for different Big Five scores indicated by square brackets: Topic 5 was relevant for higher Sociableness (E2); topic 7 for higher Conscientiousness (C) and its facets Sense of duty (C3) and Ambition (C4); and topic 11 for lower Sociableness (E2). Topics are part of a model with 30 topics obtained from training Latent Dirichlet Allocation on song lyrics. Keywords of the remaining, non-depicted topics can be found in Table S3 in the supplemental material. Word clouds show the 50 most frequent words of each topic. Words occurring in more than 60% of the topics’ top 50 words and meaningless fill words (e.g., “yeah”, “ooh”) were removed for better interpretability. Word size indicates the relative frequency of a word within the topic, whereby larger words are more frequent. Quotation marks contain a post-hoc topic label based on visual inspection of the keywords. This Figure is available under a CC-BY4.0 license at https://osf.io/x7dar/.

Figure 4.
Word Clouds of Most Predictive Lyrics Topics

Note. Keywords of lyrics topics, that belong to the most important predictors of personality in random forest models (see Table 3). Preference for these topics was predictive for different Big Five scores indicated by square brackets: Topic 5 was relevant for higher Sociableness (E2); topic 7 for higher Conscientiousness (C) and its facets Sense of duty (C3) and Ambition (C4); and topic 11 for lower Sociableness (E2). Topics are part of a model with 30 topics obtained from training Latent Dirichlet Allocation on song lyrics. Keywords of the remaining, non-depicted topics can be found in Table S3 in the supplemental material. Word clouds show the 50 most frequent words of each topic. Words occurring in more than 60% of the topics’ top 50 words and meaningless fill words (e.g., “yeah”, “ooh”) were removed for better interpretability. Word size indicates the relative frequency of a word within the topic, whereby larger words are more frequent. Quotation marks contain a post-hoc topic label based on visual inspection of the keywords. This Figure is available under a CC-BY4.0 license at https://osf.io/x7dar/.

Close modal

In the present study, we adopted a personality computing approach to explore individual differences in music listening behavior on smartphones. We extracted an extensive set of variables representing natural music preferences in terms of various audio and lyrics characteristics as well as habitual listening behaviors, which we used to predict the Big Five dimensions on domain and facet level in a machine learning framework. Afterward, we compared the independent contribution of the aspects of music listening, paying special attention to audio vs. lyrics preferences, and we inspected which single variables were most relevant in personality predictions.

Personality Prediction Based on Music Listening Behavior

To quantify the amount of personality-relevant information in digital music listening records from smartphones, we assessed out-of-sample predictions of personality based on an extensive set of music listening variables.

Overall Predictability Levels

Our results show that music listening behavior was moderately predictive of personality with performances of rs > .20 for the significant models, which corresponds to the average reported effect size in personality psychology (Funder & Ozer, 2019). However, we obtained only three significant prediction models and small to moderate effects (rs between .10 and .21) for 11 other personality outcomes. This limited number and magnitude of effects is in line with the few and weak pooled correlations (six out of 30 coefficients ranging between .10 and .21) obtained between the Big Five domains and self-reported music style preferences in a meta-analysis by Schäfer and Mehlhorn (2017). In contrast, our out-of-sample prediction performances were lower, across domains, than those reported in a similar personality computing study by Anderson et al. (2021), who achieved correlations ranging from .26 to .37. between the Big Five and their predictions based on music listening behavior on Spotify. While this latter study may seem to provide a fair comparison due to the close proximity in design, the discrepancy in results may be attributed to Anderson et al.’s (2021) significantly larger sample size (N > 5000) or their inclusion of demographic predictor variables (i.e., age and gender), which are known to be related to personality (Soto et al., 2011). Because our personality models used only behavioral predictors, our results seem reasonable, in particular, considering the bandwidth-fidelity dilemma we faced when predicting the Big Five dimensions, which aggregate the entirety of a person’s thoughts, feelings, and behaviors, from music listening as one narrow excerpt of human behavior (Cronbach & Gleser, 1957; Rauthmann, 2021). We scratched on the lower range of successful personality prediction performances obtained from diverse behavioral indicators of smartphone usage (r =.20 to .40; Stachl et al., 2020) or from digital behaviors explicitly communicating self-views like social media postings (r = .28 to .42; Schwartz et al., 2013).

Differential Predictability Across Personality Dimensions

In our study, Openness and its facets were most predictable from music listening behavior compared to the remaining Big Five dimensions. While this pattern is consistent with past findings on musical style and audio preferences (e.g., Dobrota & Reić Ercegovac, 2015; Greenberg et al., 2016; Nave et al., 2018; Rentfrow & Gosling, 2003; Schäfer & Mehlhorn, 2017), it seemingly contradicts Anderson et al.’s (2021) recent finding that Openness only ranked third in predictability from natural music listening behavior on Spotify. However, their two top-ranking prediction performances for Emotional Stability and Conscientiousness strongly relied on the demographic predictor variable age, while their Openness models were predominantly based on music listening predictors, so our findings align after all. The pattern of Openness being most strongly related to music listening corroborates the Big Five’s conceptualization that more open individuals are generally more interested in different forms of art (DeYoung, 2015).

Albeit not obtaining significant predictions, the dimension of Conscientiousness was second most strongly related to music listening on smartphones. In previous work, Conscientiousness was associated with individuals’ favorite song lyrics (Qiu et al., 2019) but not with preferences for musical styles or audio characteristics (e.g., Greenberg et al., 2016; Nave et al., 2018; Schäfer & Mehlhorn, 2017). This pattern was supported by our grouped and single variable importance metrics indicating that lyrics were of greater relevance than audio characteristics when relating music preferences to Conscientiousness.

The dimensions of Extraversion and Agreeableness were not strongly predicted by our music listening variables, which is in line with a meta-analysis on musical style preferences by Schäfer and Mehlhorn (2017) and findings from music listening behavior on Spotify (I. Anderson et al., 2021). As Anderson et al. (2021) noted, privately listening to music does not provide opportunities for social interaction, which, in turn, may suppress the expression of these socially defined traits (Goldberg, 1990). However, music from smartphones may also be used to promote social interactions (e.g., at parties), so associations with Extraversion and Agreeableness may become visible when considering the social listening context, for example, with whom somebody is listening to music.

Emotional Stability was the least predictable personality dimension in our study, which, again, corresponds to previous studies reporting weak relationships with musical style preferences (e.g., Nave et al., 2018; Schäfer & Mehlhorn, 2017). However, our results conflict with Qiu et al. (2019), who successfully related Emotional Stability to lyrics-based music preferences when only investigating participants’ favorite songs, whose lyrics may be particularly meaningful compared to those of all played songs. While it seems reasonable that Emotional Stability may be connected to music listening (e.g., the emotionality of song lyrics), which is commonly used for emotion regulation, such relationships may vary intra-individually and be dependent on the emotional context of a music listening situation (i.e., the listener’s mood; e.g., Chamorro-Premuzic et al., 2010).

Importance of Different Aspects of Music Listening Behavior

Beyond disclosing its general predictive power, we applied interpretable machine learning techniques to explore which granular aspects of natural music listening behavior were most informative for personality predictions.

Variable Groups

Overall, music preferences in terms of audio and lyrics characteristics were both predictive of listeners’ personalities (especially the Openness dimension), while habitual listening behaviors played no major role in our models. Among lyrics characteristics, the technically most sophisticated but non-interpretable word embeddings were most informative across outcomes, followed by lyrics’ topics (especially for Conscientiousness), while lyrics’ emotionality and other aspects (e.g., lyrics length) appeared less relevant. This rank order among natural language models hints at the advantages of open-vocabulary approaches when predicting personality from textual properties, which was previously reported for other text sources (e.g., G. Park et al., 2015; Schwartz et al., 2013).

At the trait level, preferences for audio and lyrics characteristics exhibited differential prediction performances for most personality dimensions, most notably for Conscientiousness, where lyrics outperformed audio characteristics, and for Extraversion, where audio characteristics outperformed lyrics. These findings may relate to the independent cognitive processing of melodies and lyrics (Besson et al., 1998; Bonnel et al., 2001) and indicate that both audio and lyrics should be considered when investigating music preferences in personality science.

Individual Variables

When considering individual music listening variables, the most important (interpretable) predictors were generally congruent with both past findings and the Big Five conceptualization (see DeYoung, 2015; Goldberg, 1990). As an example, that was the case for the positive associations between calm melodies and Openness to feelings, which were previously reported on the domain-level by other studies on audio characteristics (Dobrota & Reić Ercegovac, 2015; Fricke & Herzberg, 2017), or for the positive relations between celebration-themed lyrics and the Extraversion facet of Sociableness, which support previously found associations between the Extraversion domain and positive emotion words in lyrics (Qiu et al., 2019).

Because our study design did not consider causality, these associations may indicate that listeners adjusted their auditory environments to their personalities or vice versa (e.g., Bleidorn et al., 2020; Buss, 1987; Fleeson, 2001; Rauthmann, 2021; Swann, 1987). On the one hand, people with high levels of Openness to feelings may choose calm melodies to accommodate their emotional sensitivity, and those high in Sociability may listen to celebration-themed lyrics to help them experience positive social interactions. On the other hand, repeated exposure to calm melodies may provide opportunities for emotional experiences, which, in turn, accumulate to higher levels of Openness to feelings. Similarly, frequently listening to celebration-themed lyrics may give rise to positive social interactions and, in the long run, cause people to become more extraverted. While most of our variable importance ranking seems plausible in this sense, some findings were surprising, adding potentially new facets to the theoretical trait concepts. For example, the preference for love-related lyrics is rather difficult to reconcile with high levels of Conscientiousness, a trait typically characterized by planning behavior and obedience to norms (Roberts et al., 2004). In sum, these results demonstrate that specific granular aspects of music listening behavior are distinctly informative about the different Big Five dimensions.

Constraints on Generalizability

We follow the recommendation by Simons et al. (2017) and discuss the generalizability of our empirical findings for different samples, materials, and contexts.

The present study investigated three ad hoc samples of mostly young participants with high education levels, which, given our university recruiting context, suggests that German university students were our proximal population. We are, however, confident that our findings generalize beyond this specific population because the associations we found between personality traits and music preferences generally aligned with those obtained in past studies investigating university students from other countries (e.g., Dobrota & Reić Ercegovac, 2015; Qiu et al., 2019; Rentfrow & Gosling, 2003) or more diverse samples of Facebook users (e.g., Greenberg et al., 2016; Nave et al., 2018). Nevertheless, the young mean age of participants in both our and past music research referenced above may have reduced our sample’s variance in personality traits and music listening behaviors, which both appear to change with age (e.g., Bonneville-Roussy et al., 2013; Lucas & Donnellan, 2011). Hence, we believe our results may not necessarily generalize to samples including older adults, which are currently underrepresented in music listening research. Furthermore, our and past samples were exclusively representative of WEIRD (i.e., Western, Educated, Industrialized, Rich, Democratic) populations (Henrich et al., 2010). While the Big Five structure of personality (e.g., McCrae & Terracciano, 2005) and its reflection in preferences for Western musical styles were found to generalize across countries (Greenberg et al., 2022), the musical styles actually listened to differ between countries and cultures (e.g., Bello & Garcia, 2021; M. Park et al., 2019). Thus, natural music listening behavior and its relation to personality may look differently in non-Western populations. Finally, our specific study’s sample was limited to users of Android smartphones due to technical reasons, excluding those owning iOS devices. However, as previous studies found no meaningful differences in demographic and personality characteristics between Android and iOS users, this bias should not dramatically impact the generalizability of our findings (Götz et al., 2017; Keusch et al., 2020). To summarize, we believe that our findings are representative of young adults in Western societies and recommend that follow-up studies generalize our approach to samples including older adults and other cultures.

While the subject of our study was natural music listening behavior exhibited on smartphones, we assume that our personality predictions transfer to all forms of private digital music consumption, including all listening instances where participants can freely choose what music to listen to from their own or a very large collection of songs. That should include music listening on any digital device with music storing or streaming functionalities, such as computers or smart TVs, because our data collection took place at a time when music streaming was on the rise, but when some people still listened to locally stored music on their smartphones. In contrast, music listening on more old-fashioned analog devices such as record players may differ from that on smartphones due to the restricted availability of contemporary songs in the respective formats. This may, in turn, introduce systematic differences in music preferences between non-digital and digital devices (e.g., playing only oldies on the record player but more modern hits on digital devices) and hinder replication of our study. Furthermore, our personality patterns may not generalize to individuals’ full spectrum of music listening behavior when including instances where music is not self-chosen, such as music listening on the radio or at a café.

The most important aspect of our procedure was that we assessed music listening behavior with high ecological validity and in an unobtrusive and objective manner via smartphone sensing. To replicate our findings, future studies should also assess digital music listening records, either obtained from listening devices or directly from streaming services such as Spotify (see I. Anderson et al., 2021). This procedure, however, excludes populations currently not listening to music digitally, such as older people and people in developing countries with very low smartphone penetration. In contrast, when assessing music listening in a more intrusive way (e.g., in laboratory settings), participants may adapt their behaviors in a socially desirable manner (e.g., based on assumptions about researcher goals), so replication is not guaranteed. Similarly, we do not expect our findings to fully generalize to self-reported music listening behaviors, even though they exhibited some overlap with studies on self-reported music preferences (see Schäfer & Mehlhorn, 2017). Finally, to replicate our procedure, it is important to represent music in terms of the intrinsic properties of its melodies and lyrics instead of broad musical styles or genres. The automatic approaches for extracting these musical characteristics can be transferred to all samples of music worldwide and is, thus, widely applicable.

Beyond the considerations outlined above, we currently have no reason to believe that our results depended on other characteristics of the participants, materials, or procedure.

Limitations and Future Directions

The present study has several limitations. First, the relatively small sample size may have prevented our machine learning algorithms from detecting stable patterns that transfer from training to test sets in our cross-validated resampling scheme. Thus, our low prediction performances represent a rather conservative estimate of how well personality may be predicted from music listening behavior in larger samples. Second, careless or insufficient effort responding to our lengthy self-report measure (300 items) may have further attenuated associations between music listening and personality traits (Curran, 2016; Ward & Meade, 2023). While most of our self-reports appeared plausible, different post-hoc response validity analyses identified few participants suspicious of careless responding (see Text S1; Curran, 2016). However, our random forest algorithms are rather robust to outliers and should, thus, not have been impacted too dramatically by the inclusion of potentially careless responses (Breiman, 2001). Third, the music preference variables extracted from participants’ song records depended on the availability of external song-level information (i.e., Spotify’s audio metrics and Genius’ lyrics), possibly resulting in an underrepresentation of uncommon songs and restricted prediction performances for participants with an exotic taste in music. Fourth, our lyrics-based variables may not necessarily represent conscious preferences for song lyrics because we could not confirm that our German participants had fully understood the mainly English lyrics they were listening to. While most young Germans speak English fluently2, personality patterns in lyrics preferences may be even more pronounced when considering only lyrics in the sample’s mother tongue. Fifth, we could not distinguish instances where participants played the music on their smartphones themselves from those where others (e.g., friends or children) initiated music listening events, which may have introduced noise to participants’ music-listening metrics.

Our study demonstrates the potential of smartphone sensing for music listening research in personality psychology and beyond. As popular music listening devices, smartphones allowed us to collect digital records of participants’ day-to-day listening habits and music preferences over time (Greenberg & Rentfrow, 2017). However, our rather traditional approach to investigating average music listening metrics captures only a small proportion of the information in these longitudinal data. For example, if a person listens to either very calm or very energetic melodies, their average score cannot accurately represent their music listening behavior. Thus, to seize the full potential of digital music listening records, future studies should analyze variations in single listening events over time instead of aggregating them. When investigating listening events nested within persons, personality traits may exhibit relations to intra-individual variations in music listening. For example, the trait Openness, which was previously associated with more diverse average music preferences (Bansal et al., 2021), may be even more predictable from variations within individuals’ music listening events than from aggregated scores. Beyond stable personality traits, future research may also include momentary aspects such as mood states or situation perceptions to explain intra-individual variance in music listening behavior, which remains largely unexplored to this date. Smartphone-based ambulatory assessment has laid the foundation for this kind of research because it enables the simultaneous collection of objective music listening and other contextual data (e.g., places where music listening occurred; see Schoedel et al., 2022) via smartphone sensing and self-reported subjective experiences in-situ via the experience sampling methodology (ESM; van Berkel et al., 2017). The combination of passive smartphone sensing with active experience sampling is quite novel but provides great opportunities for personality research in general (Schoedel et al., 2022). In sum, smartphones open up ample possibilities for investigating the interplay of various enduring and fluctuating variables, which will broaden our understanding of music listening behavior.

The present study demonstrates that smartphone sensing is a promising method to investigate natural music listening behavior and its association with personality. Overcoming self-report assessments of broad musical style preferences, we introduced a personality computing framework for predicting the Big Five dimensions from preferences for intrinsic musical properties and habitual listening behaviors extracted from digital music listening records. Machine learning models revealed that only the personality dimension of Openness was successfully predicted from our music listening variables, corroborating past findings that out of the Big Five, Openness is most strongly related to music listening. In contrast, Conscientiousness and several personality facets showed non-significant but small to moderate prediction effects in our models. Furthermore, our study compared the contribution of audio and lyrics characteristics for relating music preferences to personality, finding that they are both distinctly predictive and that the associations between specific music preference variables and certain personality traits were generally in line with the Big Five’s theoretical conceptualization. In sum, our findings provide new insights into personality patterns in natural music listening behavior, which may be extended in numerous ways using the methodological framework proposed here.

Contributed to conception and design: LS

Contributed to acquisition of data: CS, MB, RS

Contributed to analysis and interpretation of data: LS, GK

Drafted and/or revised the article: LS, RS

Approved the submitted version for publication: LS, RS

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

The authors declare no competing interests, financial or otherwise.

As indicated in the main text, we provide the dataset of aggregated variables used for personality modeling as well as the reproducible code for preprocessing, variable extraction, and predictive modeling in the project’s OSF-repository accessible under the following URL: https://osf.io/x7dar/. However, the privacy-sensitive nature of the smartphone usage data prevents us from openly sharing the raw logging data. This study’s design and analyses are purely exploratory and were not pre-registered. However, a preliminary (and also exploratory) analysis based on a small fraction of our music listening variables was previously part of the author’s master thesis preregistered under https://osf.io/as3ze. While this preregistration does not directly pertain to the current study, it may be considered its groundwork, so we communicate all deviations in our Disclosure of Prior Data Uses available in our project repository.

Acknowledgements

We thank the entire PhoneStudy team at LMU Munich for their continuous and diligent work on the PhoneStudy app, making our research possible. In particular, we thank Tobias Schuwerk for his role in providing parts of the data. We also thank Schuhfried GmbH for providing the Big Five Structure Inventory as a digital version. We further thank Florian Pargent and Felix Schönbrodt for giving insightful modeling advice. Special thanks go to Monika Wintergerst for advising on the language modeling in our research project.

1.

If predictors contain no relevant information for predicting an outcome, the elastic net shrinks their coefficients to zero and returns intercept-only predictions (i.e., it constantly predicts the training data mean), which are mathematically equivalent to our baseline predictions. The intercept-only predictions produce NAs for the Spearman correlation metric (due to their invariance). Thus, outcomes that produced many intercept-only predictions exhibited low variance in the Spearman correlation metric across iterations in Figure 2.

2.

Due to compulsory English language schooling from Kindergarden onwards.

Ali, S. O., & Peynircioğlu, Z. F. (2006). Songs and emotions: Are lyrics and melodies equal partners? Psychology of Music, 34(4), 511–534. https://doi.org/10.1177/0305735606067168
Anderson, C. A., Carnagey, N. L., & Eubanks, J. (2003). Exposure to violent media: The effects of songs with violent lyrics on aggressive thoughts and feelings. Journal of Personality and Social Psychology, 84(5), 960–971. https://doi.org/10.1177/0093650203258281
Anderson, I., Gil, S., Gibson, C., Wolf, S., Shapiro, W., Semerci, O., & Greenberg, D. M. (2021). “Just the way you are”: Linking music listening on Spotify and personality. Social Psychological and Personality Science, 12(4), 561–572. https://doi.org/10.1177/1948550620923228
Apley, D. W., & Zhu, J. (2020). Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(4), 1059–1086. https://doi.org/10.1111/rssb.12377
Arendasy, M. (2009). BFSI: Big-Five Struktur-Inventar (Test Manual). SCHUHFRIED GmbH.
Au, Q. (2020). fxtract: Feature Extraction from Grouped Data (Version 0.9.4.). https://rdrr.io/cran/fxtract/
Aucouturier, J.-J., Pachet, F. (2003). Representing musical genre: A state of the art. Journal of New Music Research, 32(1), 83–93. https://doi.org/10.1076/jnmr.32.1.83.16801
Bailey, M. (2019). NRCLex (Version 3.0.0). https://pypi.org/project/NRCLex/
Bansal, J., Flannery, M. B., Woolhouse, M. H. (2021). Influence of personality on music-genre exclusivity. Psychology of Music, 49(5), 1356–1371. https://doi.org/10.1177/0305735620953611
Baumeister, R. F., Vohs, K. D., Funder, D. C. (2007). Psychology as the science of self-reports and finger movements: Whatever happened to actual behavior? Perspectives on Psychological Science, 2(4), 396–403. https://doi.org/10.1111/j.1745-6916.2007.00051.x
Bello, P., Garcia, D. (2021). Cultural divergence in popular music: The increasing diversity of music consumption on Spotify across countries. Humanities and Social Sciences Communications, 8(1), 1–8. https://doi.org/10.1057/s41599-021-00855-1
Bertin-Mahieux, T., Ellis, D. P. W., Whitman, B., Lamere, P. (2011). The Million Song Dataset. In A. Klapuri C. Leider (Eds.), Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR 2011) (pp. 1–6). University of Miami. https://doi.org/10.7916/D8NZ8J07
Besson, M., Faïta, F., Peretz, I., Bonnel, A.-M., Requin, J. (1998). Singing in the brain: Independence of lyrics and tunes. Psychological Science, 9(6), 494–498. https://doi.org/10.1111/1467-9280.00091
Billboard. (2019, September 12). Weekly time spent listening to music in the United States from 2015 to 2019 (in hours). Statista. https://www.statista.com/statistics/828195/time-spent-music/
Blei, D. M., Ng, A. Y., Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(1), 993–1022.
Bleidorn, W., Hopwood, C. J., Back, M. D., Denissen, J. J. A., Hennecke, M., Jokela, M., Kandler, C., Lucas, R. E., Luhmann, M., Orth, U., Roberts, B. W., Wagner, J., Wrzus, C., Zimmermann, J. (2020). Longitudinal experience–wide association studies - A framework for studying personality change. European Journal of Personality, 34(3), 285–300. https://doi.org/10.1002/per.2247
Bonnel, A.-M., Faita, F., Peretz, I., Besson, M. (2001). Divided attention between lyrics and tunes of operatic songs: Evidence for independent processing. Perception amp; Psychophysics, 63(7), 1201–1213. https://doi.org/10.3758/bf03194534
Bonneville-Roussy, A., Rentfrow, P. J., Xu, M. K., Potter, J. (2013). Music through the ages: Trends in musical engagement and preferences from adolescence through middle adulthood. Journal of Personality and Social Psychology, 105(4), 703–717. https://doi.org/10.1037/a0033770
Bouckaert, R. R., Frank, E. (2004). Evaluating the replicability of significance tests for comparing learning algorithms. In H. Dai, R. Srikant, C. Zhang (Eds.), Advances in Knowledge Discovery and Data Mining (Vol. 3056, pp. 3–12). Springer. https://doi.org/10.1007/978-3-540-24775-3_3
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/a:1010933404324
Buss, D. M. (1987). Selection, evocation, and manipulation. Journal of Personality and Social Psychology, 53(6), 1214–1221. https://doi.org/10.1037/0022-3514.53.6.1214
Casalicchio, G., Molnar, C., Bischl, B. (2019). Visualizing the feature importance for black box models. In M. Berlingerio, F. Bonchi, T. Gärtner, N. Hurley, G. Ifrim (Eds.), Machine Learning and Knowledge Discovery in Databases (Vol. 11051, pp. 655–670). Springer. https://doi.org/10.1007/978-3-030-10925-7_40
Chamorro-Premuzic, T., Fagan, P., Furnham, A. (2010). Personality and uses of music as predictors of preferences for music consensually classified as happy, sad, complex, and social. Psychology of Aesthetics, Creativity, and the Arts, 4(4), 205–213. https://doi.org/10.1037/a0019210
Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., Blei, D. M. (2009). Reading tea leaves: How humans interpret topic models. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, A. Culotta (Eds.), Advances in Neural Information Processing Systems (Vol. 22, pp. 288–296). Curran Associates, Inc.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. https://doi.org/10.1037/0033-2909.112.1.155
Cronbach, L. J., Gleser, G. C. (1957). Psychological tests and personnel decisions. University of Illinois Press.
Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4–19. https://doi.org/10.1016/j.jesp.2015.07.006
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv. https://arxiv.org/abs/1810.04805
DeYoung, C. G. (2015). Openness/intellect: A dimension of personality reflecting cognitive exploration. In M. Mikulincer, P. R. Shaver, M. L. Cooper, R. J. Larsen (Eds.), APA handbook of personality and social psychology, Volume 4: Personality processes and individual differences. (pp. 369–399). American Psychological Association. https://doi.org/10.1037/14343-017
Dobrota, S., Reić Ercegovac, I. (2015). The relationship between music preferences of different mode and tempo and personality traits – Implications for music pedagogy. Music Education Research, 17(2), 234–247. https://doi.org/10.1080/14613808.2014.933790
Flannery, M. B., Woolhouse, M. H. (2021). Musical Preference: Role of personality and music-related acoustic features. Music Science, 4, 205920432110140. https://doi.org/10.1177/20592043211014014
Fleeson, W. (2001). Toward a structure- and process-integrated view of personality: Traits as density distributions of states. Journal of Personality and Social Psychology, 80(6), 1011–1027. https://doi.org/10.1037/0022-3514.80.6.1011
Fricke, K. R., Greenberg, D. M., Rentfrow, P. J., Herzberg, P. Y. (2018). Computer-based music feature analysis mirrors human perception and can be used to measure individual music preference. Journal of Research in Personality, 75, 94–102. https://doi.org/10.1016/j.jrp.2018.06.004
Fricke, K. R., Herzberg, P. Y. (2017). Personality and self-reported preference for music genres and attributes in a German-speaking sample. Journal of Research in Personality, 68, 114–123. https://doi.org/10.1016/j.jrp.2017.01.001
Friedman, J., Hastie, T., Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22. https://doi.org/10.18637/jss.v033.i01
Funder, D. C., Ozer, D. J. (2019). Evaluating effect size in psychological research: Sense and nonsense. Advances in Methods and Practices in Psychological Science, 2(2), 156–168. https://doi.org/10.1177/2515245919847202
Genius. (2022). Getting Started, Genius. https://docs.genius.com/#/getting-started-h1
Goldberg, L. R. (1990). An alternative “description of personality”: The Big-Five factor structure. Journal of Personality and Social Psychology, 59(6), 1216–1229. https://doi.org/10.1037/0022-3514.59.6.1216
Götz, F. M., Stieger, S., Reips, U.-D. (2017). Users of the main smartphone operating systems (iOS, Android) differ only little in personality. PloS ONE, 12(5), e0176921. https://doi.org/10.1371/journal.pone.0176921
Greenberg, D. M., Kosinski, M., Stillwell, D. J., Monteiro, B. L., Levitin, D. J., Rentfrow, P. J. (2016). The song is you: Preferences for musical attribute dimensions reflect personality. Social Psychological and Personality Science, 7(6), 597–605. https://doi.org/10.1177/1948550616641473
Greenberg, D. M., Matz, S. C., Schwartz, H. A., Fricke, K. R. (2020). The self-congruity effect of music. Journal of Personality and Social Psychology, 121(1), 137–150. https://doi.org/10.1037/pspp0000293
Greenberg, D. M., Rentfrow, P. J. (2017). Music and big data: A new frontier. Current Opinion in Behavioral Sciences, 18, 50–56. https://doi.org/10.1016/j.cobeha.2017.07.007
Greenberg, D. M., Wride, S. J., Snowden, D. A., Spathis, D., Potter, J., Rentfrow, P. J. (2022). Universals and variations in musical preferences: A study of preferential reactions to Western music in 53 countries. Journal of Personality and Social Psychology, 122(2), 286–309. https://doi.org/10.1037/pspp0000397
Henrich, J., Heine, S. J., Norenzayan, A. (2010). Most people are not WEIRD. Nature, 466(7302), 29–29. https://doi.org/10.1038/466029a
IFPI. (2019). Global Music Listening Report 2019. https://www.ifpi.org/resources/
IFPI. (2021). Engaging with Music. https://www.ifpi.org/resources/
Keusch, F., Bähr, S., Haas, G.-C., Kreuter, F., Trappmann, M. (2020). Coverage error in data collection combining mobile surveys with passive measurement using apps: Data from a German national survey. Sociological Methods Research, 004912412091492. https://doi.org/10.1177/0049124120914924
Lang, M., Binder, M., Richter, J., Schratz, P., Pfisterer, F., Coors, S., Au, Q., Casalicchio, G., Kotthoff, L., Bischl, B. (2019). mlr3: A modern object-oriented machine learning framework in R. Journal of Open Source Software, 4(44), 1903–1905. https://doi.org/10.21105/joss.01903
Langmeyer, A., Guglhör-Rudan, A., Tarnai, C. (2012). What do music preferences reveal about personality? Journal of Individual Differences, 33(2), 119–130. https://doi.org/10.1027/1614-0001/a000082
Le Pennec, E., Slowikowski, K. (2019). ggwordcloud: A word cloud geom for “ggplot2” (Version 0.5.0). https://CRAN.R-project.org/package=ggwordcloud
Lucas, R. E., Donnellan, M. B. (2011). Personality development across the life span: Longitudinal analyses with a national sample from Germany. Journal of Personality and Social Psychology, 101(4), 847–861. https://doi.org/10.1037/a0024298
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. https://doi.org/10.1007/bf02296272
McCallum, A. K. (2002). MALLET: A Machine Learning for Language Toolkit (Version 2.0.8). http://mallet.cs.umass.edu
McCrae, R. R., John, O. P. (1992). An introduction to the five-factor model and its applications. Journal of Personality, 60(2), 175–215. https://doi.org/10.1111/j.1467-6494.1992.tb00970.x
McCrae, R. R., Terracciano, A. (2005). Universal features of personality traits from the observer’s perspective: Data from 50 cultures. Journal of Personality and Social Psychology, 88(3), 547–561. https://doi.org/10.1037/0022-3514.88.3.547
Miles, J. (1976). Music. Decca.
Mohammad, S. M., Turney, P. D. (2013). Crowdsourcing a word-emotion association lexicon. Computational Intelligence, 29(3), 436–465. https://doi.org/10.1111/j.1467-8640.2012.00460.x
Molnar, C., Bischl, B., Casalicchio, G. (2018). iml: An R package for interpretable machine learning. Journal of Open Source Software, 3(26), 786. https://doi.org/10.21105/joss.00786
Nadeau, C., Bengio, Y. (2003). Inference for the generalization error. Machine Learning, 52(3), 239–281. https://doi.org/10.1023/a:1024068626366
Nave, G., Minxha, J., Greenberg, D. M., Kosinski, M., Stillwell, D., Rentfrow, J. (2018). Musical preferences predict personality: Evidence from active listening and Facebook likes. Psychological Science, 29(7), 1145–1158. https://doi.org/10.1177/0956797618761659
Pargent, F., Schoedel, R., Stachl, C. (2022). Best Practices in Supervised Machine Learning: A Tutorial for Psychologists. PsyArXiv. https://doi.org/10.31234/osf.io/89snd
Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Kosinski, M., Stillwell, D. J., Ungar, L. H., Seligman, M. E. P. (2015). Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 108(6), 934–952. https://doi.org/10.1037/pspp0000020
Park, M., Thom, J., Mennicken, S., Cramer, H., Macy, M. (2019). Global music streaming data reveal diurnal and seasonal patterns of affective preference. Nature Human Behaviour, 3(3), 230–236. https://doi.org/10.1038/s41562-018-0508-z
Phan, L. V., Rauthmann, J. F. (2021). Personality computing: New frontiers in personality assessment. Social and Personality Psychology Compass, 15(7), e12624. https://doi.org/10.1111/spc3.12624
Python Software Foundation. (2021). Python: A dynamic, open source programming language (Version 3.7.10). Python Software Foundation. https://www.python.org
Qiu, L., Chen, J., Ramsay, J., Lu, J. (2019). Personality predicts words in favorite songs. Journal of Research in Personality, 78, 25–35. https://doi.org/10.1016/j.jrp.2018.11.004
R Core Team. (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.r-project.org
Rauthmann, J. F. (2021). Capturing interactions, correlations, fits, and transactions: A person-environment relations model. In J.F. Rauthmann (Ed.), The handbook of personality dynamics and processes (pp. 427–522). Elsevier. https://doi.org/10.1016/b978-0-12-813995-0.00018-2
Rehurek, R., Sojka, R. (2010). Software framework for topic modelling with large corpora. In R. Witte, H. Cunningham, J. Patrick, E. Beisswanger, E. Buyko, U. Hahn, K. Verspoor, A. R. Coden (Eds.), Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (pp. 45–50). University of Malta.
Rentfrow, P. J., Goldberg, L. R., Levitin, D. J. (2011). The structure of musical preferences: A five-factor model. Journal of Personality and Social Psychology, 100(6), 1139–1157. https://doi.org/10.1037/a0022406
Rentfrow, P. J., Gosling, S. D. (2003). The do re mi’s of everyday life: The structure and personality correlates of music preferences. Journal of Personality and Social Psychology, 84(6), 1236–1256. https://doi.org/10.1037/0022-3514.84.6.1236
Roberts, B. W., Bogg, T., Walton, K. E., Chernyshenko, O. S., Stark, S. E. (2004). A lexical investigation of the lower-order structure of conscientiousness. Journal of Research in Personality, 38(2), 164–178. https://doi.org/10.1016/s0092-6566(03)00065-5
Schäfer, T., Mehlhorn, C. (2017). Can personality traits predict musical style preferences? A meta-analysis. Personality and Individual Differences, 116, 265–273. https://doi.org/10.1016/j.paid.2017.04.061
Schoedel, R., Au, Q., Völkel, S. T., Lehmann, F., Becker, D., Bühner, M., Bischl, B., Hussmann, H., Stachl, C. (2019). Digital footprints of sensation seeking. Zeitschrift Für Psychologie, 226(4), 232–245. https://doi.org/10.1027/2151-2604/a000342
Schoedel, R., Kunz, F., Bergmann, M., Bemmann, F., Bühner, M., Sust, L. (2022). Snapshots of Daily Life: Situations Investigated Through the Lens of Smartphone Sensing. PsyArXiv. https://doi.org/10.31234/osf.io/f3htz
Schuwerk, T., Kaltefleiter, L. J., Au, J.-Q., Hoesl, A., Stachl, C. (2019). Enter the wild: Autistic traits and their relationship to mentalizing and social interaction in everyday life. Journal of Autism and Developmental Disorders, 49(10), 4193–4208. https://doi.org/10.1007/s10803-019-04134-6
Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E. P., Ungar, L. H. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS ONE, 8(9), e73791. https://doi.org/10.1371/journal.pone.0073791
Simons, D. J., Shoda, Y., Lindsay, D. S. (2017). Constraints on generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science, 12(6), 1123–1128. https://doi.org/10.1177/1745691617708630
Soto, C. J., John, O. P., Gosling, S. D., Potter, J. (2011). Age differences in personality traits from 10 to 65: Big Five domains and facets in a large cross-sectional sample. Journal of Personality and Social Psychology, 100(2), 330–348. https://doi.org/10.1037/a0021717
Spotify. (2022). Get Track’s Audio Features, Spotify for Developers. https://developer.spotify.com/documentation/web-api/reference/#/operations/get-several-audio-features
Stachl, C., Au, Q., Schoedel, R., Gosling, S. D., Harari, G. M., Buschek, D., Völkel, S. T., Schuwerk, T., Oldemeier, M., Ullmann, T., Hussmann, H., Bischl, B., Bühner, M. (2020). Predicting personality from patterns of behavior collected with smartphones. Proceedings of the National Academy of Sciences, 117(30), 17680–17687. https://doi.org/10.1073/pnas.1920484117
Stachl, C., Hilbert, S., Au, J. Q., Buschek, D., De Luca, A., Bischl, B., Hussmann, H., Bühner, M. (2017). Personality traits predict smartphone usage. European Journal of Personality, 31(6), 701–722. https://doi.org/10.1002/per.2113
Swann, W. B. (1987). Identity negotiation: Where two roads meet. Journal of Personality and Social Psychology, 53(6), 1038–1051. https://doi.org/10.1037/0022-3514.53.6.1038
Tarrant, M., North, A. C., Hargreaves, D. J. (2000). English and American adolescents’ reasons for listening to music. Psychology of Music, 28(2), 166–173. https://doi.org/10.1177/0305735600282005
van Berkel, N., Ferreira, D., Kostakos, V. (2017). The experience sampling method on mobile devices. ACM Computing Surveys, 50(6), 1–40. https://doi.org/10.1145/3123988
Vuoskoski, J. K., Eerola, T. (2011). The role of mood and personality in the perception of emotions represented by music. Cortex, 47(9), 1099–1106. https://doi.org/10.1016/j.cortex.2011.04.011
Ward, M. K., Meade, A. W. (2023). Dealing with careless responding in survey data: Prevention, identification, and recommended best practices. Annual Review of Psychology, 74(1), 577–596. https://doi.org/10.1146/annurev-psych-040422-045007
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis (Version 3.3.5). Springer-Verlag. https://ggplot2.tidyverse.org
Wickham, H., François, R., Henry, L., Müller, K. (2021). dplyr: A Grammar of Data Manipulation (Version 1.0.4). https://CRAN.R-project.org/package=dplyr
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., … Rush, A. (2020). HuggingFace’s Transformers: State-of-the-Art Natural Language Processing. In Q. Liu D. Schlangen (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 38–45). https://doi.org/10.18653/v1/2020.emnlp-demos.6
Wright, M. N., Ziegler, A. (2017). Ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software, 77(1), 1–17. https://doi.org/10.18637/jss.v077.i01
Yarkoni, T., Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 1100–1122. https://doi.org/10.1177/1745691617693393
Zou, H., Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
This is an open access article distributed under the terms of the Creative Commons Attribution License (4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Supplementary Material