It is a long-held belief in psychology and beyond that individuals’ music preferences reveal information about their personality traits. While initial evidence relates self-reported preferences for broad musical styles to the Big Five dimensions, little is known about day-to-day music listening behavior and the intrinsic attributes of melodies and lyrics that reflect these individual differences. The present study (N = 330) proposes a personality computing approach to fill these gaps with new insights from ecologically valid music listening records from smartphones. We quantified participants’ music preferences via audio and lyrics characteristics of their played songs through technical audio features from Spotify and textual attributes obtained via natural language processing. Using linear elastic net and non-linear random forest models, these behavioral variables served to predict Big Five personality on domain and facet levels. Out-of-sample prediction performances revealed that – on the domain level – Openness was most strongly related to music listening (r = .25), followed by Conscientiousness (r = .13), while several facets of the Big Five also showed small to medium effects. Hinting at the incremental value of audio and lyrics characteristics, both musical components were differentially informative for models predicting Openness and its facets, whereas lyrics preferences played the more important role for predictions of Conscientiousness dimensions. In doing so, the models’ most predictive variables displayed generally trait-congruent relationships between personality and music preferences. These findings contribute to the development of a cumulative theory on music listening in personality science and may be extended in numerous ways by future work leveraging the computational framework proposed here.
Music was my first love and it will be the lastMusic of the future and music of the pastTo live without my music would be impossible to doIn this world of troubles my music pulls me through(Miles, 1976)
Most of us will agree with John Miles’ iconic song quote that music plays an important role in our lives. Indeed, we spend nearly one-fourth of our waking time listening to music (Billboard, 2019), and the digitalization of the music market is further increasing these numbers as online streaming services make music more pervasive than ever, with tens of millions of songs accessible anywhere and anytime by over 440 million paid subscribers (IFPI, 2021). This transformation in music consumption has turned streaming platforms and devices into digital sources of music listening data, creating an unprecedented opportunity to investigate natural music listening behavior “in the wild” (see I. Anderson et al., 2021). In doing so, digital listening records provide fine-grained data on various psychologically relevant behavioral outcomes such as music preferences or listening durations (Greenberg & Rentfrow, 2017). Music preferences, in particular, can be automatically represented in terms of the intrinsic properties of the songs played on an everyday basis using tools from computational music information retrieval (e.g., Fricke et al., 2018).
This new ecological validity and granularity in music listening assessment has the potential to push the boundaries of research in personality science, which has long been adopting the interactionist perspective that the music people listen to calibrates their external environments with their personalities and, hence, reflects their individual traits (Greenberg et al., 2020; Rentfrow et al., 2011). Parallel to other types of digital data, such as app usage (Stachl et al., 2017) or social media postings (Schwartz et al., 2013), music listening records can now be assessed for personality-relevant information via machine learning algorithms (see Phan & Rauthmann, 2021). The present study adopts this so-called personality computing approach to overcome methodological limitations of the past and model personality from various indicators of natural music listening on smartphones.
Music Listening in Personality Research
Personality researchers have been exploring the associations between music listening and the Big Five personality traits for the past two decades. These studies have mainly focused on individuals’ preferences for different styles of music, finding the most robust patterns for the personality dimension of Openness, which correlated positively with preferences for intense (e.g., Rock) and complex (e.g., Classical) musical styles (e.g., I. Anderson et al., 2021; Bonneville-Roussy et al., 2013; Greenberg et al., 2016; Langmeyer et al., 2012; Nave et al., 2018; Rentfrow & Gosling, 2003).
However, a meta-analysis of the correlation between musical style preferences and personality concluded that the effect sizes for Openness were rather small across studies (r = .12 for intense and r = .21 for complex music), while the remaining Big Five dimensions exhibited average correlations near zero (Schäfer & Mehlhorn, 2017). The studies included in this meta-analysis shared as a limitation tough that they analyzed music preferences via self-reported genre preferences (e.g., Bonneville-Roussy et al., 2013; Rentfrow & Gosling, 2003) or ratings of musical excerpts (e.g., Langmeyer et al., 2012), which may not accurately represent natural music listening behavior (Greenberg & Rentfrow, 2017). That is because self-reports may suffer from socially desirable responding (e.g., towards music favored by one’s peer group; cf. Tarrant et al., 2000) or biased memory recollection (Baumeister et al., 2007), while affective reactions to artificially manipulated or unreleased music excerpts may not reflect preferences displayed on the natural music market.
Only recently, Anderson et al. (2021) overcame this limitation by investigating music listening behavior exhibited on the streaming service Spotify. They predicted personality in a machine learning framework and achieved moderate to high performances for the Big Five dimensions, whose predicted and self-reported scores correlated at a range between .26 for Agreeableness and .37 for Emotional Stability. While these findings deviate from those of self-report-based studies with regard to the strength and rank order of associations, these discrepancies cannot be directly attributed to the ecologically valid assessment. That is because Anderson et al. (2021) included not only behavioral music preferences as personality predictors but also streaming behaviors (e.g., the streaming device or the number of artists followed) and participants’ demographics (i.e., age and gender). In particular, the demographic predictors, which are known to correlate with personality (see Soto et al., 2011) and improve personality predictions from music preferences (Nave et al., 2018), were among the most predictive variables across all Big Five dimensions except Openness (I. Anderson et al., 2021). Thus, the current state of literature does not allow for unambiguous conclusions about the personality-relevant information contained in natural music listening behavior.
Audio vs. Lyrics Characteristics
While previous studies reported important insights into personality correlates in broad musical style or genre preferences, they rarely investigated music preferences on a more granular level, preventing inferences about the intrinsic musical properties underlying these personality associations (Aucouturier & Pachet, 2003; Rentfrow et al., 2011). Non-instrumental songs, in particular, are defined by audio and lyrics characteristics, which may play a distinct role in music preferences and their association with personality. While empirical findings suggest that melodies and lyrics are independently processed when listening to music (Besson et al., 1998; Bonnel et al., 2001) and that both components have a unique impact on the affective listening experience (Ali & Peynircioğlu, 2006; C. A. Anderson et al., 2003), they were never compared in a comprehensive analysis in personality psychology.
However, few studies have separately related personality traits to preferences for either audio or lyrics characteristics. For audio characteristics, they found that Openness was correlated to preferences for music with a slow tempo, minor mode, acoustic sounds, and negative valence, while Extraversion was related to preferences for music with major mode, high tones, and positive valence (Dobrota & Reić Ercegovac, 2015; Flannery & Woolhouse, 2021; Fricke & Herzberg, 2017; Vuoskoski & Eerola, 2011). Regarding lyrics characteristics, a pioneering study by Qiu et al. (2019) connected the Big Five personality traits with linguistic style preferences in lyrics, reporting the strongest associations for Conscientiousness, which, for example, correlated positively with a preference for achievement words, and for Emotional Stability, which was related to a preference for positive emotion words in lyrics. These preliminary findings indicate that different aspects of music preferences may be of incremental value for personality prediction.
The sparsity of studies investigating intrinsic musical attributes may be ascribed to a lack of automated extraction tools as researchers had to rely on human labeling to quantify audio and lyrics characteristics (e.g., Dobrota & Reić Ercegovac, 2015; Rentfrow et al., 2011). This approach was not only burdensome and practically infeasible for large collections of songs in natural music listening records but also at risk of assessing music’s subjective experience rather than intrinsic musical properties. However, advances in music information retrieval now enable the automatic extraction of musical characteristics from audio recordings or song lyrics. In particular, technical audio characteristics, ranging from basic physical parameters (e.g., tempo) to more complex aggregated features (e.g., valence) learned via machine learning algorithms, can now be obtained in a ready-to-use format from external sources such as Spotify (I. Anderson et al., 2021; Stachl et al., 2020) or via music analysis software (e.g., ESSENTIA; Fricke et al., 2018). To obtain the textual lyrics characteristics, researchers can apply natural language processing (NLP), choosing between closed-vocabulary approaches, which count word usage in a text over pre-defined word categories (see Qiu et al., 2019), and open-vocabulary approaches, which analyze language in a bottom-up manner (e.g., by word clusters). While the closed-vocabulary approaches are often easier to interpret, they are restricted by the word coverage and subjectivity of the underlying dictionaries, which may be why open approaches have proven to be more informative of personality when investigating other sources of written text (e.g., G. Park et al., 2015; Schwartz et al., 2013). These automated approaches for extracting various musical characteristics open up new possibilities for comparing the contribution of melodies and lyrics when predicting personality from music preferences.
The Present Research
The present study applied a personality computing approach to efficiently collect, computationally represent, and jointly model different aspects of music listening behavior. For the ecologically valid assessment of music listening, we used smartphones which are currently the most used device for music listening besides radios (IFPI, 2019) and provide granular digital listening records. We analyzed a smartphone sensing dataset of 330 participants collected over 3 to 85 study days and represented music preferences in terms of intrinsic musical attributes of the songs listened to. Here, we distinguished between preferences quantified via technical audio characteristics from Spotify.com and textual lyrics variables obtained through different natural language models. In addition, we considered habitual listening behaviors that quantified participants’ engagement with music (e.g., their listening duration). An extensive set of 844 strictly behavioral variables served us to predict self-reported Big Five personality trait scores on domain and facet level. To counteract overfitting, we applied two machine learning algorithms suitable for high-dimensional data (i.e., data in which the number of predictors is larger than the number of observations) and evaluated prediction performance in a strict out-of-sample fashion. Finally, we used interpretable machine learning techniques to compare the independent contribution of audio- and lyrics-based preferences and explored which single music listening variables were most important in personality predictions.
Methods
We conducted a secondary data analysis based on three mobile sensing datasets summarized in Stachl et al., 2020. Since the datasets were previously published, we focus our report on procedures and decisions relevant to the present study. Additional details on the study procedures are available in the original articles (Schoedel et al., 2019; Schuwerk et al., 2019; Stachl et al., 2017).
This study’s design and analyses are purely exploratory and were not pre-registered. However, preliminary (and also exploratory) groundwork provided in a student thesis was preregistered under https://osf.io/as3ze. While this preregistration does not directly pertain to the current study, we still communicate deviations in our Disclosure of Prior Data Uses available in our project’s online repository under https://osf.io/x7dar/. In this repository, we also provide the code for preprocessing, variable extraction, and predictive modeling, as well as a dataset of aggregated variables used for predictive modeling. However, please understand that the privacy-sensitive nature of the smartphone usage data prevents us from sharing the raw logging data.
Dataset
In the present study, we re-analyzed data from three separate studies conducted within the PhoneStudy project at LMU Munich between 2014 and 2018 (Schoedel et al., 2019; Schuwerk et al., 2019; Stachl et al., 2017). In the supplemental Table S1, we provide an overview of the included datasets. The procedures of all three studies were approved by institutional review boards and carried out according to EU laws and ethical standards. All subjects participated willingly and gave informed consent prior to their participation.
In all three studies, participants completed a series of self-report questionnaires, including the personality inventory used here. Furthermore, they installed an Android research application on their private smartphones, which logged a variety of smartphone usage behaviors, including music listening, for a period of at least 14 study days. A detailed description of the individual study procedures and all collected measures is available in the respective research articles and in Stachl et al. (2020).
The initial sample was determined by the availability of secondary data and contained logging and self-report data from 684 participants. During pre-processing, we removed participants who had played fewer than five different songs with available lyrics characteristics (see our section on Song-Level Variables), resulting in a sample size of 330 participants (54% women) with sufficient music listening data. We additionally assessed the response validity of our self-report measure but refrained from removing participants based on inconclusive evidence of careless responding (see Text S1; Curran, 2016; Ward & Meade, 2023). Our final sample was skewed towards younger age (M = 22.42, SD = 4.33, Min = 18, Max = 57) and better education (93% with A-levels and 20% with a university degree).
Personality Measure
All three studies used the German Big Five Structure Inventory (BFSI; Arendasy, 2009) to assess personality based on the well-established Big Five taxonomy: Openness, Conscientiousness, Extraversion, Agreeableness, and Emotional Stability (McCrae & John, 1992). The BFSI consists of 300 items (adjectives and short phrases) and measures the Big Five personality dimensions on five broad domains and 30 more specific facets. Item agreement is stated on a 4-point Likert scale ranging from “untypical for me” to “rather untypical for me” to “rather typical for me” to “typical for me.” The BFSI corresponds to the partial credit model (Masters, 1982), which defines an individual’s observed item response as a function of their latent trait value (i.e., their person parameter) and the item’s latent difficulty thresholds. Correspondingly, we used the person parameters assigned to participants based on their item sum scores as personality estimates in our analyses. Confidence intervals of internal consistencies obtained in our sample are available in Table S2 in the supplemental material.
Behavioral Music Listening Measures
An Android-based research application provided raw sensing data on participants’ natural smartphone usage, including their music listening records. Whenever participants had listened to locally stored or streamed music, the app created time-stamped event logs with the title, artist, and album name of the played song.
Song-Level Variables
To describe the played songs in terms of musical attributes, we enriched the music event logs with audio and lyrics characteristics. Therefore, we retrieved additional song-level data from two external sources. We visualize the data enrichment workflow with exemplary songs in Figure 1 and provide further details in the supplemental material (see Text S2).
First, we used Spotify’s Track API to retrieve 12 song-level variables provided by Spotify.com (see Table 1; Spotify, 2022). These variables contained 11 computationally derived technical audio characteristics (e.g., the songs’ “tempo” and “acousticness”) based on the songs’ audio recordings and one lyrics-based variable indicating the presence of explicit lyrical contents (i.e., strong language or references to sexual or violent behavior).
Song-level Variable . | Data Source . | Description . |
---|---|---|
Audio Characteristics | ||
Mode | Spotify API | The song’s modality (major vs. minor), i.e., the type of scale the song’s melodic content is derived from. |
Key | Spotify API | The song’s melodic key in standard pitch class notation (e.g., 0 = C, 1 = C/D, 2 = D). |
Tempo | Spotify API | The song’s overall estimated tempo in beats per minute (BpM). |
Loudness | Spotify API | The song’s average loudness in decibels (dB). |
Energy | Spotify API | The song’s perceived intensity and activity on a scale from 0.0 to 1.0, whereby songs with a high energy feel fast, loud, and noisy. The measure is defined by several elements such as perceived loudness. |
Danceability | Spotify API | The song’s suitability for dancing on a scale from 0.0 to 1.0, whereby higher values represent more danceable songs. The measure combines several musical elements including tempo and overall regularity. |
Acousticness | Spotify API | The song’s acousticness (i.e., absence of electronic sounds) on a scale from 0.0 to 1.0, whereby higher values represent an increased confidence that the song is acoustic. |
Valence | Spotify API | The musical positiveness conveyed by the song, whereby songs with valence closer to 1.0 sound more positive (e.g., happy) and songs with values closer to 0.0 sound more negative (e.g., sad). |
Speechiness | Spotify API | The probability of spoken words in a song, whereby values below 0.33 most likely represent pure music, while higher values represent songs containing both music and speech (e.g., rap music). |
Instrumentalness | Spotify API | The probability of vocals in a song, whereby values closer to 1.0 represent a greater likelihood that the song contains no vocal content. Values above 0.5 most likely represent instrumental songs. |
Liveness | Spotify API | The probability of an audience in the song’s recording, whereby values closer to 1.0 represent a greater likelihood that the song was performed live. Values above 0.8 most likely represent live songs. |
Lyrics Characteristics | ||
Length | Genius API + lyrics NLP | The number of words of the song’s lyrics. |
Language | Genius API + lyrics NLP | The song’s language (i.e., English vs. German vs. Other) derived via language detection from the lyrics. |
Explicit content | Spotify API | The presence of explicit words (e.g., swear words) in the song’s lyrics. |
10 Emotionality scores | Genius API + lyrics NLP | The probability by which the song’s lyrics contain words from ten emotion categories of the NRC Emotion Lexicon (e.g., Positivity, Negativity, Sadness, Anger, Joy, Trust). |
30 Topics | Genius API + lyrics NLP | The probability by which the song’s lyrics belong to each of 30 lyrical topics derived via Latent Dirichlet Allocation. |
768 Word embeddings | Genius API + lyrics NLP | The song’s value on each of the 768 dimensions in the lyrics embedding space of the BERT-model. |
Song-level Variable . | Data Source . | Description . |
---|---|---|
Audio Characteristics | ||
Mode | Spotify API | The song’s modality (major vs. minor), i.e., the type of scale the song’s melodic content is derived from. |
Key | Spotify API | The song’s melodic key in standard pitch class notation (e.g., 0 = C, 1 = C/D, 2 = D). |
Tempo | Spotify API | The song’s overall estimated tempo in beats per minute (BpM). |
Loudness | Spotify API | The song’s average loudness in decibels (dB). |
Energy | Spotify API | The song’s perceived intensity and activity on a scale from 0.0 to 1.0, whereby songs with a high energy feel fast, loud, and noisy. The measure is defined by several elements such as perceived loudness. |
Danceability | Spotify API | The song’s suitability for dancing on a scale from 0.0 to 1.0, whereby higher values represent more danceable songs. The measure combines several musical elements including tempo and overall regularity. |
Acousticness | Spotify API | The song’s acousticness (i.e., absence of electronic sounds) on a scale from 0.0 to 1.0, whereby higher values represent an increased confidence that the song is acoustic. |
Valence | Spotify API | The musical positiveness conveyed by the song, whereby songs with valence closer to 1.0 sound more positive (e.g., happy) and songs with values closer to 0.0 sound more negative (e.g., sad). |
Speechiness | Spotify API | The probability of spoken words in a song, whereby values below 0.33 most likely represent pure music, while higher values represent songs containing both music and speech (e.g., rap music). |
Instrumentalness | Spotify API | The probability of vocals in a song, whereby values closer to 1.0 represent a greater likelihood that the song contains no vocal content. Values above 0.5 most likely represent instrumental songs. |
Liveness | Spotify API | The probability of an audience in the song’s recording, whereby values closer to 1.0 represent a greater likelihood that the song was performed live. Values above 0.8 most likely represent live songs. |
Lyrics Characteristics | ||
Length | Genius API + lyrics NLP | The number of words of the song’s lyrics. |
Language | Genius API + lyrics NLP | The song’s language (i.e., English vs. German vs. Other) derived via language detection from the lyrics. |
Explicit content | Spotify API | The presence of explicit words (e.g., swear words) in the song’s lyrics. |
10 Emotionality scores | Genius API + lyrics NLP | The probability by which the song’s lyrics contain words from ten emotion categories of the NRC Emotion Lexicon (e.g., Positivity, Negativity, Sadness, Anger, Joy, Trust). |
30 Topics | Genius API + lyrics NLP | The probability by which the song’s lyrics belong to each of 30 lyrical topics derived via Latent Dirichlet Allocation. |
768 Word embeddings | Genius API + lyrics NLP | The song’s value on each of the 768 dimensions in the lyrics embedding space of the BERT-model. |
Note. Spotify variable descriptions were derived from Spotify.com. API = application program interface; API calls retrieved ready-to-use variables from Spotify.com and raw song lyrics from Genius.com. NLP = natural language processing; NLP extracted variables from the song lyrics.
In addition, we retrieved song lyrics from Genius.com and created meaningful textual variables via a text-mining pipeline combining closed and open vocabulary approaches (see Table 1; Genius, 2022). We describe all lyrics analyses at an abstract level here and provide further details in the supplemental material (see Text S3). We extracted two stylistic variables representing the lyrics’ length and language and applied three natural language models to quantify the content characteristics of the lyrics. First, we detected the emotional content of the lyrics using the NRC Word-Emotion Association Lexicon (Mohammad & Turney, 2013). Based on word occurrences, the NRC lexicon assigned each song a score on two sentiments (positive and negative valence) and eight emotion categories (anger, anticipation, disgust, fear, joy, sadness, surprise, and trust). Second, we applied Latent Dirichlet Allocation (LDA; Blei et al., 2003) to obtain the topics covered in the song lyrics. This generative probabilistic model assumes that each document (in our case, song lyrics) in a corpus contains a mixture of latent topics, where each topic is a cluster of co-occurring words. To avoid overfitting the LDA to sample-specific patterns in our lyrics corpus, we pre-trained the model on a large external lyrics corpus (the Million Song Dataset; Bertin-Mahieux et al., 2011). We determined the topic count such that the topic coherence (i.e., the semantic similarity between words within a topic; Chang et al., 2009) was maximized, which resulted in a model with 30 topics. This pre-trained topic model assigned each song in our corpus to a score on each of the 30 topics. We provide details on the topic modeling, including coherence metrics (see Table S3) and topic keywords (see Table S4), in the supplemental material. Finally, we represented the lyrics as word embeddings using the state-of-the-art Bidirectional Encoder Representations from Transformers (BERT; Devlin et al., 2018). BERT embeddings use a neural network architecture to convert textual data into context-sensitive numerical representations. We employed the pre-trained BERT implementation from the HuggingFace framework (Wolf et al., 2020) to extract one embedding vector for each song’s full lyrics. This BERT vector had a length of 768, so each song was assigned a score on 768 embedding dimensions. Again, more details on the BERT modeling are available in the supplemental material (see Text S3).
In total, we computed 822 variables quantifying different intrinsic musical characteristics of the songs played in our study (see Table 1). These song-level variables were assigned to the respective music events in the logging data. Figure 1 illustrates this matching and provides examples of the face validity of song-level variables. However, not all music listening events could be enriched because some contained non-musical tracks (e.g., audiobooks), had incorrect song information (e.g., typos in the song title), or were not covered by the respective online sources.
Person-Level Variables
In the next preprocessing step, we used the song-level enriched music event logs to extract person-level variables capturing music preferences and habitual listening behaviors. Therefore, we first reduced the logs to music events that lasted longer than 20 seconds to exclude skipped songs. Furthermore, we removed music events from the first study day to avoid potential reactivity biases.
We aggregated the distribution of the song-level variables (see Table 1) over each participant’s played songs via the arithmetic mean (for numeric variables) or percentage scores (for factor variables). We focused on participants’ average music preferences to limit our predictor space while also enabling a comparison with past research (e.g., Nave et al., 2018; Rentfrow & Gosling, 2003; Schäfer & Mehlhorn, 2017). The resulting 833 variables covered average music preferences for 1) audio characteristics (e.g., the mean tempo of played songs) and 2) lyrics characteristics represented by a) emotion scores (e.g., the mean negative emotionality of played songs), b) topics (e.g., the mean probability of the topic “love” in played songs), c) word embeddings (e.g., the mean embedding dimension 1 of played songs), and d) other lyrics characteristics (e.g., the percentage of English songs among all played songs). As noted above, the external song-level variables were not available for all tracks, so the music preference variables only covered a portion of participants’ played tracks. On average, participants’ preferences for Spotify-based variables covered 57% (SD = 0.21) of played tracks, while preferences for lyrics-based variables covered 42% (SD = 0.19). To account for the limited song coverage, we created an additional validity variable indicating the proportion of participants’ songs represented by lyrics-based preference variables.
In addition, we extracted ten variables on habitual listening behaviors by quantifying the extent of participants’ music consumption, for example, the total number of played songs, the number of unique artists listened to, or the average daily number of played songs.
In total, we obtained 844 variables capturing participants’ music preferences and habitual listening behaviors, which served as predictors in our personality predictions. We provide a list of all person-level variables, including summary statistics, in our online repository.
Personality Predictions
Machine Learning Analyses
We trained machine learning models for the prediction of the five domains and 30 facets of our personality inventory. While we provide a short overview of basic machine learning concepts relevant to understanding our study here, a more detailed introduction to supervised machine learning can be found in a state-of-the-art tutorial by Pargent et al. (2022).
Models. For each personality outcome, as a benchmark, we compared the predictive performance of elastic net (Zou & Hastie, 2005) and random forest models (Breiman, 2001) with those of a featureless baseline model. The baseline model predicted the mean personality score of a training set for all observations in a respective test set. The elastic net model is an extension of basic linear regression that applies two regularization penalties to encourage simpler models, and the random forest aggregates the output of multiple decision trees to account for non-linear relationships. We chose these models because of their ability to automatically perform a selection of relevant predictors, allowing them to cope with high-dimensional and inter-correlated predictor spaces in small samples. We used the default settings of the models’ hyperparameters as specified in their implementation within the mlr3 environment (e.g., Lang et al., 2019).
Resampling Strategy. For a strict separation of training and test data, we estimated the models’ expected predictive performance on unseen data using 10-times repeated 10-fold cross-validation (10x10 CV). In this cross-validation scheme, a dataset is randomly split into 10 folds, and each fold serves as an unseen hold-out set for prediction (i.e., the test set) once, while the models are trained on the data of the remaining nine folds (i.e., the training set). Prediction performance is computed separately for each fold of 10x10 CV and then aggregated to the mean across the 100 iterations per model. Such out-of-sample prediction performances have a reduced risk of overfitting sample-specific patterns and provide a more reliable estimate of the models’ ability to make predictions in new samples (e.g., Yarkoni & Westfall, 2017).
Performance Evaluation. We evaluated model performances by correlating predicted personality scores with the person-parameter estimates from the self-reported personality trait measure using Spearman rank order correlations (rs). However, the baseline model produced invariant predictions (i.e., the training set’s mean) across all observations, preventing us from calculating this correlation metric. Hence, we additionally determined the mean squared error (MSE) for all models (see the supplemental Text S4 for the respective formulas). We tested if the MSEs of our prediction models were significantly lower than those of the corresponding featureless baselines. We treated the MSEs of prediction vs. baseline models obtained in the same cross-validation iteration as dependent pairs (due to their shared training set) and compared them across iterations using variance-corrected pairwise student t-tests (one-sided; Bouckaert & Frank, 2004; Nadeau & Bengio, 2003; Stachl et al., 2020). For each personality outcome, we adjusted for multiple comparisons (n = 2 models against the common baseline) via Bonferroni correction. Based on this conservative approach, prediction models with a significantly smaller MSE than the baseline were considered predictive as they were consistently successful across resampling iterations.
Interpretable Machine Learning
Machine learning models often lack natural interpretability, so we combined different approaches to gain insights into successful prediction models. First, we grouped our variables by the overarching aspects of music listening (e.g., audio vs. lyrics preferences) they represented and investigated the unique importance of these groups as a whole. We used the settings described above (10x10 CV) and ran additional benchmark analyses with seven different subsets of music listening variables. More specifically, we compared the independent predictive performance of 1) habitual listening behaviors, 2) preferences for audio characteristics, and 3) preferences for lyrics characteristics, whereby the third group was considered both in aggregation and separately by the types of lyrics information, namely lyrics’ a) emotionality, b) topics, c) word embeddings, and d) other lyrics characteristics. As an effect size index of the groups’ importance, we considered their individual performance in terms of the Spearman correlation (rs) metric and computed variance-corrected 95% confidence intervals based on the student t-distribution (Bouckaert & Frank, 2004; Nadeau & Bengio, 2003). However, we refrained from conducting significance tests for between-group comparisons due to the highly exploratory nature of these analyses.
For insights into the importance of single predictors within the full set of music listening variables, we applied interpretable machine learning tools to the full personality prediction models. For random forest models, we computed permutation variable importance, which measures the decrease in a model’s prediction performance after randomly permuting one single variable (Casalicchio et al., 2019). Variable importance scores were aggregated across 50 iterations to provide stable estimates. For elastic net models, we considered the model-inherent, non-standardized beta weights known from simple linear regression.
To further explore predictor effects, we extracted the 15 most important variables of the respective models and illustrated their influence on the prediction with accumulated local effects (ALE; Apley & Zhu, 2020). ALE plots visualize the effect of an individual predictor variable by showing how its manifestations, on average, affect the model prediction.
Statistical Software
API calls and natural language processing analyses were conducted in Python, version 3.7.10 (Python Software Foundation, 2021). We used the libraries MALLET (McCallum, 2002) and gensim (Rehurek & Sojka, 2010) for Latent Dirichlet Allocation, the library NRXLex (Bailey, 2019) for emotion analysis, and the Hugging Face Transformers (Wolf et al., 2020) for extracting BERT embeddings.
All other analyses were conducted with the statistical software R (version 4.0.3 for preprocessing and version 4.2.1 for data analysis; R Core Team, 2022). We used the packages dplyr (version 1.0.7, Wickham et al., 2021) and fxtract (version 0.9.4, Au, 2020) for extracting person-level variables. For predictive modeling, we employed the packages mlr3 (version 0.14.1, Lang et al., 2019), glmnet (version 4.1-6, Friedman et al., 2010), and ranger (version 0.14.1, Wright & Ziegler, 2017). Furthermore, we used iml (version 0.11.1, Molnar et al., 2018) for interpretable machine learning. Finally, the packages ggplot2 (version 3.3.5, Wickham, 2016) and ggwordcloud (version 0.5.0, Le Pennec & Slowikowski, 2019) served for visualizing our results.
Results
Descriptive Statistics
Across our sample, participants provided between 3 and 85 days of logged smartphone data (M = 43.4, SD = 15.8) and, on average, listened to music on half of these days (M = 47.1%, SD = 28.5%). They used an average of 2.3 different music apps (SD = 1.4), with Spotify being the most used app (40.6%), followed by Android Music (19.7%) and Google Play Music (9.1%). The number of songs listened to per participant ranged between 5 and 4387 (M = 397.6, SD = 547.2), and, on average, participants played 9.4 songs per day (SD = 12.7) for 31.4 minutes (SD = 42.9). Participants’ self-reports are summarized in the supplemental material (see Table S2). Furthermore, we provide detailed descriptive statistics for behavioral variables, including pairwise Spearman correlations with self-reports, in our online project repository.
Personality Predictions
In our main benchmark analysis, we evaluated the performance of two machine learning algorithms predicting personality from our full spectrum of music listening variables. In this analysis, the linear elastic net and non-linear random forest models obtained similar prediction performances for most Big Five dimensions (see Figure 2). However, the elastic net produced only one (instead of three) significant models (see Table 2) and failed to make variable-based predictions in many of the 100 resampling iterations of the 10x10 CV scheme for several personality dimensions in Figure 2 (e.g., the facet Modesty of Agreeableness)1. Hence, we focus our reports on the random forest models in the remainder of this article.
Random Forest | Elastic Net | Baseline | |||||||
Personality Dimension | \(r_{\text{s}}\) | MSE | \(p_{\text{adj}}\) | \({r}_{\text{s}}\) | MSE | \({p}_{\text{adj}}\) | MSE | ||
*(O) Openness | .25 | 0.50 | .041 | .27 | 0.49 | .012 | 0.53 | ||
*(O1) Openness to imagination | .23 | 1.91 | .032 | .19 | 1.98 | .400 | 2.04 | ||
(O2) Openness to aesthetics | .21 | 1.64 | .151 | .18 | 1.66 | .371 | 1.71 | ||
*(O3) Openness to feelings | .22 | 4.39 | .027 | .19 | 4.53 | .237 | 4.64 | ||
(O4) Openness to actions | .03 | 2.23 | 1 | .12 | 2.16 | .726 | 2.18 | ||
(O5) Openness to ideas | .15 | 2.16 | .378 | .18 | 2.13 | .166 | 2.22 | ||
(O6) Openness to value & norm | .15 | 1.09 | 1 | .10 | 1.09 | 1 | 1.08 | ||
(C) Conscientiousness | .13 | 0.55 | .987 | .14 | 0.54 | .527 | 0.55 | ||
(C1) Competence | -.02 | 1.45 | 1 | -.01 | 1.40 | 1 | 1.39 | ||
(C2) Love of order | .15 | 2.37 | .686 | .11 | 2.39 | .998 | 2.39 | ||
(C3) Sense of duty | .11 | 1.91 | 1 | .09 | 1.91 | 1 | 1.91 | ||
(C4) Ambition | .13 | 2.87 | 1 | .10 | 2.85 | 1 | 2.84 | ||
(C5) Discipline | .04 | 2.23 | 1 | .08 | 2.17 | 1 | 2.17 | ||
(C6) Caution | .05 | 1.92 | 1 | -.09 | 1.89 | 1 | 1.87 | ||
(E) Extraversion | .07 | 0.53 | 1 | .03 | 0.54 | 1 | 0.53 | ||
(E1) Friendliness | .13 | 1.55 | .540 | .17 | 1.55 | .568 | 1.57 | ||
(E2) Sociableness | .10 | 2.97 | .444 | .06 | 3.03 | 1 | 3.03 | ||
(E3) Assertiveness | .07 | 1.90 | 1 | -.08 | 1.90 | 1 | 1.88 | ||
(E4) Dynamism | .08 | 2.56 | 1 | -.07 | 2.57 | 1 | 2.54 | ||
(E5) Adventurousness | .06 | 2.29 | 1 | 0 | 2.28 | 1 | 2.26 | ||
(E6) Cheerfulness | .01 | 2.82 | 1 | .13 | 2.71 | .612 | 2.74 | ||
(A) Agreeableness | .04 | 0.63 | 1 | 0 | 0.63 | 1 | 0.62 | ||
(A1) Willingness to trust | .11 | 2.14 | .851 | 0 | 2.18 | 1 | 2.15 | ||
(A2) Genuineness | .11 | 1.01 | .898 | .13 | 1 | .386 | 1.01 | ||
(A3) Helpfulness | -.03 | 1.98 | 1 | -.14 | 1.92 | 1 | 1.91 | ||
(A4) Obligingness | -.02 | 1.96 | 1 | -.15 | 1.94 | 1 | 1.92 | ||
(A5) Modesty | -.09 | 1.38 | 1 | -.28 | 1.31 | 1 | 1.31 | ||
(A6) Good naturedness | .03 | 3.52 | 1 | -.03 | 3.51 | 1 | 3.47 | ||
(ES) Emotional Stability | -.06 | 0.55 | 1 | -.30 | 0.53 | 1 | 0.52 | ||
(ES1) Carefreeness | .05 | 1.80 | 1 | -.17 | 1.78 | 1 | 1.77 | ||
(ES2) Equanimity | -.01 | 1.20 | 1 | -.11 | 1.16 | 1 | 1.16 | ||
(ES3) Positive mood | -.03 | 2.24 | 1 | -.08 | 2.18 | 1 | 2.15 | ||
(ES4) Self consciousness | .08 | 1.40 | .905 | .05 | 1.42 | 1 | 1.40 | ||
(ES5) Self control | -.08 | 1.03 | 1 | 0 | 0.99 | 1 | 0.99 | ||
(ES6) Emotional robustness | -.05 | 1.44 | 1 | -.24 | 1.40 | 1 | 1.39 |
Random Forest | Elastic Net | Baseline | |||||||
Personality Dimension | \(r_{\text{s}}\) | MSE | \(p_{\text{adj}}\) | \({r}_{\text{s}}\) | MSE | \({p}_{\text{adj}}\) | MSE | ||
*(O) Openness | .25 | 0.50 | .041 | .27 | 0.49 | .012 | 0.53 | ||
*(O1) Openness to imagination | .23 | 1.91 | .032 | .19 | 1.98 | .400 | 2.04 | ||
(O2) Openness to aesthetics | .21 | 1.64 | .151 | .18 | 1.66 | .371 | 1.71 | ||
*(O3) Openness to feelings | .22 | 4.39 | .027 | .19 | 4.53 | .237 | 4.64 | ||
(O4) Openness to actions | .03 | 2.23 | 1 | .12 | 2.16 | .726 | 2.18 | ||
(O5) Openness to ideas | .15 | 2.16 | .378 | .18 | 2.13 | .166 | 2.22 | ||
(O6) Openness to value & norm | .15 | 1.09 | 1 | .10 | 1.09 | 1 | 1.08 | ||
(C) Conscientiousness | .13 | 0.55 | .987 | .14 | 0.54 | .527 | 0.55 | ||
(C1) Competence | -.02 | 1.45 | 1 | -.01 | 1.40 | 1 | 1.39 | ||
(C2) Love of order | .15 | 2.37 | .686 | .11 | 2.39 | .998 | 2.39 | ||
(C3) Sense of duty | .11 | 1.91 | 1 | .09 | 1.91 | 1 | 1.91 | ||
(C4) Ambition | .13 | 2.87 | 1 | .10 | 2.85 | 1 | 2.84 | ||
(C5) Discipline | .04 | 2.23 | 1 | .08 | 2.17 | 1 | 2.17 | ||
(C6) Caution | .05 | 1.92 | 1 | -.09 | 1.89 | 1 | 1.87 | ||
(E) Extraversion | .07 | 0.53 | 1 | .03 | 0.54 | 1 | 0.53 | ||
(E1) Friendliness | .13 | 1.55 | .540 | .17 | 1.55 | .568 | 1.57 | ||
(E2) Sociableness | .10 | 2.97 | .444 | .06 | 3.03 | 1 | 3.03 | ||
(E3) Assertiveness | .07 | 1.90 | 1 | -.08 | 1.90 | 1 | 1.88 | ||
(E4) Dynamism | .08 | 2.56 | 1 | -.07 | 2.57 | 1 | 2.54 | ||
(E5) Adventurousness | .06 | 2.29 | 1 | 0 | 2.28 | 1 | 2.26 | ||
(E6) Cheerfulness | .01 | 2.82 | 1 | .13 | 2.71 | .612 | 2.74 | ||
(A) Agreeableness | .04 | 0.63 | 1 | 0 | 0.63 | 1 | 0.62 | ||
(A1) Willingness to trust | .11 | 2.14 | .851 | 0 | 2.18 | 1 | 2.15 | ||
(A2) Genuineness | .11 | 1.01 | .898 | .13 | 1 | .386 | 1.01 | ||
(A3) Helpfulness | -.03 | 1.98 | 1 | -.14 | 1.92 | 1 | 1.91 | ||
(A4) Obligingness | -.02 | 1.96 | 1 | -.15 | 1.94 | 1 | 1.92 | ||
(A5) Modesty | -.09 | 1.38 | 1 | -.28 | 1.31 | 1 | 1.31 | ||
(A6) Good naturedness | .03 | 3.52 | 1 | -.03 | 3.51 | 1 | 3.47 | ||
(ES) Emotional Stability | -.06 | 0.55 | 1 | -.30 | 0.53 | 1 | 0.52 | ||
(ES1) Carefreeness | .05 | 1.80 | 1 | -.17 | 1.78 | 1 | 1.77 | ||
(ES2) Equanimity | -.01 | 1.20 | 1 | -.11 | 1.16 | 1 | 1.16 | ||
(ES3) Positive mood | -.03 | 2.24 | 1 | -.08 | 2.18 | 1 | 2.15 | ||
(ES4) Self consciousness | .08 | 1.40 | .905 | .05 | 1.42 | 1 | 1.40 | ||
(ES5) Self control | -.08 | 1.03 | 1 | 0 | 0.99 | 1 | 0.99 | ||
(ES6) Emotional robustness | -.05 | 1.44 | 1 | -.24 | 1.40 | 1 | 1.39 |
Note. Performance metrics were first computed separately for each of the 100 iterations of our cross-validation scheme (10x10 CV) and then aggregated to the mean. = Spearman’s rank order correlation between predicted and measured personality scores. MSE = Mean squared error. = Bonferroni adjusted p-values of variance corrected one-sided t-tests comparing the MSE measures of prediction models with the baseline. Overarching personality domains are printed in bold font. Significant models ( = .05) are indicated by an asterisk.
The results summarized in Table 2 show that the Big Five personality dimension Openness (O) and its facets Openness to imagination (O1) and Openness to feelings (O3) were successfully predicted from our music listening variables. That means the MSEs of their random forest models across resampling iterations were, on average, significantly lower than those of the featureless baseline model. While the remaining Big Five criteria exhibited no significant reduction in MSEs, the distribution of correlations between predicted and self-reported personality scores in Figure 2 reveals promising prediction performances in many resampling iterations for several other personality dimensions. More specifically, 14 outcomes in Table 2 exhibited a small- to medium-sized mean correlation on or above a threshold of .10 suggested by Cohen’s (1992) effect size conventions (rs between .10 and .25). Inspection of these selected outcomes suggests that our random forest models worked best for the domain Openness (O, rs = .25) and its facets Openness to imagination (O1, rs = .23), followed by Openness to feelings (O3, rs = .22), Openness to aesthetics (O2, rs = .21), Openness to ideas (O5, rs = .15), and Openness to value and norm (O6, rs = .15). Second best prediction performances were obtained for the domain Conscientiousness (C, rs = .13) and its facets Love of order (C2, rs = .15), followed by Ambition (C4, rs = .13) and Sense of duty (C3, rs = .11). In contrast, the remaining facets of Openness and Conscientiousness exhibited correlations close to zero. While the domains Extraversion (E) and Agreeableness (A) obtained correlations below .10, each two of their six facets showed moderate prediction performances, namely Friendliness (E1, rs = .13) and Sociableness (E2, rs = .10) as well as Willingness to trust (A1, rs = .11) and Genuineness (A2, rs = .11). Only the dimensions of Emotional Stability were completely unrelated to music listening behavior according to our performance metrics. Please note that all models with moderate prediction performance, including those reaching significance, also contained few resampling iterations with a negative correlation between predicted and self-reported outcomes in Figure 2, indicating that the random forests failed to learn systematic patterns in some instances.
Interpretation of Prediction Models
After providing an overview of how well different personality dimensions can be predicted from music listening variables, we considered what aspects of music listening drove our models’ predictions. We applied two interpretation approaches to all random forest models with a minimum mean performance of rs = .10 listed above.
Importance of Variable Groups
We conducted an additional benchmark analysis comparing the independent performance of each group of music listening variables when separately predicting the respective personality scores. We report prediction performances in terms of the average Spearman correlation with 95% confidence intervals across iterations in Table S5 of the supplemental material and illustrate them in Figure 3. The unique prediction performance represents the relevance of each variable group as a whole (i.e., including all of its variables and their interactions) for our random forest models.
Figure 3 shows that, across all personality outcomes, habitual listening behaviors (range rs = -.04 to .14) were less predictive than music preferences (range rs = -.04 to .29). In contrast, preferences for audio and lyrics characteristics were relevant for many outcomes. Audio characteristics obtained the highest prediction performances for Openness dimensions (range rs = .09 to .29) and lowest ones for Conscientiousness facets (e.g., range rs = -.04 to .17). Lyrics characteristics were also particularly informative about Openness dimensions (range rs = .15 to .23) but least relevant for facets of Extraversion (range rs = .10 to .13) and Agreeableness (range rs = .09 to .12). Among lyrics characteristics, word embeddings were the most relevant group for the largest number of outcomes (8), followed by topics (2), other lyrics characteristics (2), and emotionality (1) – not regarding ties between groups. One may argue that the superiority of word embeddings is related to the large size of this predictor group (i.e., 768 lyrics word embeddings vs. 30 lyrics topics in the second largest group). However, as seen in Figure 3, other aspects of lyrics (e.g., topics for Conscientiousness) and also audio characteristics (e.g., for the facet Openness to aesthetics) outweighed word embeddings for several outcomes, indicating that the number of variables per group does not determine its prediction performance.
Looking further into the relevance of music preferences, we can compare their importance for all four personality domains featured in Figure 3. For several Openness dimensions, preferences for audio (range rs = .09 to .29) and lyrics characteristics (range rs = .15 to .23) were both informative for predictions with differential patterns per dimension. In particular, lyrics characteristics were more relevant for the domain itself (O, rs = .20 for audio vs. rs = .23 for lyrics) and its facets Openness to imagination (O1, rs = .13 for audio vs. rs = .23 for lyrics) and Openness to ideas (O5, rs = .09 for audio vs. rs = .16 for lyrics), while audio characteristics were more important for the facets Openness to aesthetics (O2, rs = .26 for audio vs. rs = .19 for lyrics) and Openness to feelings (O3, rs = .29 for audio vs. rs = .21 for lyrics). For Openness to value and norm (O6), audio and lyrics preferences were equally predictive (both rs = .15). Taking a closer look at the different types of lyrics information, word embeddings were most relevant for Openness predictions (range rs = .15 to .23), followed by topics (range rs = .07 to .19), other lyrics characteristics (range rs =.09 to .24) and emotionality (range rs = -.03 to .10). Only for the facet Openness to ideas, other lyrics characteristics produced best predictions (rs = .24). For the domain Conscientiousness (C, rs = .07 for audio vs. rs = .12 for lyrics) and its facets Sense of duty (C3, rs = -.04 for audio vs. rs = .11 lyrics) and Ambition (C4, rs = -.03 for audio vs. rs = .15 for lyrics), lyrics characteristics were more informative for prediction models because audio characteristics were (almost) unpredictive. Only for the facet Love of order (C2, rs = .17 for audio vs. rs = .16 for lyrics), audio and lyrics characteristics were similarly important. More specifically, lyrics’ topics (range rs = .10 to .16) and word embeddings (range rs = .11 to .16) were particularly meaningful, while emotionality (range rs = -.10 to .03) and other lyrics characteristics (range rs = .02 to .11) were not very predictive. For the Extraversion facets Friendliness (E1, rs = .16 for audio vs. rs = .13 for lyrics) and Sociableness (E2, rs = .16 for audio vs. rs = .10 for lyrics), audio characteristics were more relevant for predictions compared to lyrics, whose different types of variables were similarly predictive. Finally, lyrics preferences were slightly more predictive for the Agreeableness facet Willingness to trust (A1, rs = .08 for audio vs. rs = .12 for lyrics), while audio preferences were more relevant for the facet Genuineness (A2, rs = .18 for audio vs. rs = .09 for lyrics). Here, the different aspects of song lyrics were again of comparable relevance.
As seen in the importance measures above, some variable groups performed better on their own than in combination with the remaining variables (see Table 2 and Table S5). For example, Openness to feelings (O3) obtained better performances when predicted only from audio characteristics (rs = .29) compared to the performance of the full predictor set in Table 2 (rs = .22). Please note, however, that these results were obtained for different benchmarks, and that the full variable set performance in the grouped benchmark are reported in Table S5. Such discrepancies highlight the predictive power of single variable groups for the respective outcome and indicate that some of the other groups introduced noise that hindered random forest models from learning systematic patterns.
Importance of Single Variables
We also explored which variables – considered individually among the full set of music listening variables – were most important for predicting each personality dimension. Therefore, we considered the loss in prediction performance after permuting a single variable of the random forest models. In Table 3, we present the top ten variables (i.e., those causing the greatest performance loss) for each outcome with some exemplary variable effects in ALE plots. In addition, we provide full lists of variable importance and beta weights for the elastic net models in the online project repository.
The leftmost column in Table 3 shows that, across all outcomes, the majority of the most important variables represented lyrics’ characteristics (127), followed by audio characteristics (13), while none of the top predictors captured habitual listening behaviors. This finding confirms the superiority of music preferences over habitual listening behaviors visible in the grouped importance presented earlier (see Figure 3). The color-coding in Table 3’s second leftmost column further indicates that among lyrics characteristics, word embeddings were by far the most relevant group (121), followed by topics (5) and emotionality (1), while other lyrics characteristics were not among the most predictive variables.
For the different outcomes, the variables featured in the top 10 single predictors mostly represent groups identified as most relevant in Figure 3. For example, topics were the most predictive group as a whole and were among the most relevant individual variables for Conscientiousness (C) and its facets Sense of duty (C3), and Ambition (C4). However, there were also some discrepancies, where the most relevant individual variables did not (or only sparsely) contain predictors from the most important group. For example, the facet Openness to aesthetics (O2) had only two audio characteristics but eight lyrics characteristics in its top 10 predictors, even though the combined importance of these groups was reversed in Figure 3. One possible explanation is that audio characteristics are most predictive when combined as a group. For example, the music’s loudness, tempo, and danceability may not be as informative on their own as they are together because only their constellations reveal what a song sounds like (e.g., a fast song vs. a fast, loud, energetic, and danceable song). If that were the case, the random forest models in our grouped benchmarks could have learned interaction effects from audio characteristics, resulting in high grouped prediction performances. In contrast, our single variable importance metric indicates the performance loss after permuting one specific variable and, thus, captures the relevance of a single variable but not its interactions.
For most of the Big Five domains, some individual music listening variables were repeatedly listed in the top ten predictors across facets, highlighting the relevance of these particular variables in the respective random forest models. While many of these recurring variables were word embeddings (e.g., embedding 315 for Openness, embedding 486 for Conscientiousness), we refrain from elaborating on them because word embeddings are non-interpretable. For Openness, predictions across several dimensions were higher for people listening to melodies with quieter, less danceable, and more acoustic audio characteristics (see Table 3). Similarly, two other audio characteristics representing lower energy and higher instrumentalness of melodies were also relevant for the prediction of one Openness facet each. Providing an exemplary effect interpretation, the ALE plots in Table 3 illustrate that random forest models using all variables predicted higher scores in Openness to imagination (O1) for participants listening to music with lower average values on the audio characteristics variable danceability. Regarding Conscientiousness, participants listening to lyrics with more love-related topics (see Figure 4 for topic interpretations) received higher predictions on the domain and two of its facets. For Extraversion, none of the top predictors were relevant across both facets inspected in Table 3. However, for the facet Sociableness (E2), people obtained higher predicted scores if they listened to lyrics with more celebration- and less goth-themed lyrics. Finally, for Agreeableness, people predicted to score high on the first two facets listened to melodies with less energetic and more acoustic audio characteristics. Furthermore, the models for the facet Willingness to trust (A1) predicted higher scores for participants listening to music with less emotionally negative lyrics, as visible in the corresponding ALE plot in Table 3.
Discussion
In the present study, we adopted a personality computing approach to explore individual differences in music listening behavior on smartphones. We extracted an extensive set of variables representing natural music preferences in terms of various audio and lyrics characteristics as well as habitual listening behaviors, which we used to predict the Big Five dimensions on domain and facet level in a machine learning framework. Afterward, we compared the independent contribution of the aspects of music listening, paying special attention to audio vs. lyrics preferences, and we inspected which single variables were most relevant in personality predictions.
Personality Prediction Based on Music Listening Behavior
To quantify the amount of personality-relevant information in digital music listening records from smartphones, we assessed out-of-sample predictions of personality based on an extensive set of music listening variables.
Overall Predictability Levels
Our results show that music listening behavior was moderately predictive of personality with performances of rs > .20 for the significant models, which corresponds to the average reported effect size in personality psychology (Funder & Ozer, 2019). However, we obtained only three significant prediction models and small to moderate effects (rs between .10 and .21) for 11 other personality outcomes. This limited number and magnitude of effects is in line with the few and weak pooled correlations (six out of 30 coefficients ranging between .10 and .21) obtained between the Big Five domains and self-reported music style preferences in a meta-analysis by Schäfer and Mehlhorn (2017). In contrast, our out-of-sample prediction performances were lower, across domains, than those reported in a similar personality computing study by Anderson et al. (2021), who achieved correlations ranging from .26 to .37. between the Big Five and their predictions based on music listening behavior on Spotify. While this latter study may seem to provide a fair comparison due to the close proximity in design, the discrepancy in results may be attributed to Anderson et al.’s (2021) significantly larger sample size (N > 5000) or their inclusion of demographic predictor variables (i.e., age and gender), which are known to be related to personality (Soto et al., 2011). Because our personality models used only behavioral predictors, our results seem reasonable, in particular, considering the bandwidth-fidelity dilemma we faced when predicting the Big Five dimensions, which aggregate the entirety of a person’s thoughts, feelings, and behaviors, from music listening as one narrow excerpt of human behavior (Cronbach & Gleser, 1957; Rauthmann, 2021). We scratched on the lower range of successful personality prediction performances obtained from diverse behavioral indicators of smartphone usage (r =.20 to .40; Stachl et al., 2020) or from digital behaviors explicitly communicating self-views like social media postings (r = .28 to .42; Schwartz et al., 2013).
Differential Predictability Across Personality Dimensions
In our study, Openness and its facets were most predictable from music listening behavior compared to the remaining Big Five dimensions. While this pattern is consistent with past findings on musical style and audio preferences (e.g., Dobrota & Reić Ercegovac, 2015; Greenberg et al., 2016; Nave et al., 2018; Rentfrow & Gosling, 2003; Schäfer & Mehlhorn, 2017), it seemingly contradicts Anderson et al.’s (2021) recent finding that Openness only ranked third in predictability from natural music listening behavior on Spotify. However, their two top-ranking prediction performances for Emotional Stability and Conscientiousness strongly relied on the demographic predictor variable age, while their Openness models were predominantly based on music listening predictors, so our findings align after all. The pattern of Openness being most strongly related to music listening corroborates the Big Five’s conceptualization that more open individuals are generally more interested in different forms of art (DeYoung, 2015).
Albeit not obtaining significant predictions, the dimension of Conscientiousness was second most strongly related to music listening on smartphones. In previous work, Conscientiousness was associated with individuals’ favorite song lyrics (Qiu et al., 2019) but not with preferences for musical styles or audio characteristics (e.g., Greenberg et al., 2016; Nave et al., 2018; Schäfer & Mehlhorn, 2017). This pattern was supported by our grouped and single variable importance metrics indicating that lyrics were of greater relevance than audio characteristics when relating music preferences to Conscientiousness.
The dimensions of Extraversion and Agreeableness were not strongly predicted by our music listening variables, which is in line with a meta-analysis on musical style preferences by Schäfer and Mehlhorn (2017) and findings from music listening behavior on Spotify (I. Anderson et al., 2021). As Anderson et al. (2021) noted, privately listening to music does not provide opportunities for social interaction, which, in turn, may suppress the expression of these socially defined traits (Goldberg, 1990). However, music from smartphones may also be used to promote social interactions (e.g., at parties), so associations with Extraversion and Agreeableness may become visible when considering the social listening context, for example, with whom somebody is listening to music.
Emotional Stability was the least predictable personality dimension in our study, which, again, corresponds to previous studies reporting weak relationships with musical style preferences (e.g., Nave et al., 2018; Schäfer & Mehlhorn, 2017). However, our results conflict with Qiu et al. (2019), who successfully related Emotional Stability to lyrics-based music preferences when only investigating participants’ favorite songs, whose lyrics may be particularly meaningful compared to those of all played songs. While it seems reasonable that Emotional Stability may be connected to music listening (e.g., the emotionality of song lyrics), which is commonly used for emotion regulation, such relationships may vary intra-individually and be dependent on the emotional context of a music listening situation (i.e., the listener’s mood; e.g., Chamorro-Premuzic et al., 2010).
Importance of Different Aspects of Music Listening Behavior
Beyond disclosing its general predictive power, we applied interpretable machine learning techniques to explore which granular aspects of natural music listening behavior were most informative for personality predictions.
Variable Groups
Overall, music preferences in terms of audio and lyrics characteristics were both predictive of listeners’ personalities (especially the Openness dimension), while habitual listening behaviors played no major role in our models. Among lyrics characteristics, the technically most sophisticated but non-interpretable word embeddings were most informative across outcomes, followed by lyrics’ topics (especially for Conscientiousness), while lyrics’ emotionality and other aspects (e.g., lyrics length) appeared less relevant. This rank order among natural language models hints at the advantages of open-vocabulary approaches when predicting personality from textual properties, which was previously reported for other text sources (e.g., G. Park et al., 2015; Schwartz et al., 2013).
At the trait level, preferences for audio and lyrics characteristics exhibited differential prediction performances for most personality dimensions, most notably for Conscientiousness, where lyrics outperformed audio characteristics, and for Extraversion, where audio characteristics outperformed lyrics. These findings may relate to the independent cognitive processing of melodies and lyrics (Besson et al., 1998; Bonnel et al., 2001) and indicate that both audio and lyrics should be considered when investigating music preferences in personality science.
Individual Variables
When considering individual music listening variables, the most important (interpretable) predictors were generally congruent with both past findings and the Big Five conceptualization (see DeYoung, 2015; Goldberg, 1990). As an example, that was the case for the positive associations between calm melodies and Openness to feelings, which were previously reported on the domain-level by other studies on audio characteristics (Dobrota & Reić Ercegovac, 2015; Fricke & Herzberg, 2017), or for the positive relations between celebration-themed lyrics and the Extraversion facet of Sociableness, which support previously found associations between the Extraversion domain and positive emotion words in lyrics (Qiu et al., 2019).
Because our study design did not consider causality, these associations may indicate that listeners adjusted their auditory environments to their personalities or vice versa (e.g., Bleidorn et al., 2020; Buss, 1987; Fleeson, 2001; Rauthmann, 2021; Swann, 1987). On the one hand, people with high levels of Openness to feelings may choose calm melodies to accommodate their emotional sensitivity, and those high in Sociability may listen to celebration-themed lyrics to help them experience positive social interactions. On the other hand, repeated exposure to calm melodies may provide opportunities for emotional experiences, which, in turn, accumulate to higher levels of Openness to feelings. Similarly, frequently listening to celebration-themed lyrics may give rise to positive social interactions and, in the long run, cause people to become more extraverted. While most of our variable importance ranking seems plausible in this sense, some findings were surprising, adding potentially new facets to the theoretical trait concepts. For example, the preference for love-related lyrics is rather difficult to reconcile with high levels of Conscientiousness, a trait typically characterized by planning behavior and obedience to norms (Roberts et al., 2004). In sum, these results demonstrate that specific granular aspects of music listening behavior are distinctly informative about the different Big Five dimensions.
Constraints on Generalizability
We follow the recommendation by Simons et al. (2017) and discuss the generalizability of our empirical findings for different samples, materials, and contexts.
The present study investigated three ad hoc samples of mostly young participants with high education levels, which, given our university recruiting context, suggests that German university students were our proximal population. We are, however, confident that our findings generalize beyond this specific population because the associations we found between personality traits and music preferences generally aligned with those obtained in past studies investigating university students from other countries (e.g., Dobrota & Reić Ercegovac, 2015; Qiu et al., 2019; Rentfrow & Gosling, 2003) or more diverse samples of Facebook users (e.g., Greenberg et al., 2016; Nave et al., 2018). Nevertheless, the young mean age of participants in both our and past music research referenced above may have reduced our sample’s variance in personality traits and music listening behaviors, which both appear to change with age (e.g., Bonneville-Roussy et al., 2013; Lucas & Donnellan, 2011). Hence, we believe our results may not necessarily generalize to samples including older adults, which are currently underrepresented in music listening research. Furthermore, our and past samples were exclusively representative of WEIRD (i.e., Western, Educated, Industrialized, Rich, Democratic) populations (Henrich et al., 2010). While the Big Five structure of personality (e.g., McCrae & Terracciano, 2005) and its reflection in preferences for Western musical styles were found to generalize across countries (Greenberg et al., 2022), the musical styles actually listened to differ between countries and cultures (e.g., Bello & Garcia, 2021; M. Park et al., 2019). Thus, natural music listening behavior and its relation to personality may look differently in non-Western populations. Finally, our specific study’s sample was limited to users of Android smartphones due to technical reasons, excluding those owning iOS devices. However, as previous studies found no meaningful differences in demographic and personality characteristics between Android and iOS users, this bias should not dramatically impact the generalizability of our findings (Götz et al., 2017; Keusch et al., 2020). To summarize, we believe that our findings are representative of young adults in Western societies and recommend that follow-up studies generalize our approach to samples including older adults and other cultures.
While the subject of our study was natural music listening behavior exhibited on smartphones, we assume that our personality predictions transfer to all forms of private digital music consumption, including all listening instances where participants can freely choose what music to listen to from their own or a very large collection of songs. That should include music listening on any digital device with music storing or streaming functionalities, such as computers or smart TVs, because our data collection took place at a time when music streaming was on the rise, but when some people still listened to locally stored music on their smartphones. In contrast, music listening on more old-fashioned analog devices such as record players may differ from that on smartphones due to the restricted availability of contemporary songs in the respective formats. This may, in turn, introduce systematic differences in music preferences between non-digital and digital devices (e.g., playing only oldies on the record player but more modern hits on digital devices) and hinder replication of our study. Furthermore, our personality patterns may not generalize to individuals’ full spectrum of music listening behavior when including instances where music is not self-chosen, such as music listening on the radio or at a café.
The most important aspect of our procedure was that we assessed music listening behavior with high ecological validity and in an unobtrusive and objective manner via smartphone sensing. To replicate our findings, future studies should also assess digital music listening records, either obtained from listening devices or directly from streaming services such as Spotify (see I. Anderson et al., 2021). This procedure, however, excludes populations currently not listening to music digitally, such as older people and people in developing countries with very low smartphone penetration. In contrast, when assessing music listening in a more intrusive way (e.g., in laboratory settings), participants may adapt their behaviors in a socially desirable manner (e.g., based on assumptions about researcher goals), so replication is not guaranteed. Similarly, we do not expect our findings to fully generalize to self-reported music listening behaviors, even though they exhibited some overlap with studies on self-reported music preferences (see Schäfer & Mehlhorn, 2017). Finally, to replicate our procedure, it is important to represent music in terms of the intrinsic properties of its melodies and lyrics instead of broad musical styles or genres. The automatic approaches for extracting these musical characteristics can be transferred to all samples of music worldwide and is, thus, widely applicable.
Beyond the considerations outlined above, we currently have no reason to believe that our results depended on other characteristics of the participants, materials, or procedure.
Limitations and Future Directions
The present study has several limitations. First, the relatively small sample size may have prevented our machine learning algorithms from detecting stable patterns that transfer from training to test sets in our cross-validated resampling scheme. Thus, our low prediction performances represent a rather conservative estimate of how well personality may be predicted from music listening behavior in larger samples. Second, careless or insufficient effort responding to our lengthy self-report measure (300 items) may have further attenuated associations between music listening and personality traits (Curran, 2016; Ward & Meade, 2023). While most of our self-reports appeared plausible, different post-hoc response validity analyses identified few participants suspicious of careless responding (see Text S1; Curran, 2016). However, our random forest algorithms are rather robust to outliers and should, thus, not have been impacted too dramatically by the inclusion of potentially careless responses (Breiman, 2001). Third, the music preference variables extracted from participants’ song records depended on the availability of external song-level information (i.e., Spotify’s audio metrics and Genius’ lyrics), possibly resulting in an underrepresentation of uncommon songs and restricted prediction performances for participants with an exotic taste in music. Fourth, our lyrics-based variables may not necessarily represent conscious preferences for song lyrics because we could not confirm that our German participants had fully understood the mainly English lyrics they were listening to. While most young Germans speak English fluently2, personality patterns in lyrics preferences may be even more pronounced when considering only lyrics in the sample’s mother tongue. Fifth, we could not distinguish instances where participants played the music on their smartphones themselves from those where others (e.g., friends or children) initiated music listening events, which may have introduced noise to participants’ music-listening metrics.
Our study demonstrates the potential of smartphone sensing for music listening research in personality psychology and beyond. As popular music listening devices, smartphones allowed us to collect digital records of participants’ day-to-day listening habits and music preferences over time (Greenberg & Rentfrow, 2017). However, our rather traditional approach to investigating average music listening metrics captures only a small proportion of the information in these longitudinal data. For example, if a person listens to either very calm or very energetic melodies, their average score cannot accurately represent their music listening behavior. Thus, to seize the full potential of digital music listening records, future studies should analyze variations in single listening events over time instead of aggregating them. When investigating listening events nested within persons, personality traits may exhibit relations to intra-individual variations in music listening. For example, the trait Openness, which was previously associated with more diverse average music preferences (Bansal et al., 2021), may be even more predictable from variations within individuals’ music listening events than from aggregated scores. Beyond stable personality traits, future research may also include momentary aspects such as mood states or situation perceptions to explain intra-individual variance in music listening behavior, which remains largely unexplored to this date. Smartphone-based ambulatory assessment has laid the foundation for this kind of research because it enables the simultaneous collection of objective music listening and other contextual data (e.g., places where music listening occurred; see Schoedel et al., 2022) via smartphone sensing and self-reported subjective experiences in-situ via the experience sampling methodology (ESM; van Berkel et al., 2017). The combination of passive smartphone sensing with active experience sampling is quite novel but provides great opportunities for personality research in general (Schoedel et al., 2022). In sum, smartphones open up ample possibilities for investigating the interplay of various enduring and fluctuating variables, which will broaden our understanding of music listening behavior.
Conclusion
The present study demonstrates that smartphone sensing is a promising method to investigate natural music listening behavior and its association with personality. Overcoming self-report assessments of broad musical style preferences, we introduced a personality computing framework for predicting the Big Five dimensions from preferences for intrinsic musical properties and habitual listening behaviors extracted from digital music listening records. Machine learning models revealed that only the personality dimension of Openness was successfully predicted from our music listening variables, corroborating past findings that out of the Big Five, Openness is most strongly related to music listening. In contrast, Conscientiousness and several personality facets showed non-significant but small to moderate prediction effects in our models. Furthermore, our study compared the contribution of audio and lyrics characteristics for relating music preferences to personality, finding that they are both distinctly predictive and that the associations between specific music preference variables and certain personality traits were generally in line with the Big Five’s theoretical conceptualization. In sum, our findings provide new insights into personality patterns in natural music listening behavior, which may be extended in numerous ways using the methodological framework proposed here.
Author Contributions
Contributed to conception and design: LS
Contributed to acquisition of data: CS, MB, RS
Contributed to analysis and interpretation of data: LS, GK
Drafted and/or revised the article: LS, RS
Approved the submitted version for publication: LS, RS
Funding Information
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Competing Interests
The authors declare no competing interests, financial or otherwise.
Data Accessibility Statement
As indicated in the main text, we provide the dataset of aggregated variables used for personality modeling as well as the reproducible code for preprocessing, variable extraction, and predictive modeling in the project’s OSF-repository accessible under the following URL: https://osf.io/x7dar/. However, the privacy-sensitive nature of the smartphone usage data prevents us from openly sharing the raw logging data. This study’s design and analyses are purely exploratory and were not pre-registered. However, a preliminary (and also exploratory) analysis based on a small fraction of our music listening variables was previously part of the author’s master thesis preregistered under https://osf.io/as3ze. While this preregistration does not directly pertain to the current study, it may be considered its groundwork, so we communicate all deviations in our Disclosure of Prior Data Uses available in our project repository.
Acknowledgements
We thank the entire PhoneStudy team at LMU Munich for their continuous and diligent work on the PhoneStudy app, making our research possible. In particular, we thank Tobias Schuwerk for his role in providing parts of the data. We also thank Schuhfried GmbH for providing the Big Five Structure Inventory as a digital version. We further thank Florian Pargent and Felix Schönbrodt for giving insightful modeling advice. Special thanks go to Monika Wintergerst for advising on the language modeling in our research project.
Footnotes
If predictors contain no relevant information for predicting an outcome, the elastic net shrinks their coefficients to zero and returns intercept-only predictions (i.e., it constantly predicts the training data mean), which are mathematically equivalent to our baseline predictions. The intercept-only predictions produce NAs for the Spearman correlation metric (due to their invariance). Thus, outcomes that produced many intercept-only predictions exhibited low variance in the Spearman correlation metric across iterations in Figure 2.
Due to compulsory English language schooling from Kindergarden onwards.