**The neural activation patterns provoked in** response to music listening can reveal whether a subject did or did not receive music training. In the current exploratory study, we have approached this two-group (musicians and nonmusicians) classification problem through a computational framework composed of the following steps: Acoustic features extraction; Acoustic features selection; Trigger selection; EEG signal processing; and Multivariate statistical analysis. We are particularly interested in analyzing the brain data on a global level, considering its activity registered in electroencephalogram (EEG) signals on a given time instant. Our experiment's results—with 26 volunteers (13 musicians and 13 nonmusicians) who listened the classical music Hungarian Dance No. 5 from Johannes Brahms—have shown that is possible to linearly differentiate musicians and nonmusicians with classification accuracies that range from 69.2% (test set) to 93.8% (training set), despite the limited sample sizes available. Additionally, given the whole brain vector navigation method described and implemented here, our results suggest that it is possible to highlight the most expressive and discriminant changes in the participants brain activity patterns depending on the acoustic feature extracted from the audio.

**Music requires a high neural demand** from who plays it (Münte, Altenmüller, & Jäncke, 2002; Peretz & Zatorre, 2005) and is an important tool for understanding the organization of the human brain (Münte et al., 2002; Peretz & Zatorre, 2005; Schlaug, 2015). In the last two decades, the processing of music by our brain has attracted the attention of researchers worldwide, and a number of scientific works have identified neural activations differences between musicians and nonmusicians in distinct experiments using wearable and non-wearable technologies, mainly EEG (electroencephalography) and fMRI (functional magnetic resonance imaging), respectively (Liang, Hsieh, Chen, & Lin, 2011; Mikutta, Maissen, Altorfer, Strik, & Koenig, 2014; Münte et al., 2002; Peretz & Zatorre, 2005; Saari, Burunat, Brattico, & Toiviainen, 2018; Schlaug, 2015; Tervaniemi, Castaneda, Knoll, & Uther, 2006; Virtala, Huotilainen, Partanen, & Tervaniemi, 2014; Vuust, Brattico, Sppanen, Naatanen, & Tervaniemi, 2012).

In the experiments of music listening and EEG signal analysis, past studies focused on understanding the neural processing of artificial stimuli that rely on controlled auditory paradigms. The most recent works on this issue (Abrams et al., 2013; Alluri et al., 2012, 2013; Markovic, Kühnis, & Jäncke, 2017; Poikonen, Alluri, et al., 2016; Poikonen, Toiviainen, & Tervaniemi, 2016; Saari et al., 2018) have analyzed though naturalistic music pieces rather than artificial ones, with the goal of describing the association between the dynamic changes in the audio features and the time courses of the neural activations recorded in volunteers (or subjects). Interestingly, these recent works have been able to achieve almost the same results found previously with artificial stimuli, indicating the recruitment of new brain areas related to music processing.

However, all these state-of-the-art findings have been obtained on a local level, based on the concept of massunivariate analyses (Markovic et al., 2017; Poikonen, Alluri, et al., 2016; Poikonen, Toiviainen, & Tervaniemi, 2016; Rigoulot, Pell, & Armony, 2015; Virtala et al., 2014; Vuust et al., 2012), which reveal statistical aspects of specific and unique areas of the brain that supposedly do not depend on changes in other cognitive areas. Although this approach is mathematically sound, it inevitably ignores possible global and interregional brain dependencies in music processing, providing limited understanding of the behavior of our brain during such a cognitive task. Since EEG brain data inherently include simultaneous measurements on a given number of electrodes, to understand the complexity of their information, it seems appropriate to disclose the relationships between all these electrodes in a multivariate way.

In the current exploratory study, we propose to analyze on a global level the EEG brain data collected during a music listening task, considering the whole brain activity registered in all the EEG electrodes simultaneously. More specifically, we approach this problem through a novel multivariate statistical perspective, using this global EEG brain information to predict musicianship as well. Therefore, rather than using a mass-univariate model to find statistical differences between distinct sample groups, we describe and implement here a two group pattern recognition framework to linearly separate the samples as musicians and nonmusicians, evaluated by means of classification accuracy and categorized previously, depending on whether or not they have undergone music training.

To the best of our knowledge, we present here the first results on whole brain EEG analysis of musicianship in the context of naturalistic music listening, elaborating on Ribeiro and Thomaz (2018), by adding new experimental evidences in EEG feature extraction and classification, using several acoustic features, and new multivariate and comparative statistical analyses. Our experimental results, based on high dimensional encoding of variance and discriminant information of EEG data, indicate feasible and plausible linear separation from musicians and nonmusicians sample groups, despite their corresponding limited sample sizes available.

## Materials and Method

### PARTICIPANTS

Two groups of 13 subjects took part in our experiment. The musicians (7 male and 6 female) were all amateurs, aged between 18 and 45 years (mean 26.8 years; all right-handed). They had started playing between 4 and 17 years (mean 17.1 years) and were currently playing 4 hours on average per week, with distinct musical styles (classic, pop, and rock). All of them received more than two years of formal training in music, but none had a professional degree in music performance. The nonmusicians (10 males and 3 females) were aged between 25 and 45 years (mean 31.2 years; 9 right-handed). Two of them received less than one year of formal training in music (none of them keeps playing) and the others had no formal music training. All participants gave a written informed consent to participate in the study.

### STIMULUS

As stimulus, the classical music Hungarian Dance No.5 of Johannes Brahms was used, presented via intraearphone. The composition was performed by the Fulda Symphonic Orchestra, conducted by Simon Schindler at the Fürstensaal des Stadtschlosses.^{1} The ending of the audio, corresponding to the applauses, was replaced for silence, using the free, open source, audio software Audacity (version 2.2.2), resulting in an audio signal that was 3.2 minutes long.

All subjects were instructed to listen to music, remaining as still as possible while their EEG signals were recorded. The musical piece was selected—within the context of a benchmark classical music used in other works (Alluri et al., 2012; Poikonen, Alluri, et al., 2016)—due to its high range of variations during the performance in several musical features such as dynamics, timbre, and rhythm, having an appropriate duration for the experiment.

### METHODOLOGY

Our computational framework consists of the following 5 steps: (I) acoustic features extraction; (II) acoustic features selection; (III) trigger selection; (IV) EEG signal processing; and (V) multivariate statistical analysis. The first three steps refer to the audio signal and the last two to the EEG signal. This framework, illustrated in Figure 1, is based on previous multidisciplinary works of EEG music processing (Poikonen, Alluri, et al., 2016; Poikonen, Toiviainen, & Tervaniemi, 2016) and high-dimensional data analysis (Davatzikos, 2004; Gregori, Sanches, & Thomaz, 2017; Sato et al., 2008; Thomaz, Duran, Busatto, Gillies, & Rueckert, 2007; Xavier et al., 2015).

#### Acoustic feature extraction

In this first step, the lowlevel acoustic features that describe the audio signal are extracted using MIRtoolbox (version 1.6.1) (Lartillot, 2014). To perform such extraction, the signal is decomposed into 50 ms windows with 50% of overlap, in order to study the evolution of each of these features over time. These characteristics correspond to aspects of the audio that can be identified in terms of human perception and, even though they may not accurately describe what is acoustically perceived, they generate significant EEG neural responses (Poikonen, Alluri, et al., 2016; Poikonen, Toiviainen, & Tervaniemi, 2016).

We have selected the following standard and well known acoustic features (Alluri et al., 2012; Lartillot, 2014; Lerch, 2012; Poikonen, Alluri, et al., 2016): (1) *root mean square* (RMS): measure of the energy in the signal computed by taking the square root average of the square of the amplitude; (2) *zero crossing rate* (ZCR): measure of the number of times the signal crosses the *x*-axis; (3) *spectral rolloff*: frequency below which 85% of the total energy is contained in the signal; (4) *spectral roughness*: estimation of the sensory dissonance; (5) *brightness*: measure of the amount of energy above 1500 Hz; (6) *spectral entropy*: measure of the relative Shannon entropy of the signal which indicates whether the spectrum contains predominant peaks or not; (7) *spectral flatness*: measure of the uniformity of the spectrum defined as the ratio between the geometric mean and the arithmetic mean, also known as the Wiener entropy; (8) *spectral skewness*: the third central moment of the spectrum distribution and is a measure of the asymmetry of the distribution; (9) *spectral kurtosis*: the fourth central moment of the spectrum distribution and indicates the flatness of the spectrum and a sudden change can indicate transients at the audio; (10) *spectral centroid*: the first central moment of the spectrum distribution and is the geometric center of the distribution; (11) *spectral spread*: the second central moment of the spectrum distribution; and (12) *spectral flux*: measure of the temporal changes in the spectrum between successive frames. A detailed explanation of these 12 features can be found in the user manual of the MIRtoolbox (Lartillot, 2014).

#### Acoustic feature selection

In this second step we assess whether all the *s* = 12 standard features extracted from the audio are statistically non-redundant, through a cluster analysis, disclosing the correlation among the acoustic features to adequately represent the data.

We have used factor analysis (FA) (Johnson & Wichern, 2007), a well-known multivariate statistical technique, to describe the association between the values extracted from the windows of each acoustic features in a non-supervised way. Our motivation here is to reduce the redundancy of the data extracted from the audio signal, selecting the R most significant factors (higher loadings), where *R* ≤ *s*. We have chosen here principal component analysis (PCA) (Johnson & Wichern, 2007) to estimate the parameters of our factor model because its spectral decomposition solution simplifies the issue of how many factors to retain.

Ideally, a pattern of factor loadings where each cluster of acoustic features is highly represented by a single factor and has low coefficients on the remaining ones is desirable. Therefore, those **F** = [**f**_{1}, **f**_{2}, … , **f**_{R}] factors can replace the initial s variables on R rotated common factor loadings, where the association between the acoustic features would be most significant in terms of variance (varimax rotation), choosing conventionally the factor loadings with corresponding eigenvalues greater than 1 to determine the adequate number of factors (Johnson & Wichern, 2007).

#### Trigger selection

The changes on the acoustic features used in this work are known as triggers (Poikonen, Alluri, et al., 2016), which are able to elicit sensory components, similar to those found with artificial stimuli. More precisely, triggers are instants in the time series generated by the extracted features where a high-contrast occurs, as detailed below.

We have adopted the Poikonen, Alluri, et al. (2016) method to identify the triggers, where the upper and lower thresholds V_{p+} and V_{p−} are determined from the mean values of the acoustic feature as a given percentage q above or below it, respectively (q = ± 20%, here). In other words, a trigger occurs when the signal remains below V_{p−} for a minimum interval of time, defined as 500 ms for all acoustic features—called *preceding lowfeature phase* (PLFP)—followed by an ascendant phase where the signal reaches V_{p+} (Poikonen, Alluri, et al., 2016). The intensity of the acoustic contrast depends on the parameters chosen for trigger selection (length of PLFP and the upper and lower thresholds V_{p+} and V_{p−}). These parameters can be changed accordingly, based on a trade-off between the number of triggers and how high each acoustic contrast might be defined; that is, the higher the acoustic contrast, the lower the number of triggers identified.

#### EEG signal processing

We have used the OpenBCI EEG device to acquire the brain electrical signals (OpenBCI, 2018). This device has a sampling rate of 127 Hz and is composed of 16 dry electrodes, positioned according to the international 10–20 system, including the additional two electrodes placed on each earlobe as reference. All EEG signal have been preprocessed using a b and pass Butterworth filter of 1–30 Hz analogously to others (Poikonen, Alluri, et al., 2016; Virtala et al., 2014). The continuous EEG data were separated into epochs according to the triggers and averaged for each electrode. The epochs started 100 ms before the trigger and ended 300 ms after the trigger. The baseline has been defined according to the 100 ms time period before the trigger (Poikonen, Alluri, et al., 2016; Virtala et al., 2014; Vuust et al., 2012). All signals have been inspected visually and EEG detected channels with noise have been removed from analysis as well as the epochs with amplitudes above 100 µV (in absolute values). The standard and well-known EEG Matlab toolbox (EEGLab 13.5.4b) was used to process the EEG data at Matlab R2015a software.

#### Multivariate statistical analysis

In this fifth and last step of our computational framework, we first describe the data matrix * X_{r}* composed of the EEG values at instant t = 100 ms post stimulus (predefined time-stamps), defined by the average along triggers of each r selected acoustic feature separately, where

*r*1, 2, … ,

*R*in a way that the corresponding sampled input signal can be treated as a high-dimensional point in a multivariate space, as follows:

where *N* is the total number of volunteers (or subjects) and n the total number of electrodes (*n* = 16, here).

According to this data representation, we are assuming that our EEG multivariate statistical analyses might include the electrical potentials at the predefined timestamps to understand the complexity of the brain activity not only in terms of the *N* sampling, but also in terms of all the *n* electrodes simultaneously.

Principal component analysis (PCA) and linear discriminant analysis (LDA) (Fukunaga, 1990; Johnson & Wichern, 2007) were explored here as two alternative multivariate statistical methods to understand how the information is changing in the original space of the EEG brain data, looking not only for the most expressive (higher variance) changes, but also for the most discriminant (higher separability) ones, according to the neural responses reflected by the acoustic feature extracted from the audio used as stimulus.

#### Principal component analysis

We have used PCA to highlight the most expressive changes in terms of the total variance information of the electrical potentials.

PCA calculates the spectral decomposition of the correlation matrix of the *N* × *n* matrix * Xr*, described in Equation (1). We have composed the PCA transformation matrix by selecting all the M eigenvectors with non-zero eigenvalues, where

*M*≤

*n*. Data projected onto these [

*p*_{1,r},

*p*_{2,r}, … ,

*p*_{M,r}] eigenvectors ranked in decreasing order of their corresponding eigenvalues; that is, λ

_{1,r}, λ

_{2,r}, … , λ

_{M,r}, follow the directions of higher variance of

*.*

**X**rEach * p_{m,r}* vector, where

*m*= 1, 2, … ,

*M*, can be used to form a spatial map of the brain regions that most vary in the data, moving from one side of the principal component axis (or dimension) to the other (Cootes, Edwards, & Taylor, 1998; Davatzikos, 2004). Thus, it is possible to navigate along such dimensions to capture and understand the most expressive changes of the data matrix

*. This vector navigation method (Cootes et al., 1998; Gregori et al., 2017; Sato et al., 2008; Xavier et al., 2015) can be mathematically described as*

**X**_{r}where $k\u2208{\u22123,\u22122,\u22121,0,1,2,3}$, *m* is the index number of the corresponding principal component to navigate, and $x\xafi$ is the *n*-dimensional global mean vector of * X_{r}* for each r selected acoustic feature.

#### Linear discriminant analysis

We have used LDA to identify the most discriminant dimension for separating the two sample groups (or classes) of interest, that is, musicians and nonmusicians, by maximizing their between-class separability while minimizing their within-class variability.

Let the between-class and within-class scatter matrices *S*_{b} and *S*_{w} be defined, respectively, as (Fukunaga, 1990)

where * x_{i,j}* is the

*n*-dimensional sample j from class i,

*N*is the number of training samples of class i (

_{i}*N*

_{1}= 13 and

*N*

_{2}= 13, here), g represents the total number of classes (g = 2, here) and $x\xafi$ is the sample group mean vector given by the corresponding

*N*samples only.

_{i}The main objective of LDA here is to find a projection vector * w_{r}* that maximizes the ratio of the determinant of

*to the determinant of*

**S**_{b}*(Fisher's criterion), formulated by:*

**S**_{w}The Fisher's criterion is maximized when the projection vector * w_{r}* is composed of the leading eigenvector of $Sw\u22121Sb$ with nonzero corresponding eigenvalue (Fukunaga, 1990; Johnson & Wichern, 2007).

Analogous to PCA, this vector * w_{r}* can be used to form a spatial map of the most discriminant brain regions, moving from one side of the separating dimension to the other (Davatzikos, 2004; Sato et al., 2008). Thus, it is possible to navigate along such dimension to capture and understand the most discriminant changes of the data matrix

*, assuming its dispersion follows a Gaussian distribution (Gregori et al., 2017; Sato et al., 2008; Xavier et al., 2015). This vector navigation method can be mathematically described as*

**X**_{r}where $j\u2208{\u22121,0,1}$, i is again the corresponding sample group, $x\xafr$ the *n*-dimensional global mean vector of * X_{r}*, and $\sigma i$ and $x\xafi$ are, respectively, the standard deviation and mean of each sample group on the LDA space for each r selected acoustic feature.

## Results

### AUDIO ANALYSIS

First, the audio was analyzed by means of the aforementioned factor analysis. The factor loadings values presented by the FA are shown in Figure 2 and Table 1. According to this, it is possible to observe that the acoustic features extracted from the audio clearly forms three clusters that show a relation between them, which can indicate that the acoustic information expressed for each cluster may be similar among the features that presented the highest loadings on each factor, as follows: RMS (cluster 1), spectral kurtosis (cluster 2), and spectral rolloff (cluster 3). Note that for the factor 2 the feature spectral skewness had higher loading than spectral kurtosis, but since it was found only three triggers for this feature, we choose to use the spectral kurtosis instead, given the trade-off described previously between the number of triggers and how high each acoustic contrast may be defined.

Feature . | F1 . | F2 . | F3 . |
---|---|---|---|

RMS | 0.6291 | 0.1337 | 0.7203 |

ZCR | 0.7851 | −0.3235 | −0.3240 |

S. Rolloff | 0.9584 | −0.1939 | −0.1629 |

S. Roughness | 0.3883 | −0.0242 | 0.7185 |

Brightness | 0.9471 | −0.1643 | −0.1790 |

S. Entropy | 0.8779 | 0.3712 | −0.1710 |

S. Flatness | 0.9329 | 0.0290 | −0.0764 |

S. Skewness | 0.0995 | 0.9847 | −0.1210 |

S. Kurtosis | −0.0761 | 0.9617 | −0.1139 |

S. Centroid | 0.9563 | −0.1484 | −0.1975 |

S. Spread | 0.9347 | 0.2493 | −0.1738 |

S. Flux | 0.5880 | 0.0931 | 0.7174 |

Feature . | F1 . | F2 . | F3 . |
---|---|---|---|

RMS | 0.6291 | 0.1337 | 0.7203 |

ZCR | 0.7851 | −0.3235 | −0.3240 |

S. Rolloff | 0.9584 | −0.1939 | −0.1629 |

S. Roughness | 0.3883 | −0.0242 | 0.7185 |

Brightness | 0.9471 | −0.1643 | −0.1790 |

S. Entropy | 0.8779 | 0.3712 | −0.1710 |

S. Flatness | 0.9329 | 0.0290 | −0.0764 |

S. Skewness | 0.0995 | 0.9847 | −0.1210 |

S. Kurtosis | −0.0761 | 0.9617 | −0.1139 |

S. Centroid | 0.9563 | −0.1484 | −0.1975 |

S. Spread | 0.9347 | 0.2493 | −0.1738 |

S. Flux | 0.5880 | 0.0931 | 0.7174 |

Using these selected acoustic features, we have found 9 triggers for the RMS, 7 for the spectral rolloff and 8 for the spectral kurtosis along the music used as stimulus. As can be seen in Figure 3, there is a similarity of the spectrograms between the clusters and there are triggers that occur at the same time within the same cluster, although some of them are different. These differences can be related to the methodological parameters used to select the triggers, showing that some of these features might be complementary to each other specially within the same cluster.

### EEG ANALYSIS

The topographic maps were generated by the vector navigation method on the four most expressive dimensions of PCA, according to Equation (2), and the discriminant dimension of LDA, according to Equation (6), for each acoustic feature selected, using the entire dataset. Such multivariate statistical analyses disclose visually the electrical potential changes and their data distributions along the corresponding dimensions.

Given the limited sample sizes available, we have adopted the leave-one-out and resubstitution methods (Fukunaga, 1990) to estimate, respectively, the lower (test set) and upper (training set) bounds of the most discriminant dimension, using the Euclidean sample mean distance classifier to decide whether the data projected is more similar to the musician or nonmusician group.

The topographic brain maps and the projection of the data generated by PCA from the acoustic feature spectral rolloff are shown in Figure 4, where we can see the changes of the brain responses along each principal component (PC), disclosing major electrical potential differences in several parts of the brain, especially at frontal areas. However, when the data are projected on the axis of each PC separately for classification evaluation, it's not possible to see a distinction between the musician and nonmusicians sample groups. In fact, all the classification rates were around chance level using the entire data set. Different from PCA, Figure 5 shows the navigation on the most discriminant axis of LDA for the same acoustic feature spectral rolloff. This vector navigation presents differences between the groups on the frontal areas, where the nonmusicians show a positive electrical potential distributed all over the frontal area, whereas the musicians present high activity at the prefrontal area. Such differences are not discriminant either, as can be seen on the data distribution, showing a classification rate that ranges from 15.4% (test set) to 74.5% (training set).

Additionally, Figure 6 shows the topographic maps and the projection of the data generated by PCA from the acoustic feature RMS and it's possible to see changes occurring mostly on the frontal areas of the brain, but again it's not possible to see a distinction between the musicians and the nonmusicians sample groups at the projection of the data on each PC dimension. Likewise, Figure 7 shows the RMS navigation along its discriminant axis, presenting visually subtle electrical potential differences between the groups, but more statistically discriminant than the previous spectral rolloff, with classification rate that ranges from 53.8% (test set) to 88.0% (training set).

Although, analogous to the other most expressive analyses, the PCA results from the acoustic feature spectral kurtosis, shown in Figure 8, highlight major but not discriminant electrical potential differences in several parts of the brain. This acoustic feature presents, in Figure 9, a clear electrical potential distinction between the sample groups of interest. In this case, the nonmusicians present high activity at the prefrontal area and the musicians show high activity at the right temporal area and negative electrical potential distributed over the parietal area. The data distribution exhibits a clear separation from these sample groups with classification rate between 69.2% (test set) and 93.8% (training set). For comparison, we have used the same approach to discriminate the sample groups using only a reduced number of electrodes, instead of all of them, with the following arrangements: 6 electrodes (F3, F4, C3, C4, P3, P4); 4 electrodes (F3, F4, C3, C4); 2 electrodes (F3, F4) and 2 electrodes (C3, C4). Table 2 presents these LDA classification rates of each acoustic feature and Table 3 the detailed classification rates using all the electrodes.

Features . | All Electrodes . | 6 Electrodes . | 4 Electrodes . | 2 Electrodes (F3-F4) . | 2 Electrodes (C3-C4) . |
---|---|---|---|---|---|

Test Set | |||||

S. Kurtosis | 69.2% | 53.8% | 65.4% | 53.8% | 73.1% |

RMS | 53.8% | 57.7% | 53.8% | 61.5% | 61.5% |

S. Rolloff | 15.4% | 42.3% | 50.0% | 65.4% | 53.8% |

Training Set | |||||

S. Kurtosis | 93.8% | 72.9% | 71.8% | 54.0% | 73.1% |

RMS | 88.0% | 72.2% | 65.4% | 68.2% | 67.5% |

S. Rolloff | 74.5% | 68.5% | 70.9% | 69.4% | 54.6% |

Features . | All Electrodes . | 6 Electrodes . | 4 Electrodes . | 2 Electrodes (F3-F4) . | 2 Electrodes (C3-C4) . |
---|---|---|---|---|---|

Test Set | |||||

S. Kurtosis | 69.2% | 53.8% | 65.4% | 53.8% | 73.1% |

RMS | 53.8% | 57.7% | 53.8% | 61.5% | 61.5% |

S. Rolloff | 15.4% | 42.3% | 50.0% | 65.4% | 53.8% |

Training Set | |||||

S. Kurtosis | 93.8% | 72.9% | 71.8% | 54.0% | 73.1% |

RMS | 88.0% | 72.2% | 65.4% | 68.2% | 67.5% |

S. Rolloff | 74.5% | 68.5% | 70.9% | 69.4% | 54.6% |

Feature . | Accuracy . | Sensitivity . | Specificity . | F measure . |
---|---|---|---|---|

Test Set | ||||

S. Kurtosis | 69.2 | 69.2 | 69.2 | 69.2 |

RMS | 53.8 | 61.5 | 46.2 | 57.1 |

S. Rolloff | 15.4 | 30.8 | 0 | 26.7 |

Training Set | ||||

S. Kurtosis | 93.8 | 87.7 | 100 | 93.4 |

RMS | 88.0 | 89.8 | 86.2 | 88.2 |

S. Rolloff | 74.5 | 71.4 | 77.5 | 73.7 |

Feature . | Accuracy . | Sensitivity . | Specificity . | F measure . |
---|---|---|---|---|

Test Set | ||||

S. Kurtosis | 69.2 | 69.2 | 69.2 | 69.2 |

RMS | 53.8 | 61.5 | 46.2 | 57.1 |

S. Rolloff | 15.4 | 30.8 | 0 | 26.7 |

Training Set | ||||

S. Kurtosis | 93.8 | 87.7 | 100 | 93.4 |

RMS | 88.0 | 89.8 | 86.2 | 88.2 |

S. Rolloff | 74.5 | 71.4 | 77.5 | 73.7 |

## Discussion

In the present study, we applied a multivariate statistical framework to characterize the differences between subjects categorized as musicians and nonmusicians depending on whether or not have undergone musical training. Our main findings can be summarized as follows: (1) in order to investigate the neural activation patterns evoked by the acoustic features, it was shown that not all the standard and well-known 12 features are necessary, given the redundancy found by FA, which associated all these 12 acoustic features commonly used into only 3 clusters that were statistically nonredundant; (2) the discriminative patterns required to classify the musicians and nonmusicians sample groups exist predominantly in the frontal areas of the brain, but there are differences between the responses for each cluster representative acoustic feature that indicate that some aspects of the music are better to predict musicianship than others.

Our results show that the Poikonen, Alluri, et al. (2016) method proposed to identify triggers in music can be very helpful to differentiate musicians and nonmusicians, considering the trade-off between the number of triggers and how high the acoustic contrast can be defined. Considering the similarity between the extracted acoustic features on the factor loadings (Figure 2), it is interesting to observe that each extracted acoustic feature corresponds to one of the clusters found, and the spectrograms (Figure 3) clearly display these similarities. The triggers selected mostly occur at the same instant within each cluster, even though some of them presents more triggers than others. These extra triggers are related to the methodological parameters used, since here we decided to use the same parameters for all the acoustic features. A different approach could be, instead of selecting only one representative acoustic feature within a cluster, concatenate the triggers found for each feature within each cluster. It is important to highlight though that all these findings were achieved because of the classical music used here. We would expect this clustering behavior to be similar among the same musical genre, but there is no guarantee that these findings can be generalized to other pieces of music and further analysis on this issue is necessary.

It is known that music training promotes differences between musicians and nonmusicians not only during music listening but also during task-free conditions (Klein, Liem, Hänggi, Elmer, & Jäncke, 2016). There is also evidence of differences between musicians with different types of training, musical style/genre, and listening experiences (Tervaniemi, Janhunen, Kruck, Putkinen, & Huotilainen, 2016; Vuust et al., 2012).

Thus, there is a large variability among individuals for this two-group classification problem. However, our experimental results have showed that the discriminant vector navigation method described and implemented here can give a comprehensive description about these sample groups’ activity patterns and classification boundary. The topographic maps generated by this navigation method clearly present distinct neural activation patterns between musicians and nonmusicians, enhancing the understanding about the transient states between these sample groups, despite the limited sample sizes available.

The results of the acoustic feature spectral kurtosis (Figure 9), better than the other ones, disclose the relationship between musicians and nonmusicians, indicating the best linear separation between the sample groups. It is interesting to observe that spectral kurtosis and spectral skewness are in the same cluster, since they characterize the spectral data distribution in terms of its shape, along with spectral flatness. Both skewness and kurtosis provide information about the type and magnitude of departures from normality (DeCarlo, 1997) and it has been demonstrated that a sudden change on spectral kurtosis can indicate transients at non-stationary signals (Antoni, 2006; Dwyer, 1984). However, even though this acoustic feature seems suitable to discriminate musicians from nonmusicians, its relationship with what is indeed perceived by the subjects remains uncertain and needs further analysis.

Our multivariate statistical navigation on the PC dimensions show that although PCA finds the directions that the EEG data vary the most, these directions are not necessarily the ones that best discriminate the sample groups. Meanwhile, LDA showed interesting information about the brain discriminant aspects of each sample group. We may anticipate that the use of professional musicians in our approach shall promote a better discrimination between the groups of interest and, consequently, a better classification performance. However, an additional limitation of our study is that the subjects who took part in our experiments were not matched according to some background variables, especially the musicians (e.g., gender, onset of music training, and formal training received). Future works must pay attention to these matching criteria as well.

Additionally, taking into account the whole brain activity (considering the information registered simultaneously in all the EEG electrodes) during the listening task, we intend here not to overlook possible interactions among different brain regions, because such interrelations might be useful to separate the sample groups under investigation (Davatzikos, 2004; Friston & Ashburner, 2004; Thomaz et al., 2007). In fact, this multivariate statistical approach allows us to have new perspectives to identify novel regions potentially discriminant to characterize musicianship beyond specific regions of interest. The statistical differences between musicians and nonmusicians captured by our approach revealed that the brain areas that best describe musicianship exist predominantly in the frontal areas of the brain, which corroborate previous works (Saari et al., 2018) on how the brain processes certain acoustic characteristics of music, specially related to the motor and auditory areas.

Nevertheless, in Poikonen, Alluri, et al. (2016) work, brain activity has been defined by searching for a signal peak at the vicinity of 100 ms corresponding to the presence of a N100 component. In our work only the musicians presented this phenomenon, preventing us from using this peak information to compose our data matrix. Therefore, we have defined the time instant at 100 ms as an equal and comparable predefined timestamp for all the subjects, musicians and nonmusicians. However, even if we could ensure that this N100 component occurs to all subjects, we would still be using predefined information based on the response of specific brain regions of interest (related to the electrodes where this N100 component should be expected to occur) without considering the information contained beyond, in other regions of the brain. We believe that this limitation could be properly addressed using an algorithm that summarizes all the information on the average epoch of each subject (Rocha, Massad, Thomaz, & Rocha, 2014), or through a multilinear perspective (Lu, Plataniotis, & Venetsanopoulos, 2013).

Finally, all our multivariate statistical analyses have been carried out on a limited sample size setup, given the original dimensionality of our data matrix and the difficulty of recruiting volunteers, particularly professional musicians, which has an impact on the estimates of the classification accuracy of the discriminant dimensions. This becomes clear when we use less electrodes (Table 2) to compare our results. It is possible to see that, in this case, the training accuracies are close to the test ones, whereas the multivariate estimates (that is, using all the electrodes) show more sensitivity to such choice giving larger ranges between lower (test set) and upper (training set) classification bounds. This limited sample size issue could be addressed using regularized versions of LDA, such as MLDA (Sato et al., 2008; Thomaz et al., 2007), or other small sample size classifiers like Support Vector Machine (Davatzikos, 2004). Another interesting possibility would be to address this problem through a multilinear subspace learning perspective (Lu et al., 2013), considering time, electrodes and subjects as a third-order tensor data, which, as mentioned in the previous paragraph, might also overcome the timestamp limitation of our work.

## Conclusion

This work proposed and implemented a multivariate statistical framework to automatically classify whether the listeners were musicians or not, based on the most expressive and discriminant features obtained through the analysis of the brain EEG data on a global level. This is an exploratory study that describes in mathematical and computational terms the differences of musical processing between musicians and nonmusicians, allowing a plausible linear separation among the sample groups, based on the analysis of high-dimensional and limited sample size data, with relatively high classification accuracy.

We believe that this multivariate statistical analysis generates a more comprehensive description of the neural activation patterns, transition states, and classification boundary that separate cognitively musicians from nonmusicians. Further work with professional musicians and larger sample groups might increase the statistical discriminant power of our findings. The applicability of the multivariate statistical framework proposed is not restricted to the classical music used here, and other types of music beyond the classical ones can benefit from this whole brain signal analysis as well.