The structure of the self and its relationship with wellbeing are of interest to researchers in many areas of psychology, including social, clinical and differential. Psychologists seeking to calculate the self-concept structure indices associated with the self-complexity framework have long been faced with a computation bottleneck. The complex formulae, variable approaches and lack of availability of ready-to-use programs that allow streamlined calculation of dimensionality (Linville’s H), complexity (Sakaki’s SC), overlap (OL) and compartmentalisation (Shower’s Phi) means that the research area has been limited in both research speed and participation. The low volume of studies and the computational disincentive for replications have led to a situation where the evidence on self-complexity and related constructs is equivocal. In this article, we discuss approaches to calculating self-concept structure, providing a practical guide for researchers who would like to implement the published formulas or use existing computational tools, made available by the authors. This tutorial will allow both efficiency and broader access for researchers seeking to examine constructs related to self-concept structure. We hope that these contributions might foster a new wave of data collection in this area, allowing for clarity on the utility and implications of dimensionality and complexity in the self-concept.
The concept of self-complexity (Linville, 1987) and the associated trait-sort task (Showers & Kling, 1996) are used widely across a range of psychological disciplines, including social psychology (e.g. Banas & Smyth, 2021), clinical psychology (e.g. Woolfolk et al., 1999), personality psychology (e.g. Miller et al., 1991), educational psychology (e.g. Hsu et al., 2022) and cognitive psychology (e.g. Sakaki, 2004). The self-complexity framework is a network-type model of the self-concept structure with two central elements: the number of “aspects” that comprise the self, and the degree to which these aspects are differentiated from one another. However, despite a fairly clear set of theoretical predictions in the original papers (Linville, 1985, 1987), the evidence for the key theorised effect -the impact of complexity and dimensionality of the self-concept structure on wellbeing- remains equivocal (Rafaeli-Mor et al., 1999; Rafaeli-Mor & Steinberg, 2002). The area of research has been plagued with measurement challenges and has produced a range of indices that can be used to summarise various aspects of the self-concept structure captured by the rich datasets generated when participants describe their self-concepts (Pilarska & Suchańska, 2015). This variation in measurement approaches has been identified as one of the sources of the inconsistent evidence on the self-complexity-wellbeing relationship (Koch & Shepperd, 2004; Rafaeli-Mor & Steinberg, 2002). As such, this area is ripe for new research that might resolve the inconsistent findings. The main challenge to a researcher attempting to join this conversation, however, is the computational challenge and the lack of ready-to-use programs to calculate these indices. Many of the indices in use to capture properties of the self-concept structure are calculated using multi-step formulae and require significant re-arrangement of the data. This is particularly the case for the most popular index of self-concept structure- Linville’s H dimensionality measure- where evidence is highly variable. As a result of this difficulty, there is comparatively little evidence accumulated with reference to any one index and it is difficult to draw inferences about the findings. The current article discusses these indices, walks the reader through their calculations and compiles available information on expected ranges and interpretation. The tutorial covers “self-complexity”/ dimensionality (H; Linville, 1987), complexity (SC; Sakaki, 2004), overlap (OL; Rafaeli-Mor et al., 1999) and compartmentalisation (Phi; Showers & Zeigler-Hill, 2003) on data generated using a trait-sort type self-concept description task (Showers, 1992).
Background
Linville’s (1985, 1987) self-complexity model is a commonly used framework for understanding the structure of people’s self-concept (that is, the way in which the aspects of the cognitive representation of the self are named, qualified, arranged and relate to one another), and has demonstrated utility in a range of contexts and processes (Cohen et al., 1997; Linville, 1987; Steinberg et al., 2003). This model has also been used as a way to account for differences in well-being, particularly how people respond to stressors (G. Brown & Rafaeli, 2007; Linville, 1985, 1987; Mavor et al., 2014). These effects are explained in terms of the stress buffering hypothesis (Linville, 1987), where higher levels of self-complexity are found to moderate the impact of stressful events on both physical and mental health: people who have higher self-complexity are able to better buffer the negative effects of stress, and therefore experience higher levels of wellbeing. In this understanding, higher self-complexity serves as a buffer against the effects of stress because only the relevant aspect of self is affected (Linville, 1987), while other parts remain largely unaffected.
The self-complexity model has two central elements: the number of self-aspects and the degree of content overlap among these aspects. Self-aspects refer to the different ways people think of themselves, such as in terms of their various roles, groups or relationships (Mavor et al., 2014). Each self-aspect is associated with several attributes, usually expressed as adjectives (e.g. friendly, indecisive, calm). The extent to which individuals perceive the sets of attributes associated with these self-aspects to be consistent across aspects is known as the overlap score. The raw overlap score has been used as one way to operationalise self-complexity (Rafaeli-Mor et al., 1999). In a slightly more complex approach proposed by Sakaki (2004), overlap and the number of aspects are used in conjunction to quantify complexity: low self-complexity is characterised by fewer self-aspects with greater degree of overlap, while individuals high in self-complexity will report greater number of self-aspects, and higher degree of differentiation (i.e. lower overlap).
Another way to quantify “complexity” in a useful way is to use the Scott dimensionality statistic H (Scott, 1969) to capture the degree to which the patterning of attributes varies with self-aspect. In this approach, proposed by Linville (1987), the focus is not so much on how much the attributes associated with one self-aspect overlap with those associated with another self-aspect, but rather on the patterning of the self-aspects associated with each attribute. In an important corollary, this also allows for a number of broader applications of the current tutorial and tools; any research where the dimensionality of the data can be understood using the Scott H, can employ the approach and tools discussed below.
And finally, the compartmentalisation index (phi; Showers & Zeigler-Hill, 2003) focuses solely on the distribution of positive and negative attributes across the self-aspects, and indicates whether some aspects appear to be primarily positive, while others are primarily negative, or if the positive and negative attributes are more evenly distributed. In this context, high compartmentalisation would indicate that some aspects are positive and others negative, while low compartmentalisation would indicate that most aspects have both positive and negative attributes.
The Linville approach has been plagued with variable findings and measurement issues. There is some evidence (e.g. Cohen et al., 1997; Linville, 1985, 1987; Steinberg et al., 2003) for the Linville “stress buffering hypothesis” (Linville, 1987), where higher levels of self-complexity are found to moderate the impact of stressful events on both physical and mental health. However, there is also evidence from the same original researcher that having a large number of self-aspects may be a source of chronic, low level stress (Linville, 1985) and, similarly, that the maintenance of multiple self-aspects may be inherently stressful (McConnell, 2011). These equivocal findings are further unpacked by a meta-analysis (Rafaeli-Mor & Steinberg, 2002), which found limited support for the buffering hypothesis. In essence, while a simple reading of self-complexity theory (Linville, 1985, 1987) would suggest that bolstering numbers of diverse group memberships might be a straightforward and effective intervention to buffer the stress of transition, ensuing findings suggest that there are cases where this might cause more stress (McConnell, 2011; Woolfolk et al., 1995, 1999), have no effect at all (J. D. Campbell et al., 1991; Morgan & Janoff-Bulman, 1994), be dependent on the valence of self-aspects (Showers, 1992; Showers & Kling, 1996) or the distribution pattern of positive and negative perceptions across different aspects (Showers, 1992; Showers & Zeigler-Hill, 2003; Woolfolk et al., 1995).
Addressing the inconsistencies
These inconsistent findings are difficult to reconcile and, in many cases, would have opposing practical applications: do we encourage more or fewer self-perceptions in those at risk of poorer wellbeing? Review and synthesis of the literature (Koch & Shepperd, 2004; Rafaeli-Mor & Steinberg, 2002) indicates that much of the inconsistency in findings likely arises as a direct result of the variation in measurement approaches, the wide range of available indices in use and limited available literature to review. In order to address these inconsistencies, the field requires two key things:
Greater volume of research in the area, particularly replication studies that might clarify earlier findings;
A standardised approach to calculation and interpretation of the indices;
While the first of these is a long-term aspiration, the current paper provides an important first step to the second issue. We provide documentation and discussion of the issues surrounding collection of self-concept structure data and calculation of the relevant indices. We also provide a set of worked examples and a sample dataset with summary statistics. In a tutorial format, we outline the formulae used for calculation of the indices manually. Next, we present a sample (real) dataset analysed using the described approach, alongside some descriptive analyses and commentary on the relationships between the indices produced by our tools. We also provide links to both an SPSS macro and an R package created by the authors and now available to researchers in this area.
Measurement Approaches
Before we commence the process of describing the computational process to calculate the indices, we must first consider the ways in which the data are collected and formatted. These vary in three key ways: data collection vectors, attribute/trait lists and task instructions.
Data collection vectors
The original task and many of its clinically-based adaptations collect self-concept structure data via a hands-on, paper-based trait sort task. In this task, participants are given a stack of trait cards (one trait per card) and are asked to sort these into sets to describe an aspect of themselves, each of which is recorded by the experimenter. In some versions of this task, the participant is asked to provide each set a descriptive label, after the attributes have been chosen. In others, the groupings of attributes are not labelled. The participant can use and re-use trait cards as many times as they like; each “sort” is a standalone arrangement of the deck to describe one aspect, and there is no requirement to arrange the aspects of the self in space or groups different sets of cards visually. This approach has the limitation that it requires to participant to deal with 40 physical cards in a deck, to describe self-aspects, under observation. This may result in an artificially limited range of attribute use, owing to the physical task of sorting through the deck each time to find the desired “trait”. It is also a bottom-up approach to describing the self-concept, that experimentally demands differentiation among attributes ascribed to each aspect, as the traits are first sorted and then labelled (or not labelled at all). Despite evidence suggesting that a greater degree of similarity or compatibility among the aspects of the self-concept may be protective of wellbeing (Bentley et al., 2019) it would be a very contrary participant that repeatedly arranged the same or similar sets of attributes and ascribed them different labels.
More recent adaptations of the task have used computer-based version of this task, such that participants complete an online survey-type collection tool. In this version, participants are asked to generate a list of n labels of the aspects of the self and then are presented with n copies of the attribute set, and asked to describe the aspect of the self-associated with each label. This is an immediate departure from the original task: no physical cards, no spatial sorting, a two-phase process (all labels, then all attributes, rather than sort-label-sort-label) and a top-down, labels-first direction of work. We argue that this approach addresses the issue of experimental demand for differentiation and allows participant to report greater similarity among self-aspects. It also allows for the introduction of self-report questions related to participant perceptions of each self-aspect (e.g. whether the aspect is associated with a social group, whether the participant considers it positive. See Banas & Smyth, 2021 for a discussion of the value this can add). At the same time, it creates a limitation in terms of the potential for an infinite number of sorts. The online tools require a practical limit to the number of labels and, at the same time, the pre-sort nomination of labels disallows spontaneous addition of another aspect that the participant might remember while undertaking the description task.
Attribute/ Trait lists
The second source of methodological variability is the list of attributes provided to participants to “sort” among self-aspects. The most commonly used set is the Showers (Showers, 1995) set of 40, with 20 positive and 20 negative. This is the set used in the worked example and sample dataset in this paper. However, there is variation in the literature in terms of the number of attributes. Supplementary material A provides a list of published trait sorts on which readers may wish to practice calculations. It also provides a tidy demonstration of this variability in attribute lists. The number of supplied attributes ranges from 33 (Linville, 1987), through 40 (Showers & Kevlyn, 1999) and 44 (Luo & Watkins, 2008), up to 48 (Clifford et al., 2020).
Task instructions
The final source of variation is the instructions provided to participants for completion of the task. We provide the instructions that were used in our example data (Supplementary material B), wherein the participant is instructed to “think of up as many different subtypes of yourself… that would describe who you are” and prompted by suggestions of possible types of selves (situation, role, group-based, relational), but also encouraged not to limit their descriptions to these suggestions. The original Linville (1987) instructions ask participants to “form groups of traits that go together…[such that] Each group of traits might represent a different aspect of yourself.” (p666). Showers and colleagues (e.g. Showers & Kevlyn, 1999) use an adaptation of these instructions, but explicitly note that the experimenter is not going to “give examples of the groups because I want you to form the ones that are most meaningful to you” (p961).
Tutorial
Study Setup
For the purposes of illustrating how the various self-concept structure indices are calculated, we selected a single participant1 from a larger published dataset, associated with Banas and Smyth (2021), and available online at https://osf.io/zt7an/. This participant’s card sort is presented in Table 1 below. Participants were asked to generate a list of labels of the aspects of the self. Participants were instructed to generate as many labels as they feel were required to fully describe themselves. They were then, for each of these identified aspects, given a set of 40 attributes (based on the Showers and Zeigler-Hill (2003) set), with 20 positive and 20 negative and asked to use these sets to describe each aspect qualitatively. For each self-aspect, the participant was able to choose as many or as few attributes as they wanted. Negative attributes are denoted with (-) in Table 1.2
Me alone | Me with family | Me with friends |
Hardworking | Comfortable | Communicative |
Independent | Confident | Fun and Entertaining |
Isolated (-) | Fun and Entertaining | Indecisive (-) |
Mature | Immature (-) | Independent |
Optimistic | Intelligent | Insecure (-) |
Optimistic | Intelligent | |
Self-centered (-) | Optimistic | |
Successful | Organised | |
Outgoing |
Me alone | Me with family | Me with friends |
Hardworking | Comfortable | Communicative |
Independent | Confident | Fun and Entertaining |
Isolated (-) | Fun and Entertaining | Indecisive (-) |
Mature | Immature (-) | Independent |
Optimistic | Intelligent | Insecure (-) |
Optimistic | Intelligent | |
Self-centered (-) | Optimistic | |
Successful | Organised | |
Outgoing |
Indices
Overlap
Formula and Its Implementation. In order to calculate overlap between two aspects, we applied the following formula (Rafaeli-Mor et al., 1999):
where C is the number of attributes common to the two aspects, T is the total number of attributes in the referent aspect, n is the total number of aspects in the individual’s trait-sort and i and j vary from 0 to n. In order to do this, we followed these steps:
Make a list of all possible pairs of self-aspects (order is important, so 1-2 is a different pair to 2-1)
For each pair, count the attributes that appear in both self-aspects and divide that number by the total number of adjectives in the second self-aspect in that pair.
Add up the results of step 2 from all pairs.
Divide the result of step 3 by n*(n-1), where n is the number of self-aspects that this participant listed.
Worked Example. To apply this procedure to our example case, we followed the steps outlined above:
List all possible pairs (populate column A, Table 2).
Count the number of overlapping attributes in each pair, and divide by the number of attributes in the second self-aspect in the pair (populate columns B-D)
Add up the numbers in column D, to arrive at the sum.
Divide the sum by n*(n-1), where n is the number of self-aspects (in our example, n = 3).
A | B | C | D |
Pair | Number of overlapping attributes | Number of attributes in second self-aspect of the pair | (B) divided by (C) |
1 and 2 | 1 | 8 | 0.125 |
2 and 1 | 1 | 5 | 0.2 |
1 and 3 | 2 | 9 | 0.22 |
3 and 1 | 2 | 5 | 0.4 |
2 and 3 | 3 | 9 | 0.33 |
3 and 2 | 3 | 8 | 0.375 |
Sum | 1.66 | ||
n | 3 | ||
n*(n-1) | 6 | ||
Overlap | 0.27 |
A | B | C | D |
Pair | Number of overlapping attributes | Number of attributes in second self-aspect of the pair | (B) divided by (C) |
1 and 2 | 1 | 8 | 0.125 |
2 and 1 | 1 | 5 | 0.2 |
1 and 3 | 2 | 9 | 0.22 |
3 and 1 | 2 | 5 | 0.4 |
2 and 3 | 3 | 9 | 0.33 |
3 and 2 | 3 | 8 | 0.375 |
Sum | 1.66 | ||
n | 3 | ||
n*(n-1) | 6 | ||
Overlap | 0.27 |
Note: The overlap concept requires that the participant listed at least 2 self-aspects. If only one self-aspect was listed, overlap cannot be calculated, by definition.
H Statistic
Formula and Its Implementation. In order to calculate H, we followed Linville’s (1987) approach and used the Scott dimensionality formula (Scott, 1969):
where n is the total number of attributes, and ni is the number of attributes that appear in a particular combination in the descriptions of self-aspects. Higher H scores indicate greater complexity. To do this, we took the following steps:
List the possible patterns of attribute combinations across the aspects
Count the number of attributes that take each of these patterns
Calculate nilog2ni for each of these patterns
Sum the values from step 3 and divide by the total number of available attributes
Subtract from log2 of the total number of attributes
Worked Example. To apply this to our worked example, we first listed all the possible patterns across the three aspects reported (column A in Table 3) and then counted the number of attributes that took each of these patterns (column B). We then calculated nilog2ni (calculated in two steps below in columns C and D, for clarity).
A | B | C | D |
Pattern | ni | log2ni | product (nilog2ni) |
not used at all | 23 | 4.52 | 104.04 |
only in self-aspect 1 | 3 | 1.58 | 4.75 |
only in 1 and 2 (e.g“independent”) | 1 | 0 | 0 |
present in all 3 (e.g.“optimistic”) | 1 | 0 | 0 |
only in self-aspect 2 | 5 | 2.32 | 11.61 |
only in 2 and 3 | 2 | 1 | 2 |
only in self-aspect 3 | 5 | 2.32 | 11.61 |
A | B | C | D |
Pattern | ni | log2ni | product (nilog2ni) |
not used at all | 23 | 4.52 | 104.04 |
only in self-aspect 1 | 3 | 1.58 | 4.75 |
only in 1 and 2 (e.g“independent”) | 1 | 0 | 0 |
present in all 3 (e.g.“optimistic”) | 1 | 0 | 0 |
only in self-aspect 2 | 5 | 2.32 | 11.61 |
only in 2 and 3 | 2 | 1 | 2 |
only in self-aspect 3 | 5 | 2.32 | 11.61 |
We can then sum the products in column D and input into the formula:
Sakaki’s (2004) Self-complexity
Formula and Its Implementation. In order to calculate self-complexity, we used the formula introduced by Sakaki (2004): SC = NASP/OL, where NASP is the total number of self-aspects in the person’s sort and OL is the person’s overlap score, as calculated above.
Worked Example. To apply this procedure to our example participant, we simply divided the number of aspects (here: 3) by the degree of overlap as calculated above (here: 0.27). In this case SC= 11.11.
Compartmentalisation: Phi
Formula and Its Implementation. Phi is based on a chi-squared statistic and provides an index of the extent to which the distribution of positive and negative traits across self-aspect groups deviates from what would be expected based on chance. In order to calculate this index, we follow the below steps:
Categorise all attributes as positive or negative and tabulate counts of each, per aspect;
On the basis of the total number of positive and negative attributes used (including repeats), calculate an overall negative and positive proportion for the trait sort;
Using the overall negative and positive proportions, calculate expected positive and negative frequencies per aspect, on the basis of attribute count per aspect
Compare both positive and negative expected frequencies to observed frequencies for each aspect, using the formula:
Calculate a chi-squared value for the sort, by summing all of the values generated in step 4
Calculate phi from the chi-squared value using the formula:
Where N is the total number of attributes used (with repeats).
Worked Example. To apply this procedure to our worked example, we first tabulated the positive and negative frequencies across aspects, as in Table 4.
Subtype | Me alone | Me with Family | Me With Friends | Total |
Positive frequency | 4 | 6 | 7 | 17 |
Negative frequency | 1 | 2 | 2 | 5 |
Total | 5 | 8 | 9 | 22 |
Subtype | Me alone | Me with Family | Me With Friends | Total |
Positive frequency | 4 | 6 | 7 | 17 |
Negative frequency | 1 | 2 | 2 | 5 |
Total | 5 | 8 | 9 | 22 |
We can then calculated the proportion of total used attributes that were positive and negative:
Using these proportions and the total attributes in each aspect, we calculated expected frequencies (see Table 5):
Subtype | Me alone | Me with Family | Me With Friends |
Total attributes | 5 | 8 | 9 |
Expected positive frequency (total x positive proportion) | 3.86 | 6.18 | 6.95 |
Expected negative frequency (total x negative proportion) | 1.14 | 1.82 | 2.05 |
Subtype | Me alone | Me with Family | Me With Friends |
Total attributes | 5 | 8 | 9 |
Expected positive frequency (total x positive proportion) | 3.86 | 6.18 | 6.95 |
Expected negative frequency (total x negative proportion) | 1.14 | 1.82 | 2.05 |
We then calculated a chi-squared value, as the sum of across all aspects (Table 6).
Subtype | Me alone | Me with Family | Me With Friends |
Observed (positive) | 4 | 6 | 7 |
Expected (positive) | 3.86 | 6.18 | 6.95 |
Observed/ expected (positive) | 0.14 | -0.18 | 0.05 |
(observed-expected)2 (positive) | 0.0196 | 0.0324 | 0.0025 |
(observed-expected)2/ expected (positive) | 0.005 | 0.005 | 0.0003 |
Observed (negative) | 1 | 2 | 2 |
Expected (negative) | 1.14 | 1.82 | 2.05 |
Observed/ expected (negative) | -0.14 | 0.18 | -0.05 |
(observed-expected)2 (negative) | 0.0196 | 0.0324 | 0.0025 |
(observed-expected)2/ expected (negative) | 0.016 | 0.018 | 0.001 |
Subtype | Me alone | Me with Family | Me With Friends |
Observed (positive) | 4 | 6 | 7 |
Expected (positive) | 3.86 | 6.18 | 6.95 |
Observed/ expected (positive) | 0.14 | -0.18 | 0.05 |
(observed-expected)2 (positive) | 0.0196 | 0.0324 | 0.0025 |
(observed-expected)2/ expected (positive) | 0.005 | 0.005 | 0.0003 |
Observed (negative) | 1 | 2 | 2 |
Expected (negative) | 1.14 | 1.82 | 2.05 |
Observed/ expected (negative) | -0.14 | 0.18 | -0.05 |
(observed-expected)2 (negative) | 0.0196 | 0.0324 | 0.0025 |
(observed-expected)2/ expected (negative) | 0.016 | 0.018 | 0.001 |
And finally, we calculated phi by taking the square root of χ2 (which was equal to 0.045, as calculated above) divided by the total number of attributes used (equal to 22).
Tools
While these calculations are relatively straightforward and can be calculated by hand for a single participant with few listed self-aspects, as in our example, for large datasets and larger and more complex self-concepts, researchers require computational support. To this end, we would like to make the research community aware of two tools available that will automatically calculate the above-mentioned indices in a trait-sort data set. These are an R-package (Name: selfcomplexity, see Banas, 2022) and an SPSS macro (Name: CatComplex, available on request from the authors). These tools were developed by the current authors and have been made available freely to the research community.
Data format for R package
The functions provided as part of the R package require data to be provided in the so-called tidy format, where each observation (i.e. each self-aspect) is entered as a separate row. This means that the participant whose self-aspects are listed in Table 1 would have their data spread across three rows. The functions assume that the dataset has a column with unique participant IDs, a column with self-aspect names, and a column with a comma-separated list of attributes that are associated with a given self-aspect. The names of these columns can be provided by the user as arguments when calling the functions.
Apart from the dataset, the user needs to supply the set of attributes that were made available to participants in the card-sort task. As there are various versions of the task, these sets will vary between users, and we did not want to be prescriptive in setting a default set. Nevertheless, the package includes a dataset corresponding to the set of 40 attributes (20 positive and 20 negative) used in studies conducted by the authors, and forming the basis of the examples provided in this paper. An outline of the approach taken to calculating indices using the R package is in Supplementary Material C.
Data format for SPSS macro
Use of the SPSS macro requires data in “wide” format; each participant in a row and each attribute of each aspect in columns. The macro assumes a column with unique participant IDs, and Naspects sets of attribute variables in columns. The tool assumes binary data (attribute allocated/ not allocated) organised by attributes nested in aspects. This data can be generated by asking participants to name up to X aspects of the self and then cycling through the list of Y attributes for each named aspect and asking participants to indicate which attributes from the list describe each aspect. There is no upper limit to the number of attributes participants can select per aspect, but a minimum of one is required. Participants are in cases and each aspect/attribute is in a separate variable. That is, for a single participant reporting on 40 attributes across 10 self-aspects, we would have 400 variables.
The macro allows customisation on the number of aspects and attributes for analysis and also the valence assigned to each attribute. The macro does not require a list of labels for attributes. The user is expected to provide input on: maximum number of aspects, number of available attributes, variable names where the attributes are stored and the valence pattern across the list of attributes (i.e., for the attribute list intelligent, unreliable, friendly, the user needs to specify the valence as 1,-1,1).
Sample Dataset
To further illustrate the use of the indices calculated by the tools and to demonstrate the ways in which the various indices relate to one another, we analysed a small trait-sort data set. The sample data set contains self-concepts, described using a trait-sort task that allowed up to 10 self-aspects to be described using any of 40 attributes (20 positive, 20 negative)- the same approach as that taken in the worked example above. Data were collected from 62 predominantly Australian and South-East Asian participants in 2019 (before the covid-19 pandemic). The demographics of the participants are in Table 7. The data file, associated codebooks and R scripts with all analyses are available online (https://osf.io/4ve62/).
Count | Mean (SD) | |||
Age | - | 24.64 (6.26) | ||
Gender | ||||
Male | 38 | |||
Female | 21 | |||
Ethnicity | ||||
South-East Asian | 30 | |||
Australian | 13 | |||
East Asian | 9 | |||
South Asian | 4 | |||
Other | 3 | |||
Religious | ||||
Yes | 15 | |||
No | 44 |
Count | Mean (SD) | |||
Age | - | 24.64 (6.26) | ||
Gender | ||||
Male | 38 | |||
Female | 21 | |||
Ethnicity | ||||
South-East Asian | 30 | |||
Australian | 13 | |||
East Asian | 9 | |||
South Asian | 4 | |||
Other | 3 | |||
Religious | ||||
Yes | 15 | |||
No | 44 |
We used the selfcomplexity R package (Banas, 2022) to calculate all indices described in this paper: number of self-aspects (NASP), overlap, H, Sakaki’s SC index (NASP/OL), and Phi. Descriptive statistics for all indices are presented in Table 8, violin and box plots of the distributions are presented in Figure 1, and correlations between indices are presented in Table 9. As the questionnaire allowed participants to list between 0 and 10 self-aspects, the range of NASP was equal to 0-10, with a median of five.
Index name | Minimum | Maximum | Mean | SD | Median | IQR |
NASP | 0.00 | 10.00 | 5.58 | 1.91 | 5.00 | 1.00 |
Overlap | 0.00 | 0.55 | 0.26 | 0.13 | 0.24 | 0.20 |
H | 1.18 | 4.72 | 2.57 | 0.76 | 2.48 | 0.80 |
Sakaki’s SC | 10.38 | 120.00 | 28.04 | 18.53 | 24.38 | 16.84 |
Phi | 0.18 | 0.92 | 0.51 | 0.17 | 0.47 | 0.22 |
Index name | Minimum | Maximum | Mean | SD | Median | IQR |
NASP | 0.00 | 10.00 | 5.58 | 1.91 | 5.00 | 1.00 |
Overlap | 0.00 | 0.55 | 0.26 | 0.13 | 0.24 | 0.20 |
H | 1.18 | 4.72 | 2.57 | 0.76 | 2.48 | 0.80 |
Sakaki’s SC | 10.38 | 120.00 | 28.04 | 18.53 | 24.38 | 16.84 |
Phi | 0.18 | 0.92 | 0.51 | 0.17 | 0.47 | 0.22 |
Note: SD stands for standard deviation, IQR stands for interquartile range. As not all indices are normally distributed, both mean and median are reported, alongside measures of spread.
Overlap | H | Sakaki’s SC | Phi | |
NASP | 0.38** | 0.82*** | 0.03 | 0.22 |
Overlap | 0.38** | -0.70*** | -0.09 | |
H | 0.02 | 0.08 | ||
Sakaki’s SC | 0.20 |
Overlap | H | Sakaki’s SC | Phi | |
NASP | 0.38** | 0.82*** | 0.03 | 0.22 |
Overlap | 0.38** | -0.70*** | -0.09 | |
H | 0.02 | 0.08 | ||
Sakaki’s SC | 0.20 |
Note: * p < .05, ** p < .01, *** p < .001.
To contextualise these summary statistics, we list below some guiding information for interpretation
Number of Aspects (NASP)
As illustrated above, the attributes of this index (e.g. min/max) are largely driven by experimental design decisions. In the original literature, where the task was conducted by grouping attribute cards, the participant is instructed to make an exhaustive self-concept. That is, to continue naming aspects until the self is fully described. Operationalisation of this, however, particularly online, often requires the setting of limits (see, for example, Banas & Smyth, 2021).
Overlap (OL)
By definition, overlap can range from 0 to 1, with 0 indicating that the aspects are perfectly non-overlapping and 1 indicating perfect overlap- where all self-aspects are described with the same attributes.
H Statistic
The range of the H statistic is variable- see Luo, Watkins and Lam (2008) for a full discussion. In a practical sense, H ranges from 0 (i.e. log21) to the smaller of k and log2n, where n is the number of adjectives that were made available to the participant in the card sort task (in this case, n = 40, and so the upper bound for H was 5.32) and k is the number of self-aspects. A higher value of H indicates a higher level of complexity
Sakaki’s Self-complexity
By definition, the minimum possible value of SC for a participant is equal to the number of self-aspects they listed (i.e. when OL = 1, SC = NASP). The maximum value of SC would be achieved when the OL value for the given participant is close to 0. Thus, higher values of SC indicate a larger number of self-aspects and/or lower overlap. When overlap score is equal to zero, it is not possible to compute SC.
Compartmentalisation: Phi
By definition, Phi can range from 0 to 1, where 0 indicates perfect integration (positive and negative attributes are evenly distributed across self-aspects), and 1 indicates perfect compartmentalisation (each self-aspect is either purely negative or purely positive). Unlike some of the other measures, Phi does not depend on the number of self-aspects listed by the participant, or the number of attributes used in the card sort (Zeigler-Hill & Showers, 2007). Among participants in this example, the distribution is not completely symmetrical, indicating more participants with very high compartmentalisation than participants with very low compartmentalisation.
Discussion
We provided here a summary of the key indices used in self-complexity approaches to measuring the self-concept structure, summarised the steps in calculating the indices and provided links to computational tools to streamline the process. We did this with a view to paving the way for a greater volume of research in this area, more quickly. The self-complexity approach to self-concept structure, while intuitively useful and theoretically based, has been plagued by measurement issues and the non-replication issues inherent in small evidence bases. Our contribution clarifies and summarises the measurement approach with a view to building that evidence base and allowing replication studies to resolve some key questions in this literature. We summarise below five important ways forward.
Future Directions
Future direction 1: more diverse data. With very few exceptions (e.g. C. M. Brown et al., 2017), the vast majority of self-concept structure data has been collected on Western participants. The tutorial and tools we present here represent an opportunity for rapid replication of the literature, particularly the original literature on the links between self-structure and wellbeing, in a range of cultural and linguistic contexts.
Future direction 2: replication towards an answer. The key question in this literature is about the link between self-structure indices and wellbeing. Synthesis of the available literature (G. Brown & Rafaeli, 2007; Rafaeli-Mor & Steinberg, 2002) indicates that the evidence cannot demonstrate either the existence or the direction of this link. The current paper streamlines the process for replication to allow the amassing of a larger evidence base from which to examine this question.
Future direction 3: Consideration of test-retest reliability. There is an assumption in a lot of the work in this area that the self-concept is stable and that the same self-perceptions that were documented in the lab would be brought to bear in a stressful situation in which wellbeing might be threatened. The authors, however, come from a theoretical tradition where we would expect contextual and temporal variation in self-concept. An important future direction would be to use test-retest reliability approaches to establish whether the way in which participants respond to a task such as this is better conceptualised as a stable individual difference or a contextual self-perception.
Future direction 4: developing a clearer understanding of relationships among the self-concept structure variables. We would, theoretically, expect that the self-concept structure indices would be correlated and there is existing evidence of a range of zero-order correlations (including those presented here). The next step in establishing the construct validity of these indices would be a rigorous, systematic approach to determining the inter-relationships, for example a multi-trait, multi-method approach (D. T. Campbell & Fiske, 1959). Conversely, there is also scope to consider more complex patterns of relationship, for example consideration of whether the relationships between complexity and wellbeing might not be linear.
Future direction 5: Considering how individual differences might impact the way participants engage with the task. We have alluded above to the potential effects of idiosyncratic vocabulary, construals of the valence of the attributes, and perceptions of inter-attribute and inter-aspect compatibility. The scope for research is much broader than this. For example, verbal intelligence, self-reflection and insight, and personality could all, intuitively, have impact.
Limitations
In pressing forward to use these indices and the tools to calculate them, researchers should be aware of a number of limitations of both our approach and the trait-sort approach more generally.
Limitations of our approach. The first, and most obvious limitation of our approach is that it is tailored to the use of the trait sort task only, using an on/off approach to assign attributes. In this approach, there is no way to assign attributes by degree, which limits the richness of our understanding of the self-concept. For instance, two self-aspects might have an identical set of attributes, but the aspects may be functionally different owing to the centrality, strength or priority of the attributes assigned. There is already a great wealth of literature in social psychology that unpacks the potential impact of varying levels of centrality or importance of aspects of the self (e.g. Leach et al., 2008; Settles, 2004).
The second limitation is a practical issue. Where the original task of sorting hard-copy attribute cards into categories allowed for an infinite possible number of self-aspects (and this is typically reflected in the instructions for the task), moving the trait-sort online and building computational tools requires a limit on the number of self-aspects. In our data collection vector, while the number of labels a participant can provide is notionally “infinite”, there is an element of experimental demand in how many possible label slots are displayed on the screen at once. In the computational tools, we have set the limit arbitrarily high, but it is not infinite. It is possible this approach artificially reduces both the number of aspects and, consequently, the complexity of self-concepts reported by participants.
Limitations of the approach more generally. There are also three key issues associated with the trait sort/ self complexity approach more broadly. These issues are “baked in” to the approach and require research and analysis to delineate and address their impact. These are 1) that the compartmentalisation index (phi) requires the experimenter to ascribe the positivity and negativity of the attributes. For example, “intelligent” is coded as positive, and “incompetent” as negative. These valences are then used to calculate how positive and negative attributes are distributed across self-aspects. This approach, however, fails to account for the subjectivity of values like this. It is not difficult to imagine a participant who might think that “lazy” is positive or “confident” is negative. In this case, the valence of the self-perception is misrepresented by the use of experimenter-coded phi. This potential mismatch requires addressing, as there is a demonstrated link between the self-perceived positivity of an aspect of the self and its impacts on wellbeing (e.g. Bentley et al., 2019 address this in the context of social identities).
Second, the attributes are treated as orthogonal. Calculations of overlap and the flow-on calculations to complexity (as either dimensionality “h” or Sakaki complexity) rely on measuring the degree to which the same attribute is assigned across self-aspect and does not consider that the list of attributes contains a significant degree of semantic overlap. For example, a self-aspect described only as “friendly” would be treated as entirely non-overlapping with one described only as “outgoing”. Considerable data on similarity and compatibility among self-aspects is lost in not considering the degree to which, and ways in which, these attributes might be related. Given the documented impact of compatibility among self-perceptions on wellbeing and success (e.g. Iyer et al., 2009; Rosenthal et al., 2011; Smyth et al., 2019), this lost data is a considerable missed opportunity.
Third, the labels assigned to each self-aspect are typically not analysed. Practically speaking, this is because a participant is free to use whatever label they like (“me 1”, “me 2” are entirely acceptable), limiting the scope for analysis. A recent paper (Banas & Smyth, 2021), however, demonstrates the possible utility in including this participant-provided information in a nuanced understanding of the self-concept.
Conclusions
We provide here a tutorial for calculating the indices associated with the trait-sort task used in the self-complexity literature. We have provided written explanation, a worked example and a sample dataset, as well as providing links to existing analysis tools. We do this with a view to stimulating a greater volume of research in this area, with a hope to achieve two goals. First to facilitate replication studies to resolve the relationships between self-concept complexity and wellbeing. Second, we hope to scaffold new uses, new data and the answering of new question in this area.
Contributions
Contributed to conception and design: KM, LS, KB
Contributed to acquisition of data: KB, LS
Contributed to analysis and interpretation of data: KB, LS
Drafted and/or revised the article: LS, KB, KM
Approved the submitted version for publication: LS, KB, KM
Acknowledgements
We would like to acknowledge Mr James Li for contributions to data collection for the sample data set.
We would like to acknowledge two anonymous reviewers for their constructive feedback, including some highly valuable suggestions for future directions for this research, which were added to our discussion.
Funding
This work received no specific funding
Competing Interests
The authors declare no competing interests.
Data Accessibility Statement
All data and code used in this tutorial are available on OSF.
Sample sort: https://osf.io/zt7an/
Sample data set: https://osf.io/4ve62/
Footnotes
Readers wishing to practice these calculations on a range of sample sorts can refer to Supplementary Materials A for a list of published sample trait sorts with calculated indices. Participant ID for case presented here: R_2wmF4JXKNx6McGd
NB: symbols denoting valence of attributes are not visible to participants and are included here for the reader’s reference in understanding the ensuing calculation steps.