Concurrent and retrospective reports correspond for personality, affect, and coping. The present study described how autonomy, competence, and relatedness components of eudaemonic well-being (EWB) change over days and months and tested correspondences of daily and retrospective reports between and within people. Midlife and older (50-75 years) women (N = 200) completed online diaries daily for 1 week for 9 bursts over 2 years and answered questionnaires at the end of each burst (burst n = 1,529). Multilevel models partialed levels of variance and tested correspondence. Women varied in their daily experiences of EWB but did not vary substantially between bursts. Burst-level diary means and questionnaire responses corresponded between people, but changes within people were less strongly related. The daily, but not monthly, time scale of change is important for capturing within-person changes in EWB. Finding EWB change over months to years may depend on measurement designed to capture medium-term change.
Eudaemonia refers to well-being achieved when people are growing personally, flourishing, using their talents, engaging with their true self, and focusing on their own path, while not allowing societal norms to affect their decision making and behavior (Heintzelman, 2018; Ryan & Deci, 2001). Although dozens of constructs have been included under the umbrella of eudaemonic well-being (EWB), two frameworks defining what constitutes EWB predominate (Heintzelman, 2018). Self-determination theory (SDT) proposed that autonomy, relatedness, and competence comprise the basic human needs for growth and well-being (Martela & Sheldon, 2019; Ryan & Deci, 2001). Psychological well-being (PWB) theory proposed six components: autonomy, environmental mastery, personal growth, positive relations with others, purpose in life, and self-acceptance (Ryff & Keyes, 1995). The present study described the degree of change over days and months and tested the correspondences between autonomy, competence/mastery, and relatedness/positive relations when assessed as concurrent EWB, reported daily, and retrospective EWB, reported every 3 months, both between and within people.
Differences between concurrent and retrospective reports can arise because concurrent assessments yield reports from current experience, whereas retrospective assessments yield reports from cognitive summaries and appraisals (Bolger et al., 2003; Kahneman, 1999; Robinson & Clore, 2002; Schwarz, 2007). Over periods ranging from 2 days to 1 year, aggregate concurrent reports correlated moderately with retrospective reports for personality traits (r = .50), coping (r = .24 - .77), and emotion (r = .13 - .77) (Fleeson & Gallagher, 2009; Mill et al., 2016; Röcke et al., 2011; Smith et al., 1999; Stone et al., 1998; Thomas & Diener, 1990; Todd et al., 2004). The magnitude of correspondence between concurrent and retrospective reports of EWB has not been described, although experiences of autonomy, relatedness, and competence have been proposed to lead to long-term EWB (Prentice et al., 2019). Furthermore, different kinds of well-being change on different time scales; for example, day-to-day variance in affect is greater than that in life satisfaction (Willroth et al., 2020). The appropriate measurement interval and method for EWB, therefore, relies on the time scale of change as well as whether concurrent and retrospective reports correspond (Kuiper & Ryan, 2018).
In the present study, we estimated the time scale of change in EWB and linked concurrent reports of EWB in daily diaries - autonomy, relatedness, and competence - to retrospective reports of EWB. The Daily Activity and Health in the Lives of Adult Women (DAHLiA) study of midlife and older women (age 50 to 75) utilized a longitudinal burst design that included 7 daily reports followed by one retrospective report at each burst. In addition, we performed parallel analyses with hedonic well-being (HWB; e.g., affect) because it is perhaps the best-studied component of well-being with regard to the time scale of change and correspondence between concurrent and retrospective reports (cf., Willroth et al., 2020). Its inclusion served several purposes: to examine whether variability and correspondence in EWB resembled those in HWB; to allow evaluation of discriminant validity for EWB components; and to establish that the findings for HWB were similar to those in previous research, thereby situating the present study in the literature and validating comparison between HWB and EWB.
The purpose of this analysis of DAHLiA data was:
Estimate the levels of variance in concurrent (day, burst, and person) and retrospective (burst and person) EWB and, for the purpose of comparison, distress.
Link person- and burst-level concurrent EWB and, for the purpose of comparison, distress to retrospective reports. We predicted that, as was true for personality, coping, and emotion, there would be moderate correspondence.
Finally, studies with frequently repeated measurements (e.g., diary studies) and large-scale surveys necessarily rely on short forms, including single items, to alleviate participant burden and maximize retention. Construct validity of a single-item assessment with regard to a multi-item scale depends on (1) reliability of the item; (2) correspondence between the item and scale in the focal content; and (3) content coverage of the item with regard to the scale. Although adequate reliability and validity of single-item assessments have been demonstrated for other well-being constructs such as self-esteem and life satisfaction (Cheung & Lucas, 2014; Robins et al., 2001), such evidence is not readily available for single-item measures of EWB. Therefore, correspondence between concurrent and retrospective reports was contextualized with analyses examining the validity of the single items.
Materials and Methods
Participants
Participants were 200 women aged 50 to 75 years at study entry (M = 62 years, SD = 6.4). The sample was predominantly white (99%) with the rest African-American (1%) and generally well-educated (M = 16.7 years, SD = 2.3) in the context of a large range (12-22 years).
DAHLiA was originally powered based on the “Neffective” (Snijders & Bosker, 1999), which considers the number of units at each level in light of the intraclass correlation (ICC) of key study variables. The target N was 300, with 2,306 bursts and 16,142 diaries after attrition, yielding .80 power to detect within-person effect sizes between r = .10 and r = .16 and between-person effects of r = .16. However, as the study progressed, retention and compliance was significantly higher than projected. Therefore, the target N was decreased to 200. This design change increased the smallest between-person effect size for which the study was powered at .80 (two-tailed α = .05) from r = .16 to r = .20.
DAHLiA was designed to examine the interactions among pain, daily activity, and psychological adjustment and well-being in midlife and older women (Judge et al., 2020; Leger et al., 2020; Salt et al., 2017; Segerstrom et al., 2016). In the United States, this demographic group is particularly likely to experience physical pain (Johannes et al., 2010). Participants from the Kentucky Women’s Health Registry aged 50-75 and living in a seven-county area in Central Kentucky were sent an email invitation to participate. Respondents were further screened for study exclusion criteria: BMI > 40; pacemaker; ongoing treatment for serious heart or other medical conditions; infectious or chronic inflammatory diseases; serious mental disorders; oral, inhaled, or injected corticosteroids in the three months prior to enrollment; severe hypertension (> 200/100mm Hg), tachycardia, bradycardia, or atrioventricular block; or any medical, neurological, or musculoskeletal condition preventing treadmill exercise (for estimation of cardiovascular fitness). Women who reported physical pain lasting more than 2 months in the registry were oversampled: 37 reported pain in more than one site, 54 reported pain in one site, and 109 reported no pain.
Procedures
Study participation was initiated at a clinic visit in which investigators obtained informed consent, assessed physical parameters, and oriented participants to online daily diaries. Participants were sent links to 7 consecutive online daily diaries at each of 9 bursts, which occurred every 3 months over 2 years. At the end of the burst, interviewers administered questionnaire measures in person. Participants were compensated $50 at the clinic assessment and $25 at each burst, with a $25 bonus for completing all 7 diaries between 8 pm and 2 am. All procedures were approved by the University of Kentucky Institutional Review Board.
Measures
Demographics. Participants reported their age, race/ethnicity, and education at the clinic visit.
EWB. Daily Autonomy was measured with the item “I felt free to decide for myself”, adapted from the Basic Need Satisfaction scale (Johnston & Finney, 2010). Competence was measured with the item “I felt competent and capable in my activities” and relatedness with the item “I felt close and connected to others” (from Reis et al., 2000). The response scale for all items ranged from 1 (not at all) to 5 (very much). Single-item measurement raises concern about reliability. The intraclass correlation (ICC) provides a lower bound on reliability if within-person variance is greater than 0. The ICC for single items captures both stability and reliability. If true scores are perfectly stable, reliability is equivalent to the ICC. If there is real change over time (i.e., instability), reliability is higher than the ICC. With ICCs for unaggregated EWB items (over time) in the range of 0.39 – 0.44 (ICC1,Table 1), there was evidence for reasonable measurement reliability with these items. Furthermore, the correspondence analyses were performed using the week-long mean of diary items, which had good reliability in the range of 0.80 – 0.83 (ICC2, Table 1).
ICC1 | ICC2/1 | ICC3 | 1. DA | 2. DC | 3. DR | 4. DD | 5. QA | 6. QC.78 | 7. QR | 8. QD | |
1. Diary Autonomy (DA) | 0.42 | 0.83 | 0.50 | - | 0.56 | 0.39 | -0.32 | 0.00 | 0.13 | 0.00 | -0.09 |
2. Diary Competence (DC) | 0.39 | 0.80 | 0.48 | 0.82 [.77, .86] | - | 0.49 | -0.38 | 0.02 | 0.15 | 0.02 | -0.14 |
3. Diary Relatedness (DR) | 0.44 | 0.83 | 0.53 | 0.61 [.51, .69] | 0.69 [.61, .76] | - | -0.34 | 0.04 | 0.16 | 0.11 | -0.09 |
4. Diary Distress (DD) | 0.37 | 0.65 | 0.57 | -0.49 [-.59, -.38] | -0.56 [-.65, -.46] | -0.51 [-.61, -.40] | - | -0.04 | -0.21 | -0.07 | 0.20 |
5. Questionnaire Autonomy (QA) | - | 0.83 | - | 0.34 [.21, .46] | 0.391 [.27, .50] | 0.21 [.07, .34] | -0.27 [-.40, -.14] | - | 0.21 | 0.20 | -0.06 |
6. Questionnaire Competence (QC) | - | 0.87 | - | 0.602 [.50, .68] | 0.74 [.67, .80] | 0.553 [.44, .64] | -0.57 [-.66, -.47] | 0.47 [.36, .57] | - | 0.37 | -0.32 |
7. Questionnaire Relatedness (QR) | - | 0.83 | - | 0.304 [.16, .42] | 0.445 [.32, .55] | 0.56 [.46, .65] | -0.31 [-.43, -.18] | 0.28 [.15, .41] | 0.55 [.45, .64] | - | -0.22 |
8. Questionnaire Distress (QD) | - | 0.81 | - | -0.47 [-.57, -.36] | -0.59 [-.67, -.49] | -0.52 [-.61, -.41] | 0.60 [.51, .68] | -0.24 [-.37, -.10] | -0.78 [-.83, -.72] | -0.53 [-.62, -.42] | - |
ICC1 | ICC2/1 | ICC3 | 1. DA | 2. DC | 3. DR | 4. DD | 5. QA | 6. QC.78 | 7. QR | 8. QD | |
1. Diary Autonomy (DA) | 0.42 | 0.83 | 0.50 | - | 0.56 | 0.39 | -0.32 | 0.00 | 0.13 | 0.00 | -0.09 |
2. Diary Competence (DC) | 0.39 | 0.80 | 0.48 | 0.82 [.77, .86] | - | 0.49 | -0.38 | 0.02 | 0.15 | 0.02 | -0.14 |
3. Diary Relatedness (DR) | 0.44 | 0.83 | 0.53 | 0.61 [.51, .69] | 0.69 [.61, .76] | - | -0.34 | 0.04 | 0.16 | 0.11 | -0.09 |
4. Diary Distress (DD) | 0.37 | 0.65 | 0.57 | -0.49 [-.59, -.38] | -0.56 [-.65, -.46] | -0.51 [-.61, -.40] | - | -0.04 | -0.21 | -0.07 | 0.20 |
5. Questionnaire Autonomy (QA) | - | 0.83 | - | 0.34 [.21, .46] | 0.391 [.27, .50] | 0.21 [.07, .34] | -0.27 [-.40, -.14] | - | 0.21 | 0.20 | -0.06 |
6. Questionnaire Competence (QC) | - | 0.87 | - | 0.602 [.50, .68] | 0.74 [.67, .80] | 0.553 [.44, .64] | -0.57 [-.66, -.47] | 0.47 [.36, .57] | - | 0.37 | -0.32 |
7. Questionnaire Relatedness (QR) | - | 0.83 | - | 0.304 [.16, .42] | 0.445 [.32, .55] | 0.56 [.46, .65] | -0.31 [-.43, -.18] | 0.28 [.15, .41] | 0.55 [.45, .64] | - | -0.22 |
8. Questionnaire Distress (QD) | - | 0.81 | - | -0.47 [-.57, -.36] | -0.59 [-.67, -.49] | -0.52 [-.61, -.41] | 0.60 [.51, .68] | -0.24 [-.37, -.10] | -0.78 [-.83, -.72] | -0.53 [-.62, -.42] | - |
ICC1 = Intraclass correlation between any two days in the diary or any two bursts in the questionnaire; ICC2 = Intraclass correlation between any two bursts (mean across all days) in the diary; ICC3 = Intraclass correlation between any two days within a burst in the diary. Differences between correlations from Lee & Preacher (2013). 1rQA,DA vs. rQA, DC, t(197) = 1.27, p = .10 2rQA,DA vs. rQC, DA, t(197) = 4.45, p = .00001 3rQR,DR vs. rQC,DR, t(197) = 2.59, p = .005 4rQA,DA vs. rQR,DA, t(197) = 2.01, p = .023 5rQR,DR vs. rQR,DC, t(197) = 0.19, p = .43
The Scales of Psychological Well-Being (SPWB) questionnaire with 14-item subscales (Ryff, 1989) was administered at every burst in the post-diary interview. The present analysis employed the Autonomy subscale (e.g., “My decisions are not usually influenced by what everyone else is doing”), the Environmental Mastery subscale (e.g., “I am quite good at managing the many responsibilities of my daily life”), and the Positive Relations with Others subscale (e.g., “I feel like I get a lot out of my friendships.”) The response scale ranged from 1 (strongly disagree) to 6 (strongly agree). Scale reliability was calculated both in terms of whether items covaried between people (across all bursts; RKF) and within people (between bursts; RC) (Cranford et al., 2006). All three subscales were reliable between people across bursts (Autonomy, 0.98; Environmental Mastery, 0.99; Positive Relations, 0.99). Reliability within people was lower, and the within-person reliability of Autonomy was substantially lower, suggesting that Autonomy items did not covary together as they change over time within people (Autonomy, 0.18; Environmental Mastery, 0.57; Positive Relations, 0.55).
These measures were derived from SDT and PWB perspectives, but the theoretical constructs align. For SDT and PWB respectively, autonomy is defined as the need to “self-organize experience and behavior and to have activity be concordant with one’s integrated sense of self” (Ryan & Deci, 2001, p. 213) and as “self-determining and independent; able to resist social pressures…; regulat[ing] behaviors from within; evaluates self by personal standards” (Ryff, 1989, p. 1072). Competence/mastery is defined as “a propensity to have an effect on the environment as well as to attain valued outcomes within it” (Ryan & Deci, 2001, p. 213) and as “a sense of mastery and competence in managing the environment; controls complex array of external activities; makes effective use of surrounding opportunities; able to choose or create contexts suitable to personal needs and values” (Ryff, 1989, p. 1072). Relatedness/positive relations is defined as feeling “connected to others – to love and care, and to be loved and cared for” (Ryan & Deci, 2001, p. 213) and as “has warm, satisfying, trusting relationships with others; is concerned about the welfare of others; capable of strong empathy, affection, and intimacy; understands give and take of human relationships” (Ryff, 1989, p. 1072). Although PWB dimensions can have broader theoretical scope, the core of each component of EWB aligns. Cross-sectional correlations between PWB and SDT measures among college students provide support for this assertion (Church et al., 2013; Johnston & Finney, 2010).
Distress. In the diary, depression and anxiety items from the Patient-Reported Outcomes Measurement Information System (PROMIS; Cella et al., 2007) measured distress (e.g., “I felt worthless”, “I felt depressed”, “I felt uneasy”, “I felt fearful.”) Due to a clerical error, only 7 of the 8 items were included in the diary. The response scale ranged from 1 (never) to 5 (always). The negative affect items were reliable between people (0.99) and within people (across days; 0.85).
The 30-item Geriatric Depression Scale (Yesavage et al., 1982) was administered at every burst in the post-diary interview. The GDS was designed to measure affect in older adults without confounding from physical symptoms. Twenty yes-no items reflect distress (e.g., “Do you often feel downhearted and blue?”; “Do you frequently get upset over little things?”), and 10 items reflect well-being and are reversed in scoring (e.g., “Are you hopeful about the future?”; “Do you feel happy most of the time?”). Because GDS responses were dichotomous, reliability was calculated on 5 parcels of 6 items. To account for allocation variability, 150 datasets with random allocation of items to parcels were created, reliability was calculated in each dataset, and the mean and range across the 150 datasets was obtained (Sterba & MacCallum, 2010; Sterba & Rights, 2016). The GDS reliability was very high between people across bursts (mean RKF = 0.99, range = 0.99 – 0.99) but lower within people between bursts (mean RC = 0.48, range = 0.42 – 0.52).
Data analysis
All measures were converted to percent of maximum possible, which facilitates comparison and interpretation of model parameters and is preferable to standardization for longitudinal models (Moeller, 2015). Sources of variance and correspondence between diary and interview measures were analyzed in multilevel models (MLM) using the lmer function of the lme4 package in R (3.6.1; see Reproducibility Statement in the supplemental online material for packages, versions, and references). Sources of variance were estimated in “empty” models using REML estimation. For diary measures, person was at the top level, burst was at the middle level, and day was at the bottom level. For questionnaire measures, person was at the top level and burst was at the bottom level. For example, for diary autonomy measured on day i during burst j for person k:
The variances of , , and represent the amount of variance at the day, burst, and person levels of the model, respectively. (The model for questionnaire measures yielded only variances of and .)
Another way of expressing variance components is the intraclass correlation (ICC). For questionnaire measures, which had two levels, the ICC is the ratio of person-level variance to person- and burst-level variance and represents the correlation between any two bursts within a person. For diary measures, which had three levels, there are multiple ICCs. ICC1 is the ratio of person-level to person-, burst-, and day-level variance and represents the correlation between any two days within a person. ICC2 is the ratio of person-level to person- and burst-level variance and represents the correlation between any two bursts within a person (when burst is represented as mean across days). ICC3 is the ratio of person- and burst-level variance to person-, burst-, and day-level variance and represents the correlation between any two days within a person and burst.
For estimating correspondence between diary and questionnaire measures, two-level MLM were constructed for burst i and person j with ML estimation and specifying a random intercept and random slope for time. A random time slope imposed a weak autoregressive covariance structure (vs. AR(1), which imposes a stronger autoregressive structure). The model with random intercept and time slope was significantly better by likelihood ratio test than a random intercept-only model, which imposed a compound symmetric covariance structure in which time interval does not affect covariance between bursts. The fixed slope for time was insubstantial and not included in the fixed portion of the model.
The explanatory diary variables (mean across all diaries [D] within a burst) were person-mean-centered so that there were two diary terms. The mean of means across all bursts reflected between-person individual differences (Level 2), and deviations from that mean at each burst reflected within-person changes (Level 1). The outcome variable was the questionnaire (Q) score at each burst. For example, for burst i and person j:
The presence of a random slope (reflected by ) was tested using the likelihood ratio of models with and without random slopes for , with mixture degrees of freedom. Models with random slopes were indicated (all p < .05) and included. All parameters were evaluated using Kenward-Roger degrees of freedom to correct for bias. Random effects are reported as σ2 (i.e., VAR(eij)), τ00 (i.e., VAR (U0j)), and τ11 (i.e., VAR(U1j)).
Finally, the distress model that paralleled the well-being models is shown for comparison. However, GDS scores were better reflected by a zero-inflated Poisson distribution, as 28% of the bursts yielded GDS scores of 0. Therefore, a multilevel ZIP model with MLF estimation was conducted in MPlus. The results of this model were substantively the same as those of the MLM using lmer and are available in the supplemental material.
To probe the validity of the diary EWB items, first, the mean of the diary item (across all days and bursts) was correlated with each of the means of the questionnaire subscale items (across all bursts) and compared with the mean item-total correlation from the questionnaire subscale. Second, two structural equations models were fit: one with the diary item regressed on each subscale item with the weights freely estimated, and one with the diary item regressed on each subscale item with the weights constrained to be equal (Gonzalez et al., 2020). Whether this constraint compromised the model fit was tested using change in χ2 with change in degrees of freedom between the two models. Where model fit was compromised (i.e,. p < .05), items with high weights were allowed to have a different constrained weight b from the remainder of the items’ constrained weight a. Finally, setting a to 0 tested whether the diary item was related to all of the scale items or only to a few strongly related items.
Results
Compliance and missing data
Within diary bursts, most data were complete with 7 diary observations (85% for distress, 84% for autonomy, 83% for competence and relatedness), and almost all had 6 or 7 observations (97%). Of the 1800 possible bursts, 1529 were completed (85%). Reasons for burst-level missingness were dropout (too busy, N = 8; moved, N = 4; illness, N = 7; schedule, N = 2), lost to follow-up (N = 20), and death (N = 1).
Variance components
Figure 1 shows the sources of variance in the daily diary and burst-level questionnaire components of autonomy, relatedness, competence, and distress. When measured daily, variance for all four constructs was primarily found at the day and person levels. Note that the day level contains both day variance and measurement error for the single items in the diary. However, correction for unreliability (see ICC results below) suggests that most of the variance at the day level was true change. EWB varied little at the burst level. Distress varied more from burst to burst, particularly in the daily diary. Therefore, women varied substantially from day to day in their experiences of EWB, and they also varied substantially from each other. However, their levels of diary well-being averaged across a week-long burst did not change substantially between bursts. When measured retrospectively at end-of-burst, questionnaire variance for all four constructs was primarily at the person level; women varied substantially from each other but did not change substantially between bursts.
Table 1 shows the descriptive statistics for person-level variables (mean across all days and/or bursts), ICCs, and correlations among the person-level variables. For diary variables, the correlation between any two days within a person (ICC1) was moderate at ~ r = 0.40. Assuming single item reliability of 0.50, reliability-adjusted ICC1s were 0.55 for autonomy (vs. 0.42 uncorrected), 0.52 for competence (vs. 0.39), and 0.58 for relatedness (vs. 0.44) (Wilms et al., 2020). It is therefore reasonable to conclude that approximately half of the variance was due to person, with the remainder due to changes among bursts and days.
For diary variables, the correlation between any two days within a burst (ICC3) was also modest at ~ r = 0.50, and not much higher than the ICC1, a result expectable from the small amount of burst variance (see Figure 1). For both diary and questionnaire variables, the correlation between any two bursts within a person (ICC2 for diary, ICC1 for questionnaire) was high at r > .80 (except for diary distress, r = 0.65).
Correspondence between concurrent and retrospective well-being: Between- and within-person
Table 1 correlations provide the first evidence for correspondence and convergent validity. Diary and questionnaire components of EWB were positively correlated with each other at the between-person (mean across all assessments) and within-person (using person-mean-centered values to remove between-person variance) levels. Between people, competence from diary and questionnaire was most highly correlated across people (r = .74), and autonomy was least highly correlated (r = 0.34). At the burst level, there was less evidence for correspondence. Within people, changes in diary and questionnaire autonomy were uncorrelated across bursts (r = .00). There were small correlations for competence and relatedness (r = .15 and .11, respectively), and the largest correlation was for distress (r = .20).
To further test correspondence at between-person and within-person levels, burst-level, concurrent diary means were used as explanatory variables with retrospective EWB (SPWB questionnaire) as outcome variables in multilevel models. These models provide inferential statistics for within-person effects in clustered data, which correlations cannot, and also estimate variances and covariances for random effects. Because both variables were expressed as percent of maximum possible (POMP), the estimates reflect the POMP difference in the SPWB questionnaire for each POMP within-person change or between-person difference in the diary mean. Table 2 gives the estimates (γ10 for within-person effects, γ01 for between-person effects in the equations above) with their 95% confidence intervals and p values. All the relationships between diary reports and questionnaire reports were statistically significant, except for burst-level changes in autonomy. The magnitude of these relationships varied; in particular, between-person individual differences corresponded more closely than did within-person changes.
Correspondence between concurrent and retrospective well-being: Validity evidence
Low correlations between diary and questionnaire could reflect poor correspondence between daily and retrospective reports but also could reflect poor convergent validity either because the two are measuring different constructs (low correlations between the diary item and the questionnaire items; Strauss & Smith, 2009) or because the diary item does not have the breadth of coverage of the questionnaire (the diary item correlates highly with some questionnaire items but does not correlate with others; Cheung & Lucas, 2014). Table 3 shows the between-person correlations between the diary item and each of the questionnaire items.
For autonomy, correlations between the diary and the questionnaire items were lower than the mean item-total correlation within the questionnaire. To gauge overall coverage, a structural equations model in which the diary item was regressed on each questionnaire item with weights freely estimated was compared with a model in which all weights were constrained to be equal. Although the construct validity for autonomy was low, constraining all weights to be equal did not significantly compromise model fit (χ2 (13) = 17.3, p = .19). This pattern generally indicates low construct convergence but equal breadth of coverage.
Construct validity for competence was better, with diary correlations resembling the mean item-total correlation. However, constraining all weights to be equal significantly compromised model fit (χ2 (13) = 24.4, p = .029). The weights for two questionnaire items, #1 “In general, I feel I am in charge of the situation in which I live” and #4 “I am quite good at managing the many responsibilities of my daily life”, were higher than the rest. Constraining these two weights to be equal but different from the remainder of the weights (which were all constrained to be equal) did not significantly compromise model fit compared with the freely estimated model (χ2 (12) = 15.2, p = .23) and improved fit compared with the fully constrained model (χ2 (1) = 9.0, p < .0001). Constraining the remaining weights to 0 compromised model fit compared with this third model (χ2 (1) = 17.9, p = .0027). Although the weights were smaller with the remainder of items, they were necessary. This pattern generally indicates good construct convergence for the diary item, with coverage favoring some content but spread across the questionnaire items.
. | Questionnaire Autonomy . | Questionnaire Competence . | Questionnaire Relatedness . | Questionnaire Distress . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Predictors | Estimates | CI | p | Estimates | CI | p | Estimates | CI | p | Estimates | CI | p |
(Intercept) | 69.12 | 67.50, 70.74 | <0.001 | 74.91 | 73.70, 76.12 | <0.001 | 76.53 | 75.16, 77.89 | <0.001 | 9.17 | 8.07, 10.28 | <0.001 |
D Mean | 0.30 | 0.19, 0.41 | <0.001 | 0.78 | 0.68, 0.88 | <0.001 | 0.39 | 0.31, 0.47 | <0.001 | 1.01 | 0.83, 1.19 | <0.001 |
D Change | 0.01 | -0.02, 0.04 | 0.631 | 0.08 | 0.04, 0.12 | <0.001 | 0.05 | 0.03, 0.08 | <0.001 | 0.22 | 0.14, 0.30 | <0.001 |
Random Effect Estimates | ||||||||||||
σ2 | 26.26 | 20.29 | 18 | 18.72 | ||||||||
τ00 Intercept | 132.48 | 83.72 | 102.25 | 69.12 | ||||||||
τ11 D change slope | 0.00 | 0.02 | 0.00 | 0.05 | ||||||||
τ11Time slope | 0.65 | 0.34 | 0.59 | 0.24 | ||||||||
ρ01 Intercept-D change | 1.00 | -0.08 | -0.97 | 0.16 | ||||||||
ρ01 Intercept-Time | -0.12 | -0.40 | -0.30 | -0.38 |
. | Questionnaire Autonomy . | Questionnaire Competence . | Questionnaire Relatedness . | Questionnaire Distress . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Predictors | Estimates | CI | p | Estimates | CI | p | Estimates | CI | p | Estimates | CI | p |
(Intercept) | 69.12 | 67.50, 70.74 | <0.001 | 74.91 | 73.70, 76.12 | <0.001 | 76.53 | 75.16, 77.89 | <0.001 | 9.17 | 8.07, 10.28 | <0.001 |
D Mean | 0.30 | 0.19, 0.41 | <0.001 | 0.78 | 0.68, 0.88 | <0.001 | 0.39 | 0.31, 0.47 | <0.001 | 1.01 | 0.83, 1.19 | <0.001 |
D Change | 0.01 | -0.02, 0.04 | 0.631 | 0.08 | 0.04, 0.12 | <0.001 | 0.05 | 0.03, 0.08 | <0.001 | 0.22 | 0.14, 0.30 | <0.001 |
Random Effect Estimates | ||||||||||||
σ2 | 26.26 | 20.29 | 18 | 18.72 | ||||||||
τ00 Intercept | 132.48 | 83.72 | 102.25 | 69.12 | ||||||||
τ11 D change slope | 0.00 | 0.02 | 0.00 | 0.05 | ||||||||
τ11Time slope | 0.65 | 0.34 | 0.59 | 0.24 | ||||||||
ρ01 Intercept-D change | 1.00 | -0.08 | -0.97 | 0.16 | ||||||||
ρ01 Intercept-Time | -0.12 | -0.40 | -0.30 | -0.38 |
CI = 95% confidence interval; D = diary.
Questionnaire Subscale Item Number | Diary Autonomy “I felt free to decide for myself” | Diary Competence “I felt competent and capable in my activities” | Diary Relatedness “I felt close and connected to others” |
1 | .20 [.06, .33] | .66 [.57, .73] | .46 [.34, .56] |
2 | .23 [.09, .36] | .58 [.48, .67] | .51 [.40, .61] |
3 | .26 [.13, .38] | .36 [.23, .48] | .52 [.41, .62] |
4 | .36 [.23, .47] | .71 [.64, .78] | .42 [.30, .53] |
5 | .26 [.13, .39] | .57 [.47, .66] | .31 [.18, .43] |
6 | .24 [.11, .37] | .48 [.36, .58] | .50 [.39, .60] |
7 | .24 [.10, .37] | .45 [.33, .55] | .43 [.31, .54] |
8 | .17 [.03, .30] | .57 [.47, .66] | .43 [.31, .54] |
9 | .32 [.19, .44] | .63 [.54, .71] | .46 [.34, .56] |
10 | .27 [.14, .40] | .60 [.50, .68] | .48 [.37, .58] |
11 | .21 [.08, .34] | .62 [.53, .70] | .45 [.33, .55] |
12 | .24 [.11, .37] | .44 [.32, .54] | .35 [.22, .47] |
13 | .31 [.18, .43] | .61 [.52, .69] | .32 [.19, .44] |
14 | .33 [.20, .45] | .51 [.40, .60] | .29 [.16, .41] |
Mean Item-Total Correlation | .53 | .52 | .55 |
Questionnaire Subscale Item Number | Diary Autonomy “I felt free to decide for myself” | Diary Competence “I felt competent and capable in my activities” | Diary Relatedness “I felt close and connected to others” |
1 | .20 [.06, .33] | .66 [.57, .73] | .46 [.34, .56] |
2 | .23 [.09, .36] | .58 [.48, .67] | .51 [.40, .61] |
3 | .26 [.13, .38] | .36 [.23, .48] | .52 [.41, .62] |
4 | .36 [.23, .47] | .71 [.64, .78] | .42 [.30, .53] |
5 | .26 [.13, .39] | .57 [.47, .66] | .31 [.18, .43] |
6 | .24 [.11, .37] | .48 [.36, .58] | .50 [.39, .60] |
7 | .24 [.10, .37] | .45 [.33, .55] | .43 [.31, .54] |
8 | .17 [.03, .30] | .57 [.47, .66] | .43 [.31, .54] |
9 | .32 [.19, .44] | .63 [.54, .71] | .46 [.34, .56] |
10 | .27 [.14, .40] | .60 [.50, .68] | .48 [.37, .58] |
11 | .21 [.08, .34] | .62 [.53, .70] | .45 [.33, .55] |
12 | .24 [.11, .37] | .44 [.32, .54] | .35 [.22, .47] |
13 | .31 [.18, .43] | .61 [.52, .69] | .32 [.19, .44] |
14 | .33 [.20, .45] | .51 [.40, .60] | .29 [.16, .41] |
Mean Item-Total Correlation | .53 | .52 | .55 |
Correspondence for relatedness was also generally good, with correlations approaching the mean item-total correlation. Constraining all weights to be equal did not significantly compromise model fit (χ2 (13) = 22.0, p = .056). However, the weights for two questionnaire items, #2 [reversed] “Maintaining close relationships has been difficult and frustrating for me” and #3 [reversed] “I often feel lonely because I have few close friends with whom to share my concerns”, were higher than the rest. Constraining these two weights to be equal but different from the remainder of the weights (which were all constrained to be equal) did not significantly compromise model fit compared with the freely estimated model (χ2 (12) = 16.1, p = .18) and improved fit compared with the fully constrained model (χ2 (1) = 5.8, p = .016). Constraining the remaining weights to 0 compromised model fit compared with this third model (χ2 (1) = 7.5, p = .0062). Although the weights were smaller with the remainder of items, they were necessary. This pattern generally indicates good convergent validity, with coverage favoring some content but spread across the questionnaire items.
Table 1 correlations also speak to discriminant validity. Diary components of EWB were negatively correlated with distress between people, and correlations among components were larger than those with distress (all p < .05, except the difference between rcomp,rel and rcomp,distress, p = .056). Therefore, there is evidence for discriminant validity of EWB components with regard to distress for the diary items. For questionnaire components, all components correlated negatively with distress. However, only for the difference between rauto,comp and rauto,distress was the correlation between components larger than that with distress. In the rest of cases, the difference was not statistically significant, or the magnitude of the correlation was significantly larger between the EWB component and distress than the EWB component and another component. Therefore, although correlations with distress were in the expected direction, there was more evidence against discriminant validity in the questionnaire than for it when the magnitude of the relationships was considered.
Among EWB components across diary and questionnaire, evidence was worst for autonomy, where diary autonomy correlated significantly more highly with questionnaire competence than with questionnaire autonomy between people, and the same pattern was true within people, albeit all within-person discriminant validity correlations were small. Questionnaire autonomy was also not more highly correlated with diary autonomy than diary competence. Discriminant validity of competence and relatedness were better, as the convergent correlations were generally higher than the discriminant correlations. However, between people, although diary relatedness was more highly correlated with questionnaire relatedness than was diary competence, the relationship between diary relatedness and questionnaire relatedness was not significantly different from the relationship with questionnaire competence. In addition, the high correlations among diary EWB measures also speak to a lack of discrimination among them.
Discussion
When midlife and older women reported their EWB daily for 7 days over 9 bursts at 3-month intervals, the vast majority of the variance was associated with daily changes and between-person differences, and a minority of variance was associated with changes between bursts (Figure 1). Intraclass correlations between days within a person were moderate (ICC1 across all days and bursts, ICC3 across all days within bursts), and correlations across bursts were high (ICC2; Table 1). When EWB was reported retrospectively at each burst, the results were similar: variance in between-person differences was larger than that in burst-level change (Figure 1), and correlations across bursts were high (ICC1; Table 1).
The time scale of EWB change is important for studying within-person phenomena (Kuiper & Ryan, 2018). Among midlife and older women, there was little change in autonomy, competence, and relatedness at the time scale of months. Therefore, within-person psychological, behavioral, and health correlations with EWB may not be observed at that time scale. One reason may be that the SPWB measures of autonomy, competence, and relatedness all had poor reliability for change (Autonomy, 0.18; Environmental Mastery, 0.57; Positive Relations, 0.43; see Method), particularly Autonomy, which was the only component uncorrelated with changes in diary EWB. The SPWB were designed to capture between-person differences and not within-person change (Ryff, 1989). As a result, SPWB total and subscale scores are impressively stable. Over the course of several years among older men and women, the ICC1 for the total SPWB score was .83, and the ICC1s for the autonomy and mastery scales were .67 and .64, respectively (Segerstrom et al., 2015; Wettstein et al., 2015). Over a broader age range, the test-retest correlation for the SPWB over 9 years was .63 (Rush et al., 2019). These estimates are not substantially lower than the ICC obtained over 18 days among older adults, suggesting little decay in stability with increasing time (total, 0.64; autonomy, 0.49; mastery, 0.61; positive relations, 0.76; Saajanaho et al., 2020). A measure of EWB specifically designed to find more evidence of change over months to years may indeed do so. On the other hand, the daily measure also did not change much over months. It did have substantial change at the time scale of days, where daily activities and stressors may have contributed to daily changes (Reis et al., 2000). These findings suggest that not to measure EWB daily is to miss an important level of experience.
For the most part, EWB and distress had similar variance structures. Compared with EWB, diary distress had more burst-level variance (Figure 1; ICC1 < ICC3 in Table 1), but questionnaire distress had similar burst-level variance (Figure 1; ICC1 in Table 1). The higher amount of burst-level, within-person variance in diary distress potentially explains the larger within-person correspondence between diary and questionnaire (Table 2). This result also suggests that women were in fact experiencing psychological change between bursts; they were simply not changing as much in EWB.
At the person level, correlations between aggregated concurrent and retrospective EWB over 1 week resembled those for other psychological variables over similar time frames. Over 1 week, concurrent and retrospective accounts of coping correlated r = .50 - .77 (Smith et al., 1999). Over 2 weeks, concurrent and retrospective accounts of affect correlated r = .31 - .47 (Mill et al., 2016). Over 10 days among older, nontraditional students, concurrent accounts of behavior related to personality traits correlated r = .29 - .52 with self-reported personality traits (Fleeson & Gallagher, 2009, Study 12). In the present study, correlations for competence (r = .74) were at the high end of these ranges, and autonomy (r = .34), at the low end. It may be easier to recollect specific instances of competence and relatedness, but harder to recollect specific instances of autonomous action. Additionally, instances of low autonomy might be infrequent in this largely white and well-educated sample, whose resources might have enabled more autonomy. However, between-person standard deviations were similar for all three EWB components, and within-person standard deviations had similar medians and ranges, indicating no restriction of range between or within people. Autonomy also had the poorest convergent and discriminant validity, particularly with regard to competence. Although the theoretical definitions for SDT and PWB autonomy correspond, the poorer reliability for change in SPWB autonomy and the lower correlations with concurrent reports of autonomy indicate a need for further investigation into meaning and measurement in this construct.
By contrast, there was more evidence for convergent validity for competence and relatedness. When averaged across all assessments to maximize reliability, the single competence item correlated with the SPWB items at or above the mean item-total correlation among the SPWB items. There was evidence that the item did not relate to all of the content of the scale to the same degree in that two items regarding “the situation in which I live” and “my daily life” were more strongly related than the remainder of the items. These items seem to be more global assessments of competence; however, the single item provided content coverage beyond these two SPWB items. The single relatedness item was correlated with the SPWB items at or below the mean item-total correlation among the SPWB items. Although the model in which the item was related to all SPWB items equally was acceptable, there was evidence that the item did not capture all of the content of the scale to the same degree in that two items regarding “close relationships” and “close friends” were more strongly related than the remainder of the items. This is sensible, given that the diary item refers to feeling “close”. However, the item did provide content coverage beyond these two items. Therefore, correspondence analyses for competence and relatedness can be attributed at least in part to similarities and differences between retrospective and concurrent reports.
There was a surprising proportion of women, about 10% of the sample, who reported maximum EWB or minimum distress across all diaries. One possibility is that this subsample experienced very high well-being on some dimension across the entire study. Another, however, is that these women are psychologically different from those who did not report perfect well-being. Very low scores (0-1) on the Beck Depression Inventory, compared with low scores, are associated with efforts to present oneself in a favorable light (Clark et al., 1998; Joiner et al., 2000). However, most women did not report perfect scores for each of the 4 constructs, suggesting that they differentiated between the constructs and were perhaps not globally engaging in impression management (perfect on 4, n = 3; on 3, n = 6; on 2, n = 14; on 1, n = 22). Repeating the correspondence analyses without scores that were perfect across the study did not substantively change the results reported in Table 2 (see supplemental material).
Methodological strengths of DAHLiA include the burst design, which permitted measurement of well-being at multiple levels as well as variance partition at short-, medium-, and long-term intervals. The study had excellent compliance confirmed by time stamp in the online diary. The study was also powered to detect small to medium effects. However, its greatest methodological weakness is generalizability. Because the study focused on a group likely to experience physical pain (midlife and older women), these results do not necessarily generalize to younger people, men, or both. The sample was also primarily white (99%), which is generally characteristic of the population of the sampling area (94% white), but further limits generalizability. In addition, the older age, high education, and low racial diversity of the present sample might have restricted range in EWB. EWB and education are positively correlated, as are EWB and age (Ryff & Singer, 2008). In one study, minority status positively correlated with EWB (Ryff et al., 2003). Therefore, a more diverse sample might find different distributions of variance in EWB and distress. Furthermore, random slopes for correlated change in EWB components indicated individual differences in correspondence between concurrent and retrospective reports even within this homogeneous sample. These individual differences need to be explained theoretically and empirically (cf. Willroth et al., 2020). Extending these findings in more diverse samples will be useful to understanding when and how EWB differs between people and changes within people.
Conclusion
Personality, affect, and coping have all exhibited modest correspondence between daily or momentary concurrent reports and global, retrospective reports. The present study extended this framework to EWB, obtaining similar results. In addition to testing correspondence, the results point to the presence of short- but not medium-term change in EWB, need for development of better measures for medium-term change in EWB and for autonomy, and extension of these findings to more diverse samples.
Funding
This research was supported by the National Institutes of Health (R01-AG046116, K99-AG056635, UL1TR001998).
Conflicts of Interest
The authors declare that there were no conflicts of interest with respect to the authorship or the publication of this article.
Data Accessibility Statement
All data and code used to generate this report are available at https://osf.io/qtvme/
Author Contributions
S.C. Segerstrom and L.J. Crofford designed the DAHLiA Study. S.C. Segerstrom and T.G. Blevins conceived this study and wrote the manuscript. S.C. Segerstrom and R.G. Reed performed the data analysis. All authors contributed feedback on study conceptualization and the manuscript. All authors approved the final, submitted version.
Author Note
Correspondence concerning this article should be addressed to Suzanne C. Segerstrom, 125 Kastle Hall, Lexington, KY, 40506-0044. E-mail: [email protected]