Cognitive decline often accompanies natural aging, which results in younger adults outperforming older adults, on average, on cognitive tasks requiring skills such as attention, memory, or reasoning. This performance gap between age groups persists even after people train on these tasks, but it remains unclear whether the gap persists when individuals, rather than groups, are compared at different training levels. In this paper, we analyzed 9,923 users between 18-90 years old (63% over 60) who performed a variety of cognitive tasks on an online cognitive training platform. We quantified an older adult’s potential to catch up to, or perform as well as, a younger adult. We found that the probability of catching up to someone decades younger increases with differential amounts of training on a variety of cognitive tasks. These findings suggest that age-related performance deficits can be overcome with additional training.
Introduction
“Practice makes perfect” is an old adage which applies to people of all ages. Whether someone is learning something for the first time, or rehearsing a skill once learned but unused for some time, practice or training can help people improve their current ability.
Older adults, in particular, have shown improvement and maintenance of new skills after training (Baltes et al., 1986; Dahlin et al., 2008). In some cases, such as on collaborative, visual discrimination, or implicit learning tasks, older adults and younger adults perform equally well after training (Derksen et al., 2014; Myers & Conner, 1992; Ratcliff et al., 2006). However, for other tasks, such as those that require memory, response inhibition, or task-switching, a noticeable performance gap exists. Despite potentially starting out at the same performance level, younger adults often outperform older adults over time (Dahlin et al., 2008; Davidson et al., 2003; Karbach & Kray, 2009; Kliegl et al., 1989).
Can training for a longer period of time help an older person improve enough to close such performance gaps? Baltes and Kliegl (1992) had younger and older adults train for 38 sessions over a year, 18 more than their previous study (Kliegl et al., 1989), on a free-recall memory task and found that younger adults continued to outperform older adults. They also found that the older group never reached the same level of performance as that of the younger group near the beginning of training. Similarly, Noack et al. (2013) found that younger adults outperformed older adults on a spatial and temporal memory task after training for 100 daily sessions. These results suggest that it is unlikely for an average individual to reach the same level of performance as someone several years younger. However, most lab studies of training rarely last longer than 12 weeks, mainly due to resource limitations (Lampit et al., 2014; Nguyen et al., 2019). Perhaps this is not enough time for older adults to close the performance gap between themselves and younger adults.
We can circumvent the resource issues of traditional lab studies by using naturally occurring data that was collected online (Goldstone & Lupyan, 2016; Griffiths, 2015). In particular, we can use data from online cognitive training platforms to investigate the effect of extended training. One platform which provides this kind of data is Lumosity. Lumosity has a collection of more than 50 engaging games, some of which are based on common tasks used in lab studies, that target various cognitive processes. Each game is categorized by the cognitive domain being trained, such as attention or memory, and each gameplay lasts a few minutes. While Lumosity has yet to prove that training improvements transfer to other tasks outside of Lumosity (Simons et al., 2016), the detailed data collected on the platform is helpful for understanding how people learn and improve on these games.
Previously, Steyvers et al. (2019) used data from Lumosity to examine the effects of extensive practice on task switching. They found that a small sample of older adults who played a task switching game over 1000 times were able to match or exceed the performance of younger adults who played up to 60 times. Although this study demonstrates that older adults can bridge the performance gap with their younger counterparts, the extent of training needed and the degree of benefit gained still remain unresolved questions.
In this paper, we investigate how much training older adults need to catch up to younger adults on a variety of cognitive tasks. We define “catch up” as an older adult matching or exceeding the score of a younger adult on one of the Lumosity games. We leverage the Lumosity data set used in Steyvers and Schafer (2020) which contains data from 9,923 users between 18 and 90 years old on 57 different games to examine catch up in different training situations.
One way catch up could occur is if younger adults’ performance reaches an asymptote much earlier in training compared to that of older adults’, but the older adults’ performance reaches the same asymptote after extended training. Thus, much like previous studies, we will look at the scenario where older adults and younger adults train for the same amount of time. However, our primary focus is to what extent catch up occurs when older adults train longer than younger adults. Many older adults in our sample have performance levels which lag behind those of younger adults at the beginning of training, but by the time 200 games have been played, some of the older adults’ performance levels surpass those of younger adults’ at an earlier training point. This scenario is another way that catch up can occur.
The data set used in our analyses is well suited to address catch up by older adults for two main reasons: it contains data from thousands of users, with 55% of them over the age of 60, and the training data spans several years, with many users training on the same game well over a hundred times. However, as is the case with using any naturally occurring data, many users also drop out before training for very long. Since dropout is related to performance (Steyvers & Benjamin, 2018), we address this issue and its impact on the generalizability of our findings later in the paper. Despite this limitation, the data set enables us to accurately assess the degree of benefit that additional training imparts to various age groups among those who have trained for an extended duration.
Methods
Participants
The data used for analysis is the same as that which was analyzed in Steyvers and Schafer (2020), which contained 36,297 English-speaking Lumosity users located in either the United States, Canada, or Australia who primarily used the web version (as opposed to the mobile app). These users signed up between August 1st, 2013 and December 31st, 2016 and the data was collected between August 1st, 2013 and June 30th, 2019 (see Supplementary Information for further information on this data set). A subset of 9,923 Lumosity users between the ages of 18 and 90 at signup (under 40: 360 males, 239 females, 55 gender unavailable; 40-59: 1210 males, 1504 females, 276 gender unavailable; 60-79: 1989 males, 3056 females, 632 gender unavailable; 80 and over: 216 males, 311 females, 75 gender unavailable) were included in our analyses. No racial data was available. 15% of users had a high school diploma or completed some high school, 19% had completed some college, 25% had a bachelor’s degree, 3% had an associate’s degree, 25% had a postgraduate degree, and the rest declined to specify their education level. Users were included in the subset if they had played any of Lumosity’s games at least 100 times. We chose 100 gameplays to ensure that users had trained for an extended period of time and to avoid noise in the data caused by dropout, or users who play for a bit and then stop playing completely (Steyvers & Benjamin, 2018). The users in our sample played a median number of 2,284 games total. Thus, our sample size of nearly 10,000 people is sufficient for investigating catch up abilities on various tasks across the lifespan.
Games
There are 57 games in the data set. The original data set labeled each Lumosity game by the cognitive domain that the game was targeting and we kept these labels to observe trends within domains (Steyvers & Schafer, 2020). The six domains are attention (12 games), flexibility (6 games), memory (21 games), reasoning (7 games), language (6 games), & math (5 games). Previous results have shown that the domains of math and reasoning show some internal consistency such that games within these domains show more correlated scores within the domain than across domains (see Figure 3 of Steyvers & Schafer (2020)). However, we should note that the domain labels used by Lumosity platform do not uniquely describe the cognitive processes involved in each game as most games involve multiple types of cognitive processes.
Preprocessing
Lumosity games each have a unique scoring system which is generally based on the user’s speed and accuracy but also involves game-specific factors, leading to game scores on different scales. In order to compare the performances across games with these different scoring systems, we first normalized the game scores using a min-max transformation (Han et al., 2011). Under this normalization, scores closest to 1 are the best of the whole sample while scores close to 0 are the worst performers. This involved first setting outliers greater than 3 standard deviations above the mean equal to 3 standard deviations above the mean. We can do this because we care about the relative ranking of the scores rather than the actual value of the score. This value is now the maximum score achieved by users, so after normalization, anyone who got this score (or higher) would have their score represented as a 1. Normalization follows the formula . For example, if Mark plays Ebb and Flow and scores 21,400 and the best score achieved by someone in our sample in that game is 35,000 and the lowest score is 1,000, then Mark’s score of 21,400 would be normalized to 0.6, which means that his score is a bit better than half of those in the sample.
After applying the transformation to all the game scores in the data set, we smoothed the learning curves for each user in each game so that user scores would more accurately reflect the current performance level and our results would be less susceptible to temporary score fluctuations. Finally, a user’s age at the time of gameplay (extrapolated from time passed since signup) was added to each of their gameplay records. Further details are in the Supplementary Information.
Data Analysis Approach
We conducted two different analyses which together help us answer our questions related to catch up by older adults. The first is a group-level analysis which looks at the learning curves for each age group. The second analysis calculates catch up probabilities at an individual level. At the outset, we should note that none of the analyses involve curve fitting or computational modeling. Previous research has investigated the learning trajectories of individual users on the Lumosity platform using exponential and power law functions that only consider the amount of practice (Donner & Hardy, 2015; Steyvers & Benjamin, 2018) as well as more complex computational models that also take into account the effect of spacing and retention (Kumar et al., 2022). Given the aims of the current data analysis, we are not focused on explaining the underlying functional form of the learning process and instead use a much simpler approach of comparing performance levels at different levels of training.
Group performance analysis
In order to clearly visualize the learning curves of each age group for different levels of training, we grouped users into age bins that were mostly five years apart. For example, one bin would contain users from 40-44 years old while the next bin contained users from 45-49 years old. However, this only applied to the ages between 40 and 89. A few age bins were merged at the extreme ends of the age range due to insufficient data in the five year bins. In the end there was also a 18-29 bin, 30-39 bin, and 90-95 bin. This binning procedure applies only to the data presented in Figure 1, which visualizes the mean performance for each of these age bins. The rest of the results presented in the text follow the analysis procedure explained in the next section.
Catch up analysis
Our main analysis focused on catch up, or the idea of whether an older adult who has trained for a while can match or exceed the performance of a younger adult. For each game, we computed the catch up probabilities across pairs of age groups. To calculate this probability, we looked at all individual pairwise comparisons between the two age groups and calculated the proportion of older adults that had a higher score than a particular younger adult, for every adult in the younger group. Depending on the game and age groups, pairwise comparisons numbered anywhere from 160 to over 4.6 million. The resulting probability is akin to the probability of randomly sampling an individual from one age group and an individual in another age group and observing that the older individual has a higher score. We chose to look at catch up in this manner as opposed to comparing group performance means because there is a lot of individual variability among people in the same age group.
We started calculating catch up probabilities from 20 gameplays, when users have reasonably learned how to play the game, and continued in 20 gameplay increments until 100 gameplays, beyond which the analysis would suffer from insufficient data. We did so in order to calculate how the amount of training relates to catch up probability.
Catch up probabilities were calculated separately for each Lumosity game. Not all Lumosity games are equally popular, so games that had less than ten users from an age group of interest were excluded from that age group’s catch up analysis. Thus, instead of all 57 games in the original data set, the number of games included in our catch up analysis ranged between 32-36 games, depending on the age comparison (10 attention, 4-5 flexibility, 8-10 memory, 4 reasoning, 2-4 language, and 3 math games).
In order to have enough data to directly compare two age groups, we grouped users into larger age bins of ten years. Thus, we had the following bins: 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, and 80-89. The exceptions are at the extreme ends of the age range: 18 and 19 year old users are included with the 20-29 year old users and the catch up results for those 90-95 are not reported since there was only one game which had at least ten users from this age group.
Results
We present our results according to the two scenarios of catch up discussed in the introduction: the scenario where older and younger adults train an equal amount, and one where the amount of training is unequal, with older adults training longer.
To answer our first question of whether older adults can catch up with the same amount of training, we calculated the catch up probability of the older group when both the older group and the younger group have trained for 100 gameplays. For our second question concerning how different amounts of training affect catch up, we compared older adults who trained for 20, 40, 60, 80, & 100 gameplays to younger adults who only trained for 20 gameplays. After calculating catch up probabilities for each game, we looked for age equivalence, which we define as a 50% or greater chance for a randomly sampled older adult to score better than a randomly sampled younger adult.
Throughout the rest of this paper, we will use the decade marker as a shorthand for each age bin. For example, “60s” refers to those users between 60 and 69 years of age while “70s” includes those between 70 and 79 years. The singular exception is the “20s” group which also includes 18 and 19 year old users along with those between 20 and 29 years.
Additionally, we report Bayes factors (BFs) for our analyses, which were computed using Pingouin (Vallat, 2018), since they are easier to interpret over p values (Kass & Raftery, 1995). Following the notation for the alternative hypothesis (1) against the null (0), indicates evidence for the alternative hypothesis while indicates evidence for the null hypothesis. The value of the Bayes factor increases with the likelihood of the alternative hypothesis. For example, means that the data are 10 times more likely under the alternative hypothesis compared to the null hypothesis. Generally, BFs between 3 and 10 indicate moderate evidence against the null hypothesis, and BFs greater than 10 indicate strong evidence against the null (Kass & Raftery, 1995).
Performance gaps persist after equal training
First, we looked at the scenario where older and younger adults both train for an extended amount, in case older adults simply need more time in order to catch up to younger adults.
When older adults and younger adults train up to 100 gameplays, the mean score for nearly all age groups improves regardless of cognitive domain (BF > 10 on paired t-tests; Figure 1). The exceptions were people over 90 on attention, memory, language and math games and people in their 20s and 40s on math games (BF < 10 on paired t-tests; see Supplementary Information for training improvements). However, despite training for 100 gameplays, most older age groups continued to score lower on average compared to younger groups (Figure 1).
When examining the catch up probabilities, age equivalence was observed for one game, a vocabulary game called Taking Root, when comparing the 70s group to the 60s group. We found no age equivalence when we compared older adults in their 70s and 80s to individuals 20 years their junior at the same training level of 100 gameplays (Figure 2). However, the probability of catch up for these older adults was significantly greater than zero on attention, flexibility, memory, and reasoning games ( on one-sample Bayesian t-tests) suggesting that some older individuals have the ability to catch up to adults who are nearly twenty years younger.
Unequal training promotes catch up
Next, we shift our focus to the primary question concerning the impact of unequal training on the catch-up potential for older adults. In the analysis, we assessed the performance of two groups of adults, a younger group after 20 gameplays and an older group after 20, 40, 60, 80, and 100 gameplays (corresponding to 0, 20, 40, 60, 80 extra gameplays respectively). Figure 3 shows the catch up probabilities for a subset of comparisons: older adults who trained for 80 extra gameplays relative to younger adults (more detailed results are shown in the Supplementary Information). These results generally show that additional training makes it possible for the older group to catch up to the performance of slightly younger groups, although the potential for catch up is limited for the largest age differences.
When the amount of additional training increases from 0 to 80 extra gameplays, the probability of an older adult in the 60s, 70s, and 80s groups catching up to someone in a 20 years younger group increases. Across all games, the increase in catch up probability for these groups is substantially different from 0 (; one-sample t-test to quantify the evidence for catch up). When comparing different domains, there is substantial evidence for catch up on attention, flexibility, and memory games () as opposed to moderate evidence for math () and reasoning ( except for the 80s group). Evidence for catch up on language games was very weak (), though this may be due in part to the small number of language games included in the analysis.
For the 80s vs 60s, 70s vs 50s, and 60s vs 40s comparisons, the average increase in catch up probability ranges from 0.22 to 0.32 on attention, flexibility, memory, and reasoning games. The increase for language and math games is between 0.09 and 0.15. The catch up gains are greater when comparing adults to those in a 10 years younger group. For this comparison, the average improvement in catch up was 0.30 for attention, flexibility, memory, and reasoning, compared to 0.16 for language and math ( for attention, flexibility, memory; for reasoning, language, math).
The prevalence of age equivalence also increases when we allow the older adults to train more than younger adults. When we compared the 60s, 70s, and 80s groups to groups that were 20 years younger, then equivalence was possible after 100 gameplays in 3 games for the 80s vs 60s comparison. These games included a Stroop task (Color Match), a face-name recall task (Familiar Faces), and a planning task (Pet Detective). Equivalence was also observed in 9 games for the 70s vs 50s comparison and in 19 games for 60s vs 40s (see Supplementary Information).
Discussion
In this paper we used a data set with cognitive training scores from almost 10,000 people ages 18-90 to investigate whether a longer period of training helps to close the commonly observed performance gap between age groups. We were interested in how much extra training would help older adults catch up to younger adults on these cognitive training tasks. When older adults train more than younger adults, up to 100 training sessions, we found evidence of the performance gap diminishing as the older adults catch up to the younger adults. In some cases, with unequal training, the performance gap completely disappears.
Additionally, we looked at whether catch up would occur after equal amounts of extended training for both older and younger groups. Like Baltes and Kliegl (1992), our results also showed that age differences continued to persist even after both age groups undertake extended practice on the task. Unlike this previous study, which compared people in their 70s to people in their 20s on a memory task, we found that when people in their 70s trained up to 100 sessions, the mean group score overlapped with the mean score of people in their 20s who had only trained for 20 sessions on a memory game (see Figure 1). These discrepancies may be due to the greater number of extra practice sessions (80 compared to 18) and the larger sample size in our study. Additionally, in the unequal training case, we were able to find instances where little to no age differences were observed between adults who were closer in age, consistent with Steyvers et al. (2019).
One reason we found robust increases in catch up probability for the attention, memory, and flexibility domains and not in others is that the domain categories used here may not accurately represent the underlying cognitive processes that these games try to target. For example, many of the language and math games test vocabulary knowledge and simple arithmetic under time pressure and so performance on these games may be greatly influenced by response speed rather than domain knowledge. In addition, even though some of the games in the attention, memory, and flexibility categories are directly modeled after classic lab tasks used to assess these cognitive abilities, only the math and reasoning games were previously found to have high internal consistency (Steyvers & Schafer, 2020). Therefore, care must be taken when interpreting these results at the domain level and may be more informative at the single game level.
One important limitation of this work is that it is not clear to what degree our results will generalize to the full population. To address potential concerns arising from comparing groups with unequal total training amounts, as noted by Steyvers and Benjamin (2018), we confined our sample to players who had completed a minimum of 100 gameplays. However, this subset is not fully representative of the full population of Lumosity players as the older players tend to persist longer. Consequently, our sample exhibits a bias towards older players: for example, our sample’s average age is 64.8 years, compared to the 61-year average age among players with a minimum of 20 game sessions. When working with longitudinal data that spans a few years, dropout is inevitable and limits the generalizability of the results. Thus, while we have been careful to control for the amount of practice in the sample, our results only hold for people who persist to 100 gameplays.
Furthermore, while we have used number of gameplays as a measure of training, it is difficult to directly compare the amount of training done on the online training platform to that conducted in labs during cognitive training studies. Each gameplay of a particular game only lasts a couple of minutes and the player is free to play them whenever they’d like, often taking a month just to play 20 times, whereas participants in lab studies come in for training sessions on a rigid schedule with each session lasting anywhere from 15 minutes to 2 hours (Lampit et al., 2014; Reijnders et al., 2013). However, our results are compatible with lab-based studies which find that participants improve on the trained task over time (Anguera et al., 2013; Baltes & Kliegl, 1992; Kliegl et al., 1989; Verhaeghen et al., 1992).
While our analysis has demonstrated that it is possible for older adults to match younger adults on task performance, this is only a start. One extension of this work would be to use naturally occurring data for other tasks to build a model which could predict a person’s performance in the future based on their current learning trajectory, similar to work done by Steyvers and Schafer (2020). Such a model could help inform older adults how much more practice they require to reach their target performance level (which might be expressed in terms of a younger age group’s performance). In addition, future studies can analyze other factors which might affect an individual’s catch up rate, such as the frequency of their practice sessions (Kumar et al., 2022) or particular game features. Future lab based training studies can use our catch up probabilities to inform study design and estimate the magnitude of the expected results.
In conclusion, some older adults who persist in extended training have the potential to match younger adults on a subset of short cognitive tasks even when younger adults outperform them initially. The key seems to be for older adults to train for much longer than the younger adults. Additionally, we quantified the degree of benefit gained from different amounts of training. These results, along with future studies, can help us form a more complete picture of how age differences can be overcome with additional practice.
Contributions
PD and MS contributed to the conception and design. MS acquired the data. PD and MS contributed to the analysis and interpretation of the data, drafted and revised the article, and approved the submitted version for publication.
Competing Interests
The authors declare no competing interests.
Data Accessibility Statement
Data and analysis scripts can be found on this paper’s project page on OSF (https://osf.io/wbrq2/).