Re-exposure to the context that information was learned in facilitates its memory retrieval. However, the influence of context changes on the ability to learn new information is less well understood, which the present work investigated in two experiments with healthy participants (n = 40 per experiment; 20 female). In experiment 1, participants learned a list of word-pairs (A-B) in the morning, after which their memory for the word-pairs was immediately tested. In the evening, they learned and were tested on a second non-overlapping list (C-D), either in the same context or in a different context than the first list (between-subjects). We found that new learning is enhanced in the same context, and that new learning in the other context was decreased compared to baseline. In experiment 2, participants were exposed to both contexts in the morning, but only learned word-pairs in one of them. In the second learning session in the evening, this familiarization with the other context abolished differences between the same and other context group. These data point to context novelty interfering with new learning rather than context familiarity enhancing it. Importantly, the reduction of new learning in the other context in the first experiment, where the context was unfamiliar in both learning sessions, suggests mechanisms beyond attention processes that are bound by the novelty of the other context. Rather, the old context impairs the processing of the new context, possibly by biasing pattern completion and pattern separation trade-offs within the hippocampus.
1. Introduction
Memory is sensitive to context, meaning that the environments learning and testing take place in affect memory performance (Smith & Vela, 2001). The pioneering work of Godden & Baddeley (1975) demonstrated that a mismatch between the context during learning and recall results in impaired memory performance (for recent replication attempts, see Murre, 2021; Shin et al., 2021). Since then, a considerable amount of research has investigated this contextual dependence of memory, referring to the phenomenon that retrieving information is easier in the same context it was encoded in (Cairney et al., 2011; McKenzie & Tiberghien, 2004; Smith, 1979; Smith et al., 1978). For these context effects to occur, the context does not need to be extrinsic (e.g., different rooms for learning and testing), but can also be intrinsic (e.g., differences in the participants’ mood at encoding and retrieval, Bower et al., 1978; Robinson & Rollings, 2010). Smith (1979) demonstrated that the improvement in memory performance could not only be attributed to the reinstatement of the learning context at retrieval alone, but that familiarity with the retrieval context also contributed independently.
It is not clear whether context affects new learning in the same manner, i.e., whether new learning of information is enhanced when it is presented in a similar way as during the initial learning experience in this context. In their meta-analysis of context effects, Smith & Vela (2001) reported learning in different contexts to be beneficial – however, only seven experiments in their meta-analysis were about new learning across several contexts. Furthermore, in all but two experiments, the final memory test occurred in an entirely new, unfamiliar context. It is possible that people whose learning sessions were spread across several contexts only had an advantage because at the time of testing, they were more accustomed to changing contexts. Furthermore, different methodological choices increased the variance even within studies. For example, while Smith (1982) showed that changing contexts between learning different word lists increased new learning performance, different contextual features (e.g., location and appearance of the room, temporal separation) had different (positive or negative) effects on new learning in their study. Pastötter et al. (2008) let their participants study two different word lists, separated by a mental context change (participants had to mentally project themselves to their parents’ house), and found that the context change improved learning. However, in both studies, the learning episodes took place only minutes after each other, which means that they overlapped in their temporal context. These findings are contrasted by the experiments Cox et al. (2021) reported, where participants learned overlapping lists of word-pairs (day 1: AB, day 2: AC). Each word-pair was associated with a specific context (background image), and during new learning, the AC word-pair was either presented in the same or a different context than the corresponding AB word-pair. On the third day, memory for both the AB and the AC list was tested without any context. Under these conditions, new learning benefitted from overlapping contexts. It might be that matching contexts only benefit new learning when the learning episodes are already separated by other contextual features (e.g., time). Smith (1982) argued that contextual information during learning (e.g., spatial or temporal) may serve as landmarks, which can then be utilized during retrieval to separate the information learned in different contexts. When the learning episodes take place shortly after each other, other contextual features might be needed to separate the episodes. When the learning episodes already differ in their temporal context, additional changes to the context (i.e., not only a different time, but also a different room, decoration etc.) might not provide any added value, but simply additional information that needs to be processed. We hypothesise that new learning will be impaired by mismatching contexts, if they are sufficiently separated in time (i.e., several hours). We suggest that the temporal distance between learning episodes suffices to yield the beneficial effects usually associated with context changes and that additional changes to the context are then detrimental. There are further good theoretical reasons to believe that similar contexts can be beneficial for new learning. On the one hand, schemas have been shown to facilitate the assimilation of new congruent memories leading to superior new learning (Tse et al., 2007; van Kesteren et al., 2012), and schema memory can include contextual features of the learning situation (Ghosh & Gilboa, 2014).
In the present study, we prepared two different contexts to investigate their impact on new declarative learning after a delay. The contexts were designed to resemble a forest or a seaside environment, respectively, in two of our labs in different buildings on separate campuses. In the first experiment, participants learned word-pair lists in one of the contexts in the morning learning session and either returned to that same context or to the other context for a second session 12 h later in the evening to learn a novel list of word-pairs. We predicted that learning in the same context would improve learning performance, i.e., that participants whose learning sessions took place in the same context would perform better in the second learning session than participants who spent their learning sessions in two different contexts. We also predicted that participants that learned twice in the same context would improve their performance from the first to the second session, whereas the performance would remain the same for participants who switched contexts.
2. Experiment 1
2.1. Materials and Methods
2.1.1. Participants
Forty participants completed the first experiment (20 = male, 20 = female). This sample size was determined by the usual sample size in comparable experiments (e.g., Murre, 2021, N = 16; Pastötter et al., 2008, N = 48; Cox et al., 2021, N = 39, 43, and 41), as well as available resources (Lakens, 2021). We conducted a sensitivity analysis to determine the smallest effect size we were able to detect with 80% power (1 – β) in our main analysis, an ANOVA on the first run of each session, with session (within-subjects; morning or evening) and context (between-subjects; same or different) as predictors. We used G*Power, version 3.1.9.7 (Faul et al., 2007) to calculate the sensitivity for an α-level of 0.05, our total sample size of 40 (number of groups = 2, number of measurements = 2) and a correlation of .7 between repeated measures (the correlation between the morning and evening session was r = .76 for experiment 1, and r = .72 for experiment 2). This analysis (which crucially should not be confused with a post-hoc power analysis) revealed that the effect size we have 80% power to detect is f = 0.18 or larger, which we deemed sufficiently large to be of interest.
Participants were healthy, non-smoking, native German speaking men and women, age range between 18-30 years old (for demographic data, see Table 1). Before starting the study, all participants were pre-screened via e-mail by answering questions about their age, height, weight, and general health condition (confirming their age was between 18 and 30, their BMI below 26 and that they had no known chronic illness). If they met the initial criteria a more detailed telephone screening was used to confirm this information and exclude any participants with psychiatric, neurological, or endocrine diseases and participants who took regular/acute medication (except for oral contraceptives). The screening relied on a structured interview asking for current or past diagnosed conditions, and only healthy participants were included. In addition, participants who had participated in other experiments using word-pairs or had been in any of the buildings used for running the experiments were excluded. Furthermore, we ensured to only include participants reporting a normal sleep-wake cycle, no shift work, night work or intercontinental flights (> 4 hours time difference) for at least 6 weeks before the experiments.
same context | different context | |||
Experiment 1 | ||||
age | 24.30 | (3.29) | 22.80 | (3.04) |
bmi | 21.77 | (2.50) | 21.84 | (1.55) |
hours of sleep | 8.03 | (0.97) | 8.22 | (0.44) |
Experiment 2 | ||||
age | 23.25 | (2.79) | 24.21 | (3.12) |
bmi | 22.01 | (1.69) | 22.16 | (1.70) |
hours of sleep | 7.68 | (0.71) | 7.65 | (0.73) |
same context | different context | |||
Experiment 1 | ||||
age | 24.30 | (3.29) | 22.80 | (3.04) |
bmi | 21.77 | (2.50) | 21.84 | (1.55) |
hours of sleep | 8.03 | (0.97) | 8.22 | (0.44) |
Experiment 2 | ||||
age | 23.25 | (2.79) | 24.21 | (3.12) |
bmi | 22.01 | (1.69) | 22.16 | (1.70) |
hours of sleep | 7.68 | (0.71) | 7.65 | (0.73) |
Included participants were instructed to keep a regular sleep schedule in the week before the experiment (approximately sleeping from 23:00 to 7:00 each night) and to go to bed at 23:00 the night before experiments. They were asked to get up at 07:00 on experimental days and, during these days, not to drink caffeine-containing drinks and not to consume alcohol starting one day before the experiment. Furthermore, participants were instructed not to sleep between the two experimental sessions. Adherence to these rules was assessed with a general questionnaire at the very beginning of each experimental session and experiments were aborted and rescheduled if gross deviation from this plan was found. The experiment was approved by the local ethics committee. We obtained written informed consent from all participants before their participation.
2.1.2. Experimental Contexts
The experiments took place in the two different contexts, “Seashore” and “Forest”, set up in different testing rooms. Importantly these rooms were located in two different buildings located at different campuses roughly 1.5 km apart. The contexts consisted of distinct decorations (sand and seashells vs. trees), pictures on the walls (beach vs. forest), background screens on the testing computer (beach vs. forest), relevant background noises (waves vs. birds) and odours (ice-cream vs. pine). Two female experimenters were consistently assigned to one of these contexts each and had to wear a coloured lab coat (red vs. green). Furthermore, one of the experimenters spoke German and the other spoke English to the participants.
2.1.3. Tasks
2.1.3.1. Word-pair learning Task
We used slightly associated word-pairs to test declarative learning performance in the two learning sessions. For this, we created two completely different lists of 120 word-pairs, which were similar in difficulty. The lists were assigned to the morning or the evening learning sessions using a balancing list. The learning procedure was designed to last approximately 90 minutes and word-pairs were presented on a computer screen for 2s each with a 1s inter-stimulus interval (ISI). A picture of pinecones or a picture of seashells was used as the background of word-pairs for the two contexts respectively, which was shown for 15s before learning to familiarize the participants with it. During the learning session, the 120 word-pairs were repeated in three consecutive runs and each run was divided into three 40 word-pairs blocks. The word-pair blocks were presented in the same order across the three runs, but the block order was varied across participants. After each block there was a short self-paced break. After presenting each run of a word-pairs list the participant’s memory was tested in a cued-recall procedure, which involved presenting only the first cue word of a pair and asking the participant to produce the second target word. In the forest context, participants wrote the target word on a piece of paper whereas, in the seashore context, they said the target word out loud, which was documented by the experimenter in addition to recording their voice. The amount of correctly recalled word-pairs during each run of the cued-recall procedure was used as the measure of learning performance. The same procedure was followed in the evening session for learning the completely new list of 120 word-pairs. In the morning session, participants were informed that the cued recall procedure will be repeated in the evening session and were not informed of the new learning, which actually took place.
2.1.3.2. Control measures—vigilance, sleepiness, and mood ratings and test of encoding
Participants’ sleepiness and mood were assessed using self-report measures. The Stanford Sleepiness Scale (Hoddes et al., 1973) measures subjective sleepiness with one item and eight answer options ranging from 1 = “Feeling active, vital, alert, or wide awake” to 8 = “Asleep” (provided as an anchor). We assessed the participants’ mood using the multidimensional mood questionnaire (Hinz et al., 2012) at two time points, once in the morning session before starting the first learning session and once in the evening before starting the new learning session. This questionnaire produces the three scales “Positive mood” (high is positive), “Tiredness” (low is tired), and “Calmness” (high is calm). Objective vigilance was additionally tested twice using the psychomotor vigilance task (PVT, Dinges & Powell, 1985). This 5-min version of the PVT required pressing a button as fast as possible whenever a bright millisecond clock presented on a dark computer screen started counting upward. After the button press, this clock displayed the reaction time. In both experiment 1 and 2, one participant was excluded from the PVT analysis due to technical problems during the task. General capabilities of long-term memory retrieval were tested using two rounds of a word generation task (Regensburger Word Fluency Task, WFT). In the first round, participants had to produce as many words as possible starting with the letter P and in the second round, they had to generate as many hobbies as possible during a time of 2 min each (Aschenbrenner et al., 2000). At the end of the experiment participants were asked if they had any idea about the aim of the study.
2.1.4. Procedure
Participants visited the lab twice in one day (Figure 1). In the morning they visited one of the two contexts for the first time and in the evening they either revisited the familiar context (same context group) or visited a new context (different context group). The starting context was balanced across participants. The morning learning session lasted from 9:00 to 11:00 and the evening session from 21:00 to 23:00 resulting in a 10 hrs interval between sessions. In the morning session, participants had to fill in questionnaires (Stanford Sleepiness Scale (SSS), Multidimensional Mood Questionnaire (MDBF) and the arrival questionnaire) after their arrival to one of the contexts. After receiving the instruction of the word-pair task and practicing on an example with the experimenter, they learned the word-pairs in three runs. After each run, performance was tested in a cued recall procedure. Participants were informed that they should avoid to consciously rehearse the word-pairs. After finishing the word-pair learning task participants performed the psychomotor vigilance task (PVT). Then participants left the lab to return for the evening learning session at 21:00, either in the same context or in a different context. When participants arrived, they filled in additional questionnaires (SSS, MDBF). Then they learned a completely new list of 120 new word-pairs using the same procedure as in the morning. After learning new word-pairs, participants had to perform the word fluency task (WFT) and the psychomotor vigilance task (PVT).
Participants completed three runs of a word-pair learning task in the morning in one context (e.g., forest) and then completed three runs of the same word-pair learning task (with new words) in either the same context or a different context (e.g., seashore) in the evening.
Participants completed three runs of a word-pair learning task in the morning in one context (e.g., forest) and then completed three runs of the same word-pair learning task (with new words) in either the same context or a different context (e.g., seashore) in the evening.
2.1.5. Data reduction and statistical analysis
Data were analysed using R, version 4.2.2 (R Core Team, 2022), with the tidyverse package collection, version 1.3.2 (Wickham et al., 2019), as well as the packages afex, version 1.2-0, (Singmann et al., 2022) and effectsize, version 0.8.2 (Ben-Shachar et al., 2020). Statistical analyses generally relied on analyses of variance (R package afex, version 1.2-0, Singmann et al., 2022) including the repeated-measures factors Session (Morning Session vs. Evening Session), as well as the between-subjects factor Same/Different Context. We focused our main analysis on the first of the three runs in each session, because this is the only run that is not contaminated by effects of context on learning performance carried over from the previous run. In run 2 and 3, where the word-pairs from run 1 are repeated, memory may be influenced by the learning and retrieval processes that occurred beforehand, since memory is tested after each run. For example, if memory is improved in the first run by occurring in the same context, then for subsequent runs it is harder to show gains in additional items as more difficult items remain and the ceiling of the task is closer. Whenever we compared the two experiments, we added the factor Experiment as predictor. Greenhouse-Geisser correction of degrees of freedom was applied where necessary. Significant interactions were followed up by lower-level ANOVAs and post hoc t-tests. All data (https://doi.org/10.23668/psycharchives.12357) and analysis code (https://doi.org/10.23668/psycharchives.12358) are publicly available at the PsychArchives of the Leibniz Institute of Psychology (ZPID).
2.2. Results
2.2.1. Word-pairs
An ANOVA over the first run (Figure 2A) showed that participants learned significantly more word-pairs in the same context in the evening learning session (Session x Context interaction: F(1,38) = 10.63, p = .002, ηp² = .219). There were no significant main effects of Context (F(1,38) = 2.41, p = .129, ηp² = .060) or Session (F(1,38) = 0.55, p = .462, ηp² = .014) on performance. Intriguingly, the participants in the same context group learned more word-pairs in the first run of the evening learning session compared to the morning learning session (t (19)= 2.47, p = .023, dz = 0.55, 95% CI [0.07, 1.02]), whereas participants in the different context group learned less word-pairs in the first run of the evening learning session compared to the morning learning session (t (19)= -2.16, p = .044, dz = -0.48, 95% CI [-0.94, -0.01]).
The first two panels show the number of correctly recalled word-pairs in the same context condition versus the different context condition in the first out of three runs in experiment 1 (A), where participants were either exposed to the same or a different context during the evening session, and experiment 2 (B), where participants were additionally familiarized with the different context in the morning. The results for an independent t-test per session comparing the same versus different context condition are shown (asterisks indicate significance at α = .05). The last two panels show the number of correctly recalled word-pairs across all runs in the morning versus evening session of experiment 1 (C) and experiment 2 (D). The results for a dependent t-test per context comparing the morning versus evening session per condition and run are shown (asterisks indicate significance at α = .05). In all panels, the scattered points represent individual data points, randomly jittered across the x-axis. The lower and upper hinges of the boxplot correspond to the first and third quartiles (the 25th and 75th percentiles), and the middle line represents the median. The whiskers extend from the upper hinge to the largest value (no further than 1.5 * inter-quartile range from the hinge) and from the lower hinge to the smallest value (no further than 1.5 * inter-quartile range from the hinge). Data beyond the end of the whiskers are plotted in black (outliers). The larger diamonds indicate the mean.
The first two panels show the number of correctly recalled word-pairs in the same context condition versus the different context condition in the first out of three runs in experiment 1 (A), where participants were either exposed to the same or a different context during the evening session, and experiment 2 (B), where participants were additionally familiarized with the different context in the morning. The results for an independent t-test per session comparing the same versus different context condition are shown (asterisks indicate significance at α = .05). The last two panels show the number of correctly recalled word-pairs across all runs in the morning versus evening session of experiment 1 (C) and experiment 2 (D). The results for a dependent t-test per context comparing the morning versus evening session per condition and run are shown (asterisks indicate significance at α = .05). In all panels, the scattered points represent individual data points, randomly jittered across the x-axis. The lower and upper hinges of the boxplot correspond to the first and third quartiles (the 25th and 75th percentiles), and the middle line represents the median. The whiskers extend from the upper hinge to the largest value (no further than 1.5 * inter-quartile range from the hinge) and from the lower hinge to the smallest value (no further than 1.5 * inter-quartile range from the hinge). Data beyond the end of the whiskers are plotted in black (outliers). The larger diamonds indicate the mean.
When extending the analysis to all three runs in each session (Figure 2C), an ANOVA likewise showed that participants learned significantly more word-pairs in the same context in the evening learning session (Session × Context interaction: F(1,38) = 6.38, p = .016, ηp² = .144). In the same analysis we found that word-pair learning improved across the three runs (F(2,76) = 857.05, p ≤ .001, ηp² = .958). All other factors did not significantly affect performance (all F ≤ 2.77, p ≥ .104). An analysis of the separate sessions revealed that the benefit of the same context was only evident in the evening learning session (F(1,38) = 4.43, p = .042, ηp² = .104), but not in the morning learning session (F(1,38) = 1.04, p = .313, ηp² = .027).
2.2.2. Control Measures
Descriptive statistics for the control measures for experiment 1 can be found in Table 2. There were no significant effects of Session, Context or their interactions on the number of lapses or reaction speed (mean 1/rt) in the psychomotor vigilance task (all F ≤ 2.73, p ≥ .107). Subjective sleepiness measured by the Stanford Sleepiness Scale (SSS) was higher in the evening than in the morning (F(1,38) = 12.18, p = .001, ηp² = .243). The analysis of mood using the multidimensional mood questionnaire (MDBF) revealed no significant effects of Session, Context or their interactions for positive mood and calmness (all F ≤ 2.40, p ≥ .130). For the tiredness dimension there was a significant increase in the evening session (Session: F (1,38) = 22.36, p ≤ .001, ηp² = .370). There was no significant difference in verbal fluency used as measure of long-term memory retrieval ability in the evening session (version “hobby”: t(38) = 1.44, p = .157, d = 0.46; version “letter”: t(38) = -0.04, p = .972, d = -0.01).
same context | different context | |||
PVT | ||||
Lapses morning | 2.15 | (5.94) | 1.11 | (1.20) |
Mean 1/rt morning | 3.16 | (0.24) | 3.12 | (0.24) |
Lapses evening | 0.55 | (0.94) | 0.42 | (0.51) |
Mean 1/rt evening | 3.23 | (0.18) | 3.16 | (0.32) |
SSS | ||||
Morning | 2.45 | (0.60) | 2.35 | (0.49) |
Evening | 2.95 | (1.19) | 3.10 | (1.02) |
MDBF | ||||
Good-bad mood morning | 17.80 | (2.33) | 18.35 | (1.18) |
Calm-nervous morning | 17.05 | (2.46) | 17.30 | (1.84) |
Awake-tired morning | 15.80 | (2.59) | 16.60 | (2.14) |
Good-bad mood evening | 17.60 | (2.80) | 17.60 | (2.33) |
Calm-nervous evening | 16.65 | (2.50) | 17.30 | (2.25) |
Awake-tired evening | 14.15 | (4.13) | 13.40 | (2.52) |
WFT | ||||
Hobbies | 17.00 | (4.29) | 19.00 | (4.90) |
Word starting with P | 17.60 | (5.15) | 17.60 | (3.53) |
same context | different context | |||
PVT | ||||
Lapses morning | 2.15 | (5.94) | 1.11 | (1.20) |
Mean 1/rt morning | 3.16 | (0.24) | 3.12 | (0.24) |
Lapses evening | 0.55 | (0.94) | 0.42 | (0.51) |
Mean 1/rt evening | 3.23 | (0.18) | 3.16 | (0.32) |
SSS | ||||
Morning | 2.45 | (0.60) | 2.35 | (0.49) |
Evening | 2.95 | (1.19) | 3.10 | (1.02) |
MDBF | ||||
Good-bad mood morning | 17.80 | (2.33) | 18.35 | (1.18) |
Calm-nervous morning | 17.05 | (2.46) | 17.30 | (1.84) |
Awake-tired morning | 15.80 | (2.59) | 16.60 | (2.14) |
Good-bad mood evening | 17.60 | (2.80) | 17.60 | (2.33) |
Calm-nervous evening | 16.65 | (2.50) | 17.30 | (2.25) |
Awake-tired evening | 14.15 | (4.13) | 13.40 | (2.52) |
WFT | ||||
Hobbies | 17.00 | (4.29) | 19.00 | (4.90) |
Word starting with P | 17.60 | (5.15) | 17.60 | (3.53) |
2.3. Interim Discussion
We investigated the effect of context on new learning. Participants learned word-pairs in the morning and then learned a new set of word-pairs in either the same or a different context in the evening. In the evening session, more new word-pairs were learned in the same context than in the other context. This is in line with our hypothesis that new learning in the same context would be superior to new learning in a different context and mirrors the context-dependent enhancements that have traditionally been observed when contexts between learning and test are matched. Compared to the morning learning session, participants also enhanced their learning performance in the same context during the first run of the evening learning session, while participants’ performance in the different context condition declined across sessions.
Contrary to our results, other studies found a change of context rather than a similar context to be beneficial for new learning (Pastötter et al., 2008; Smith, 1982). However, in Pastötter et al. (2008), both retrieval sessions took place after the context change, which might have caused interference because the first list still had to be kept in memory across the context change. Crucially, the learning episodes in the two aforementioned studies were conducted immediately after each other, creating an overlap in their temporal context. Temporal landmarks can be used during retrieval to separate information learned in different contexts (Smith, 1982), which could explain why we found the opposite pattern with our learning episodes being separated by 12 h. In fact, Smith (1982) finds that increased intervals between learning sessions can improve new learning performance, and argues that context effects on memory might not generalise across different time intervals. Likewise, Cox et al. (2021), who placed their two learning sessions 24 h apart, found that new learning benefits from matching contexts. Against this backdrop, our data raises the possible explanation that the degree of contextual overlap determines whether a change in context is beneficial for new learning. Context overlap might influence memory performance in the form of an inverted U-shape, where new learning in a very similar context is impaired due to too many overlapping “landmarks” (e.g., both spatial and temporal), new learning in a context with moderate differences is enhanced (e.g., only a temporal difference), and new learning in a very dissimilar context is impaired (e.g., temporospatial differences). In very different contexts, the complete lack of shared “landmarks” could lead to a failure to retrieve metacognitive strategies about the task. Adding to the complexity of this question, research on memory reconsolidation suggests that when reminders of the first learning context are presented before learning a second list, memory for list 1 is disturbed by list 2, but critically, the reminders did not affect performance for list 2 (Hupbach et al., 2007). While reconsolidation processes by definition do not affect new learning, it is tempting to speculate that in their study, the reminder cues of the first context compensated any negative effects the context change may otherwise have had on new learning.
3. Experiment 2
However, the benefit of matching contexts for new learning might be explained by the fact that participants in the same context condition were simply more familiar with the context by the second learning session. For example, when new learning occurs in the same context twice less attentional resources may be needed the second time and therefore explain why memory was improved for the same context condition. In this case, familiarization to the context prior to the second learning should suffice to produce the context-related learning benefit. Therefore, we decided to control for effects of familiarity, when manipulating the learning context. To do so, we conducted a second experiment that mirrored the first experiment, with the addition that participants were familiarized with the other context before the second learning session in the evening. Crucially, participants did not learn anything when they were exposed to the other context, but performed a letter counting task instead that did not involve memory processes. This means that in the evening learning session, all participants were familiar with both contexts, but had only learned word-pairs in one of them. If familiarity is not a relevant moderator of the results observed in experiment 1, we would expect that in experiment 2, new learning in a different, but familiar context should be impaired compared to new learning in the same context.
3.1. Materials and Methods
3.1.1. Participants
Consistent with experiment 1, we collected 40 participants for experiment 2 (20 = male, 20 = female; for demographic data, see Table 1). The same inclusion and exclusion criteria as in experiment 1 applied, and additionally, participants were excluded if they were familiar with Persian or Arabic, the languages used in the control task. The experiment was approved by the local ethics committee. We obtained written informed consent from all participants before their participation.
3.1.2. Tasks
3.1.2.1. Letter counting task
In experiment 2, a letter counting task was additionally administered in the morning while participants were familiarized with the other experimental context. This task, where participants counted the letters of Persian words, was constructed to mimic the learning task in as many aspects as possible except for actual learning. The words were also presented in three runs and each run contained two different procedures. The first mimicked the learning procedure and presented 120 word-pairs per run in three blocks of 40. In this phase, participants were asked to continuously count and add up the letters of all the word-pairs in one block, which were displayed for 4s each (1s ISI). During the short breaks they wrote down the total number of letters for all 40 word-pairs on a piece of paper in the Forest context or said it aloud to the experimenter in the Seashore context. The second procedure mimicked the cued retrieval procedure and presented 120 single words in blocks of 40 words. Here, participants had to count the letters of each word and immediately afterwards write the number down on a piece of paper or say it aloud to the experimenter depending on the context. Each word was displayed until participants answered. The letters used in these words were of course repeated, but each word was only shown once, which prevented learning in this task. We chose Persian words (which did not have any meaning for the participants), as this would prevent any implicit learning. As for the word-pair task, the sum of counted letters for each run was used for data analysis.
3.2. Procedure
Experiment 2 was designed to match experiment 1 as closely as possible. However, participants were additionally familiarized with the other context while performing a letter-counting task (Figure 3). To control for unspecific effects of task order, participants were either familiarized with the other context before the first session of the word-pair learning task, or afterwards (balanced). In the morning, participants visited the first context from 8:30 to 10:30 and travelled to the second context via taxi to visit it from 10:45 to 12:30. The evening learning session was slightly postponed from 20:30 to 23:00 leading to a minimum of 8 hrs between the two learning sessions.
Participants completed three runs of a word-pair learning task in the morning in one context (e.g., forest) and were familiarized with the other context (e.g., seashore), where they had to complete a letter counting task. Familiarization with the other context either occurred before the word pair learning task, or afterwards. In the evening, they completed three runs of the same word-pair learning task (with new words) in either the same context the first word-pair task took place, or in a different context (e.g., seashore).
Participants completed three runs of a word-pair learning task in the morning in one context (e.g., forest) and were familiarized with the other context (e.g., seashore), where they had to complete a letter counting task. Familiarization with the other context either occurred before the word pair learning task, or afterwards. In the evening, they completed three runs of the same word-pair learning task (with new words) in either the same context the first word-pair task took place, or in a different context (e.g., seashore).
3.3. Results
3.3.1. Word-pairs
In experiment 2, where participants were familiarized with the different context before the second learning session, an ANOVA (Figure 2B) over the first run in each session showed no significant difference between the two context conditions (Context main effect: F(1,38) = 0.01, p = .935, ηp² = <.001; Context x Session interaction effect: F(1,38) = 0.05, p = .827, ηp² = .001). As in experiment 1, an ANOVA across all runs (Figure 2D) revealed that participants improved their learning performance across the three runs (F(1.47,55.80) = 676.10, p ≤ .001, ηp² = .947). However, in contrast to experiment 1, there was no significant difference between the two contexts (Context main effect: F(1,38) = 0.10, p = .759, ηp² = .003; Context x Session interaction effect: F(1,38) = 0.09, p = .769, ηp² = .002).
There was no significant difference in learning performance in the first run of the morning session between participants who were exposed to the different context before the word-pair learning session (M = 41.50, SD = 12.79), and those who were familiarized with it afterwards (M = 34.40, SD = 18.25), t(38) = -1.42, p = .162, d = -0.45, 95% CI [-1.08, 0.18]. Adding the order of context exposure to the ANOVA across the first runs did not change the previous pattern of results, and did not yield any significant effects (all F ≤ 1.37, p ≥ .249).
3.3.2. Experiment 2 versus 1
In a direct comparison of the two experiments, we found that only in experiment 1 the different context negatively affected word-pair learning performance in the first run of the evening learning session (Experiment x Context x Session interaction: F(1,76) = 4.10, p = .046, ηp² = .051, Context x Session interaction: F(1,76) = 5.53, p = .021, ηp² = .068). Comparing the first run of the evening learning session across the two experiments (Figure 4), we found that participants learned less word-pairs in experiment 1 in the other context (t(38) = -2.07, p = .045, d = -0.66, 95% CI [-1.29, -0.01]), whereas learning was comparable in the same context (t(38) = 0.42, p = .679, d = 0.13, 95% CI [-0.49, 0.75]). Even though participants in the same context condition in experiment 1 showed significantly better performance in the evening than in the morning session, and no such effect was seen in experiment 2, there was no Experiment x Session interaction in an ANOVA run on the same context condition (F(1,38) = 1.78, p = .190, ηp² = .045).
The number of correctly recalled word-pairs in experiment 1, where participants were either exposed to the same or a different context during the evening session, versus experiment 2, where participants were additionally familiarized with the different context in the morning, for the first run of each session (morning and evening) and for both conditions (same and different context). The results for an independent t-test per session and condition comparing the first versus the second experiment are shown (asterisks indicate significance at α = .05). The scattered points represent individual data points, randomly jittered across the x-axis. The lower and upper hinges of the boxplot correspond to the first and third quartiles (the 25th and 75th percentiles), and the middle line represents the median. The whiskers extend from the upper hinge to the largest value (no further than 1.5 * inter-quartile range from the hinge) and from the lower hinge to the smallest value (no further than 1.5 * inter-quartile range from the hinge). Data beyond the end of the whiskers are plotted in black (outliers). The larger diamonds indicate the mean.
The number of correctly recalled word-pairs in experiment 1, where participants were either exposed to the same or a different context during the evening session, versus experiment 2, where participants were additionally familiarized with the different context in the morning, for the first run of each session (morning and evening) and for both conditions (same and different context). The results for an independent t-test per session and condition comparing the first versus the second experiment are shown (asterisks indicate significance at α = .05). The scattered points represent individual data points, randomly jittered across the x-axis. The lower and upper hinges of the boxplot correspond to the first and third quartiles (the 25th and 75th percentiles), and the middle line represents the median. The whiskers extend from the upper hinge to the largest value (no further than 1.5 * inter-quartile range from the hinge) and from the lower hinge to the smallest value (no further than 1.5 * inter-quartile range from the hinge). Data beyond the end of the whiskers are plotted in black (outliers). The larger diamonds indicate the mean.
A comparison of all three runs between the two experiments showed a trend towards the different context negatively affecting word-pair learning in the evening learning session of experiment 1 (Experiment x Context x Session interaction: F(1,76) = 3.35, p = .071, ηp² = .042).
3.3.3. Control Measures
For experiment 2, descriptive data for the control measures can be found in Table 3. In the psychomotor vigilance task participants showed significantly faster reaction speed in the same context condition (F(1,37) = 4.19, p = .048), but there was no effect of Session or a Context x Session interaction, nor was there an effect of Context and/or session on the number of lapses (all F ≤ 1.37, p ≥ .250). Subjective sleepiness showed no effects of Session, Context or their interactions (all p ≥ .094). The analysis of mood showed that there were no significant effects of Session, Context or their interactions for positive mood and calmness (all F ≤ 2.99, p ≥ .092). There was, however, evidence for significantly higher tiredness in the evening session (F(1,38) = 11.91, p = .001). We did not find any significant differences in long-term memory retrieval performance comparing the contexts as measured by the word fluency task (version “hobby”: t(38) = 0.00, p > .999, d = 0.00, version “letter”: t(38) = 0.73, p = .467, d = 0.23). For the letter counting task, which participants completed while being exposed to the other context in the morning, they counted more letters across runs (continuous counting: F(1.95,74.01) = 26.91, p < .001, ηp² = .415; single counting: F(1.8,68.44) = 470.54, p < .001, ηp² = .925), but there was neither an effect of Context nor a Context x Run interaction (all F ≤ 1.61, p ≥ .210).
same context | different context | |||
PVT | ||||
Lapses morning | 1.37 | (1.16) | 1.15 | (1.14) |
Mean 1/rt morning | 3.30 | (0.37) | 3.12 | (0.23) |
Lapses evening | 1.00 | (0.94) | 1.34 | (1.34) |
Mean 1/rt evening | 3.32 | (0.28) | 3.17 | (0.29) |
SSS | ||||
Morning | 2.25 | (1.12) | 2.65 | (0.93) |
Evening | 2.65 | (1.09) | 2.95 | (1.10) |
MDBF | ||||
Good-bad mood morning | 17.20 | (2.53) | 18.35 | (1.14) |
Calm-nervous morning | 16.90 | (2.29) | 17.55 | (1.50) |
Awake-tired morning | 15.60 | (2.74) | 15.30 | (2.39) |
Good-bad mood evening | 17.60 | (2.78) | 17.95 | (1.96) |
Calm-nervous evening | 16.60 | (1.79) | 16.90 | (2.10) |
Awake-tired evening | 13.65 | (2.81) | 13.60 | (2.80) |
WFT | ||||
Hobbies | 17.20 | (3.40) | 17.20 | (4.16) |
Word starting with P | 16.00 | (4.72) | 17.0 | (4.31) |
Letter counting | ||||
Single letters run 1 | 756 | (6.33) | 752 | (3.78) |
Single letters run 2 | 759 | (6.21) | 758 | (5.96) |
Single letters run 3 | 780 | (3.93) | 779 | (6.57) |
Continuous run 1 | 1110 | (235) | 1049 | (69.90) |
Continuous run 2 | 1204 | (199) | 1132 | (89.20) |
Continuous run 3 | 1192 | (230) | 1138 | (89.60) |
same context | different context | |||
PVT | ||||
Lapses morning | 1.37 | (1.16) | 1.15 | (1.14) |
Mean 1/rt morning | 3.30 | (0.37) | 3.12 | (0.23) |
Lapses evening | 1.00 | (0.94) | 1.34 | (1.34) |
Mean 1/rt evening | 3.32 | (0.28) | 3.17 | (0.29) |
SSS | ||||
Morning | 2.25 | (1.12) | 2.65 | (0.93) |
Evening | 2.65 | (1.09) | 2.95 | (1.10) |
MDBF | ||||
Good-bad mood morning | 17.20 | (2.53) | 18.35 | (1.14) |
Calm-nervous morning | 16.90 | (2.29) | 17.55 | (1.50) |
Awake-tired morning | 15.60 | (2.74) | 15.30 | (2.39) |
Good-bad mood evening | 17.60 | (2.78) | 17.95 | (1.96) |
Calm-nervous evening | 16.60 | (1.79) | 16.90 | (2.10) |
Awake-tired evening | 13.65 | (2.81) | 13.60 | (2.80) |
WFT | ||||
Hobbies | 17.20 | (3.40) | 17.20 | (4.16) |
Word starting with P | 16.00 | (4.72) | 17.0 | (4.31) |
Letter counting | ||||
Single letters run 1 | 756 | (6.33) | 752 | (3.78) |
Single letters run 2 | 759 | (6.21) | 758 | (5.96) |
Single letters run 3 | 780 | (3.93) | 779 | (6.57) |
Continuous run 1 | 1110 | (235) | 1049 | (69.90) |
Continuous run 2 | 1204 | (199) | 1132 | (89.20) |
Continuous run 3 | 1192 | (230) | 1138 | (89.60) |
4. General Discussion
We investigated the effect of context on learning in two experiments. In the first, participants learned word-pairs in the morning and then learned a new set of word-pairs in either the same or a different context in the evening. We found that participants performed better when context was kept constant across learning sessions. While participants in the same context condition improved across learning sessions, participants in the different context condition learned less in the second session. In a second experiment, participants were additionally familiarized to the other context before the second learning session. This was done by letting participants complete a task without learning in the other context before or after learning in the morning. The aim of experiment 2 was to test whether different contexts have the same impact on new learning when they are familiar. The familiarization with the other context wiped out the differential effects found for the two context conditions in experiment 1. That is, no significant differences between the same and the different context condition emerged in experiment 2.
When comparing the two experiments the results supported a role of the familiarization process to reduce the detrimental effect of changing contexts. For the first run of the evening learning session in the same context condition, performance in experiment 1 and experiment 2 did not differ. However, in the different context condition, learning performance during the first run of the evening learning session in experiment 1 was significantly lower than in experiment 2. While participants in the same context condition of experiment 1 even showed significantly improved performance in the second learning session, no such effect was observed in experiment 2. However, statistically a direct comparison between the same context condition across the two experiments showed that participants in experiment 1 did not show a significantly stronger improvement across sessions than participants in experiment 2. This means we can neither conclude that the familiarization procedure disturbed the effect of improved learning in the same context, nor that it did not. However, it suggests that the effect of the familiarization procedure on the context change condition was stronger, where the negative effect of changing contexts was abolished. Nonetheless, it would be interesting to follow up this research to investigate whether the context familiarization procedure interferes with the context consolidation in some way.
It is intriguing to speculate how our findings relate to formal memory models such as REM (retrieving effectively from memory; Shiffrin & Steyvers, 1997). According to REM, each item is stored in memory as a vector consisting of different feature values. These features refer both to the content and the context of the item. During recognition, a probe vector (of the item to be recognised) is compared to the vectors stored in memory to determine which item to retrieve. In a cued recall procedure, as in our experiments, paired items (e.g., word-pairs) are thought to be stored in a common vector. When the cue is presented during recall testing, it will partially match the vector representing the pair of items, which enables us to retrieve the target associated with the cue. Items that were learned in a certain context will have similar context features, so context cues can be used to group items into different lists. Against this background, one might expect that new learning in different contexts is advantageous: When participants in the different context condition retrieve items from the second learning session, the probability of accidentally activating items from the first learning session would be low, because the items from the second and first session differ in their context features. Participants in the same context condition on the other hand, might erroneously match items from the first session during retrieval in the second session, because the items learned in different sessions share some context features. However, our results show the opposite pattern. One might argue that this is because the physical context features are not relevant and are de-weighted during memory retrieval: There is no overlap between the lists studied in the two sessions (A-B; C-D), so participants do not need to rely on context information to disentangle the two lists. Furthermore, when the items studied fall into the domain of associative processing (e.g., word-pairs), effects of context manipulations may be less effective (Smith & Vela, 2001). Within the REM framework, the “environmental base rates” of the feature values within each item vector play a role during retrieval. That is, some feature values are more common than others, which is taken into account, when the match between the memory probe and the stored vectors is calculated. When a context is more familiar, this might mean that the features representing the familiar context are treated as more common. Importantly, the errors during the storage process are not independent of the environmental base rate. When a feature is not stored correctly, the feature is drawn randomly according to the environmental base rates of potential feature values. This makes it more likely that more plausible common feature values are stored. When a context is more familiar (i.e., the environmental base rate is higher), it might be more likely that the correct feature values are stored by accident in the case of a storage error. This would both explain why participants in the same context condition in experiment 1 improve across sessions, and why the familiarization procedure in experiment 2 abolishes the differences between the same and different context condition. However, while this explains the advantage of new learning in the same context, it does not explain why new learning in a different, unfamiliar context is detrimental. Furthermore, it is unclear whether a single exposure to the different context is enough to influence the environmental base rates of item features.
Not only external contextual factors, but also internal contextual factors such as mood can affect memory performance (Bower et al., 1978; Robinson & Rollings, 2010). However, evaluating all of our control measures (vigilance, mood, sleepiness, letter counting and word fluency), the only difference that emerged between the two context conditions was the difference in vigilance in experiment 2. This difference was already evident in the morning, which means that it likely reflects a difference in trait rather than state. In addition, the reduced vigilance would have led to reduced learning in the other context group, which is not what we found. The classical context-dependence of memory can be at least partially cognitively controlled by the mental reinstatement of a context (Smith, 1979). In this scenario, the learner has to merely imagine the original encoding context to benefit the recall performance. However, such reinstatement should have equally occurred in experiment 1 and 2 and additionally cannot explain why performance in the other context dropped compared to the morning learning session. Alternatively, the benefit of new learning in a familiar context may be due to the facilitated access to meta-cognitive learning processes, cued by contextual features. More familiar cues are better at triggering memory retrieval (Robin & Moscovitch, 2014), which is an elegant explanation why familiarization with the other context in our second experiment abolished the differences between the same and other context condition.
Unfamiliar contexts might lead to the allocation of attentional resources to the processing of the novel environment – resources that are then missing for the processing of the learning task. While this may explain the improved encoding in the same context condition, it is not sufficient to explain impaired encoding in the other context condition, of experiment 1. In the morning, both context conditions encoded in equally unfamiliar environments thus any processing bound by the changed context in the evening should also have been bound by processing the context in the morning. By statistically comparing encoding in the morning and the evening, we could show impaired performance in the changed context in the evening compared to the morning thus ruling out the processing account as sufficient explanation. Furthermore, if the unfamiliar context would have drawn attentional resources, this should also have been reflected in control measures like the verbal fluency task. However, there was no difference between the same and different context group in the control tasks. A further argument against an attentional processing deficit as explanation for the negative effects of a context change arises from the analysis of context familiarization order in experiment 2 (before or after the first learning session). Exposure to the other context before learning did not affect learning performance. Descriptively, we even observed the opposite pattern: Participants who were exposed to a different context before the first learning session performed better than those who encountered the different context afterwards. This speaks against a deficit in attentional resources induced by a change of context. Nevertheless, it would be interesting to incorporate physiological measures of processing such as eye tracking in future studies to investigate how much attention participants devote to the task at hand and/or their environment.
The hippocampus is critically involved in processing of the contextual information that a learning situation occurs in (Anagnostaras et al., 2001; Ekstrom & Ranganath, 2018; Maren, 2001; Myers & Gluck, 1994; Rugg et al., 2012). Cholinergic models of hippocampal function (Easton et al., 2012; Hasselmo et al., 1996; Meeter et al., 2004) suggest that higher levels of cholinergic activity, following exposure to novel contextual cues, may bias the hippocampus toward forming distinctive memories by prioritizing encoding over retrieval (Douchamps et al., 2013). Whereas lower cholinergic levels occurring in familiar contexts (Giovannini et al., 2001), may bias the hippocampus toward memory reactivation aiding pattern completion processes (Duncan et al., 2012). In our cued recall word-pair task, pattern completion processes were likely beneficial, because participants had to generate the matching target to the cue presented at test. Furthermore, the word lists across the two sessions did not overlap (A-B, C-D), so that reducing interference via pattern separation was less important. This explains why new learning in a familiar context is superior to new learning in an unfamiliar context. Future experiments should resolve, whether in an A-B, A-C paradigm these effects reverse. However, participants in the different context condition of experiment 1 decreased their performance across sessions, in both of which the context was equally unfamiliar. A possible explanation is that pattern completion and separation processes do not only affect the learned material itself, but also metacognitive knowledge about the learning task itself (such as strategies). In the evening session of the unfamiliar context condition of experiment 1, pattern separation processes might have prevented access to neuronal processing steps relevant for task execution that were associated to the morning context. In the morning, when the context was also unfamiliar, but no pattern separation processes were initiated because the task was entirely novel as well, this access would not have been blocked, leading to the performance measured in the morning session. This explanation goes beyond novelty binding of attention resources since it assumes that access to processing resources is gated by context. This intriguing possibility should be systematically examined in further experiments.
A potential limitation of our study comes from the short interval between switching the contexts in the morning session of experiment 2. It is conceivable that participants did not encode the two contexts as distinct episodes but rather linked them in memory, so that the context formed a larger meta-context. In experiment 1, we tried to separate the context also in time, but this was not possible for experiment 2 in the same way. In future experiments it would be helpful to familiarize the participants with the contexts with at least one day in between these sessions and without any learning occurring initially. This would allow applying the same paradigm as in experiment 1 with both contexts being familiar. Furthermore, it would be interesting to see whether our results generalise to other forms of memory testing, other than cued recall. The meta-analysis by Smith and Vela (2001) suggests that context memory effects might be larger for free recall or recognition procedures (even though there was no meta-analytic effect of the type of memory test used).
In conclusion, while experiment 1 seemingly provided evidence that the classical context dependence of memory extends to new learning this assumption was falsified by experiment 2. It seems that it is sufficient to be familiar with a context to be able to learn new information without detriment and that learning a task in that precise context is unnecessary for this benefit. However, the reduction in new learning from morning to evening in the other context group of experiment 1 leaves the question, which processes are interfering with new learning in unfamiliar contexts, if the learning procedure is familiar.
Funding and Disclosure
This research was supported by grants from the Deutsche Forschungsgemeinschaft (DFG) SFB 654 ‘Plasticity and Sleep’ and the European Research Council (ERC AdG 883098 SleepBalance) to J.B. as well as a DFG Emmy-Noether-Grant (FE 1617/2-1) to G.B.F. and the authors declare no competing financial interests. The founding sources had no role in the study design, or in the collection, analysis and interpretation of data; in the writing of the report; and in the decision to submit the article for publication.
Transparency and Openness
All data (https://doi.org/10.23668/psycharchives.12357) and analysis code (https://doi.org/10.23668/psycharchives.12358) are publicly available at the PsychArchives of the Leibniz Institute of Psychology (ZPID). This study’s design and its analysis were not pre-registered.
Author Contributions
MAA, GBF and JB developed the study concept and study design. Data collection was performed by MAA, SB and GN. MAA and JN performed the data analysis and interpretation under the supervision of GBF. MAA drafted the manuscript, and JN, GBF and JB provided critical revisions. All authors approved the final version of the manuscript for submission.
Acknowledgements
The authors would like to thank Michael Radloff for assisting data collection.