Learning Habits: Does Overtraining Lead to Resistance to New Learning?

We explore the development of habitual responding within the colour-word contingency learning paradigm, in which participants respond to the colour of neutral words. Each word is most often presented in one colour. Learning is indicated by faster responses to the colour when the word is presented in the expected rather than in the unexpected colour. In Experiment 1, participants took part in two sessions, separated by one day. Critically, one set of words was trained across both days, and other new sets of words were introduced at various time points. Overall performance was faster on trials with overtrained words. Additionally, contingency effects were larger for overtrained words than for words introduced on Day 2. Removing the contingency had a similar impact on the learning effect for overtrained and new words. However, during a counterconditioning phase, where the words were made predictive of new colours, the previous contingency continued to influence performance for overtrained words but not for more recently introduced words. Relatedly, the new contingency was not acquired for the overtrained words. The reverse pattern was observed for recently-introduced words, with the newly-introduced contingency rapidly acquired and the influence of the old contingency quickly extinguished. In Experiments 2 and 3, however, both new and old learning effects were observed for both overtrained and recently-acquired contingencies. The net results suggest that while contingency learning effects are highly pliable during initial and subsequent learning, early-acquired contingency knowledge is maintained after removal of the contingency. Implications for models of learning are discussed.


Introduction
For any intelligent organism to be able to understand and interact effectively with its environment, it is necessary to learn the regularities between events and outcomes (Allan, 2005;Beckers, De Houwer, & Matute, 2007;Shanks, 2010). Knowledge of the meaning of words, the tastes of foods, and the likely results of our actions are all built on this contingency learning backbone. These regularities in the environment shape the behavioural repertoire of the organism. The present paper explores how contingency learning also helps to shape automatized or default responding to a stimulus. In particular, we ask how quickly this responding becomes stable and resistant to changes in contingencies, and we discuss potential relations to habit formation.
One useful paradigm for studying contingency learning is the colour-word contingency learning paradigm (Schmidt, Crump, Cheesman, & Besner, 2007; for related paradigms, see Carlson & Flowers, 1996;Levin & Tzelgov, 2016;Lewicki, 1985Lewicki, , 1986Miller, 1987;Musen & Squire, 1993;Schmidt & De Houwer, 2012b; for a review, see MacLeod, 2019). In the typical version of this paradigm, participants respond to the print colour of words (or the reverse; Forrin & MacLeod, 2017) with a key press (for verbal variants, see Atalay & Misirlisoy, 2012;Forrin & MacLeod, 2017). Each word is presented most often in one colour (e.g., "find" most often in purple, "help" most often in orange, etc.). Learning of the word-response contingencies is indicated by faster and more accurate responses to high contingency trials, where the word is presented in its most frequent colour (e.g., "find" in purple), relative to low contingency trials, where the word is presented in an infrequent colour (e.g., "find" in orange). One useful feature of this paradigm is the robustness of the effect, with nearly 100% of participants showing a positive contingency effect with very short experiments (e.g., 5-10 minutes).
In previous work, it has been observed that the colourword contingency learning effect appears almost instantly after the start of the task (Schmidt et al., 2007;Schmidt & De Houwer, 2012c, 2016bSchmidt, De Houwer, & Besner, 2010;O. Y.-H. Lin & MacLeod, 2018). Within the very first block of trials, the contingency effect is typically already robust, even with blocks as small as 18 trials. Similarly quick acquisition is observed in other, related implicit learning paradigms (e.g., Lewicki, 1985;Nissen & Bullemer, 1987;Schmidt & De Houwer, 2012a). There is, in addition, a small but significant gradual increase in the contingency effect with increasing practice (O. Y.-H. Lin & MacLeod, 2018;Schmidt & De Houwer, 2016b), indicating that there is a cumulative learning effect (i.e., continued strengthening of learning over time).
Interestingly, work has also indicated that when the contingency is removed from the experiment (i.e., after some blocks of experiencing the contingency), the contingency effect very rapidly diminishes (O. Y.-H. Lin & MacLeod, 2018;Schmidt & De Houwer, 2016b;Schmidt et al., 2010). Though not necessarily eliminated entirely (and even persisting in much reduced form for hundreds of trials; Schmidt & De Houwer, 2016b), the effect does approach zero almost as quickly as the initial contingency learning effect appeared. This suggests that the contingency effect is heavily influenced by very recentlyencountered events. The fact that contingency learning effects are so pliable in the colour-word contingency learning paradigm is intriguing. Though persistent effects of previously experienced contingencies are still observed after the contingencies are removed, and the contingency effect cannot be exclusively explained by word-colour conjunctions one to five trials back (Schmidt et al., 2010; but see, Giesen, Schmidt, & Rothermund, 2020; press), the results so far suggest that, for the most part, recent experience is what matters most.
Contingency learning effects like this are interesting in that they resemble a habit (for a review, see Wood & Rünger, 2016). Organisms tend to repeat behaviours in response to given stimuli, especially when these responses are rewarded (Thorndike, 1911). In the colourword contingency learning paradigm this is reflected by the bias to repeat the frequently paired response to the (task-irrelevant) word stimulus. The best way to define a habit is not consistently agreed upon (De Houwer, 2019). On the one hand, a habit might be defined in a broad sense as roughly synonymous with "automaticity." In this sense, habitual responding is observed when a learned response is automatically evoked by a stimulus as the so-called "default" response (Evans & Stanovich, 2013). This definition focuses on the conditions under which responding occurs and does not specify the underlying mental mechanisms (whether the response is caused by stimulus representations, attitudes, or goals). A narrower definition of a habit, however, refers exclusively to the automatic priming of a specific response, and specifically excludes the operation of goals or attitudes (Wood & Rünger, 2016). Even more restrictively, Dickinson (1985;Heyes & Dickinson, 1990) defines habits as stimulus-driven responses (mediated by mental S-R associations) that have been installed via overtraining, and contrasts them with goal-directed actions that are driven by representations of the values and expectancies of the outcomes of these responses. By these more restrictive criteria, it can be quite difficult to determine when a behaviour is truly habitual, even in common metrics of habits like stimulus revaluation (e.g., De Houwer, Tanaka, Moors, & Tibboel, 2018;Moors, Boddez, & De Houwer, 2017).
Another way of thinking about contingency learning and habits (which is not necessarily incompatible with the view of Dickinson, 1985) is in terms of memory traces. According to an episodic memory perspective (Schmidt et al., 2010;Schmidt, De Houwer, & Rothermund, 2016), the colour-word contingency learning effect results from the storage and retrieval of episodic memories or exemplars (Logan, 1988). As more and more episodes linking a stimulus to a response are stored, presentation of said stimulus will more strongly bias retrieval in favour of the high contingency response.
In either case, the automatic biasing of the high contingency response to a word in the colour-word contingency learning paradigm might be regarded as a habit, though this might depend both on how one defines a habit and what assumptions are made about the mechanisms producing the effect (De Houwer, 2019). In the present manuscript, our primary interest is in exploring the automatic or default biasing of the high contingency response in the colour-word contingency learning paradigm, though we will return to potential implications for habit formation later.
The fact that contingency effects can be so pliable is interesting, as this suggests that recently-encoded events are particularly potent in their influence on performance, whereas older encoded events have minimal impact or rapidly become less potent (e.g., weakly retrievable). If so, then automatic responding might not be nearly as stable as previously thought. That is, it could be that a large chunk of what we regard as default responding (whether habitual in the restrictive sense or goal-mediated) is actually due to retrieval of only a very limited number of recentlyencoded event memories (Giesen et al., 2020;Schmidt et al., in press). Note that persistence of an automatic response over time is not necessarily inconsistent with this notion: Repeatedly responding in a similar way to a stimulus as you did on your recent experiences with the same stimulus will tend to preserve the same behaviours long term.
It should be noted, on the other hand, that previous colour-word contingency learning experiments were relatively short in duration. They may therefore not reflect cases in which, after a considerable amount of practice, a stable representation of the meaningful connection between the stimulus and response may eventually emerge. For instance, over a lifetime one learns that the word "blue" is pronounced "blue." Even after considerable practice in a Stroop task where "blue" is presented equally often in all colours, the word "blue" will continue to interfere with naming of incongruent colours (Ellis & Dulaney, 1991;Gul & Humphreys, 2015;MacLeod, 1998). That is, the relation between the word "blue" and its verbalisation is not quickly forgotten simply because the word is no longer predictive of the "blue" verbal response within the context of the experiment. It could be argued, however, that training studies with Stroop stimuli like this should be interpreted differently. Because the (repeatedly-reinforced) goal is to name the print colour congruently throughout the task (e.g., saying "blue" to the print colour blue), the habitual links between colour word stimuli and verbal responses may be indirectly reinforced (e.g., the link between the concept "blue" and the verbalization "blue" is reinforced during colour naming, which indirectly keeps the word "blue" linked to a "blue" verbalisation). The colour-word contingency learning paradigm is more neutral in this regard, given the lack of a meaningful, pre-existing relation between the predictive (non-colour word) and target (colour) dimensions.
Also potentially informative in this regard is a series of experiments by MacLeod and Dunbar (1988). In these experiments, participants were trained to name novel shapes as colours (e.g., naming one polygon as "blue," another as "pink," etc.). After practice in shape naming, the task was reversed and participants named the (actual) print colours of the shapes. This allowed assessment of both "congruent" (or high contingency) trials, in which the (now task-irrelevant) shape is presented in the high contingency colour (i.e., the colour the shape was named as during training), and "incongruent" (or low contingency) trials, in which the shape is presented in a low contingency colour. In contrast to the colour-word contingency learning paradigm, a congruency/contingency effect was not observed immediately in this training paradigm. Instead, it took multiple days of training before the shapecolour contingency began to influence performance when naming the colour of the shapes. This may have been due to the change in the task context (i.e., swapping the taskrelevant and -irrelevant features) or some other factor. In any case, in their Experiment 2, a congruency/contingency effect was still observed three months after the end of training. Although it is impressive that this contingency effect persisted in the absence of continued training, they did not test to see how this contingency effect was influenced by direct changes in the contingencies. It is possible, for instance, that the contingency effect would be immediately obliterated after introduction of a new contingency (counterconditioning) in which the shapes were briefly retrained with new colour names.
Given the above considerations, it is not clear whether the automatic response tendencies developed during implicit learning procedures, such as the colourword contingency learning paradigm, are stable. More generally, if such a training procedure does produce stable responding, then how long might one need to practice before the contingency effect is stable enough to, for instance, be resistant to unlearning (i.e., removal of the contingency) or counterconditioning (i.e., introduction of a new, conflicting contingency)? That is, how much training is needed for acquired contingency knowledge to form a stable automatic response that is strong enough to override the currently-experienced stimulus-response contingencies?
In our previous research, we already studied unlearning with the colour-word contingency learning paradigm, but the contingency was always removed after a relatively short training period (O. Y.-H. Lin & MacLeod, 2018;Schmidt & De Houwer, 2016b;Schmidt et al., 2010). In the present series of three experiments, we investigated whether a contingency effect for heavily overtrained stimuli diminished rapidly after the contingency is removed (unlearning) and whether the effect of the original contingency was either further maintained or rapidly changed when a new, different contingency was introduced (counterconditioning). These three types of procedures (implemented in different phases of the experiments) are illustrated in Figure 1. We also investigated to what extent a new contingency was quickly acquired for stimuli that were previously associated with other colours. In particular, one set of Figure 1: Three types of testing phases in the experiments with the stimuli that correspond to each condition. In the initial learning phase, a contingency was introduced. During the unlearning phase, the contingency was removed. During the counterconditioning phase, a different contingency for the same words was introduced. For overtrained words, the original learning phase comprised many blocks spread over either two days (Experiments 1 and 3) or one long session (Experiment 2). For other words, the original learning phase comprised only five small sub-blocks. words was trained first for an extended period of time (e.g., for two days in Experiments 1 and 3 or in one long session in Experiment 2). After this training, the contingency was removed for the second to last phase, and finally an alternative contingency was introduced in the final phase. These effects for heavily overtrained stimuli were compared to the same effects for recentlyintroduced contingencies (i.e., with stimuli that were introduced only briefly before the unlearning and counterconditioning phases).
Two possible results could occur in this setup. The first possibility we term the recent-events-matter-most scenario. In this scenario, older events have minimal impact on performance, and performance is primarily determined by the stimulus-response bindings in the recently encountered events. That is, the learning mechanism is strongly "myopic" to events that were just experienced. This scenario is radically different from that predicted by traditional views of automatic responding discussed above, which assume that associations progressively strengthen over time, implying that frequency is more important than recency instead (of course, no one would argue that recent events do not influence behaviour at all). If the recent-events-matter most scenario obtains, then the contingency effect will disappear rapidly when the contingency is removed, even for the heavily trained stimuli. For instance, participants will stop responding faster to "find" in purple very shortly after "find" is changed to be presented equally often in all colours. It should similarly be expected that a newly introduced contingency for the same stimuli (i.e., counterconditioning) is rapidly learned. For instance, if "find" is now presented most often in orange (instead of purple), then participants should rapidly begin responding faster to "find" in orange, even after considerable training with "find" in purple. At the same time, the "old" high contingency (e.g., "find" in purple) should no longer influence performance after an extended unlearning phase and subsequent introduction of a "new" high contingency (e.g., "find" in orange).
The second possibility we term the eventually-stablehabit scenario. In this scenario, while recently acquired contingency knowledge may be more pliable early on (i.e., as the recent-events-matter-most scenario suggests), the memory bias for an overtrained contingency is more stable after sufficient training. In other words, sufficiently repeated encoding of a stimulus-response binding into memory eventually makes it difficult for new bindings to "break" the overtrained habit. Thus, for the overtrained stimulus set, unlearning should be less rapid. In other words, we might expect the original contingency to "stubbornly persist" after the contingency is removed for overtrained stimuli (i.e., no unlearning). Similarly, acquisition of a new contingency during counterconditioning should be reduced when the original contingency was overtrained. For instance, if "find" is changed to be presented most often in orange (rather than purple), then speeded responses to "find" in orange should not emerge quickly (perhaps not at all). This would indicate that the find-purple habit is too strongly ingrained to be quickly overcome. At the same time, the "old" high contingency should continue to influence performance, that is, participants should continue to respond quickly to "find" in purple long after the find-purple contingency has been replaced by the find-orange contingency.
It should be noted in advance that the recent-eventsmatter-most and eventually-stable-habit scenarios are deliberately presented as extremes, the former proposing very myopic learning and the latter proposing stubborn habit persistence. The truth may equally well lie somewhere in between these two extremes. That is, there might be both some continued influence of older experiences (e.g., from overtraining) in addition to influences of recently-acquired information. This would imply that we adapt quickly to newly-experienced events, but do not "catastrophically forget" everything that came before. To foreshadow our results, exactly this sort of mixed influence of both the new and old experiences was observed.

Participants
Fifty Ghent University undergraduates participated in the study on two separate 30 minute sessions one day apart in exchange for €10. Our sample size was determined a priori, but partially subjectively. In particular, as we had never previously studied counterconditioning with this procedure, we did not know how large of an effect to expect (i.e., for a priori power calculation). The current sample size seemed more than reasonable based on our prior experiences with the procedure on related topics. Two participants, however, did not show up for the second session and were therefore removed from the sample. Another participant had 16% incorrect responses in the main part of the experiment (i.e., excluding the practice block). This was over 2.5 standard deviations above the mean sample error rate. This participant was also removed from the sample. This participant contributed some notable noise to the sample, but inclusion of this participant did not influence the most critical results. Another participant had an empty cell (i.e., no correct response times) and was also removed.

Apparatus
The experiment utilized a standard PC. Stimulus and response timing were controlled with E-Prime 2 software (Psychology Software Tools, Pittsburgh, PA). The experiment files, along with the raw data, participant averaged data, and R scripts are available on the Open Science Framework (https://osf.io/7fwae/). Participants responded to purple, orange, and grey stimuli with the "J," "K," and "L" keys, respectively, on an AZERTY keyboard. Although we did not enforce specific fingers for the three keys, all participants in this and related studies defaulted to the standard keyboard resting position (i.e., right index on the J-key, right middle finger for the K-key, and right ring finger for the L-key).

Design
The structure of the experiment is presented in Figure 2. In each of the two testing days, participants were exposed to four larger mega-blocks, each with 5 sub-blocks of 36 trials (180 trials in each mega-block; 1440 trials total). In addition, on Day 1 participants also began with a practice block of 36 trials to familiarize participants with the colour-to-key mapping. To this end, trials in the practice block consisted of the stimulus "@@@@" presented in each of three colours (purple, orange, and grey) 12 times each and participants were instructed to respond as quickly and accurately as possible. For the four experimental mega-blocks of each day, four sets of three words (Sets A-D) were created from a list of 12 four letter, first person Dutch verbs. The verbs are presented in Table 1. Note that which words were a part of which set and which words were presented most often in which colour were counterbalanced by assigning words, in the order listed in Table 1, to lists offset by participant number (e.g., "vind," "help," and "weet" were the Set A words for Participants 1, 13, 25, etc., "help," "weet," and "denk" were the Set A words for Participants 2, 14, 26, etc.). The first three experimental mega-blocks of Day 1 constituted the training phase in which Set A (e.g., "vind," "help," and "weet") and Set B words (e.g., "denk," "roep," and "geef") were presented. In each of five sub-blocks of each mega-block, one word from each set was presented most often (i.e., four of six times: 67%) in purple (e.g., "vind" and "denk"), a second most often in orange (e.g., "help" and "roep"), and a third most often in grey (e.g., "weet" and "geef"). Each word was presented once (17%) in each of the remaining two colours per sub-block. Note that Set A stimuli were the "overtrained" words that were used throughout the entire experiment. Set B words were filler words included in the first training phase to keep the task similar throughout (i.e., with six words in three colours). Because they were not of interest for the main analyses, we do not report the analyses on Set B stimuli, though we note that comparisons with Set A stimuli revealed nothing problematic. Sets C and D served as the non-overtrained stimuli that appeared later in the experiment. In the forth and final mega-block of Day 1, Set A stimuli remained, but Set B words were replaced by Set C words, which had a word-colour contingency manipulation identical to that for Sets A and B. In this final mega-block, it is possible to compare the magnitude of the learning effect of the heavily-overtrained Set A words with the newlyexperienced Set C words.
On Day 2, the first mega-block was identical to the last one experienced on Day 1 (i.e., with Set A and Set C words), allowing a comparison of Set A and Set C again, but after a night of consolidation. The next mega-block again maintained the Set A words, but replaced the Set C words with Set D words (i.e., another newly trained set). In Figure 2: Composition of the phases, mega-blocks, and sub-blocks in the two testing days, with the sub-blocks of each mega-block indicated as separate squares. The two stimulus sets (A-D) in a given mega-block are indicated along with the contingency (in percentage) for the high contingency response. Changes in stimulus sets and contingency percentages from the prior mega-block are presented in bold. Note that in the final mega-block the "old" contingency is weakened and a "new" contingency is introduced. Phase:

Mega-block:
Sub-block: the third mega-block to follow, the same Set A and Set D words were again presented, but the words were no longer predictive of the colour response. In particular, each of the words was presented two out of six times (33%) in each of the three colours. Thus, this mega-block constitutes the unlearning phase as it allows comparing the rate of unlearning for the heavily-overtrained Set A words with the newly-learned Set D words. In a final mega-block, which constitutes the counterconditioning phase, the same Set A and Set D words were again presented, but the contingencies were now changed. In particular, the word that used to predict purple was now presented most often in orange, the word that used to predict orange was presented most often in grey, and the word that used to be presented most often in grey was presented most often in purple. Thus, this mega-block allowed us to assess to what extent the new contingency is learned and whether the heavily-overtrained Set A contingencies are more resistant to a change in contingency than the newly-experienced Set D contingencies. For each of these sets (A and D), we can compare old high contingency items, which are pairings that previously had a high contingency but not anymore, to new high contingency items, which are pairings that currently have a high contingency but did not before. Both of these can further be compared to low contingency items, which are pairings that never had high contingency. Note that for most of the above-mentioned contrasts we might not only expect larger contingency effects for overtrained stimuli, but also potential main effects of word type. This is because both the high and the low contingency pairings with overtrained stimuli have been experienced more frequently than the high and low contingency pairings with recently-acquired contingencies (for further discussion of the frequency versus proportion distinction, see Schmidt & De Houwer, 2016a). In other words, we might expect that high contingency Set A items will be responded to faster than high contingency Set D items and, similarly, that low contingency Set A items will be responded to faster than low contingency Set D items.

Procedure
Stimuli were presented in the center of a black (0,0,0) screen, and presented in bold, 18 pt. Courier New font. On each trial, the participant was first presented a white (255,255,255) fixation "+" for 150 ms. This was followed by the word (or @'s during practice) in a neutral brown (255,183,113) for 150 ms, which was then colourized in one of the three target colours: purple (128,0,128), orange (255,165,0), or a light grey (192,192,192), which correspond to "purple," "orange," and "silver" in the standard E-Prime/HTML colour palette. This word preview was used because it is known to boost contingency effects (Schmidt & De Houwer, 2016b), likely because the word has more time to influence colour identification. The stimulus remained on the screen until either a response was made or 1500 ms elapsed. Following correct responses, the next trial began immediately. Following an incorrect response or 1500 ms without a response "XXX" was presented in white for 1000 ms before the next trial. Participants were instructed to try to respond as quickly and accurately as possible.

Results
Analyses focused on mean correct response times during the main phases of the experiment (i.e., practice phase excluded). Trials for which participants did not respond before the 1500 ms deadline were excluded, but no other response time trims were performed (as has been our standard practice with this paradigm). Error data are not reported here given that they were far too noisy to produce anything meaningful and that the general length of the reported analyses was already long, but there were no speed-accuracy trade-offs and the error data are available for download (along with the response time data and R scripts for both dependent measures) on the Open Science Framework (https://osf.io/7fwae/). In all analyses, subblock was treated as a linear factor, which is more sensible than treating the sub-block as an unordered factor. Indeed, a linear factor allows for inferences about increases or decreases across sub-blocks (whether for the main effect or interactions involving sub-block), whereas the same is not true when treating sub-block as a categorical factor (e.g., a significant sub-block effect could hypothetically emerge due to abnormally fast or slow responses in one of the middle blocks). Linear factors should generally be used for any interval or scale factor. In the interest of brevity, only the theoretically interesting contrasts are reported (whether significant or non-significant). We do not report the less interesting contrasts, unless p < .1 (thus, the reader can correctly assume that any nonreported factor or interaction is not significant). Note that we did not preregister, but all data analyses for this and the following two experiments were planned in advance and similar to those in our past reports with this task. The only exception was the addition of some Bayes tests, which we added in response to editor feedback. All Bayes tests were conducted with the BayesFactor package in r with the default Cauchy prior and 100,000 recomputes to increase precision. The contingency effect as a function of the mega-blocks is presented in Figure 3 and the subblock means and standard errors are presented in Table A1 for both response times and errors (see also the R scripts).

Day 1, Sets A and C
First, we compared Set A (which had already been trained for 15 sub-blocks) with the newly added Set C using a subblock (16-20) by contingency (high vs. low) by set (A vs. C) ANOVA to test for potential differences between the more heavily trained Set A stimuli over the new Set C stimuli. indicating faster overall performance for the overtrained Set A stimuli. In particular, Set A high contingency trials (mean = 612 ms, SE = 11) were marginally faster than Set C high contingency trials (mean = 622 ms, SE = 11), F(1,45) = 3.493, MSE = 2986, p = .068, 2 .07 p   , and Set A low contingency trials (mean = 626 ms, SE = 12) were significantly faster than Set C low contingency trials (mean = 639 ms, SE = 13), F(1,45) = 4.852, MSE = 4189, p = .033, 2 .10 p   . Thus, an overall advantage was evident for the overtrained stimuli, albeit only significantly so for the infrequent pairings. Globally, this is consistent with the notion that the frequency of co-occurrences between a stimulus and response is important, not just the proportion (Schmidt & De Houwer, 2016a). However, the contingency effect (i.e., the difference between high and low contingency trials) was not significantly different between Set A and C stimuli,   . The fact that the contingency effect was still robust in the absence of a contingency is also not consistent with the recent-eventsmatter-most hypothesis. There was also a marginal main effect of sub-block, F(1,45) = 3.483, MSE = 3324, p = .069, 2 .07 p   , again hinting at fatigue.

Day 2, counterconditioning
Finally and most critically, we considered the counterconditioning blocks where we directly pitted an old (recent or overtrained) high contingency against a newly-introduced inconsistent high contingency. Most importantly, we began by considering whether reduction in the effect of the old high contingency and learning of the new high contingency was faster with the new Set D stimuli than with the overtrained Set A stimuli. For this, we began by comparing the trials in which the word was presented with the colour that was high contingency during initial training (old high contingency) with trials in which the word was presented with the colour that is currently high contingency (new high contingency). Trials in which the word was presented in a colour that was low contingency in all phases will be considered afterwards. Thus, we first conducted a sub-block (36-40) by contingency (old high vs. new high) by set (A vs. D) ANOVA, and followed this with the relevant contrasts. There was no main effect of contingency, F(1,45) = 0.002, MSE = 3438, p = .969, 2 .01 p   . Note, of course, that this is not a test of the contingency effect per se, but of the comparison between the new versus old contingency. Thus, this should not be interpreted as no evidence of learning. There was a robust crossover interaction between contingency and set, F(1,45)

Discussion
Experiment 1 produced results that did not overwhelmingly support either of the two strong views mentioned in the Introduction. For some contrasts, there was no robust evidence of a difference between overtrained and newlyacquired contingencies. For instance, no differences were observed between Set A and Set D stimuli in the unlearning phase, where the contingency effect for overtrained Set A stimuli marginally decreased (i.e., did not "stubbornly persist" in an unchanged magnitude). Findings such as this could be considered as evidence for the recent-events-matter-most scenario, which predicts no differences between heavily-overtrained and newly-learned contingencies. On the other hand, overall response times were faster to the overtrained Set A stimuli relative to newly-added stimuli during the acquisition phases for Sets C and D (but only on Day 1 for the former). That is, there was a general speedup of responses to the overtrained stimuli, both high and low contingency, relative to recently-introduced stimuli. This is consistent with the notion that practice with the co-occurrence of stimuli (even if task irrelevant) and responses benefits performance (Lemercier, 2009;Schmidt & De Houwer, 2016a;Schmidt et al., 2016). That is, even though the proportions of high versus low contingency pairings are equivalent for overtrained and recently-introduced contingencies (i.e., 4/6 high contingency pairings, and 1/6 for each low contingency pairing), participants have more frequently observed each compound stimulus for the overtrained Set A stimuli (Y.-H. Lin, 2015). For instance, by the end of the acquisition phase for Set D, participants had seen each high contingency pairing 120 times for Set A, but only 20 times for Set D. Similarly, they had seen each low contingency pairing 30 times for Set A, but only 5 times for Set D. However, the contingency effect (i.e., difference between high and low contingency trials) was only (robustly) larger in the Set A versus Set D comparison. For Set C, it is possible that the overall advantage (i.e., main effect) for the overtrained stimuli worked against the contingency effect, in line with previous findings that effects tend to scale up with mean response time (Stevens et al., 2002;Urry, Burns, & Baetu, 2015;Schmidt & De Houwer, 2016b;Schmidt et al., 2016). That is, even though contingency knowledge may be stronger for the overlearned stimuli, there is less time for this contingency knowledge to be expressed (i.e., to influence colour decisions) as overall responding was quicker to overlearned stimuli. Stated differently, even though the contingency knowledge might be stronger for Set A, the difference between high and low contingency trials might not be notably larger in this set because the fast overall response speed to Set A stimuli precluded the influence of the contingency knowledge on the response times. During counterconditioning, there was a trend for faster responses to both the old and new high contingency trials relative to low contingency trials, though only the old contingency effect was robust for the overtrained Set A and only the new contingency effect was robust for newly added Set D items. The interaction between set (A vs. D) and contingency (new vs. old) was robust. This suggests that the old contingency persists to a greater extent with overtraining, and the new contingency is learned more quickly with newly-acquired contingencies. For all comparisons between overtrained and recently-acquired contingencies, no results pointed in the reverse direction than the eventually-stable-habits account would suggest (e.g., smaller effects of the old contingency for Set A), though an overwhelming difference between overtrained and recently-acquired contingencies does not seem apparent (i.e., for many contrasts there were no differences, and for others only small differences). Together, the results might suggest some carryover influence of overtraining, but a remaining potent influence of recent events (i.e., even with overtraining).

Experiment 2
Results from Experiment 1 suggest that neither of the two extreme views we discussed in the Introduction are correct. That is, it was not the case that only recent events had an influence on performance (which would have predicted no differences at all between Set A and Sets C and D), and it was also not the case that overlearning completely prevented any new learning (which would have predicted, for instance, no unlearning at all for Set A). Instead, the results supported an intermediate view, with some findings suggesting a lasting (though perhaps subtle) influence of overtraining, but with a robust influence of very recent experiences. Experiment 2 aimed to provide a conceptual replication of Experiment 1. The most important change was that we dropped the two-day design and instead used a longer single day design (with 6 mega-blocks in one day, instead of 4 per day). We also dropped the Set C stimuli. Put differently, Experiment 2 was identical to Experiment 1 in all respects except that sub-blocks 16-25 (the fourth and fifth mega-blocks in Figure 2) were dropped and the remaining 30 sub-blocks (6 mega-blocks) were tested in one day. This does imply, however, that the initial training duration for Set A is shorter in Experiment 2 and does not include a night of sleep consolidation. Also note that we continue to label the last-introduced words as "Set D" for consistency with the prior experiment even though there was no longer a Set C.

Participants
Sixty-one Ghent University undergraduates participated in the study in one 30 minute session in exchange for €5. A slightly larger sample was collected because we supposed that any potential differences between overtrained and recently-learned contingencies might be smaller with a shorter training period. Using the same exclusion criteria as the prior experiment, only one participant was excluded due to an empty cell.

Apparatus, Design, and Procedure
The apparatus, design, and procedure of the current experiment were identical to Experiment 1 with the following exceptions. The two mega-blocks with Set C stimuli from Experiment 1 (see Figure 2) were dropped and the remaining six mega-blocks were run in one day (i.e., 3 A+B learning mega-blocks, followed by A+D learning, unlearning, and counterconditioning blocks; 1080 trials in total).

Results
Data were analysed in the same manner as in the prior experiment. The contingency effect as a function of the mega-blocks is presented in Figure 5. The means and standard errors for the response times and errors are presented in Table A2 in the Appendix (see also the R scripts).   (note again that this is not a test of learning, but rather a comparison of old vs. new learning). In contrast to Experiment 1, however, the contingency by set interaction was not significant, F(1,59) = 0.347, MSE = 3756, p = .558, 2 .01 p η < , BF 01 = 9.64, as illustrated in indicating both some preservation of the old contingency and acquisition of the new one.

Discussion
Like Experiment 1, Experiment 2 produced some results indicating an overtraining advantage for Set A stimuli, but this time only in the comparison of the contingency effect for the just-introduced Set D stimuli relative to the overtrained Set A stimuli. In particular, there was a larger contingency effect for Set A. However, there was again no difference between the two sets during unlearning. Further, there was evidence of both a preservation of the old contingency and acquisition of the new contingency during the counterconditioning phase. Unlike Experiment 1, however, this did not seem to be influenced markedly by set. The old contingency effect does, on its own, indicate a persisting influence of older experiences: Even after 180 trials of unlearning and a subsequent introduction of an opposing contingency during counterconditioning, the originally-trained regularity continued to influence behaviour. However, even extensive overtraining (i.e., for Set A) did not seem to altogether prevent acquisition of a new contingency and this is not consistent with the strong view that heavy overtraining prevents acquisition of new knowledge.

Experiment 3
Results from Experiments 1 and 2 both provide some hints of an effect of overtraining, but not evidence for an overwhelming effect. For instance, the original contingency for Set A stimuli did not "stubbornly persist" through unlearning, instead reducing substantially as with the Set D stimuli. Further, Experiment 2 did not replicate the interaction between set (A vs. D) and contingency (old high vs. new high), instead showing significant effects for both new and old high contingencies (relative to low contingency) across sets, indicating both persistence of old contingency knowledge (inconsistent with the recent-events-mattermost scenario) and acquisition of new contingency knowledge (inconsistent with the eventually-stablehabits scenario). This might indicate that the amount of initial training is less important for persistence of the old contingency through counterconditioning than Experiment 1 suggested. Alternatively, it might be that the shorter overtraining phase was responsible for the lack of an interaction in Experiment 2. We therefore decided to run a third experiment as a conceptual replication of Experiment 1, that is, with a session of two separate days, but with an even longer training phase.

Participants
Fifty Ghent University undergraduates participated in the study in two sessions in exchange for €10, as in Experiment 1. Using the same exclusion criteria as the prior experiments, no participants were excluded.  The apparatus, design, and procedure of the current experiment were identical to Experiment 1 with the following exceptions. There were six mega-blocks per day instead of four, but we again excluded Set C stimuli and ran nine mega-blocks of A+B learning (i.e., 6 on Day 1 and 3 on Day 2), followed by the same A+D learning, unlearning, and counterconditioning mega-blocks as in the prior two experiments (2160 trials total).

Results
Data were analysed in the same manner as in the prior experiments. The contingency effect as a function of the larger blocks is presented in Figure 7. The means and standard errors for response times and errors are presented in Table A3 in the Appendix (see also the R scripts).

Sets A and D
First, we compared the overtrained Set A with a newlyintroduced Set D using a sub-block (46-50) by contingency (high vs. low) by set (A vs. D) ANOVA. The main effect of contingency was significant, F(1,49)  a tendency for the old contingency effect to decrease and the new contingency effect to increase across subblocks. Relative to low contingency trials (mean = 591 ms, SE = 9 in Set A and mean = 589 ms, SE = 9 in Set D), the contingency effect (across sets) was significant for the old contingency (mean = 572 ms, SE = 9 in Set A and mean = 573 ms, SE = 9 in Set D) trials (i.e., old vs. low), F(1,49) = 6.087, MSE = 10300, p = .017, 2 .11 p   , and for the new contingency (mean = 576 ms, SE = 8 in Set A and mean = 579 ms, SE = 7 in Set D) trials (i.e., new vs. low), F(1,49) = 8.001, MSE = 5310, p = .007, 2 .14 p   .

Discussion
As in the prior two experiments, Experiment 3 produced some results consistent with a benefit from overtraining. In particular, contingency effects were larger for Set A stimuli relative to Set D during initial acquisition of Set D contingencies. Similarly, response times were overall faster to Set A stimuli during unlearning. However, learning of the new contingency was observed for both sets of stimuli during counterconditioning. There was, nevertheless, again a persistence of the old contingency in the counterconditioning phase, which did not differ notably between the overtrained and recently-acquired stimuli. Together, the results again suggest a very strong influence of recent experiences, including for overtrained stimuli, as the recent-events-matter-most hypothesis would suggest. However, the results again suggest that learning is not completely "myopic" to only very recent experiences.

General Discussion
In the present series of experiments, we asked whether overlearning of contingencies either over two days (Experiments 1 and 3) or a single longer session (Experiment 2) would lead to a more stable learning effect, resistant to unlearning or counterconditioning, which we referred to as the eventually-stable-habit view. Alternatively, we considered the idea that learning might be rather "myopic" to recent events, whereby "habits" are maintained merely due to the continued repetition of recently-executed behaviours. This alternative view would suggest no observable differences between overtrained and recently-acquired contingencies. Our data are consistent with a more intermediate view: both lasting influences of older experiences and marked sensitivity to recent ones.
Some influences of older events were clearly observed. For instance, in all three experiments there was a larger contingency effect and/or an overall main effect speeding for Set A (overtrained) stimuli relative to newly-introduced Set B or C stimuli. This is also consistent with past findings that contingency effects, while acquired quite quickly, do tend to slowly increase with further training (Schmidt & De Houwer, 2016b). Similarly, contingency effects remained significant within the 180-trial unlearning phase, indicating persistence of a contingency when the regularity no longer applied. On the other hand, the

Contingency
Set A Set D contingency was not "stubbornly resistant" to the extent that no reductions in the learning effect were observed. Perhaps most interesting was the newly-introduced counterconditioning phase. In all three experiments, the trend was for persistence of the old contingency effect, albeit in a reduced form, despite the preceding 180-trial unlearning phase and the introduction of a new, competing contingency. There was also a trend for acquisition of the new contingency effect in the counterconditioning phase, however. In Experiment 1, there was a significant interaction between the old and new contingency effects. In particular, while the new contingency effect was not robust (though numerically trending) and the new contingency effect was significant for the recently-introduced stimuli (Set D), the reverse was true for overlearned stimuli: The old contingency effect persisted and the newly-changed contingency was non-significant (though again trending in the correct direction). This pattern did not replicate in Experiments 2 and 3, however, where both the old and new contingencies influenced performance regardless of set. The reason for this discrepancy is uncertain. The significant interaction in Experiment 1 could have been a Type 1 error. Alternatively, a true but very small effect might exist that was detected only in one experiment. Indeed, as a general trend "hints" in favour of some overtraining effect were rather systematically observed in our experiments, but only some findings were significant and no overtraining effects were overwhelmingly large. Globally, then, the results are again not consistent with the strong idea that overtrained stimuli are inflexibly resistant to new learning: Some lasting influence of older events is observed, but new contingencies are picked up even for overtrained stimuli.
It is further noteworthy that there were a number of contrasts between overtrained and recently-acquired stimuli that were performed (i.e., two main ones per phase if we consider both the main effect of set and the interaction between set and contingency). Only some of these came out as significant and most of the observed differences were not substantial. We also note that we did not make corrections for multiple comparisons and there was clearly some noise and inconsistencies across the many comparisons, so more targeted replications of specific findings seem warranted. However, we do note that all significant cross-set comparisons were in the direction predicted by the eventually-stable-habit view. Still, any lasting influences from overtraining seem rather underwhelming. It is perhaps important to stress that these findings were not, however, unclear, as the results revealed significant effects that are inconsistent with each of the "extreme" positions we discussed in the Introduction. For instance, the eventually-stablehabits view clearly should have predicted, for overtrained stimuli: (a) no reduction in contingency effects with unlearning (but this was observed), (b) no acquisition of new contingencies during counterconditioning (but this was also observed). Similarly, the results also are not consistent with the extreme view that learning is myopic to only very recently occurring events, as the persistence of the old contingency through unlearning and counterconditioning clearly demonstrates. Thus, robust effects argue against both extreme views and therefore suggest that a more moderate view is necessary.
Collectively, the results of the present three experiments are consistent with the idea that both frequency and recency influence the quality of representations (in this case, representations of contingency; see Moors, 2016). Our results might be coherently explained by one high learning rate memory mechanism (e.g., Logan, 1988;Schmidt et al., 2016). According to this view, the individual impact of a given experience on current behaviour is related to how long ago the past experience occurred. Recentlyexperienced events have a particularly potent influence on behaviour, whereas older and older memory traces have increasingly smaller (but non-zero) influences on current behaviour. This notion has often been referred to as the power law of practice (Logan, 1988;Newell & Rosenbloom, 1981). The actual form of acquisition may be exponential (Heathcote, Brown, & Mewhort, 2000;Myung, Kim, & Pitt, 2000), which, averaged across participants, appears more like a power function. In any case, the rough notion is illustrated in Figure 9. If the participant is, for instance, currently partway through a counterconditioning phase, then learning of a "new" contingency can be explained by the potent influence of recent events. However, there will also be continued influence of older events from the prior unlearning and acquisition phases, which can explain the persistence of an old contingency. Differences between overtrained (Set A) and recently-acquired (Set D) stimuli can be explained by the extra traces encountered for the former stimuli. The reason for relatively weak overtraining effects can similarly be explained by the decay of much older memory traces. Similar notions have been forwarded to explain skill acquisition (Logan, 1988) and repetition priming (Grant & Logan, 1993;Logan, 1990).
This theoretical account may also explain why more notable differences were observed between Set A and Set D stimuli during acquisition, but less clearly during unlearning and counterconditioning. In particular, the extra learning experiences for Set A stimuli were more recent during acquisition, but more distant in time when the unlearning and subsequent counterconditioning phases eventually began. In that vein, somewhat larger effects of overlearning might be observable during counterconditioning if the counterconditioning directly follows acquisition.
Related to the above discussion, the present results might prove informative in constraining conceptual and modelling accounts of learning. For instance, the Parallel Episodic Processing (PEP) model of Schmidt and colleagues (2016) has been used to simulate a range of findings from the colour-word contingency learning paradigm, along with a number of other binding, timing, attentional, and control phenomena. Similar to what we described above, the PEP stores traces of individual events and similaritybased retrieval of these "exemplars" produces learning effects. It is already the case in the PEP model that recently-encoded events are more strongly retrieved from memory that older events. This allows the model to learn quickly and simulate (with the same mechanism) more transitory influences on behaviour, such as distracterresponse binding effects (Frings, Rothermund, & Wentura, 2007). Future modelling work, however, could aim to see whether the PEP (or other models) is able to simulate the both the lasting and dynamic adaptations to contingencies as observed in the present report.
The current research introduces interesting new avenues for future research. For instance, we observed that an originally-trained contingency that was acquired over two sessions on two days persisted through 180 unlearning trials and even through another 180 counterconditioning trials. But how much longer would this effect persist? Various permutations of the learning and counterconditioning phase lengths might clarify this issue further. Presumably, the new contingency will overwhelm the old contingency eventually, and the 180-trial counterconditioning phase in the current report may have simply been too short. Future work with longer counterconditioning phases might therefore explore whether an old contingency does eventually extinguish entirely. Similarly, it may be the case that the old contingency eventually does become stable and completely prevents acquisition of a new contingency. Future research might aim to extend the initial training over much longer durations (e.g., weeks) before introducing counterconditioning.
More globally, the present results might suggest that much of what we consider to be a habit is driven by recent experiences. If recent experiences are, indeed, more influential on the maintenance of an automatic behaviour than is typically assumed, then this might have interesting implications. Although certainly speculative, to change automatic behaviour (e.g., undesirable habits), it might suffice to force a change in a limited number of recent experiences. That is, the default response to a stimulus could change even if only the most recent experiences are different from many old ones. At the same time, our data show that old memory traces will persist to some degree, which could explain relapse of old automatic behaviours, including undesirable habits. Such relapse might be particularly likely as time elapses. In this case, events inconsistent with the original default response (e.g., counter-habitual behaviours) will lose their advantage of recency so that traces of old automatic behaviours can resurface. This is related to explanations for spontaneous recovery, that is, the re-emergence of a previously-extinguished behaviour (Briggs, 1954). It would thus be interesting to rerun our studies but add a delay between counterconditioning and a subsequent unlearning test phase. It might be the case that the old contingency resurfaces more strongly with delay (Pavlov, 1927).
Relatedly, two of our three experiments included two days of training. This was largely done with the aim of breaking up the lengthy training. However, the twoday training did involve an intermediate night of sleep reconsolidation. Future work might explore the role of consolidation more directly. Some existing work already suggests that consolidation does play some role in the strength of learning. For instance, in Geukes, Gaskell, and Zwitserlood (2015) trained participants with novel words and colour words as pairs and observed Stroop-like (or learning) effects when intermixed with colour word distracters. Without colour word distracters, however, the Stroop effect was only observed on Day 2 (i.e., after consolidation). Similarly, the role of sleep or simply time in consolidation could be explored (Lindsay & Gaskell, 2013).
Another factor of interest that might be interesting to explore in the current design is context dependency. It is likely that, in real life, automatic (e.g., habitual) behaviours have been emitted in many contexts whereas recent (e.g., counter-habitual) behaviours inconsistent Figure 9: An illustration of a power law influence on behaviour. Recently-encoded events have a particularly strong influence on behaviour, whereas older and older memories have ever diminishing influences. In this example, the counterconditioning phase is marked in dark grey, the unlearning phase in light grey, and acquisition in white.

First Set D trial
First Set A trial with the original learning have been emitted in more restricted contexts (e.g., therapy). Also, from learning psychology we know that first learning is less context dependent than new learning (for a review, see Bouton, 2004). Hence, old memory traces might have more impact in new contexts than in those contexts specifically used for unlearning or counterconditioning. To explore this notion with the present materials, one might therefore imagine introducing contextual changes, for instance, after counterconditioning to see whether this leads to a re-emergence of the old contingency. We also remind the reader that "habit" is inconsistently defined in the literature (see the Introduction). Thus, depending on how one defines a habit, our lengthy acquisition phase may or may not be considered sufficient to establish a habit. It is similarly not clear whether the incidental learning effects explored in the present report are due to goal-free stimulus-response learning or whether the learning also includes goaldirected actions. Independently of whether the absence of goals should be considered a defining feature of a habit (see De Houwer, 2019, for concerns with this perspective), it remains an interesting open question to explore in future research (e.g., by conducting similar studies with or without deliberate learning goals).