In this paper, we draw connections between reward processing and cognition by behaviourally testing the implications of neurobiological theories of reward processing on memory. Single-cell neurophysiology in non-human primates and imaging work in humans suggests that the dopaminergic reward system responds to different components of reward: expected value; outcome or prediction error; and uncertainty of reward (Schultz et al., 2008). The literature on both incidental and motivated learning has focused on understanding how expected value and outcome—linked to increased activity in the reward system—lead to consolidation-related memory enhancements. In the current study, we additionally investigate the impact of reward uncertainty on human memory. The contribution of reward uncertainty—the spread of the reward probability distribution irrespective of the magnitude—has not been previously examined. To examine the effects of uncertainty on memory, a word-learning task was introduced, along with a surprise delayed recognition memory test. Using Bayesian model selection, we found evidence only for expected value as a predictor of memory performance. Our findings suggest that reward uncertainty does not enhance memory for individual items. This supports emerging evidence that an effect of uncertainty on memory is only observed in high compared to low risk environments.
Introduction
We are constantly encoding information; however, relatively little of that information is eventually consolidated into memory. In order to be adaptive, memory must be selective and prioritise information that is likely to be relevant to future decisions. There are many ways in which reward can affect the consolidation of newly learned material. For example, students studying for exams will (ideally) be actively focusing their attention and resources to promote memory for information that is likely to be tested on an exam (motivated learning). In other situations, value may be more incidental to information that might be later remembered; for example, a child might enjoy interacting with a new object and therefore be more likely to remember its name (incidental learning).
The effects of reward on learning have been studied in the context of both motivated and incidental learning. Neurobiological mechanisms have been proposed to account for both types of learning (Adcock, Thangavel, Whitfield-Gabrieli, Knutson, & Gabrieli, 2006; Shohamy & Adcock, 2010; Wittmann, Dolan, & Düzel, 2011; Wittmann et al., 2005). Research has focused on the role of the neurotransmitter dopamine in increased hippocampal consolidation of reward-based memories (Lisman & Grace, 2005; Shohamy & Adcock, 2010). Additionally, salient and emotional episodic memory enhancements have been linked to increased activity in the locus-coeruleus-norepinephrine (LC-NE) system (Clewett & Mather, 2014; Clewett, Sakaki, Nielsen, Petzinger, & Mather, 2017; Preuschoff, ‘t Hart, & Einhauser, 2011). Given the difficulty of individually isolating motivational factors such as reward, emotion and arousal, it is likely that multiple neurobiological systems support increased hippocampal encoding (Madan, 2017; Shaikh & Coulthard, 2013; Shohamy & Adcock, 2010; Takeuchi et al., 2016). Furthermore, reward-based learning may be supported either by synaptic (Lisman & Grace, 2005) or systems level consolidation processes (Braun, Wimmer, & Shohamy, 2018; Murty, DuBrow, & Davachi, 2018; Studte, Bridger, & Mecklinger, 2016).
Reward-based learning is often considered within the context of reinforcement learning models (Diederen et al., 2017; Sutton & Barto, 1998). The reward signal is comprised of the expected value, the actual reward outcome or the prediction error. In such models, prediction errors are used to update the current belief about the value of different actions in order to maximise future rewards. It has been suggested that neurons in the dopaminergic system encode the prediction error term of these models (Schultz, 1998). Such models account for learning and decision-making, but the precise relationship between the reward signal and individual episodic memories is less clear (Bornstein & Norman, 2017; Diederen et al., 2017; Lengyel & Dayan, 2007). The effective reward value signal might be expected value, reward outcome or prediction error. Previous studies have not clearly distinguished between anticipated rewards and actual outcomes (Wittmann et al., 2005), more recently the focus has shifted to the relationship between reward cue and reward outcome (Bialleck et al., 2011; Bunzeck, Dayan, Dolan, & Duzel, 2010; Mason et al., 2017b; Mather & Schoeke, 2011). There is evidence to suggest that memory enhancements could be attributed to either reward anticipation or a post-encoding enhancement of items after reward delivery (Gruber, Ritchey, Wang, & Doss, 2016; Murayama & Kitagami, 2014; Patil, Murty, Dunsmoor, Phelps, & Davachi, 2017).
Reward uncertainty is another important, but often ignored, signal that refers to the predictability of the outcome of an event. It tells us the spread of the reward probability distribution irrespective of the magnitude (Tobler, O’Doherty, Dolan, & Schultz, 2007). In the case where there are two possible outcomes (e.g. reward vs. no reward), expected value increases linearly with the probability of receiving a reward, whereas uncertainty follows an inverted U-shaped function of probability of reward, and is maximal at p = 0.5. A common measure of uncertainty is entropy. Entropy is calculated as the negative weighted sum of the logarithm of the probabilities of each possible outcome –ΣO PO log2PO. Where PO is the event outcome (reward or no reward). Reward uncertainty is likely to be signalled by multiple systems. It has been associated both with changes in activity in the dopaminergic reward system and the LC-NE system, which also signals arousal and surprise (Clewett & Mather, 2014; Clewett et al., 2017; Kempadoo, Mosharov, Choi, Sulzer, & Kandel, 2016; Preuschoff et al., 2011). fMRI studies in humans have demonstrated distinct coding of reward expected value and uncertainty (D’Ardenne, Mcclure, Nystrom, & Cohen, 2008; Glimcher, 2011; Hsu, Krajbich, Zhao, & Camerer, 2009; Liu, Hairston, Schrier, & Fan, 2011; Ludvig, Sutton, & Kehoe, 2008; Preuschoff, Bossaerts, & Quartz, 2006;Preuschoff, Quartz, & Bossaerts, 2008; Schultz et al., 2008; Tobler, Fiorillo, & Schultz, 2005; Tobler et al., 2007). Tobler et al. (2007) found that stimuli associated with increases in expected value elicited monotonically increasing activation in the striatum, whereas stimuli associated with higher variance led to increased activation in the orbiofrontal cortex. Other studies have indicated that the reward signal is comprised of temporally distinct linear and quadratic responses to expected value and uncertainty within dopamingeric brain regions such as the striatum (Cooper & Knutson, 2008; Dreher, Kohn, & Berman, 2006; Rolls, McCabe, & Redoute, 2008).
The link between dopaminergic activity and uncertainty on the one hand, and dopaminergic activity and memory enhancement on the other, suggests that we should expect to see a behavioural relationship between reward uncertainty and memory performance. This has only recently been given any attention in the literature. A recent study examined the effects of reward uncertainty on recognition memory. Rouhani, Norman, and Niv (2018) found that participants remembered items that occurred within a high-risk context (large variance in reward distribution) better than in a low-risk context. They also found that across risk contexts, surprise or unsigned prediction error, was the best predictor of memory for individual items (see also De Loof et al. (2018)). The authors suggested that uncertainty experienced in high-risk reward environments may improve memory in these contexts (Duncan, Sadanand, & Davachi, 2012;Mather, Clewett, Sakaki, & Harley, 2015).
What isn’t clear is whether the relationship between reward uncertainty and memory holds at finer time scales within the experimental context. Here, we ask whether variations in reward uncertainty during the experiment are linked to variations in recognition memory accuracy. In motivated learning we (Mason et al., 2017a) tested the effects of reward components on episodic memory encoding. On each trial, participants were presented with a reward probability followed by the to-be-remembered item. They were then presented with the reward outcome, but earning this was contingent upon correctly recognising the item at a delayed memory test. For each item that participants were presented with we were able to test the influence of reward expected value, prediction error, outcome and uncertainty. Across four behavioural studies we found consistent evidence against an effect of reward uncertainty on memory, and only found evidence favouring an effect of reward outcome on memory, with higher reward outcomes leading to better memory than lower outcomes or an absence of a reward.
In principle, it is possible that rewards act differently on memory when items are studied under incidental or motivated learning conditions (Cohen, Rissman, Hovhannisyan, Castel, & Knowlton, 2017;Spaniol, Schain, & Bowen, 2013). During motivated learning participants engage in different strategies to enhance encoding: these include selective attention and differential resource allocation (Ariel & Castel, 2014; Castel, 2008; Castel, Benjamin, Craik, & Watkins, 2002; Eysenck & Eysenck, 1982; Loftus & Wickens, 1970; Stefanidi, Ellis, & Brewer, 2018), and directed forgetting (Fawcett & Taylor, 2008; Friedman & Castel, 2011; Hayes, Kelly, & Smith, 2013; Lehman & Malmberg, 2009; Wylie, Foxe, & Taylor, 2008). The learner also has the expectation at encoding that reward outcomes depend upon successful memory performance at test (Adcock et al., 2006). In contrast, in incidental learning paradigms the rewards are delivered at the time of learning and are found to increase recognition and recall of items associated with the rewards (Mather & Schoeke, 2011; Wittmann et al., 2011). Given the difference in reward delivery, it is conceivable that incidental learning relies to a greater extent on neurobiology mechanisms such as dopaminergic consolidation, and we may see a stronger coupling between rewards signals, identified in the neurobiological literature, and memory performance. Given the potential for involvement of different behavioural and neurobiological contributions under incidental versus motivated learning, it is possible that an uncertainty–reward relationship might exist for individual items under incidental conditions.
Current Experiment
Accordingly, we conducted a behavioural experiment to assess the contribution of reward factors during incidental episodic memory encoding. The purpose of this paper is to test whether reward uncertainty influences memory on a trial-by-trial basis under incidental learning conditions. In addition, we will examine the influence of other reward predictors in order to identify the reward signals that drive memory performance at the behavioural level.
The reward task used in this experiment was developed by Preuschoff and colleagues and has been used to examine both dopaminergic reward signalling of uncertainty and to dissociate uncertainty and surprise (Preuschoff et al., 2006, 2011). In addition, the manipulation used in this experiment has been shown to induce a clear neural signature of reward uncertainty in the striatum (Preuschoff et al., 2006, 2008). To examine the effects of uncertainty on memory, a delayed recognition memory test was used to probe memory for words that were originally paired with rewards. The neuroimaging results from the Preuschoff (2006) experiment indicated time-dependent encoding of value and risk in the ventral striatum. Risk-related activity followed an inverted U shape function of probability, whereas the relationship between value and probability was linear. These findings are supported by evidence from several other studies exploring the neural correlates of risk (Cooper & Knutson, 2008; Dreher et al., 2006; Rolls et al., 2008;Tobler et al., 2007). The question is to what degree, if at all, reward uncertainty enhances memory. Furthermore, the aim of this experiment was to provide a comparison of the different components of reward: expected value; outcome; prediction error; uncertainty of reward; and surprisal as motivated by extensive research in single-cell neurophysiology in non-human primates and imaging work in humans (Cromwell & Schultz, 2003; Fiorillo, Tobler, & Schultz, 2003; Hollerman & Schultz, 1998; Schultz, 1998, 2002;Schultz et al., 2008; Tobler et al., 2005).
Methods
We pre-registered the experiment at https://aspredicted.org/rn2hy.pdf. The data is available on Open Science Framework https://osf.io/xkpfz/. All participants provided informed consent and the study was approved by the UWA Human Research Ethics Office.
Participants
Fifty students were recruited from the University of Western Australia undergraduate participant pool and were reimbursed with course credits. Sample size was based on anticipated effects from our previous studies examining reward-related learning. We are using Bayesian statistics as our inferential framework, which allows us to competitively test models and explicitly calculate a strength of evidence for these models. Participants had the chance to earn a maximum of $5.00 (all dollar values are in AUD), with an average of $2.77 (SD = 0.53) One participant was excluded from the analysis as their data did not save due to a network issue. This left a sample size of 49 participants (female = 32, M age = 21.14, SD = 5.63).
Experiment task
Stimuli
The stimuli for the recognition task were English words. A total of 216 words were used, taken from a pool of 400 words used in Mason et al. (2017a; obtained in turn from Oberauer, Lewandowsky, Farrell, Jarrold, and Greaves, 2012). All words were concrete nouns, and were chosen to refer to common objects that are larger or smaller than a soccer ball, with the pool consisting of 108 objects rated as larger and 108 rated as smaller. The words had an average length of 5.77 letters (SD = 1.84). The experiment was programmed and presented using the Psychophysics Toolbox for MATLAB version 2.54 (Brainard, 1997) on a standard desktop computer.
Learning Task
For each participant the experiment was conducted in two sessions occurring on different days. In the first session participants were exposed to a series of words, each word associated with a reward value with varying degrees of probability. There were three levels of probability (0.125, 0.5, 0.875) and two levels of uncertainty (low, high, and low respectively). We decided to test the conditions where reward uncertainty was greatest (.5) and two comparison points that had the same uncertainty but different expected values. The findings in the reward-memory literature do not often detect fine-grained effects (Bunzeck et al., 2010; Mason et al., 2017b;Wittmann et al., 2011) and we wanted to maximise the chances of detecting the effect.
On each trial participants placed a bet, following which they could either win or lose $0.15. The “betting” task was a simple task with simulated playing cards. Two cards were drawn without replacement by the computer from a simulated set of playing cards (ace to 9, where ace was low). The first card was drawn at random from a subset of cards (2, 5 or 8). The second was then drawn from the remaining 8 cards. Participants were to bet on whether the second card drawn would be higher or lower than the first card. When the bet was placed participants had not seen either card so they always had a 50% chance of winning (vs losing) the bet. Once the first card was drawn the probability of winning was known to the participants. The outline of a trial is shown in Figure 1.
If, for example the participant bet on the second card being higher, then the probability of winning was equal to the total number of cards in the deck (9) minus the number displayed on the card drawn (C1) divided by the number of remaining cards in the deck (8): Pwin = (9-C1)/8. The first card was always a 2, 5 or 8 which meant the probability of winning was either 0.125, 0.5, 0.875. The reward value was kept constant on each trial, and so the expected reward and risk varied directly as a function of probability of winning.
On each trial, a word was shown after card one and before card two. In the task used by Preuschoff et al. (2006) card one was displayed for 1.5 seconds, followed by an anticipatory period of 5.5 seconds before card two was presented. In the current experiment, card one was displayed for 1.5 seconds, followed by a fixation cross for 500 ms seconds. The target word was then displayed for 4 s. To ensure that the words presented were attended to, participants were required to indicate whether the object was smaller or larger than a soccer ball. Participants used the left and right arrow keys (with their index and middle fingers of their dominant hand) to input their response. The target word remained on screen after this response. At end of the 4 second period, a fixation cross was then displayed for 500 ms, before card two was displayed for 1500 ms. Participants then had 2000 ms to select one of two boxes indicating whether or not they had won the bet, to make sure the trial events were attended to and understood. If a participant responded incorrectly to this question they had a penalty amount of $0.05 deducted. There was an inter-trial interval of 500 ms. If no bet was placed the bet was lost, and if participants failed to correctly report the outcome of the bet they lost $0.05.
Before beginning the experiment, the process was explained to participants and worked examples were given for each of the possible bets and card outcomes. Participants completed 10 practice trials during the first session. The experiment was run as a series of three blocks, with 36 trials in each block. At the end of the three blocks the participants randomly selected which block’s earnings they would keep. The lowest overall bonus payment a participant could earn was $0.
Recognition memory test
The second session always occurred the day after the first session. This was usually exactly 24 hours later and always a minimum of 12 hours. In the second session, participants completed a recognition test on the words shown in the first session. Each of the 108 old words was shown, randomly intermixed with 108 new words. Participants were required to make an old/new judgement using the left and right arrow keys.
Data Analysis
Data Exclusion
During the first session, participants were asked to report whether or not they bet correctly in each trial. This was included in the experimental design to assess whether participants were maintaining attention during the task. It was assumed that participants who reported their bet outcome correctly at least 80% of the time performed well in this task, and were likely paying attention. 8 participants were excluded for not meeting the reporting requirement leaving a total sample size of 41.
Results
Model Comparisons
The dependent variable was each participants’ mean hit rate across each of the 6 conditions: reward probability (0.125, 0.5, 0.85) crossed with reward outcome (0 or 0.15) Figure 2 shows. The false alarm rate was 0.23 (SE 0.02), which is comparable to previous studies and indicates that participants are performing above chance (Mason et al., 2017a). We then conducted a mixed-effects regression. This allowed us to accommodate individual differences, at least in overall performance levels (by way of a random subject factor). A Bayesian model comparision approach was used to assess the unique contribution of different predictors. For each of the 6 experimental conditions we were are able to test the following theoretically relevant predictors: expected value, prediction error, reward outcome, reward uncertainty and surpisal. Definitions of these predictors are listed in Table 1. We tested only the individual predictors, i.e. we did not include the interaction terms. However, the interaction of interest between reward probability and reward outcome is effectively captured by the predictor surpisal.
Predictor . | Description . |
---|---|
Expected Value (EV) | Probability of obtaining a reward multiplied by the reward magnitude |
Reward Outcome (O) | Magnitude of the reward obtained |
Prediction Error (PE) | Expected value of the reward minus the reward outcome |
Reward Uncertainty (U) | The entropy – ΣOPO log2PO |
Surprisal (S) | Information gained from observing an outcome O – log2(PO)where O is the outcome, PO is the probability of that outcome the outcome (reward or no reward) |
Predictor . | Description . |
---|---|
Expected Value (EV) | Probability of obtaining a reward multiplied by the reward magnitude |
Reward Outcome (O) | Magnitude of the reward obtained |
Prediction Error (PE) | Expected value of the reward minus the reward outcome |
Reward Uncertainty (U) | The entropy – ΣOPO log2PO |
Surprisal (S) | Information gained from observing an outcome O – log2(PO)where O is the outcome, PO is the probability of that outcome the outcome (reward or no reward) |
Models were fit using the “lmer” function in the lme4 package (Bates, Mächler, Bolker, & Walker, 2015). The Bayesian Information Criteria (BIC) provided can be converted to an approximation of a Bayes Factor (assuming the unit information prior) according to the following rule: BFM1_M2 = exp(–0.5* (BICM1–BICM2)) Raftery (1995). The BICs assumed prior is relatively uninformed, and tends to be conservative (i.e., it can favour the null hypothesis more than under an informed prior; Weakliem, 1999).
For our model comparisons we first selected the model with the lowest BIC value and we then compared each of the other models to this model see Table 2. For each comparison, the Bayes factor provides relative evidence for each of the models conditional on the data. It informs us how much our prior beliefs should shift in response to the data obtained. Although there are no strict cut-offs, according to Jeffreys (1961) we can interpret odds greater than 3 as some evidence, odds greater than 10 as strong evidence, and odds greater than 30 as very strong evidence for a particular hypothesis compared to an alternative (see also Wagenmakers, 2007). In addition, to illustrate the goodness of fit we plot the predictions of each of the best models (model with the lowest BIC) alongside the data.
. | Model . | BIC . | BayesFactor . |
---|---|---|---|
LMEV | EV | –75.65 | 1.00 |
LMBase | Base | –71.96 | 6.32 |
LMEVUn | EV & U | –67.12 | 70.94 |
LMPEOut | PE & O | –64.91 | 214.10 |
LMEVOut | EV & O | –64.91 | 214.10 |
LMPEEV | PE & EV | –64.91 | 214.10 |
LMUn | U | –63.43 | 448.92 |
LMPE | PE | –61.73 | 1,052.40 |
LMOut | O | –61.25 | 1,339.36 |
LMSup | S | –59.63 | 2,999.18 |
LMPEUnOut | PE & U & O | –56.39 | 15,189.49 |
LMOutUnEV | O & U & EV | –56.39 | 15,189.49 |
LMPEUnEV | PE & U & EV | –56.39 | 15,189.49 |
LMUnSupEV | U & S & EV | –54.52 | 38,591.64 |
LMPEU | PE & U | –53.20 | 74,723.47 |
LMOutUn | O & U | –52.72 | 95,100.54 |
LMPESupOut | PE & S & O | –52.57 | 102,356.60 |
LMPESupEV | PE & S & EV | –52.57 | 102,356.60 |
LMEVSupO | EV & S & O | –52.57 | 102,356.60 |
LMUnSup | U & S | –50.86 | 240,525.53 |
LMPES | PE & S | –49.40 | 499,221.12 |
LMOutSup | O & S | –48.92 | 635,056.22 |
LMPEUnSupEV | PE & U & S & EV | –43.79 | 8,256,740.34 |
LMEVUnSupOut | EV & U & S & O | –43.79 | 8,256,740.34 |
LMPESupUnEV | PE & S & U & EV | –43.79 | 8,256,740.34 |
LMPESUn | PE & S & U | –40.64 | 40,036,172.54 |
. | Model . | BIC . | BayesFactor . |
---|---|---|---|
LMEV | EV | –75.65 | 1.00 |
LMBase | Base | –71.96 | 6.32 |
LMEVUn | EV & U | –67.12 | 70.94 |
LMPEOut | PE & O | –64.91 | 214.10 |
LMEVOut | EV & O | –64.91 | 214.10 |
LMPEEV | PE & EV | –64.91 | 214.10 |
LMUn | U | –63.43 | 448.92 |
LMPE | PE | –61.73 | 1,052.40 |
LMOut | O | –61.25 | 1,339.36 |
LMSup | S | –59.63 | 2,999.18 |
LMPEUnOut | PE & U & O | –56.39 | 15,189.49 |
LMOutUnEV | O & U & EV | –56.39 | 15,189.49 |
LMPEUnEV | PE & U & EV | –56.39 | 15,189.49 |
LMUnSupEV | U & S & EV | –54.52 | 38,591.64 |
LMPEU | PE & U | –53.20 | 74,723.47 |
LMOutUn | O & U | –52.72 | 95,100.54 |
LMPESupOut | PE & S & O | –52.57 | 102,356.60 |
LMPESupEV | PE & S & EV | –52.57 | 102,356.60 |
LMEVSupO | EV & S & O | –52.57 | 102,356.60 |
LMUnSup | U & S | –50.86 | 240,525.53 |
LMPES | PE & S | –49.40 | 499,221.12 |
LMOutSup | O & S | –48.92 | 635,056.22 |
LMPEUnSupEV | PE & U & S & EV | –43.79 | 8,256,740.34 |
LMEVUnSupOut | EV & U & S & O | –43.79 | 8,256,740.34 |
LMPESupUnEV | PE & S & U & EV | –43.79 | 8,256,740.34 |
LMPESUn | PE & S & U | –40.64 | 40,036,172.54 |
The results indicated that the best model was the Expected Value only model. The Bayes Factor model comparisons indicates some evidence that this model better accounts for the data than the base model containing no effects of probability (LMBase). Critically, the Expected Value model was strongly favoured over all other models, including all models incorporating an effect of reward uncertainty.
Discussion
In this experiment we compared how a range of reward-related predictors influence incidental memory performance. Using a behavioural task developed to elicit reward uncertainty during encoding, we found that the expected value of a reward was the best predictor of memory for the words temporally linked to rewards. In our task participants were presented with a word between reward cue (which predicted the reward outcome with greater or lesser certainty) and reward outcome. We used mixed-effects modelling to compare how different reward factors predicted recognition memory performance in a delayed surprise memory test. Our study is the first to directly compare different reward-related predictors (expected value, reward outcome, prediction error, uncertainty and surprisal) in their effect on incidental memory.
The results from our experiment—showing a specific effect of expected value—contribute to the growing body of evidence that signals related to reward prediction error, reward outcomes (Mason et al., 2017b) and expected value (Jang, Nassar, Dillon, & Frank, 2018) are consistently shown to affect reward-based memory consolidation. There has been extensive research on both the role of prediction errors in learning and decision-making (Diederen et al., 2017; Rouhani et al., 2018) and the potential relationship between prediction errors and episodic memory formation on a trial-by-trial basis (Bunzeck et al., 2010; Ergo, De Loof, Janssens, & Verguts, 2019; Jang et al., 2018; Mason et al., 2017b;Rouhani et al., 2018; Wimmer, Braun, Daw, & Shohamy, 2014). A few studies have found evidence in favour of the this, however, there appears to be more consistent evidence that reward outcomes are a strong predictor of memory in incidental learning (Bunzeck et al., 2010;Mason et al., 2017b; Mather & Schoeke, 2011; Murayama & Kitagami, 2014). Although evidence generally emerges for these signals as predictors, not all studies have provided consistent evidence for effects of all on memory. While this may be partly due to sampling variability, it may also be the case that different experimental procedures may lead to one of these signals becoming more salient and influencing memory to a greater degree than others. For example, in the current study participants were explicitly told the expected value of each reward cue, and the reward outcome was revealed later in the trial. In other studies, the cue and outcome appear closer in time which may serve to emphasise their relationship (Bunzeck et al., 2010; Mason et al., 2017b; Mather & Schoeke, 2011). Another potential objection is that the majority of studies, including our own, provide participants with small financial incentives on each trial. In the current study, it does appear that people were response to the incentives as we observed an effect of expected value on memory. However, we know that people are motivated by factors other than money (Deci, Koestner, & Ryan, 1999) and that rewards of different magnitudes effect risk-seeking behaviour and potentially memory (Konstantinidis, Taylor, & Newell, 2017;Ludvig, Madan, Mcmillan, & Spetch, 2018). Therefore, future studies in this area may benefit from using an points based incentivisation scheme.
An additional issue worth considering is the relationship between the reward and the memory stimulus. Murayama and Kitagami (2014) found that rewards promoted memory for items presented after an unrelated reward task. In our experiment, the to-be-remembered item was not directly linked to earning a reward, but instead was presented for encoding between the reward cue and outcome; so was still embedded within the reward task (Mather & Schoeke, 2011). Arguably, these designs mean that the even under incidental learning the rewards are motivationally linked to the memory stimuli, suggesting that we need to be aware of the motivational influences more broadly (Madan, 2017).
There has been a broad interest in the functional link between mesolimbic system and episodic memory formation. Activation of the mesolimbic reward system during encoding has been consistently shown to increase hippocampal consolidation. Early studies focused on reward-related activation of the mesolimbic system. A variety of factors related to motivation have been associated with this functional link, including value, reward anticipation (Adcock et al., 2006), active decision-making (Murty et al., 2018), and curiosity (Gruber, Gelman, & Ranganath, 2014; Marvin & Shohamy, 2016). In many situations and experimental designs several of these factors are likely to interact to influence memory encoding, which may contribute to discrepancy in findings within the literature.
We found evidence against an effect of reward uncertainty on memory for individual items. This supports findings from our recent study examining reward uncertainty in motivated learning (Mason et al., 2017a). We predicted that if reward uncertainty does influence episodic memory encoding, the effects would be larger during incidental learning when the conditions of learning do not promote strategic learning. The evidence from this and the current study supports the overall conclusion that reward uncertainty related to individual items does not enhance episodic memory performance. This finding is of interest in itself, but also in the context of a growing interest in the potential contribution of environmental risk to learning and memory (Diederen et al., 2017; Rouhani et al., 2018). Rouhani et al. (2018) present the first study to directly compare memory encoding under high and low risk reward environments and demonstrate a positive benefit of high-risk contexts on learning. These findings may explain why previous studies looking at uncertainty and learning in classrooms have supported the notion that uncertainty improves learning (Howard-Jones, Jay, Mason, & Jones, 2016;Ozcelik, Cagiltay, & Ozcelik, 2013). For example, Howard-Jones et al. (2016) demonstrated that learning through a quiz based game—where rewards were delivered probabilistically compared to completing multiple choice questions in return for a fixed number of points—led to better memory performance in a subsequent test. Overall, there appears to be growing support for the idea that environmental reward uncertainty promotes learning, which could be linked to increased arousal (Miendlarzewska, Bavelier, & Schwartz, 2016; Rouhani et al., 2018).
Similarly, there is evidence from the decision-making literature that memory may underpin risk-seeking behaviours (Madan, Ludvig, & Spetch, 2014). In these studies participants show better memory for extreme outcomes associated with a risky option and presumably it is the expected value of the extreme that is driving the better memory and the risk seeking behaviours. However, it would be interesting to if our findings changed as a function of making risky choices. The current design asked participants to place bets on each trial where they could either win a small about or not win. Future studies, could examine memory when participants are required to place bets intermittently for both gains and losses.
Our findings suggest that there is not a necessary link between uncertainty and memory encoding. One explanation could be that we did not observe an effect of uncertainty on memory as our manipulation did not induce a sufficient state of uncertainty, and did not produce the assumed dopaminergic signal changes (we do not have a physiological measure of uncertainty). We have adapted the behavioural task used by Preuschoff and colleagues (Preuschoff et al., 2006, 2011), who found clear evidence of a direct relationship between reward uncertainty and dopaminergic activity. Given that our task was very similar to that of Preuschoff and colleagues, there is little reason to think that we did not induce a state of uncertainty at encoding. It should also be recognised that despite our null finding, there are several potential mechanisms by which activity related to reward uncertainty could nonetheless promote memory encoding and consolidation. Shohamy and Adcock (2010) suggested that tonic dopamine associated with reward uncertainty may increase the number of disinhibited neurons, thereby increasing the likelihood that dopamine neurons would burst in response to individual events when there is high environmental uncertainty. It is plausible and consistent with our results that such a mechanism was at play during the experiment. However, our results that expected value of reward influences memory are most consistent with phasic activity of dopamine neurons enhancing hippocampal activity. We did not find evidence that prediction errors or surprise—usually associated with activity in the LC-NE system—enhanced memory performance (Clewett et al., 2017), however it is likely that there are additional neurobiolgical mechanisms at play when learning occurs in complex reward-based environments.
The current study adds weight to several previous indicating that the relationship between reward and individual items in episodic memory is modulated by reward value (Mason et al., 2017a; Murayama & Kitagami, 2014; Wittmann et al., 2011). Our findings, in combination with previous studies, highlight that the precise relationship is sensitive to the rewards cues and outcomes used in the experimental task. Nonetheless, there is clear evidence that reward uncertainty on individual trials does not improve memory and learning.
Data Accessibility Statement
The data is available on Open Science Framework https://osf.io/xkpfz/.
Funding Information
This research was supported by an Australian Research Council Discovery Project (DP160101752).
Competing Interests
The authors have no competing interests to declare.
Author Contributions
Contributed to conception and design: AM, AL, SF
Contributed to acquisition of data: AL
Contributed to analysis and interpretation of data: AM, AL, SF
Drafted and/or revised the article: AM, AL, SF
Approved the submitted version for publication: AM, AL, SF
Peer Review Comments
The author(s) of this paper chose the Open Review option, and the peer review comments are available at: http://doi.org/10.1525/collabra.217.pr