A variety of psychological effects have been recently replicated in studies where participants merely received information describing experimental tasks, while participants experienced these tasks in studies where these effects were originally established. We argue that these successful instruction-based replication studies raise challenging questions for contemporary psychological research: (1) What does psychological science tell us about effects beyond common knowledge? (2) Does performing the experienced version of the task add to the effect, how much so, and why? (3) Should the effect be considered an experimental demand artifact? Throughout the article, we discuss methodological challenges and solutions associated with these questions. We conclude that instruction-based replication studies offer opportunities for theoretical, methodological, and empirical development in psychological science.
Introduction
Participants taking part in psychological experiments typically experience procedures, the nature of which varies across conditions. For instance, in a typical approach-avoidance experiment, participants may be requested to physically approach or to physically avoid stimuli. At a later stage of the experiment, they may be asked to report how much they like these stimuli. If they report liking better the physically approached than the physically avoided stimuli, an approach-avoidance effect is established (e.g., Cacioppo et al., 1993).
In the present article, we discuss questions that arise when effects originally established in such procedures are successfully replicated in studies where participants are merely instructed about the procedures (rather than experiencing them). For example, would participants evaluate more positively stimuli if they are merely told that these stimuli will be approached rather than avoided? Recent evidence suggests that effects originally found in experienced procedures often replicate when these procedures are merely described to the participants; that is, in instruction-based replication studies (IBRs, for short).
Successful instruction-based replications of effects originally found in experienced procedures have been reported for a diverse range of phenomena. Merely providing participants with instructions about original procedures was enough to replicate the approach-avoidance effect (i.e., a preference for approached relative to avoided stimuli; Hütter & Genschow, 2020; Van Dessel et al., 2015, 2016, 2018), the mere exposure effect (i.e., higher liking as a result of repeated exposure; Van Dessel et al., 2017), the evaluative conditioning effect (i.e., a change in liking after pairing neutral stimuli with stimuli of affective valence; Béna, Mauclet, et al., 2022; Corneille et al., 2019; De Houwer, 2006; De Houwer & Hughes, 2016; Gast & De Houwer, 2012; Hütter & De Houwer, 2017; Kurdi & Banaji, 2017, 2019; Mattavelli et al., 2021; Moran et al., 2021), the truth effect (i.e., higher truth judgments as a result of repeated exposure; Mattavelli et al., 2022), and the rubber hand illusion (i.e., an enhanced feeling of ownership reported for a fake hand brushed in synchrony rather than in asynchrony with a participant’s real hand concealed to their view; Lush, 2020; Reader, 2022). To illustrate, in a recent instruction-based replication of the mere exposure effect, participants were merely instructed that two words would be presented to them either rarely (e.g., LOKANTA) or frequently (e.g., FEVKANI). Replicating the standard mere exposure effect, being merely instructed about (rather than experiencing) higher stimulus exposure increased stimulus liking (Van Dessel et al., 2017).
IBRs share two distinctive features. First, they provide participants with mere information about the original procedure, but they do not actually implement that procedure, so that participants do not experience the procedure that is described to them. Second, they examine whether effects originally found in their “experienced” counterpart replicate. This definition excludes most “vignette” and verbal suggestion studies; these studies also produce effects based on mere verbal information, but this information is generally not concerned with procedures used in previous studies.
IBRs are conceptual replication studies. The manipulation of the independent variable (IV) is necessarily altered from experiential (e.g., repeated exposure) to verbal manipulations (e.g., stated repeated exposure). Provided functional definitions of effects (a change in the dependent variable [DV] due to an IV), altering the IV precludes an exact replication of the effect. Regarding the measure, it can be identical to the original one. However, it may as well depart from it, by either asking participants to predict or to simulate responses on the original DV. Here, we will distinguish three cases: IBRs that rely on the original measure (IBR/same; only the IV is changed), IBRs that ask participants to predict responses on the original measure (IBR/predict; both the IV and the DV are changed), and IBRs that ask participants to simulate responses on the original measure (IBR/simulate; both the IV and the DV are changed; see Figure 1).1
Note. “+”: suggests; “++”: strongly suggests
Note. “+”: suggests; “++”: strongly suggests
An IBR in which both the IV and the DV were changed is the study by Bem (1967). In that study, Bem argued that dissonance effects are driven by cold inferential cognition rather than by a motivation to reduce physiologically aversive states of cognitive dissonance elicited after performing an incongruent behavior. To test this alternative account, Bem (1967) proceeded to a conceptual replication of Festinger and Carlsmith (1959)’s study in which participants were found to report more positive attitudes about an obnoxious task after agreeing for a small than large reward to persuade another participant that the task was enjoyable. Instead, in Bem’s study, participants were merely provided with information about the original study, and they had to predict how a participant in the original study would have completed the measure. Hence, the IV was changed from an experienced to an instruction-based manipulation, and the DV was changed from responses on an attitude scale to interpersonal predictions of responses on the same attitude scale.
Bem (1967) reasoned :“If this analysis of the findings is correct, then it should be possible to replicate the inverse functional relation between amount of compensation and the final attitude statement by actually letting an outside observer try to infer the attitude of (the subjects) in the original study. Conceptually, this replicates the Festinger-Carlsmith experiment (…)” (p. 188). It is worth noting that Bem’s understanding of “replication” is consistent both with our and contemporary definitions of this notion as “a study for which any outcome would be considered diagnostic evidence about a claim from prior research” (Nosek & Errington, 2020, p. 2). This understanding “reduces emphasis on operational characteristics of the study and increases emphasis on the interpretation of possible outcomes” (Nosek & Errington, 2020, p. 2). Because the replication was successful, Bem’s study challenged the dominant account of dissonance effects: experiencing dissonance and being motivated to reduce it were not necessary for producing the effect; drawing cold inferences was enough for producing it.
We argue here that successful IBRs raise challenging questions about (1) the extent to which psychological researchers know more about the effect of interest than the layperson, (2) the distinct contribution of task experience posited in prominent psychological theories, and (3) the role of demand effects in the studied phenomenon. We consider these questions in turn. In the concluding sections, we summarize the main questions discussed in this contribution and their theoretical and practical implications, and we call for more IBR research.
1. What does Psychological Science Tell us about Effects beyond Common Knowledge?
When participants can simulate (i.e., IBR/simulate) or predict (i.e., IBR/predict) effects based on the mere description of a procedure, this suggests that they consciously hold causal knowledge (in the sense of beliefs relating the IV to the DV) about whether and in what direction procedures influence responses. For instance, if participants can predict that stimuli will be liked better after repeated exposure (i.e., the mere exposure effect), there is indication that they consciously hold causal knowledge relating repeated exposures to increased liking. When a successful replication is found in IBRs requesting participants to directly complete the original measure of interest (i.e., IBR/same), this also suggests that participants hold causal knowledge relating procedures to responses. In this case, however, there is no indication that this knowledge is consciously held or consciously used by the participant, nor is there indication that it is not (see Figure 1).
If participants hold knowledge of this sort, then this blurs the boundaries between “naïve” and scientific knowledge. This, in turn, raises disquieting questions. As Houston (1983) pointed out four decades ago in a study finding convergence between lay and scientific psychological knowledge: “If chemistry, biology, or physics suddenly evaporated, the effects would be immediate and dramatic. It would probably be centuries before the lost information could be regained. But what would happen if psychology suddenly disappeared? Would humanity suffer a severe lesion in stored information, or would the basic principles of psychology be safe in the collective knowledge of the lay population?” (p. 203; see also Houston, 1985).
How do psychological and naïve knowledge differ then? A first possibility is that, just as researchers, participants hold accurate causal knowledge relating procedures to responses but that, contrary to researchers, participants remain unaware of underlying mechanisms. After all, people have prior knowledge regarding their and others’ behaviors in a wide range of contexts and may use it when it is cued by mere descriptions of situations. By analogy, most people can predict that dropping a stone on the ground will cause it to fall, although explaining exactly why is much more difficult. Psychological researchers, however, do not always agree on underlying processes, or their consensus may only be temporary. For instance, after decades of research on the evaluative conditioning effect and on the mere exposure effect (two effects successfully replicated in IBRs), there is still no scientific consensus on the mental mechanisms underlying these effects. When scientific consensus is lacking, participants’ knowledge may or may not be in line with the prevailing scientific explanations of the effects (if any).
A second possibility is that researchers, contrary to participants, hold special knowledge about factors that modulate the effect (i.e., the moderators of an effect). For instance, researchers may know under which conditions a larger time interval between exposure and judgment will increase the impact of repeating statements on their perceived truth, while this knowledge may not be part of commonsense. However, one should not posit that such knowledge is necessarily lacking in the participants. Instead, participants’ knowledge of moderation effects should also be tested in IBRs (by providing information about procedures including moderators). To our knowledge, such IBRs are currently lacking.
Finally, researchers may not always be clear about the magnitude or even the direction of causal relations. For instance, a recent study found large disagreements among happiness experts regarding how various strategy policies may contribute to life satisfaction (Buettner et al., 2020). In this case, participants’ and scientific knowledge cannot possibly converge as the latter is ill-defined or contradictory.
Turning to opportunities, successful IBRs suggest that “naïve” knowledge may be a rich source of theoretical development for psychological science. First, one may examine how this knowledge was acquired. For instance, how can participants anticipate that physical approach will increase liking relative to physical avoidance? Do they rely on personal experiences, on observational learning, or do they rely on scientific knowledge they acquired via educational training? Second, successful IBRs suggest that researchers might consider directly asking naïve participants about causal relations involved in psychological phenomena to gain insights from them. Third, unsuccessful IBRs may also inform when, which, and ultimately why participants get this psychological knowledge right or wrong (for a thorough discussion of the interplays between commonsense and scientific psychology, see Kelley, 1992).
This analysis calls for enhanced self-reflection and humility by highlighting that psychological science does not necessarily know much better than the layperson. It also points to unsuspected knowledge-building opportunities that could make use of lay psychological knowledge.
2. Does Performing the Experienced Version of the Task Add to the Effect, How Much so, and Why?
Experience-based procedures involve performing a task and processing verbal information communicated about it in the informed consent and task instructions. Therefore, whenever a successful IBR is achieved, one is left to wonder to what extent experiencing the manipulations contributed to the original effect over and above processing the verbal information provided in the experiment. Identifying this distinct contribution is theoretically important because prominent psychological theories typically assume that experiences arising during task performance add to the effect, and they often assume that they do so via a distinct set of processes.
For instance, task experiences are assumed to bring a distinct contribution to the following effects: the physical experience of approach/avoidance for embodiment accounts of the approach/avoidance effect (Nuel et al., 2022; Rougier et al., 2018), the experience of repetition-induced fluency for fluency accounts of the mere exposure effect (Leynes & Addante, 2016) and of the truth effect (Unkelbach et al., 2019; Unkelbach & Stahl, 2009), the integration of information coming from different sensory modalities for multisensory accounts of the rubber hand illusion (Ehrsson, 2020; Golaszewski et al., 2021; Kilteni et al., 2015; Tsakiris, 2010), and the experience of spatiotemporal pairings of stimuli for associative or dual-learning accounts of the evaluative conditioning effect (Gawronski et al., 2017).
Several general strategies may be used to assess the distinct contribution of task experiences. We could identify four such strategies. The first two strategies imply making comparisons between instruction-based and experience-based procedures. The last two strategies can be implemented within experience-based procedures only. As we argue below, each strategy comes with significant shortcomings.
Magnitude test: A first strategy has consisted in randomly assigning participants to instruction-based vs. experience-based versions of an experiment and comparing the magnitude of the effect established in these procedures (Forster et al., 2022; Kasran et al., 2022; Kurdi & Banaji, 2017; Lush, 2020; Mattavelli et al., 2022; Moran et al., 2021; Rougier et al., 2021; Van Dessel, De Houwer, et al., 2020). If a larger effect is observed in the experience-based procedure, this may suggest an additional contribution of performing the task (and, possibly, the existence of distinct processes underlying this additional contribution). A refined variation of this strategy consists in modeling and comparing estimates of the processes assumed to contribute to responses produced on the measure of interest (for instance, using multinomial processing tree modeling; Corneille et al., 2019; Hütter & De Houwer, 2017; Smith et al., 2020)
This comparative strategy, however, should guarantee that information is held constant in the experienced vs. instruction-based versions of the procedures, and that these procedures vary only in that one is performed and the other is merely described. This structural fit (i.e., introducing a difference between tasks only on the dimension of interest) is often difficult to achieve. In particular, researchers running an IBR face the question of how detailed the description of the experienced procedure (including stimuli materials) should be. Structural fit in IBRs largely varies, ranging from IBRs that closely (e.g., Bem, 1967; Forster et al., 2022; Lush, 2020) to loosely match their experience-based counterpart.
In addition, outcomes may vary as a function of the procedural operationalization selected. For instance, instruction-based approach-avoidance effects have been robustly replicated in several procedures, except for one specific procedure crossing a new approach-avoidance manipulation with a new indirect measure of approach-avoidance effects (Rougier et al., 2021), and they have not been found to change evaluations about familiar social groups (Van Dessel, De Houwer, et al., 2020).
Functional test: A second strategy is to test functional similarities and differences between experience-based and instruction-based procedures. Based on the logic of functional dissociations, the rationale is the following: if effects observed in these procedures are driven by the same process(es), then they should be similarly moderated by the same (experimental or individual) factors (e.g., Hu et al., 2017; Kurdi & Banaji, 2019; Moran et al., 2021; Olsson & Phelps, 2007). For instance, both experience-based and instruction-based fear conditioning influence physiological reactions to a conditioned stimulus when it is presented supraliminally. However, only experience-based fear conditioning may influence physiological reactions to a subliminal stimulus (Olsson & Phelps, 2007). This dissociation suggests that distinct processes may be involved in verbal fear conditioning.
Unfortunately, this second comparative strategy faces the same limitations as the previous one, including the difficulty of equating verbal information across conditions (Wisniewski et al., 2022). It is also more challenging because appropriate tests of the higher-order interaction (i.e., whether the moderation of the effect is moderated by the experienced vs. instruction-based nature of the procedure) are likely to require large sample sizes to have enough statistical power (depending on the size of the higher-order interaction effect).
Functional tests may involve comparing the effects of instruction-based vs. experience-based procedures across measures (e.g., for more or less controllable measures of liking; e.g., Hu et al., 2017; Kurdi & Banaji, 2019; Moran et al., 2021). Here, the test of interest is whether differences observed across measures are constant across experience-based and instruction-based procedures. A sound interpretation of such studies, however, requires a structural fit in the measures in addition to a structural fit in the procedures: the two measures should be similar, except for the theoretical factor of interest (Payne et al., 2008; Van Dessel, Cone, et al., 2020). For instance, if the controllability of the measure is the theoretical factor of interest for interpreting a difference between experience-based and instruction-based procedures, then the two measures should differ in their degree of controllability only.
Unfortunately, structural fit can be difficult to achieve, and it is rarely implemented in dissociation tests comparing effects across measures outside, for instance, memory research (e.g., Merikle & Reingold, 1991; Schacter et al., 1989). When comparing performance between structurally equated tasks, conclusions may diverge from conclusions drawn from comparisons between tasks that vary on more than one dimension (Béna, Melnikoff, et al., 2022). Of note too, finding no difference in outcomes between unfitted tasks for experience-based vs. instruction-based manipulations would not support the operation of identical processes. This is because these procedures may result in the same outcome via different processes. When outcomes on unfitted measures are compared, the interpretational ambiguity can only be worsened, as it becomes virtually impossible to isolate contributing factors.
Deterioration test: Related to the functional test approach, a third strategy assumes that one can distinguish between two categories of processes, one of which is more automatic than the other. For instance, inferential processes drawing from symbolic information (e.g., drawing inferences about the meaning of approach and avoidance) have sometimes been conceptualized as non-automatic; in contrast, embodied/sensory/affective/impulsive/experiential/associative processes (e.g., activating sensory-motor representations while physically approaching/avoiding a stimulus) are generally conceptualized as automatic. If that is the case, (part of) the effect found in an experience-based procedure should remain under conditions thought to compromise the inferential processes but not the other one(s). For instance, when “automatic” refers to efficiency (i.e., relative independence from cognitive resources), the attentional burden of a concurrent task should disrupt “non-automatic” inferences but leave the other “automatic” process(es) unaffected.
Although intuitive, this strategy is problematic in at least two regards. First, processing may be compromised to such a degree that the original experience is fundamentally changed. Second, this strategy rests on the questionable assumption that inferences cannot be automatic (in our current example, that they cannot be efficient; see, e.g., De Houwer, 2018b; Reyna & Brainerd, 1995; Shimizu et al., 2017).
Reversal test: A fourth strategy is to experimentally induce opposite causal inferences within an experience-based procedure. If the effect fails to completely reverse when reversing inferences, this suggests that not just inferences are driving the effect. However, this strategy requires successfully reversing participants’ causal inferences. This may not always be possible. For instance, a recent study could not persuade participants that asynchronous brushing would produce a larger rubber hand illusion than synchronous brushing (Lush et al., 2020). Likewise, researchers recently failed to successfully manipulate expectancies about the efficiency of a clinical treatment (i.e., Eye-Movement Desensitization Reprocessing therapy; Mertens et al., 2021). This, by the way, illustrates that sensitive and diagnostic manipulation checks can be important in experimental research (Fiedler et al., 2021). Had the manipulation checks not been collected, the absence of complete reversal may have been wrongly interpreted as evidence for the contribution of non-inferential processes.
It is important to stress that successful IBRs do not imply that “experiential” theories are wrong. However, they challenge these theories to bring evidence for the distinct contribution they posit. To make things even more challenging, this contribution may come in two forms: an independent contribution or an interactive one. We believe (1) that these contributions may largely vary across phenomena, (2) that, for each concerned phenomenon, it would be important to know better how large these contributions are and whether they are driven by distinct processes, and (3) that one reason we do not know better yet is that answering these questions is methodologically challenging (see above). This calls for more IBR research, and for methodological (e.g., designing structurally fitted procedures) and modeling (testable formal models such as multinomial processing tree models) advances.
Figure 1 summarizes the implications of the three categories of IBRs for our current question. In the case of IBR/same studies, the challenge to experiential theories is high because the DV is kept constant. In the case of IBR/predict and IBR/simulate studies, the challenge is weaker because the DV is altered. It is possible, for instance, that simulations or predictions involve processes that are not engaged in the responses produced in the original procedures. In all three cases, there is also a challenge insofar as the methodological approach is sound. As we discussed here, IBRs can vary in this regard, and we suspect that any method will come with its own set of limitations. Using several methods to build on their strengths and overcome their specific shortcomings would help shed light on the questions we just discussed.
3. Should the Effect be Considered an Experimental Demand Artifact?
Successful IBRs suggest that participants can draw causal inferences relating procedural information to responses, and that these inferences may contribute to effects. Of note, participants may draw broader inferences in a testing situation, including inferences about their role in the situation. In particular, they may be motivated to produce responses consistent with their causal inferences (i.e., to produce the expected response), because doing so feels nice to the experimenter, thoughtful, sensible, easy, interesting, or pleasant in the testing situation. Alternatively, participants may want to affirm their sense of autonomy by producing responses opposite to their causal inferences (e.g., as for reactant participants; Rosenberg & Siegel, 2018), or compromise the experiment because it feels fun to do so (e.g., as for survey trolls; Lopez & Hillygus, 2018).
Many IBRs (called quasi-controls; Orne, 2009) were conducted in the sixties to deal with this general issue of demand characteristics, which can elicit demand effects. In these studies, participants received detailed information about an experimental procedure and were asked to complete measures or simulate behaviors as if they were subjected to the actual experimental treatment. For instance, behavioral responses to hypnosis were interpreted at the time as a demand effect because participants could convincingly simulate hypnotic behavior based on procedural information received about hypnotic treatments.
Being aware of the hypothesis tested in a study and being able to produce behaviors consistent with it is not inherently problematic. Consider for instance the Stroop effect: it is conceivable that participants may anticipate that it is easier to name the color of a word appearing in blue when this word is “Blue” than “Red”. This hypothesis-awareness, however, would not rule out that the Stroop effect is produced by the operation of conflicting processes, nor would it imply that this effect is an experimental demand artifact.
Successful IBRs, however, raise the possibility that the effects of interest may be caused by demand characteristics. This likely arises when cues in the situation that are uncontrolled by the experimenter influence participants’ interpretation of what responses are expected from them in the testing situation, as well as their motivation to produce responses consistent with this interpretation.
Consider the case of evaluative conditioning research. To remind the reader, an evaluative conditioning effect is found when the evaluation of a conditioned stimulus changes after its pairing with an affective stimulus (e.g., a formerly neutral kanji will elicit more positive evaluations after being paired with an affectively positive than negative stimulus). From the perspective of a participant taking part in an evaluative conditioning procedure (be it experienced or instructed), it should not be too difficult to figure out that evaluations (and which ones) are expected. Further, participants may be tempted to form impressions consistent with these expectations because producing them appears to be appropriate in the experimental setting.
For instance, in a study by Kurdi and Banaji (2017), “participants were informed that they would see parings of names with pictures such that one target group (e.g., Laapians) would always be paired with pictures of pleasant things and the other target group (e.g., Niffians) would always be paired with pictures of unpleasant things (…)“ (Kurdi & Banaji, 2017, p. 198). When receiving such instructions, participants may easily figure out that the experiment requires from them to start liking Laapians more than Niffians. When they form evaluations of the stimuli in order to comply/react with this perceived requirement of the testing situation, a demand effect is created.
Because compliance is key in characterizing demand effects, it is important to understand how it can operate and how it may be controlled for. In a recent theoretical review on demand characteristics (Corneille & Lush, 2023), the authors identified three compliance strategies: faking (intentional and conscious, involving no genuine experience), imagination (intentional and conscious, involving a genuine experience), and phenomenological control (akin to hypnotic suggestion; intentional and unconscious, involving a genuine experience; (Dienes, Lush, & Palfi, 2022; Dienes, Lush, Palfi, et al., 2022; Dienes & Lush, 2023). From that perspective, hypothesis-aware participants may produce either counterfeit or genuine responses because they want to produce them in the testing situation to meet the perceived requirements of the situation – sometimes, without being even aware that this is their intention (Dienes, Lush, & Palfi, 2022).
The case of phenomenological control is probably the most intriguing. To illustrate, participants taking part in a rubber hand illusion study may infer that the synchronous brushing of the fake hand with their real hand (concealed from their view) will produce the illusion of the rubber hand being their own. Once aware of this experimental hypothesis, they may further want to produce responses consistent with it (i.e., to produce the illusion of the rubber hand being their own) because they think this experience is called-for in the testing circumstances and they want to comply with this understanding. However, there is a twist: because experiencing the illusion implies the phenomenological experience of producing it involuntarily, the illusion can be established only if participants inhibit their subjective experience of producing this illusion voluntarily. When participants can achieve that, the illusion is intentionally produced to comply with the requirements of the situation but without the accompanying experience of intentionality.
Phenomenological control, as a trait, is defined as the capacity to strategically experience as non-intentional perceptions, actions, or beliefs to meet the requirements of the situation. Individuals vary in their ability to play these meta-cognitive tricks on themselves, and this ability can be assessed with questionnaires of hypnotic suggestibility or derived from them, such as the Phenomenological Control Scale (Lush, Scott, et al., 2021) or the Sussex-Waterloo Scale of Hypnotisability (Lush et al., 2018). To sum up, when participants can implement phenomenological control over their subjective experiences, they can comply with the experimental hypothesis by producing responses consistent with it without being aware that this was their intention. This particularly tricky form of compliance generates genuine experiences without awareness of intentionality (for a more extensive treatment of how phenomenological control relates to compliance and experimental demands, see Corneille & Lush, 2023).
Controlling compliance with the experimental hypothesis, in its various forms, is critical to sound research. Such compliance is a major threat not just to the external validity of the study (i.e., will the effect generalize to other situations?) but also to its internal validity (i.e., is the interpretation of the effect correct?) For instance, participants may evaluate meaningless stimuli more positively when asked to approach rather than avoid them because they think this is the soundest behavior called for by the testing condition. If so, the effect of approach and avoidance on liking may not survive if changing cues in the situation that decrease participants’ compliance motivation or their interpretation of the behavior called for by the testing situation (external validity issue). If the effect reflects compliance with the experimental hypothesis, this additionally suggests that it may be driven by mechanisms that differ from those assumed. As long as there are uncontrolled cues in the experimental situation that shape goals in the participants to produce the effect (or to be reactant to it), internal validity issues arise that contaminate the interpretation of the findings.
Different strategies may be considered for assessing or preventing compliance, but here too, they come with important caveats. Researchers may directly ask participants if they complied. However, this approach would only identify strategies based on imagination. Phenomenological control is supposed to be unconsciously implemented, and it may not be reported as a result. Turning to “faking” participants, by definition, one may not trust their responses. Researchers may also use measures less amenable to strategic control. This may work to some extent for faking, keeping in mind that both indirect behavioral measures (e.g., Cummins & De Houwer, 2021; Fiedler & Bluemke, 2005) and physiological measures (Damaser et al., 1963) can be faked. Regarding imagination and phenomenological control, they produce genuine responses, including physiological ones (Hägni et al., 2008); therefore, less controllable measures may not help (Corneille & Hütter, 2020; Corneille & Lush, 2023).
In the case of phenomenological control, we have seen above that it can be measured with individual difference questionnaires. Responses to phenomenological control questionnaires have been recently shown to predict a large diversity of effects, including the rubber hand illusion and mirror touch synesthesia (Lush et al., 2020), visually evoked auditory response, and autonomous sensory meridian response (ASMR) (Lush et al., 2022). Interestingly, it does not predict the Müller-Lyer illusion (Lush et al., 2022). It is less clear, however, how the tendency or ability to engage in faking and imagination as compliance strategies may be validly measured with individual difference questionnaires.
Finally, one may vary incentives for compliant behavior and see how this influences responses (De Houwer, 2022). This is a sound strategy, provided the (dis)incentives (1) are significant enough for overturning more powerful goals, (2) do not paradoxically exacerbate compliance/reactance, and (3) do not disrupt effects for reasons alien to compliance. For instance, promising big money to participants for taking the task seriously may induce faking (if one wants to be nice to the experimenter in return for the money) or reactance (if one feels bribed); it may also elicit task-related inferences that would otherwise not be drawn; and it may increase attention to the study materials.
More research is needed on compliance with experimental demand, and we are pleased to see a renewal of interest in this “scarecrow” (Coles et al., 2022; Corneille & Lush, 2023; Doyen et al., 2012; Forster et al., 2022; Klein et al., 2012; Lush, 2020; Lush et al., 2021; Roseboom & Lush, 2022; Sharpe & Whelton, 2016). The lively debate currently taking place in research on bodily illusions is illustrative of the theoretical and methodological advances that research around these questions can generate (Chancel et al., 2022; Corneille & Lush, 2023; Ehrsson et al., 2022; Lush et al., 2020; Lush, Seth, et al., 2021; Lush & Seth, 2022; Reader, 2022; Roseboom & Lush, 2022; Slater & Ehrsson, 2022). Psychological reactance should not be overlooked either. For instance, recent research on the truth effect indicates that a sizeable portion of the participants do not show the effect, and sometimes even show a reversed effect (i.e., lower judgments of truth for repeated statements (e.g., Henderson et al., 2021; Schnuerch et al., 2021). This may signal measurement error. But this may also indicate that some participants drew reverse inferences, or were reactant to the veridical inferences they drew.
4. Looking Forward
IBRs raise outstanding questions for contemporary psychological research (for a summary, see Table 1). In this article, we asked: (1) To what extent does psychological knowledge go beyond common knowledge, (2) Does experiencing a task add to the effect of verbal instructions; and if so, is it via distinct processes, and (3) Are researchers studying phenomena artifactually inflated (or weakened) by participants’ compliance (reactance) with the perceived experimental hypothesis? We believe these questions represent opportunities for theoretical, methodological, and empirical development in psychological science.
What does psychological science tell us about effects beyond common knowledge? |
|
Does performing the experienced version of the task add to the effect, how much so, and why? |
|
Should the effect be considered an experimental demand artifact? |
|
What does psychological science tell us about effects beyond common knowledge? |
|
Does performing the experienced version of the task add to the effect, how much so, and why? |
|
Should the effect be considered an experimental demand artifact? |
|
Beyond advancing psychological theories and methods, IBRs can also inform practice. Whenever an experience-based (e.g., clinical) treatment is replicated based on mere verbal information, this points to a possible beneficial role of verbal suggestion. For instance, research suggests that fears acquired through Pavlovian conditioning can be reversed through verbal instruction about contingencies (Mertens et al., 2018; Mertens & De Houwer, 2016). Likewise, social impressions thought to reflect deeply rooted mental associations can be quickly changed based on mere verbal information (Cone et al., 2021; Cone & Calanchini, 2021; Kurdi et al., 2022; Van Dessel, Ye, et al., 2019).
IBR research highlights the powerful role of causal inferences and expectancies in how people interpret their environment, including the perception of their self. In doing so, it connects to a long tradition of research on placebo effects (De Houwer, 2018a; Kirsch, 2018), hypnosis (Lynn et al., 2019), therapeutic compliance (Kanter et al., 2002, 2004), instruction-based learning (Kang et al., 2022), and contemporary research on predictive processing (Chancel et al., 2022; Clark, 2016; Martin & Pacherie, 2019; Van Dessel, Hughes, et al., 2019) IBR research may help identify cases where nondeceptive placebo treatments are effective. In nondeceptive placebo studies, patients receive verbal information about placebo effects and they are explicitly told that they will undergo a placebo treatment (for a recent discussion, see Colloca & Howick, 2018). These treatments alleviate ethical concerns inherent in placebo treatments, and their effectiveness has been supported in studies relying on self-reports and at least one neurophysiological measure (Guevarra et al., 2020). Imaginative suggestion has also been used for a long time for clinical treatment (Pintar & Lynn, 2008). Comparison of effect sizes between instruction-based and experience-based treatments may help identify when the latter should be used rather than the former.
5. Concluding Remarks
Successful IBRs have been documented for a variety of effects and they raise outstanding questions. However, we currently have little systematic information about which experience-based effects they do and do not replicate, and whether replication outcomes depend on the specific instantiation of the IBR (e.g., simulation, prediction, or completion of the original measures; prediction for self or others; direct or indirect – including physiological - measures). In addition, among documented replications, information is scarce about how effects compare between experienced-based versus instruction-based procedures, and this comparison is challenging. How compliance effects can best be addressed is also a thorny issue. This calls for more IBR research, and it is our hope that the ideas discussed here help promote it.
Author Contributions
Olivier Corneille: Conceptualization, Writing – original draft, Project administration, Writing – review and editing
Jérémy Béna: Conceptualization, Writing – review and editing
Competing Interests
The authors have no competing interests to declare.
Acknowledgments
Preparation of this article was supported by an FRS-FNRS grant (T.0061.18) awarded to Olivier Corneille, and by an FSR Incoming Postdoctoral Fellowship [IPSY FSR22 MOVE] awarded to Jérémy Béna. We are grateful for the comments we received from Tal Moran, Peter Lush, Patrice Terrier, Christoph Stahl, Marine Rougier, Florence Stinglhamber, and Christian Unkelbach on a previous version of this manuscript, and for extensive email exchanges with Jan De Houwer. We thank Adam Hahn for giving us the impetus to write this article.
Footnotes
The measure used in an experience-based study may be a prediction or a simulation measure. In this case, one may conceive of an IBR that provides participants with verbal instructions about that procedure, and asks them to predict or simulate these predictions or simulations. These cases are not discussed here as we are unaware of successful IBRs that proceeded to such test.