Phenomenological control is the ability to alter experience according to goals, measured by responses to imaginative suggestion. The capacity to create vivid mental imagery has been thought to contribute to generation of experiences following suggestions during hypnosis, but evidence is mixed. Moreover, no study has yet investigated this link outside of hypnosis. Across two studies (n = 639 and n = 66), we tested the relationship between imagery vividness and both non-hypnotic and hypnotic imaginative suggestion scales. Our samples included, for the first time, people lacking voluntary imagery (aphantasics). We also assessed this relationship following commonly used approaches to studying aphantasia (direct recruitment of self-reported aphantasics), versus those designed to minimize demand characteristics effects. We observed a weak positive relationship between imagery vividness and phenomenological control, with aphantasics being slightly lower in this trait than non-aphantasics. However, this effect was exaggerated among self-reported aphantasics, who scored lower on phenomenological control than samples experiencing imagery as well as aphantasics assessed via single-blind procedure. Our results have been certified as computationally reproducible by an independent statistician. We conclude that, while imagery may contribute to responding to imaginative suggestions, it is crucial to follow single-blind recruitment to reduce the effect of demand characteristics.
Introduction
Within the general population, there is a great variation in the ability to alter conscious experience and behaviour to meet goals and expectations. People are, to various degrees, capable of generating experiences in response to imaginative suggestions (e.g., in hypnosis), such as perceiving something imagined as real, voluntary movement as involuntary (Polito et al., 2013), or feeling pretence as beliefs (Nash & Barnier, 2008). We refer to individual differences in this ability as trait phenomenological control (Lush, Scott, Seth, et al., 2021). The term phenomenological control was introduced to avoid potential sources of confusion and to better reflect scientific consensus than existing terms (Dienes et al., 2022; Lush, Scott, Seth, et al., 2021). First, hypnosis involves a specific context which is associated with many potentially misleading myths (Lynn et al., 2020) and refers to sleep, which is not related to phenomenological control (e.g., Banyai & Hilgard, 1976). Second, suggestibility can be confused with other concepts which share this label, but which show little relation to phenomenological control (e.g., social compliance; Coe et al., 1973; Moore, 1964). Misconceptions arising from these terms present challenges when communicating ideas about these phenomena. See Dienes et al. (2022), Lush, Dienes & Seth (2023) and Lush et al. (2021) for detailed discussion of phenomenological control and hypnosis. Note that control here refers to the ability to change one’s experience, rather than to resist suggestions (contrary to popular belief, response to imaginative suggestion does not involve a loss of control; Lynn et al., 2020; Spanos et al., 1985).
Trait phenomenological control can be measured by response to a series of direct verbal imaginative suggestions (e.g., Lush, Scott, Seth, et al., 2021; Oakley et al., 2021). Some direct verbal imaginative suggestions invite participants to imagine a counterfactual situation (e.g., that there is a magnetic force positioned between one’s outstretched hands) and suggest something that will happen as a consequence, rather than require an intentional response (e.g., that one’s hands will be drawn together, as if by a magnetic force). Other imaginative suggestions contain no direct appeal to imagine (e.g., that a hand feels too heavy to move, or that the participant cannot remember anything). Although here we focus on responses to direct imaginative suggestions, phenomenological control can also occur for indirect and non-verbal, implicit suggestions, such as in ‘mesmerism’ (a historical precursor to hypnosis; Gauld, 1992), or for beliefs arising from cues surrounding experimental situations (demand characteristics – Corneille & Lush, 2023; Orne, 1962).
Historically, phenomenological control has been mostly investigated within the hypnotic context, during a procedure that involves a hypnotic induction (commonly relaxation and suggestions to gradually enter “a state” of hypnosis; Woody & Barnier, 2008) and followed by a series of direct verbal suggestions. Hypnosis can be considered just one particular context for phenomenological control, in which the focus is on direct, verbal suggestion (Dienes et al., 2022). In the hypnotic context, individual differences in response to imaginative suggestion are referred to as hypnotic suggestibility or hypnotizability (Hilgard, 1965). Investigation of this trait within experimental settings commonly employs standardized scales to identify people who are highly and lowly hypnotizable. Hypnotizability scales typically include a set of imaginative suggestions for a variety of experiences, including changes in sensory experience (e.g., auditory or tactile hallucinations), experiences of apparently involuntary movement or paralysis, and cognitive effects (e.g., amnesia). Research using these scales showed that this trait has high stability over long periods (e.g., 25 years; Piccione et al., 1989). People who score high on these scales can experience visual hallucinations following hypnotic suggestions (Spanos et al., 1973) or even “unsee” a stimulus when it is presented after receiving the suggestion that there is nothing in front of them (negative visual hallucination - Spiegel, 2003). These reported experiences are not considered merely faking (Lynn et al., 2020), and neuroscientific evidence showed that they produce the physiological and neural changes that typically accompany genuine experience (Kosslyn et al., 2000; McGeown et al., 2012; Oakley & Halligan, 2013).
Although historically most investigations have focused on hypnosis, responses to direct imaginative suggestion do not require a special state and can occur outside the hypnotic context (e.g., when there is no ‘hypnotic induction’; Barber & Glass, 1962; Braffman & Kirsch, 1999; Hull, 1933). The recently developed Phenomenological Control Scale (PCS; Lush, Scott, Seth, et al., 2021) measures response to imaginative suggestion outside the hypnotic context (Lush, Scott, Seth, et al., 2021; see Oakley et al., 2021 for other recent work on non-hypnotic suggestion). Both PCS and hypnotizability scale scores predict reports of experience in psychological experiments when demand characteristics are not sufficiently controlled for, suggesting that phenomenological control may be a mechanism by which the reported experiences arise (e.g., autonomous sensory meridian response, visually-evoked auditory response, mirror-touch synaesthesia, vicarious pain, and the rubber hand illusion, see Lush, 2020; Lush et al., 2020, 2022).
Another ability that exerts top-down influences over our experience of the world (Dijkstra et al., 2019, 2021; Pearson et al., 2008; Perky, 1910; Segal & Nathan, 1964) is mental imagery, the capacity to generate and manipulate mental percepts without sensory stimulation (Kosslyn et al., 2001). Unlike experiences generated by responding to hypnotic suggestions, mental images are usually perceived as voluntary (although their onset can be involuntary) and not real. A wide variation in the ability to generate mental images has been found in both non-clinical (Galton, 1880; Marks, 1973), and clinical populations (Pearson et al., 2015; Sack et al., 2005). For some people, mental images are experienced as vividly as visual perception, a condition known as hyperphantasia (Zeman et al., 2015). In contrast, recently it has been shown that a small percentage of the population (3.9%, Dance et al., 2022) reports being unable to form any mental image at all, or only imagery that is markedly dim, vague or fleeting. This condition has been named “aphantasia” (Zeman et al., 2015). Notably, although aphantasics are usually assessed based on the scores of a visual imagery questionnaire (Vividness of Visual Imagery Questionnaire – VVIQ; Marks, 1973), they generally have low imagery in at least one other sensory modality (97% out of 164 in Dance, Ward, et al., 2021).
Imagery has long been considered to play an important role in generating responses to suggestions within the hypnotic context (Kunzendorf et al., 1996; Spanos, 1990). Suggestion-based phenomena are widely referred to as imaginative (Kihlstrom, 2008), as they often (though not always) explicitly involve asking participants to imagine counterfactual states as if they were true. For instance, item 2 of the Stanford Hypnotic Susceptibility Scale form c (Weitzenhoffer & Hilgard, 1962) reads: “Imagine a force acting on your hands to push them apart, as though one hand were repelling the other”, while item 4 reads “Now I want you to think of something sweet in your mouth. Imagine that you have something sweet tasting in your mouth, like a little sugar…”. Such “goal-directed fantasies” invite participants to imagine the suggested state to experience a suggested effect (Spanos & Barber, 1972).
Based on this, several studies investigated whether an individual’s ability to generate vivid mental imagery is associated with higher levels of hypnotizability (Kunzendorf et al., 1996). However, the available evidence is mixed: while some studies reported hypnotizability to be positively associated with greater vividness (Marucci & Meo, 2000) and control of visual imagery (Coe et al., 1980), imaginative involvement (Glisky et al., 1991), imagery of scenes (Farthing et al., 1983) and superior performance on visual imagery tasks (Sheehan & Robertson, 1996), others reported no significant relationship (e.g., Glisky et al., 1995; Hargadon et al., 1995; Kogon et al., 1998; Perry, 1973). There is also evidence that the utilisation of imagery strategies is not required to generate experiences in response to imaginative suggestions. For instance, Comey & Kirsch (1999) found that removing goal-directed imagery from an imaginative suggestion script increased response. Zamansky & Clark (1986) showed that participants could still respond to hypnotic suggestion while sustaining conflicting imagery, and Hargadon et al. (1995) reported that the intentional use of counter-pain imagery during hypnotic analgesia did not enhance hypnotic responding, compared to its proscription.
The picture is complicated by methodological limitations in the available studies. In particular, those correlating various imagery measures and hypnotic suggestibility were either characterized by small sample sizes (e.g., N = 28 in Marucci & Meo, 2000), which could have led to spurious correlations (Anderson et al., 2001), or collected measures of imagery and hypnotizability together (e.g., Crawford & Allen, 1983; Glisky et al., 1995; Marucci & Meo, 2000; Spanos et al., 1988). Indeed, when scales of hypnotizability are administered alongside other self-report measures, correlations tend to be high (Council et al., 1986), whereas when the measures are assessed in separate testing contexts, correlations tend to be reduced to near zero (Green et al., 1991). This is an example of a context effect, which refers to the reactive effects of measurements made in the same experimental situation (Tourangeau & Rasinski, 1988). As described in a review by Laurence et al. (2008), context effects represent a variation of demand characteristics (Orne, 1962): cues in and surrounding the experimental situation which influence participants’ beliefs about experimental aims. When participants complete questionnaires following a hypnosis screening procedure, they may understand that the researchers expect these questionnaires to be related to their hypnotic performance. This knowledge may influence their responses (Laurence et al., 2008). Such effects have been shown for several measures of constructs historically believed to be highly related to hypnotic response: for instance, absorption (Council et al., 1986; Milling et al., 2000; Oakman et al., 1996), the Perceptual Alteration Scale (PAS; Green et al., 1991), Dissociative Experience Scale (DES; Silva & Kirsch, 1992), and imagery scales (Kogon et al., 1998, but note the small sample). Minimizing demand characteristic effects by collecting measures separately is therefore essential when assessing the relationship between hypnotic suggestibility and other traits (Laurence et al., 2008).
Beyond these methodological considerations, the present work aims to address two important gaps in the literature. First, to our knowledge, the role of imagery abilities in response to imaginative suggestions outside the hypnotic context (as in the PCS) has not been tested yet. The relationship between these two traits may be stronger than between imagery and hypnotizability, considering that the PCS does not involve a hypnotic induction (which may cause reactance in some participants, Lush, Scott, Seth, et al., 2021) and emphasizes the use of imagination to generate experiences (which might encourage the voluntary use of imagery abilities). Hence, if we consider responses to suggestions as imaginative phenomena, mental imagery could be involved in generating responses or act as a moderator that could increase the vividness and perception-like features of changes in experiences. Consequently, people who report experiencing very vivid imagery should be characterized by higher phenomenological control abilities than those with average or dim imagery.
Second, no study has yet addressed whether this trait can emerge in aphantasia. Indeed, given their apparent inability to evoke mental images voluntarily, aphantasics provide a direct way of testing if imagery is necessary for generating experiences in response to imaginative suggestions. If imagery is necessary, they should display lower levels of phenomenological control compared to non-aphantasics.
To address these gaps, in a first study we investigated the relationship between individual differences in trait vividness of imagery and phenomenological control in and out of the hypnotic context by collecting measures of interest separately in a large sample of participants. We also compared phenomenological control scores between people scoring in the VVIQ aphantasia range and people who reported experiencing mental imagery. In a second study, we sought to address the impact of demand characteristics on responses in both hypnotic and non-hypnotic imaginative suggestion scales by directly recruiting a sample of self-reported aphantasics and compared their scores with aphantasics and non-aphantasics tested by means of a single-blind procedure.
Study 1
Study 1 investigated the relationship between trait vividness of imagery and the ability to alter conscious experience in and out of the hypnotic context while controlling for context effects. To measure trait differences in the ability to alter conscious experience, we used the Sussex Waterloo Scale of Hypnotizability (SWASH; Lush et al., 2018) and PCS (Lush, Scott, Seth, et al., 2021), an adaptation of the SWASH which does not involve a hypnotic induction nor contains any reference to hypnosis. To assess individual differences in mental imagery, we employed the VVIQ (Marks, 1973), in which participants rate the subjective vividness of voluntarily generated mental images for a set of different real-life scenarios. The VVIQ is the most employed measure to identify imagery extremes (aphantasia and hyperphantasia, e.g., Dance, Jaquiery, et al., 2021; Dance, Ward, et al., 2021; Liu & Bartolomeo, 2023; Pounder et al., 2022; Zeman et al., 2020). Although the VVIQ is focused on visual imagery vividness, recent studies reported that scores on this measure reflect an overall imagery ability that encompasses other sensory modalities (e.g., motor, auditory, see Floridou et al., 2022).
We measured these traits in two separate cohorts of psychology students at the University of Sussex (one to assess SWASH / VVIQ and the other PCS / VVIQ relationship). To minimize context effects, our measures of interest were collected in separate studies and no mention of the other measure being investigated was present in the study invitation or instructions, making our participants blind to the experimental goal. Participants’ VVIQ scores were subsequently matched to scores on a pre-existing PCS and SWASH database. We predicted that if responses to imaginative suggestions in and out of hypnotic context were driven by the ability to generate sensory imagery, these measures should be positively related.
Finally, we compared trait phenomenological control in aphantasic participants (identified by a common criterion in the imagery literature: mean VVIQ score equal to or lower than 2, e.g., in Dance et al., 2022; Dance, Jaquiery, et al., 2021; Kay et al., 2022; Monzel et al., 2021) and the rest of the sample (non-aphantasics). We expected that, in light of their markedly impaired imagery, aphantasic participants would display lower levels of phenomenological control than non-aphantasics.
Methods
Participants
For the SWASH and VVIQ relationship, we matched the scores of two databases (one consisting of solely VVIQ scores and one of solely SWASH scores) that were collected from the same pool of Psychology students at the University of Sussex (n = 131, age 18-27, M = 19.09, SD = 1.28, females, 97, males, 34). Similarly, for VVIQ and PCS relationship, we combined databases generated from another pool of Psychology students at the University of Sussex, one consisting of VVIQ scores and the other consisting of PCS scores (n = 508, age 18-29, Mage = 19.48, SDage = 1.71, 418 females, 67 males, 23 prefer not to say). In both cases, we matched scores from the two databases by using participants’ provided email. We retained this identifying information until data collection was complete.
In both samples, the PCS or SWASH database was collected during an undergraduate Psychology laboratory practical session (two different first-year cohorts), whilst the database of VVIQ scores was collected online by recruiting participants via the Psychology Human Participant Pool System (SONA) of the University of Sussex. VVIQ and PCS/SWASH databases were collected by different researchers belonging to different labs for three Term periods. Our final N for PCS and SWASH was motivated by the maximum number of participants available in each database and how many participants we could match with the VVIQ database within the three Term period. In particular, the final sample size for SWASH was limited by one of the two research groups interrupting data collection for this measure. The VVIQ data collection started after the PCS/SWASH (end of October), which were collected early in the academic year (mid-October). We tested for sensitivity at a conventional criterion (Bayes Factor > 3). Approval was received from University of Sussex Sciences & Technology Cross-Schools Research Ethics Committee (ER/GC337/1, ER/GC337/22) and participants gave informed consent for the study.
Materials
Sussex-Waterloo scale of hypnotisability (SWASH; Lush et al., 2018). The SWASH (Lush et al., 2018) consists of ten imaginative suggestions for a range of experiences (including auditory and gustatory hallucination, amnesia and paralysis). As described by Lush et al. (2018), SWASH establishes a hypnotic context via an initial induction in which participants are asked to relax and gradually enter a state of hypnosis. The SWASH was administered by computer, with the induction and suggestions scripts delivered by audio recording, as in Lush et al. (2021). Participants rate each of the 10 items on a dichotomous objective scale and a subjective scale ranging between 0 (denoting absence of experience) and 5 (strong presence or alteration of experience). For example, for Item 2 of the subjective scale, “moving hands together” the following instruction is given for the subjective response: “On a scale from 0 to 5, how strongly did you feel a force between your hands, where 0 means you felt no force at all and 5 means you felt a force so strong it was as if your hands were real magnets?” (Lush et al., 2018). We calculated the SWASH score from mean response to the ten SWASH items of the subjective scale, and the resulting score could range from 0 to 5. We used only scores from the items of the subjective scale in our study.
Phenomenological Control Scale (PCS; Lush, Scott, Seth, et al., 2021). The PCS (Lush, Scott, Seth, et al., 2021) is an adaptation of the SWASH (Lush et al., 2018) with all references to hypnosis removed. It is otherwise identical to the SWASH and consists of the same ten items with experience reported on a subjective scale ranging from 0 to 5 and a dichotomous objective scale. As with SWASH, the objective scale of the PCS was not used in this study. The PCS score was calculated from mean response to the ten PCS items of the subjective scale (0-5 scale).
Vividness of Visual Imagery Questionnaire (VVIQ; Marks, 1973). In this questionnaire, participants are asked to generate visual images for 16 different scenarios and rate their vividness on a Likert scale from 1 (“No image at all, you only ‘know’ that you are thinking of the object”) to 5 (“Perfectly clear and as vivid as normal vision”). The questionnaire was scored by calculating the mean of the 16 scenario scores, resulting in a score ranging from 1 to 5, with higher scores indicating more vivid self-reported imagery vividness. To facilitate interpretation of the intercept, we rescaled the VVIQ to range between 0 to 4 (instead of 1 to 5) across all our studies.
Since researchers in the field of mental imagery typically report VVIQ using total scores (sum of all items), we calculated the total score in our sample for comparison with other studies. Mean total score of the VVIQ sample matched with PCS (M = 55.7) and the of VVIQ sample matched with SWASH (M = 53.73) are comparable to those reported in other studies in the field (e.g., Milton et al., 2021 report M = 56.65; Liu & Bartolomeo, 2023 report M = 57.38; Zeman et al., 2015 report M = 57.92). While alpha is less frequently reported, Cronbach’s alpha of 0.95 for the VVIQ sample matched with PCS and of 0.91 for the VVIQ sample matched with SWASH is also consistent with other studies that reported alpha (e.g., Floridou et al., 2022, alpha of 0.93; Burton & Fogarty, 2003, alpha of .95; Nelis et al., 2014, alpha 0.89).
Analyses
All results reported in this paper have been reproduced by an independent statistician from the University of Sussex, and their report can be viewed here (OSF link: https://osf.io/ke4n7). The data and code used to produce all results and figures are available at: https://osf.io/aeyhw/.
Analyses were carried out in R version 4.2.2 (R. Core Team, 2022). Bayes Factors and robustness regions were calculated with the R package “bfrr” (available at https://github.com/debruine/bfrr, DeBruine, Dienes 2020). As no aspects of the study were preregistered, this study is exploratory. Our analyses followed the approach reported by Lush et al. (2020). We tested the relationship between VVIQ and SWASH or PCS by using a simple linear regression of mean scores. We calculated Bayes Factors to assess the strength of evidence against the null (Dienes, 2014; Wagenmakers et al., 2018). A Bayes Factor greater than 3 is considered to indicate moderate evidence for the alternative hypothesis (H1) over the null hypothesis (H0), whereas a Bayes Factor less than 1/3 indicates moderate evidence for H0 over H1. A Bayes Factor between 3 and 1/3 indicates the data do not sensitively distinguish between H0 and H1 (Dienes, 2014). We note that although we also calculated and reported p-values, they have no bearing on the inferential conclusions.
A Bayes Factor for the raw slope of the PCS or SWASH score regressed onto VVIQ score was modelled by considering what the maximum slope could be, given the VVIQ mean, and the SWASH/PCS mean, following the ratio-of-means heuristic described in Dienes (2019). A plausible maximum slope would obtain if people scoring 0 on the VVIQ also scored 0 on the SWASH/PCS; i.e. a plausible maximum slope can be estimated as the line from (0, 0) to (mean VVIQ, mean SWASH/PCS). The maximum slope was estimated as 0.68 SWASH units per VVIQ Likert unit. H1 was therefore modelled using a half-normal distribution with SD equal to 0.34 SWASH Likert units per VVIQ Likert units, half the maximum slope of 0.68. With respect to PCS, the maximum slope was estimated as 0.74 PCS units per VVIQ Likert unit. H1 was modelled using a half-normal distribution with an SD equal to 0.37 PCS Likert units per VVIQ Likert units, half the maximum slope of 0.74.
We tested for the difference in the strength of correlations between VVIQ and PCS, VVIQ and SWASH using Fisher’s z (1921, p. 26) and 95% CI using Zou’s confidence interval (2007) as implemented in the package cocor in R (Diedenhofen & Musch, 2015). A Bayes Factor for this test was calculated using a half-normal distribution, with an SD of .12, corresponding to the maximum difference estimated from the 90% CI upper correlation limit reported by a previous paper which tested the relationship between VVIQ and HGSHS:A in a large sample of n = 722 (r = .08, 90% CI [0.02, 0.14]; Glisky et al., 1995).
Finally, because in our PCS sample we had 30 participants scoring within the aphantasia range (rescaled VVIQ score lower or equal to 1, corresponding to the cut-off of mean VVIQ score being lower or equal to 2), we were able to address the question of whether, due to their lack of imagery, aphantasics scores are lower than non-aphantasics’ (rescaled VVIQ greater than 1) scores by conducting an independent samples Welch t-test to assess the difference in mean PCS scores between the two groups. Following the room-to-move heuristic proposed by Dienes (2019), H1 for Bayes Factor was modelled using a half-normal distribution with SD equal to half the mean of the non-aphantasic group on PCS (0.93). Comparison between aphantasics and non-aphantasics was not possible for the SWASH because only 4 participants in this sample had VVIQ scores within the aphantasia range.
Results
Relationship between SWASH and VVIQ
Mean subjective SWASH score was 1.59 (SD = 0.89). The mean score of the rescaled VVIQ was 2.36 (SD = 0.74). There was not much evidence for or against a relationship between these two measures (Figure 1), b = 0.05 SWASH Likert units per VVIQ Likert units, SE = 0.11, t(129) = 0.43, p = .67, 95% CI [-0.17, 0.26], R2 = -0.01, BH(0,0.34) = 0.43, RR1/3<BF<3 [0, 0.44].
Relationship between PCS and VVIQ
Mean subjective PCS score of our sample was 1.84 (SD = 0.67). The mean score of the rescaled VVIQ was 2.48 (SD = 0.83). As shown in Figure 2, VVIQ predicted subjective PCS score, b = 0.14 PCS Likert units per VVIQ Likert units, SE = 0.04, t(506) = 4.05, p < .001, 95% CI [0.07, 0.21], R2 = 0.03, BH(0,0.37) = 562.10, RRBF>3 [0.01, 0.74].
Comparison of the strength of the correlation between SWASH/VVIQ and PCS/VVIQ
There was no evidence either way for a difference in the strength of correlations between SWASH/VVIQ and PCS/VVIQ, z = 1.43, p = .15, 95% CI [-0.05, 0.33], BH(0,0.12) = 2.02, RR1/3<BF<3 [0, 0.24].
Comparison between non-aphantasics and aphantasics
The mean VVIQ score was 0.47 (SD = 0.42) for aphantasic participants (identified with rescaled VVIQ scores of 1 and below) and 2.61 (SD = 0.68) for the non-aphantasic group (rescaled VVIQ scores greater than 1). Mean PCS score was higher for non-aphantasics (1.86, SD = 0.66) than for aphantasics (M = 1.49, SD = 0.71), t(32.26) = -2.79, SE = 0.13, p = .01, Hedges’ g = -0.56, 95% CI [-0.64, -0.10], BH(0,0.93) = 9.57, RRBF>3 [0.07, 1.86], as shown in Figure 4.
Discussion
We found no evidence for the absence or presence of a relationship between hypnotisability and self-reported mental imagery vividness. We note that, due to the positively skewed distribution of VVIQ scores (McKelvie, 1995), our sample contained only four participants falling in the aphantasia range (rescaled VVIQ scores equal to or below 1).
In a larger sample of N = 508 we found a positive relationship between PCS and VVIQ. Specifically, VVIQ (5-point scale) predicted PCS (6-point scale) by approximately 0.14 PCS scale points per VVIQ Likert unit increase. While this result is in line with our prediction that more vivid imagery is related to a greater ability to control conscious experience, the observed relationship is small. Our comparison of the strength of relationship between VVIQ/SWASH and VVIQ/PCS was insensitive and will therefore not be discussed.
Finally, while there was evidence that aphantasics were lower in PCS than non-aphantasics, the difference between the two groups was one-third of a PCS scale point. We note that, out of the 30 aphantasics identified in this sample, 27% were above the mean PCS of the full sample (M = 1.84). This suggests that, in spite of their self-reported lack of visual imagery, aphantasics can generate experiences in response to imaginative suggestions (albeit to a lesser extent than non-aphantasics), which (except for one item which requires being unable to see a picture of a ball) do not necessarily involve visual experiences. Ultimately, this result provides evidence that visual imagery is not a prerequisite for generating responses to imaginative suggestions, at least for imaginative suggestion scales that do not require participants to generate visual hallucinations (though we note that one PCS and SWASH item involves a negative visual hallucination). This appears in contrast with previous findings by Sutcliffe, Perry & Sheehan (1970), and Perry (1973), who reported that non-hypnotizable participants were consistently lower in imagery abilities than hypnotizable participants (hence supporting a non-linear relationship between the two traits). However, as measures of imagery and hypnotizability were collected together in these earlier studies, we suggest that the reported evidence of non-linearity may be attributable to context effects.
Study 2
In recent years, there has been an increasing interest in aphantasia due to its potential value for addressing questions related to the functioning of imagery in many cognitive tasks (e.g., Dance, Jaquiery, et al., 2021; Dance, Ward, et al., 2021; Jacobs et al., 2018; Keogh & Pearson, 2018; Liu & Bartolomeo, 2023; Palermo et al., 2022; Pounder et al., 2022; Zeman et al., 2015). Aphantasics are rapidly becoming a frequently investigated special population in the imagery field, attracting more research than the opposite end of the spectrum, hyperphantasics. Due to the very low prevalence of this imagery extreme (3.9% - Dance et al., 2022), researchers have mainly opted for direct recruitment of self-reported aphantasics (e.g., via email invitations from existing databases, via posts on specific Facebook or Reddit groups such as “Aphantasia support group” or “r/Aphantasia”, or via self-referral to the research group).
Zeman and colleagues (2020) discussed the possibility that direct recruitment of self-selected aphantasics and hyperphantasics might affect experimental data collected, but to date, no study has directly addressed this issue yet. As discussed earlier, blind testing is important to reduce the effects of demand characteristics but is particularly challenging and time-consuming in rare special populations. To investigate the potential impact of demand characteristics, here we tested whether direct recruitment of a special population (aphantasics) resulted in different PCS and SWASH scores compared to a sample who was not directly recruited.
First, we tested if we could replicate our Study 1 result for PCS (aphantasics being lower in this trait than non-aphantasics) and whether this finding extended to SWASH by assessing if the present study aphantasics’ group scores in both scales were lower than those of non-aphantasics tested in Study 1. Second, we compared scores of participants with aphantasia in Study 1’s out-of-context group (“blinded” aphantasics) with the present study’s self-reported aphantasics tested on PCS or SWASH through direct selection. Based on the existing literature suggesting that context effects inflate the relationship between the imaginative suggestion scales and other measures (Laurence et al., 2008), we predicted lower PCS or SWASH scores in self-reported aphantasics tested in the present study, compared to aphantasics tested via single-blind recruitment in Study 1.
Method
Participants
We screened two samples of aphantasic participants, one on the online version of the SWASH (N = 35) and the other on the online version of PCS (N = 33). We also administered the VVIQ to verify that self-reported aphantasic participants scored below or equal to 1. Based on this, 1 participant from the SWASH sample and 1 participant from the PCS sample were removed as they scored above the moderate aphantasia cut-off (rescaled VVIQ equal or less than 1). Our final sample for SWASH was composed of 34 self-reported aphantasic participants (age 18-40, Mage = 26.5, SDage = 5.1, females, 19, males, 15), while for PCS we included 32 self-reported aphantasic (age 18-35, Mage = 24.9, SDage = 4.77, females, 19, males, 13). Age was different between Study 1 and Study 2 SWASH/PCS samples (SWASH: t(34.08) = -8.43, p < .001; PCS: t(31.51) = -6.45, p < .001).
Self-reported aphantasics were recruited via email invitation from the University of Sussex’s Imagery Lab – Aphantasia Cohort database. The database was generated by recruiting aphantasics through multiple sources – those that self-referred to the Imagery lab to volunteer for research, those who were recruited via advertising on online forums and social media (e.g., aphantasia Facebook groups, Reddit), and by screening the student body at the University of Sussex. Within the database, participants were classified as aphantasic if they scored between 0 and 1 (tending to rate their visual imagery as absent, or vague/dim) on the mean rescaled VVIQ.
We collected as many participants as we could in one term and, as for Study 1, we tested for sensitivity for the main analyses at the conventional Bayes Factor > 3. Approval was received from the University of Sussex Sciences & Technology Cross-Schools Research Ethics Committee (ER/GC337/16) and participants gave informed consent for the study.
Materials
Materials were the same as in Study 1.
Analyses
As in Study 1, there was no preregistration, therefore analyses are exploratory. First, to test if we could replicate results concerning the comparison between aphantasics and non-aphantasics found in Study 1, we conducted an independent samples Welch t-test to assess the difference in mean SWASH and PCS scores between non-aphantasics from Study 1 and self-reported aphantasics from Study 2. H1s for Bayes Factors were modelled using a uniform distribution from 0 to the mean SWASH score (1.60) or PCS score (1.86) of the non-aphantasic sample from Study 1.
Second, to assess the role of demand characteristics (beyond that of imagery abilities), we tested whether differences in PCS and SWASH scores would emerge between self-reported aphantasics (N = 32 and N = 34 respectively) and “blinded” aphantasics assessed from the control sample of Study 1 (who were not recruited in light of their aphantasia and were tested on VVIQ / PCS / SWASH separately). We addressed this question in two ways. First, considering the linear relationship observed for PCS and VVIQ, we used the intercept of the linear model from Study 1 (PCS predicted by VVIQ) to define estimates for the “blinded” aphantasics. The intercept corresponds to the predicted PCS score when VVIQ (rescaled) is 0, corresponding to extreme aphantasia cut-off. By using all the data to estimate the PCS mean score for this group, this approach provides a more precise estimate than the group mean that would be obtained from sub-selecting Study 1 data by using arbitrary cutoffs (e.g., literature-based cutoff or matching VVIQ mean). We also tested this comparison with respect to SWASH, assuming that for this measure a linear relationship would have emerged for the VVIQ in a larger sample (we note that other than containing no reference to hypnosis, the PCS is identical to the SWASH). Our H0 held that only visual imagery abilities would account for any difference we would find between self-reported aphantasics and non-aphantasics (hence we predicted no difference between the intercept PCS/SWASH score and that of self-reported aphantasics). H1 held that self-reported aphantasics PCS or SWASH scores would be attributable to a mix of imagery ability and demand characteristics. To carry out the t-test between Study 1 model intercept and Study 2 self-reported aphantasics, we first calculated the mean difference between the intercept value and the mean of self-reported aphantasics (e.g., mean difference was 1.02 for PCS). The standard error (SE) of the mean difference was calculated as equal to the square root of the sum of the squared SEs for each estimate (square of SE of Study 1 intercept + square of SE of Study 2 self-reported aphantasics). The t-value was thus calculated as equal to mean difference / SE of mean difference. The degrees of freedom (df) were calculated as df for Study 1 intercept + df for Study 2 self-reported aphantasics. We calculated the Bayes Factor using a uniform distribution with the lower boundary being 0 and the upper boundary corresponding to the intercept of the linear model of the full sample from Study 1 (PCS/SWASH when rescaled VVIQ = 0) which was 1.49 for SWASH and 1.49 for PCS.
Second, as additional analysis, we used the cut-off for moderate aphantasia established by the literature of rescaled VVIQ equal or less than 1 in both groups (blinded aphantasics from Study 1 and self-reported aphantasics from the present study) and compared the two groups’ mean PCS score. H1 for Bayes Factors was modelled using a uniform distribution from 0 to mean PCS score (1.49) of Study 1 aphantasics. We were able to carry out this final analysis only for PCS and not SWASH, because of the low sample size of aphantasics in the SWASH Study 1 sample.
Results
SWASH
Comparison with mean SWASH in the non-aphantasics sample. The mean rescaled VVIQ score in self-reported aphantasic participants was 0.03 (SD = 0.07), while for the non-aphantasic group was 2.43 (SD = 0.63). Mean SWASH score was higher for the non-aphantasic group (M = 1.60, SD = 0.89) than for self-reported aphantasics (M = 0.69, SD = 0.79), t(57.48) = 5.84, SE = 0.16, p < .001, Hedges’ g = 1.05, 95% CI [0.60, 1.23], BH(0,1.6) = 6.50 × 105, RRBF>3 [0.07, 1.6], as shown in Figure 3.
Comparison with intercept of the full sample (SWASH when rescaled VVIQ = 0). Mean SWASH score of self-reported aphantasics was lower than Study 1 linear model intercept (1.49, SE = 0.26), t(163) = 2.7, SE = 0.3, p = .004, 95% CI [0.2, 1.4], BH(0,1.49) = 17.10, RRBF>3 [0.25, 1.49].
PCS
Comparison with mean PCS in the non-aphantasics sample. The mean rescaled VVIQ score was 0.05 (SD = 0.17) for self-reported aphantasics and 2.61 (SD = 0.68) for non-aphantasics. As shown in Figure 4, mean PCS score was higher for the non-aphantasic group (1.86, SD = 0.66) than for self-reported aphantasics (M = 0.47, SD = 0.63), t(35.76) = 12.08, SE = 0.12, p < .001, Hedges’ g = 2.10, 95% CI [1.16, 1.63], BH(0,1.86) = 7.10 × 1018, RRBF>3 [0.04, 1.86].
Comparison with intercept of the full sample (PCS score when rescaled VVIQ = 0). Mean PCS score of self-reported aphantasics was significantly lower than Study 1 linear model intercept (1.49, SE = 0.09), t(538) = 7.05, SE = 0.14, p < .001, 95% CI [0.72, 1.31], BH(0,1.49) = 1.66 × 108, RRBF>3 [0.06, 1.49].
Comparison with blinded aphantasics based on cut-off of rescaled VVIQ <=1. As shown in Figure 4, mean PCS score was higher (M = 1.49, SD = 0.71) for the “blinded” aphantasics from Study 1 sample than self-reported aphantasics from Study 2, t(58.04) = 5.98, p < .001, SE = 0.17, Hedges’ g = 1.51, 95% CI [0.68 1.36], BH(0,1.49) = 1.44 x 106, RRBF>3 [0.07, 1.49].
Discussion
Self-reported aphantasics scored lower than the control group of non-aphantasics on both SWASH and PCS, which appears in line with the results of Study 1. However, their scores were also substantially lower than those of “blinded” aphantasics identified in Study 1, when either considering group scores for a literature-based cut-off (rescaled VVIQ score lower or equal to 1) or the extreme cut-off for aphantasia (estimated by the intercept of the linear model of Study 1, which corresponds to mean rescaled VVIQ of 0). Specifically, we observed a drop of more than 1 point in PCS (1.02) and 0.8 point in SWASH in self-reported aphantasics of Study 2 compared to the intercept of the linear model of Study 1. We note that in the latter test, we used the cut-off of rescaled VVIQ = 0 for indexing the blinded aphantasics sample of Study 1 while keeping the self-reported aphantasics of Study 2 at the moderate, literature-based cut-off, as this would provide the strongest test for assessing a difference between the two groups. However, we also reported in Supplementary Material (link: https://osf.io/aeyhw/) the same comparison using the group mean for PCS with the extreme cut-off (rescaled VVIQ = 0) applied to both groups, in which we again found self-reported aphantasics to score lower in phenomenological control than “blinded” aphantasics (Figure S1).
Based on this finding, we propose that the lower scores observed in the self-reported aphantasic group of Study 2 may be attributable to demand characteristics arising from the different recruitment (direct recruitment vs. single-blind procedure).
General Discussion
In a first study, we investigated the link between self-reported variation in imagery vividness and the ability to control conscious experience either in or out of hypnotic context, with each trait measure being tested separately to minimize context effects. We found evidence for a small positive relationship between visual imagery vividness and phenomenological control measured outside the hypnotic context. Participants scoring in the VVIQ aphantasia range were lower on PCS than non-aphantasics participants. No evidence was found for or against a relationship between trait visual imagery and hypnotisability. In a second study, to address demand characteristic effects arising from a lack of blinding, we tested through direct recruitment (email invitation) self-reported aphantasics on both PCS and SWASH scales. This group had lower scores for SWASH and PCS than non-aphantasic participants. However, self-reported aphantasics also showed substantially lower scores than the control group of aphantasics identified by commonly employed VVIQ criteria in Study 1 sample, which may reflect differing demand characteristics between the two groups.
To our knowledge, this is the first study investigating the relationship between individual differences in visual mental imagery and the ability to generate responses to direct imaginative suggestions outside the hypnotic context. It was also the first time that the relationship between VVIQ and a measure assessing trait response to imaginative suggestion was assessed in a large sample (n = 508) while simultaneously controlling for context effects. Similarly, albeit insensitive, the SWASH / VVIQ sample was larger than previous studies that collected the two measures separately (e.g., Kogon et al., 1998; Perlini et al., 1992).
While these results are correlational and other causal explanations are therefore possible, our finding of a weak but positive relationship between VVIQ and PCS is in line with the proposal that vividness of imagery has a small moderating effect on the generation of responses to imaginative suggestions. Notably, despite their absent or dim visual imagery, blinded aphantasics had only slightly lower PCS scores than non-aphantasics (and were not bound to zero), indicating that they were capable of generating experiences in response to imaginative suggestions. This suggests that the ability to generate voluntary visual imagery, while potentially acting as a facilitator in the elaboration of more vivid subjective experience, is not essential to generate experiences in response to imaginative suggestions.
It must be noted that although many PCS suggestions involve sensory imagery, none of them directly involves visual imagery (though one item required participants to become unable to see a stimulus). It has been previously argued that responses to different types of suggestions may represent partially distinct abilities (Woody & Barnier, 2008). Although visual imagery vividness has been linked to high vividness in other sensory modalities (e.g., motor, auditory, see Floridou et al., 2022), abilities indexed by the VVIQ might have a stronger relationship with responses to suggestions involving visual hallucinations. For instance, it has been reported that imagery abilities are predictive of pseudo-hallucinations and anomalous perception (Königsmark et al., 2021; Reeder, 2022; Salge et al., 2020). Although imaginative suggestion scales do not commonly include direct visual suggestions, there are potential candidates in pre-existing research. For instance, a study by Mazzoni and colleagues (2009) found that highly hypnotizable participants reported being able to see colours on a grey-scale picture and see colour pictures as grey. Future research developing imaginative suggestion scales of visual experience will be required to address whether VVIQ scores have a stronger relationship with visual hallucination suggestions. Furthermore, in light of the multisensory nature of the available imaginative suggestion scales, we suggest that future studies should take into account other questionnaires which allow assessing subjective imagery abilities in other sensory modalities, such as the Plymouth Sensory Imagery Questionnaire (PSI-Q; Andrade et al., 2013) to gather a more comprehensive picture of the relationship between this trait and phenomenological control.
As mentioned earlier, the key signature of experiences generated in response to imaginative suggestion is that although they are voluntary (Nash & Barnier, 2008), they are perceived as being involuntary and effortless. According to the cold control theory of phenomenological control (Dienes et al., 2022), an individual’s ability to experience a voluntary act as involuntary is central to response to imaginative suggestion (Dienes et al., 2022; Dienes & Perner, 2007). This ability is not captured by the VVIQ, which is focused on assessing self-reported voluntary imagery vividness. Therefore, phenomenological control might have a stronger relationship with the tendency to experience involuntary imagery, as also proposed by Terhune et al. (2020). This might also partially explain why aphantasics, who are considered to have a specific deficit towards the voluntary generation of imagery (Zeman et al., 2015), can generate experiences in response to imaginative suggestions. Indeed, as shown in Zeman et al. (2015) and Dawes et al. (2020), many people with aphantasia still report some forms of involuntary imagery, such as visual dreams or flashbacks, suggesting they might be capable of experiencing imagery when the requirement for its voluntary generation is removed (Zeman et al., 2020). It could be that responses to suggestions are facilitated by imagery, but in the case of aphantasics (or even non-aphantasics), strategic use of imagery is implemented outside of responders’ awareness or control. Hence, addressing the role of involuntary imagery abilities in imaginative suggestions represents another important goal for future research.
The second key contribution of this work consists of revealing for the first time a possible impact of demand characteristics on effects attributed to individual differences in imagery vividness, including aphantasia. In Study 2, self-reported aphantasics tested via direct recruitment scored substantially lower on PCS and SWASH than aphantasic assessed with single-blind recruitment. This result suggests that differing demand characteristics between the two groups accounted for the extremely low scores of PCS and SWASH (bound to zero) in Study 2, beyond their lack of imagery abilities. Specifically, while only 10% of aphantasics’ PCS scores in Study 1 were below or equal to 0.5, in Study 2, 66% of aphantasics had scores below or equal to 0.5. Awareness of their aphantasic status and the direct selection process likely influenced interpretation, motivation, and expectations of experimental aims in this group. Indeed, having the ability to generate an experience does not ensure that someone will respond to imaginative suggestions, but participants’ pre-existing beliefs and attitudes toward imaginative responding play an important role in generating responses to suggestions (Dienes et al., 2022; Dienes & Perner, 2007). Low scores could arise from, for example, unconscious or conscious attempts to comply with assumed experimental aims (compliance) or from reactance at being asked, as someone who is aware of their aphantasic status, to engage in an “exercise of imagination” (Corneille & Lush, 2023). It is also striking that 34% of the sample in Study 2 scored zero in PCS. This is an unusually low score – of the 244 participants in the PCS norming study (Lush, Scott, Seth, et al., 2021), none had a score of zero, and none scored 0 in our Study 1 PCS sample. Such a low score may indicate reactance (Lush, Scott, Seth, et al., 2021).
A limitation is that our study design does not rule out a role of demographic differences between groups, such as age, level of education and familiarity with psychological experiments. Of these factors, we measured only age, for which we observed a difference. However, we note that mean scores on imaginative suggestion scales (in the hypnotic context) are fairly stable across age (M at 19.5 years old = 5.9, M after 10 years = 6.0, M after 25 years = 6.5, Piccione et al., 1989). While there is some variation in scores across hypnotizability scale samples for scale translations in different countries (e.g., de Saldanha da Gama et al., 2012), we are not aware of any potential demographic differences which could account for the size of the difference in scale scores between groups in Study 2. Regarding the Phenomenological Control Scale, we report in the supplementary material comparisons with a sample of the general public (n = 240) recruited and tested online on a German translation of the PCS. This German sample of internet volunteers (mean age 33.42, SD = 12.31, https://osf.io/b7yfx/; see supplemental material for descriptives and analyses) showed an average PCS score of 1.9 (SD = 0.7). As Study 2 sample, the German sample is older than Study 1 sample (mean difference of 13.5 years). Despite the difference in age and recruitment and other potential demographic differences, the German internet sample PCS score is similar to that of the student sample in Study 1. In contrast, the Study 2 self-reported aphantasic PCS score is lower than that of the German sample, with a mean difference of 1.41. As such, general demographic differences linked to online recruitment from the general population and age cannot easily explain the lower PCS score in Study 2 sample. Therefore, while we cannot exclude the possibility that uncontrolled factors are involved, we consider systematic differences in beliefs arising from demand characteristics to be a more plausible explanation.
Our conclusions regarding relationships between trait phenomenological control and visual imagery are based on the assumption that VVIQ scores can be interpreted as evidence of visual imagery ability. While we acknowledge this assumption, we note that our results are potentially informative for any research that employs the VVIQ as a measure of aphantasia. At present, within the field of mental imagery, the VVIQ is the standard measure used to identify aphantasic participants (e.g., Bainbridge et al., 2021; Dance et al., 2022; Dance, Ward, et al., 2021; Dawes et al., 2020; Kay et al., 2022; Keogh et al., 2021; Keogh & Pearson, 2018; Liu & Bartolomeo, 2023; Milton et al., 2021; Pounder et al., 2022; Wicken et al., 2021; Zeman et al., 2020). We note that most of the aforementioned studies also recruited aphantasics from aphantasia-related forums, thereby confirming the self-reported aphantasics via VVIQ scores. Indeed, aphantasia was defined with regard to a specific range of VVIQ scores in the paper in which the term was introduced (Zeman et al., 2015). Whether or not aphantasia can be validly assessed by the administration of VVIQ, it is a routine procedure in the field to do so.
By characterizing the effects of demand characteristics when directly recruiting a self-selected special population, our results may have important methodological implications for imagery studies, especially those concerning imagery extremes (Zeman et al., 2020). This issue has also recently been raised in meditation (Davidson & Kaszniak, 2015) as well as synaesthesia research (Simner, 2013; Ward et al., 2018). Notably, by using a double-blind procedure Brang & Ahn (2019) failed to replicate the higher VVIQ scores found among synesthetes in previous studies which directly recruited this special population (e.g., Barnett & Newell, 2008). Similarly, by screening participants from the general population (single-blind procedure), Spiller et al. (2019) failed to find evidence that scores on a task used to identify grapheme-colour synaestheses were related to VVIQ scores.
We therefore suggest that to reduce the risk of confounding demand characteristic effects, imagery extremes should preferably be recruited using a single-blind procedure as in Study 1 (e.g., identifying them from a pool of participants and subsequently relate their score to a measure of interest collected separately). Concerning aphantasia, unfortunately, this is particularly challenging to achieve given the low prevalence of this group in the general population (3.9%, Dance et al., 2022). Where possible, it would be ideal to adopt blinding recruitment of special populations. For example, in undergraduate samples this may be achieved by pre-screening entire cohorts on the VVIQ. However, this may not be practical in all cases (i.e., for labour intensive and time consuming in-person procedures). In such cases, potential demand effects should be acknowledged.
In conclusion, the present work adds to the literature on relationships between phenomenological control in the hypnotic context and visual imagery ability by extending investigation to the non-hypnotic context. Moreover, we addressed the methodological limitations of previous studies by testing our measures of interest separately to control for context effects and in a large sample, which included people with aphantasia. Finally, our second study highlighted that demand characteristics may confound the interpretation of special population studies in imagery research. It is important to note that this second methodological result does not necessarily invalidate previous findings – here, context effects inflated a group difference but did not completely change the pattern of results. While the degree to which individual results are confounded by demand characteristics arising from non-blinded studies requires investigation for each case, future research should pay careful consideration in controlling for context effects when recruiting imagery extremes.
Competing Interests
We declare we have no competing interests.
Author Contributions
G.C.: conceptualization, data curation, formal analysis, investigation, project administration, methodology, visualization, software, writing—original draft, writing—review and editing; C.D.: resources, writing—review and editing; Z.D.: methodology, writing—review and editing; J.S.: resources, writing—review and editing; S.F.: project administration, supervision, writing—review and editing; P.L.: investigation, resources, methodology, writing—review and editing.
Funding
This work was supported by Sussex Neuroscience (School of Life Sciences, University of Sussex); the Economic and Social Research Council [ES/W007320/1]; and a donation of the Dr. Mortimer and Sackler Foundation to the Sackler Centre for Consciousness Science at the University of Sussex.
Acknowledgments
We would like to thank editors and reviewers for their comments on the manuscript.
Data Accessibility Statement
The data and code used to produce all results and figures are available at: https://osf.io/aeyhw/. All results reported in this paper have been reproduced by an independent statistician from the University of Sussex, and their report can be viewed here (OSF link: https://osf.io/ke4n7).