Adaptive behavior requires the rapid extraction of behaviorally relevant information in the environment, with particular emphasis on emotional cues. However, the speed of emotional feature extraction from complex visual environments is largely undetermined. Here we use objective electrophysiological recordings in combination with frequency tagging to demonstrate that the extraction of emotional information from neutral, pleasant, or unpleasant naturalistic scenes can be completed at a presentation speed of 167 ms (i.e., 6 Hz) under high perceptual load. Emotional compared to neutral pictures evoked enhanced electrophysiological responses with distinct topographical activation patterns originating from different neural sources. Cortical facilitation in early visual cortex was also more pronounced for scenes with pleasant compared to unpleasant or neutral content, suggesting a positivity offset mechanism dominating under conditions of rapid scene processing. These results significantly advance our knowledge of complex scene processing in demonstrating rapid integrative content identification, particularly for emotional cues relevant for adaptive behavior in complex environments.
Introduction
The prioritization of emotional stimuli is pivotal for fast behavioral reactions, for example in the case of threat or danger when the decision for fight-or-flight needs to be made almost instantaneously in order to survive. Appetitive and defensive neural systems, evolved to ensure survival and the continuation of the species (Lang & Bradley, 2010), allow the organism to appraise environmental stimuli on the basis of common motivational parameters, such as valence and arousal (Cacioppo & Berntson, 1994; Cacioppo & Gardner, 1999; Öhman, Hamm, & Hugdahl, 2000; Russell, 1980). In the cognitive electrophysiological literature, many studies have consistently reported enhanced electrical brain activity in response to emotional compared to neutral visual scenes (Olofsson, Nordin, Sequeira, & Polich, 2008), starting at around ~200 ms post-stimulus onset (Costa et al., 2014; Junghöfer, Bradley, Elbert, & Lang, 2001; Schupp, Junghöfer, Weike, & Hamm, 2003b) and presumably subtended by attention-dependent enhancement of sensory input strength (Desimone & Duncan, 1995; Reynolds & Heeger, 2009).
While electrophysiological enhancement is typically interpreted as a consequence of the high relevance of valence and arousal features for adaptive behavior and survival (Frijda, 2016; Lang & Bradley, 2010), the speed at which these emotional cues are extracted from complex visual scenes is still unresolved. Most electrophysiological studies in the affective neuroscience literature have displayed visual stimuli on screen for a relatively long time – e.g., from 333 milliseconds (Schupp, Junghöfer, Weike, & Hamm, 2003a) up to 6 seconds (Hajcak, Dunning, & Foti, 2009) –, a methodological choice that gives the visual system ample opportunity to thoroughly process semantic content but, on the other hand, likely overestimates the time required for initial emotional cue extraction. One obvious solution is to shorten the presentation time of each individual stimulus, thus creating a train of identical images in rapid succession (rapid serial visual presentation; RSVP). Such a presentation typically elicits steady-state visually evoked potentials (SSVEPs), oscillatory posterior brain responses considered a continuous marker of stimulus processing in early visual cortex (Norcia, Appelbaum, Ales, Cottereau, & Rossion, 2015; Regan, 1977; Vialatte, Maurice, Dauwels, & Cichocki, 2010; Wieser, Miskovic, & Keil, 2016). Previous studies (A. Keil et al., 2003, 2009, 2008) used this technique by presenting identical images ten times per second (i.e., 10 Hz) and found larger SSVEP amplitudes for emotional compared to neutral scenes, suggesting once again attention-dependent sensory enhancement for emotional material (Kim, Grabowecky, Paller, Muthu, & Suzuki, 2007; M. M. Müller & Hübner, 2002; M. M. Müller, Malinowski, Gruber, & Hillyard, 2003). However, flickering the exact same picture at a specific frequency over the course of a trial makes it difficult to assess whether the observed increased occipital activity is driven by the extraction of the emotional content from each individual picture onset or a consequence of the integration of emotional information across the entire trial. To overcome this limitation, one can present a different image at every cycle (Alonso-Prieto, Belle, Liu-Shuang, Norcia, & Rossion, 2013; Bekhtereva & Müller, 2015; Retter & Rossion, 2016), with the important corollary that each image serves as a forward mask for the subsequent stimulus and as a backward mask for the preceding one (Keysers & Perrett, 2002). For instance, Bekhtereva and Müller (2015) employed an RSVP paradigm in which different images were presented at each cycle, while the emotional content of the stimuli switched only once per trial (e.g., from streams of neutral to unpleasant scenes and vice versa). Emotion-dependent amplitude modulations of occipito-temporal SSVEPs were observed when each stimulus was presented every ~167 ms (i.e., 6 Hz), whereas no statistical difference was reported at a faster presentation rate (i.e., ~67 ms/image, 15 Hz). This is consistent with previous findings showing that the visual system is capable of extracting the emotional content of complex scenes from images presented as briefly as 80 ms in a backward-masking paradigm (e.g., Codispoti, Mazzetti, & Bradley, 2009).
In the present study, we investigated whether the extraction of emotional features from each picture in the stream can be achieved in situations of greater variability. In other words, is the visual system able to reliably extract emotional information from rapidly presented complex images whose semantic content varies in a quick and unpredictable way? Consider this approach similar to “stress tests” performed on computer hardware or software to assess robustness and error handling under a heavy perceptual load, in order to ensure stability in normal environments. Capitalizing on the results of Bekhtereva and Müller (2015), we flickered different images every ~167 ms (i.e., 6 Hz). Pleasant, unpleasant, or neutral content was presented in a pseudorandom fashion, to exclude the possibility that cortical facilitation could be based on the extraction of general valence rather than the processing of each individual image. The key manipulation was that, within each picture stream, a neutral, pleasant, or unpleasant picture was regularly presented once every three stimuli, thereby creating a semantic regularity at 2 Hz. In addition, we controlled for the influence of low-level visual properties by using the very same stimulation protocol with scrambled versions of the test images (see Materials and Methods for details). Given our previous findings, we hypothesized that, if the visual system were able to extract the general emotional content from images presented as briefly as ~167 ms, the regularity of emotional content within the presented stream of images should lead to a discernable neural signal of this extraction. Specifically, we predicted increased SSVEP amplitudes at 2 Hz only for original (non-scrambled) pictures when the regularity is conveyed by emotional compared to neutral scenes.
Results
Participants were required to keep fixation on a central cross and passively view an RSVP stream of pictures flickering at 6 Hz. A different picture was presented at each cycle, corresponding to ~167 ms presentation time per individual image. In the irregular condition, neutral, pleasant, and unpleasant scenes alternated in pseudorandom order. In the original regular conditions, a neutral, pleasant, or unpleasant picture was systematically presented once every 3 stimuli during the stream, i.e., at 2 Hz. This regularity was strengthened by two additional methodological constraints: (i) we never used the same picture more than once within a trial; (ii) the order of the non-regular fillers (i.e., in between cycles of regular pictures) was pseudorandomized (e.g.,unpleasant – pleasant – neutral –unpleasant – neutral – pleasant –unpleasant – …), thus creating completely unpredictable sequences. Scrambled regular conditions were additionally used to control whether any SSVEP amplitude enhancement at 2 Hz could be ascribed to the extraction of low-level features (e.g., color, contrast, or spatial frequency content) (see Figure 1). Scrambled images were preferred as control condition over picture inversion because identification times for upright and inverted scenes has been found to be similar (Rieger, Köchy, Schalk, Grüschow, & Heinze, 2008; Rousselet, Macé, & Fabre-Thorpe, 2003; Vuong, Hof, Bülthoff, & Thornton, 2006), presumably because the main constituents of inverted scenes can quickly be identified through some “flipping compensation process” (J. E. Murray, 1997) matching the current object with memory templates (De Caro & Reeves, 2000).
Increased signal-to-noise ratio for original regular scenes
A necessary first step was to verify that 2 Hz amplitude during the presentation of original scenes was reliably larger than noise, i.e., amplitude at neighboring frequencies (Ding, Sperling, & Srinivasan, 2006; Gundlach & Müller, 2013; Sutoyo & Srinivasan, 2009). Therefore, we calculated the signal-to-noise ratio (SNR) of each condition by dividing amplitude at 2 Hz by the averaged amplitude at the following frequencies (in Hz): 1.00, 1.14, 1.29, 1.43, 1.57, 1.71, 1.86, 2.14, 2.29, 2.43, 2.57, 2.71, 2.86, 3.00. SNR values close to 1 would indicate close similarity between signal and noise. As can be seen in Table 1, SNR to original regular trials was consistently above 1, whereas SNR to irregular and scrambled regular pictures was close to (or below) 1.
condition . | SNR . | amplitude . |
---|---|---|
irregular | 0.99 (0.10) | 0.17 (0.02) |
original regular neutral | 1.85 (0.19) | 0.33 (0.03) |
original regular pleasant | 2.46 (0.34) | 0.46 (0.04) |
original regular unpleasant | 1.70 (0.14) | 0.27 (0.02) |
scrambled regular neutral | 0.65 (0.05) | 0.12 (0.01) |
scrambled regular pleasant | 0.98 (0.12) | 0.19 (0.01) |
scrambled regular unpleasant | 1.00 (0.10) | 0.18 (0.01) |
condition . | SNR . | amplitude . |
---|---|---|
irregular | 0.99 (0.10) | 0.17 (0.02) |
original regular neutral | 1.85 (0.19) | 0.33 (0.03) |
original regular pleasant | 2.46 (0.34) | 0.46 (0.04) |
original regular unpleasant | 1.70 (0.14) | 0.27 (0.02) |
scrambled regular neutral | 0.65 (0.05) | 0.12 (0.01) |
scrambled regular pleasant | 0.98 (0.12) | 0.19 (0.01) |
scrambled regular unpleasant | 1.00 (0.10) | 0.18 (0.01) |
Note: 20% trimmed means and standard errors (in parentheses).
One-tailed one-sample Wilcoxon signed-rank tests confirmed that the SSVEP signal at 2 Hz was statistically different from noise in response to regular original neutral (Z = 329, p = 3.99 × 10–5, Hedges’ g = 0.94, CI95% [0.58, 1.28]), original pleasant (Z = 345,p = 1.46 × 10–6, g = 1.11, CI95% [0.80, 1.45]), and original unpleasant scenes (Z = 340, p = 4.92 × 10–6, g = 0.97, CI95% [0.69, 1.26]). No statistically significant differences were observed for irregular (Z = 168, p = .999, g = –0.03, CI95% [–0.43, 0.37]), regular scrambled neutral (Z = 14, p = .999, g = –1.27, CI95% [–1.95, –0.63]; here noise is higher than signal, hence the large effect size but non-significantp-value), scrambled pleasant (Z = 178,p = .999, g = 0.14, CI95% [–0.25, 0.50]), or scrambled unpleasant conditions (Z = 177, p = .999, g = 0.12, CI95% [–0.29, 0.44]). Complementary one-tailed Bayesian t-tests against 1 (Rouder, Speckman, Sun, Morey, & Iverson, 2009) confirmed that the observed SNR was better explained by the alternative hypothesis (H1) for regular original neutral (BF10 = 1,182.36 ± 0.00%), original pleasant (BF10 = 9,662.62 ± 0.00%), and original unpleasant conditions (BF10 = 1,609.18 ± 0.00%), whereas the null hypothesis (H0) ought to be preferred for SNR in response to irregular (BF10 = 0.19 ± 0.00%), regular scrambled neutral (BF10 = 0.02 ± 0.00%), scrambled pleasant (BF10 = 0.40 ± 0.03%), and scrambled unpleasant conditions (BF10 = 0.37 ± 0.00%) (see Table 2). For a specification of H1 and H0, see Materials and Methods.
comparison . | condition . | r = 1 . | r = .707 . | r = .5 . | |||
---|---|---|---|---|---|---|---|
. | |||||||
BF10 . | % pe . | BF10 . | % pe . | BF10 . | % pe . | ||
2 Hz amplitude | irregular | 0.14 | ±0.01 | 0.19 | ±0.00 | 0.25 | ±0.01 |
regular neutral | 1,203.13 | ±0.00 | 1,182.36 | ±0.00 | 1,053.77 | ±0.00 | |
regular pleasant | 10,404.17 | ±0.00 | 9,662.62 | ±0.00 | 8,182.60 | ±0.00 | |
vs. | regular unpleasant | 1,652.19 | ±0.00 | 1,609.17 | ±0.00 | 1,422.10 | ±0.00 |
noise | scrambled regular neutral | 0.02 | ±0.00 | 0.02 | ±0.00 | 0.01 | ±0.00 |
scrambled regular pleasant | 0.30 | ±0.00 | 0.40 | ±0.03 | 0.51 | ±0.00 | |
scrambled regular unpleasant | 0.27 | ±0.02 | 0.37 | ±0.00 | 0.48 | ±0.00 | |
scrambled | neutral | 115,028.63 | ±0.00 | 100,128.86 | ±0.00 | 80,480.92 | ±0.00 |
vs. | pleasant | 6,601.58 | ±0.00 | 6,096.65 | ±0.00 | 5,138.10 | ±0.00 |
original | unpleasant | 62.41 | ±0.00 | 65.88 | ±0.00 | 63.12 | ±0.00 |
2 Hz amplitude, main effect | emotion | 10.04 | ±0.01 | 13.45 | ±0.00 | 15.66 | ±0.01 |
pairwise | neutral vs. pleasant | 13.55 | ±0.00 | 15.12 | ±0.00 | 15.43 | ±0.00 |
comparisons | neutral vs. unpleasant | 0.26 | ±0.00 | 0.35 | ±0.03 | 0.45 | ±0.01 |
(original) | unpleasant vs. pleasant | 224.47 | ±0.00 | 227.19 | ±0.00 | 208.40 | ±0.00 |
comparison . | condition . | r = 1 . | r = .707 . | r = .5 . | |||
---|---|---|---|---|---|---|---|
. | |||||||
BF10 . | % pe . | BF10 . | % pe . | BF10 . | % pe . | ||
2 Hz amplitude | irregular | 0.14 | ±0.01 | 0.19 | ±0.00 | 0.25 | ±0.01 |
regular neutral | 1,203.13 | ±0.00 | 1,182.36 | ±0.00 | 1,053.77 | ±0.00 | |
regular pleasant | 10,404.17 | ±0.00 | 9,662.62 | ±0.00 | 8,182.60 | ±0.00 | |
vs. | regular unpleasant | 1,652.19 | ±0.00 | 1,609.17 | ±0.00 | 1,422.10 | ±0.00 |
noise | scrambled regular neutral | 0.02 | ±0.00 | 0.02 | ±0.00 | 0.01 | ±0.00 |
scrambled regular pleasant | 0.30 | ±0.00 | 0.40 | ±0.03 | 0.51 | ±0.00 | |
scrambled regular unpleasant | 0.27 | ±0.02 | 0.37 | ±0.00 | 0.48 | ±0.00 | |
scrambled | neutral | 115,028.63 | ±0.00 | 100,128.86 | ±0.00 | 80,480.92 | ±0.00 |
vs. | pleasant | 6,601.58 | ±0.00 | 6,096.65 | ±0.00 | 5,138.10 | ±0.00 |
original | unpleasant | 62.41 | ±0.00 | 65.88 | ±0.00 | 63.12 | ±0.00 |
2 Hz amplitude, main effect | emotion | 10.04 | ±0.01 | 13.45 | ±0.00 | 15.66 | ±0.01 |
pairwise | neutral vs. pleasant | 13.55 | ±0.00 | 15.12 | ±0.00 | 15.43 | ±0.00 |
comparisons | neutral vs. unpleasant | 0.26 | ±0.00 | 0.35 | ±0.03 | 0.45 | ±0.01 |
(original) | unpleasant vs. pleasant | 224.47 | ±0.00 | 227.19 | ±0.00 | 208.40 | ±0.00 |
A direct comparison of SNR in regular original and scrambled conditions would provide compelling evidence that the SSVEP signal at 2 Hz is reliably reflecting cognitive processes not exclusively related to the extraction of low-level visual properties of the stimuli. Two-tailed paired-sample Wilcoxon signed-rank tests confirmed larger SNR in response to original vs. scrambled neutral (Z = 348, p = 4.47 × 10–7, g = 1.37, CI95% [0.98, 1.76]), pleasant (Z = 343, p = 1.49 × 10–6, g = 1.13, CI95% [0.76, 1.48]), and unpleasant scenes (Z = 316, p = 1.26 × 10–4, g = 0.76, CI95% [0.26, 1.12]). Bayesian t-tests corroborated the NHST results: the difference between original and scrambled conditions was more likely under H1 for neutral (BF10 = 100,128.86 ± 0.00%), pleasant (BF10 = 6,096.65 ± 0.00%), and unpleasant scenes (BF10 = 65.88 ± 0.00%) (see Table 2).
These results show that the 2 Hz signal in response to original regular pictures was reliably larger than the signal in response to irregular or scrambled pictures and likely reflected post-perceptual brain processes. As 2 Hz signals for irregular and scrambled conditions did not reliably differ from noise, they were not further analyzed.
Amplitude modulations depend on emotional content
Our main prediction was that SSVEP amplitude at 2 Hz should be larger in response to emotional relative to neutral scenes. A one-way robust repeated measures ANOVA (rbANOVARM) (Field, Miles, & Field, 2012; Field & Wilcox, 2017) revealed a significant main effect ofemotion (F1.68, 25.13 = 17.00,p = 4.79 × 10–5,ξ = .51, CI95% [.14, .82]). A complementary Bayesian ANOVARM (Rouder, Engelhardt, McCabe, & Morey, 2016; Rouder, Morey, Speckman, & Province, 2012; Rouder, Morey, Verhagen, Swagman, & Wagenmakers, 2017) confirmed that these amplitude values were more likely to be explained by the model with the main effect ofemotion compared to the null model (BF10 = 13.45 ± 0.01%). We proceeded to qualify the direction of this difference by means of two-tailed paired-sample Wilcoxon signed-rank tests. Amplitude in response to pleasant scenes was reliably larger compared to neutral (Z = 60, p = .005, g = –0.64, CI95% [–1.08, –0.15]) and unpleasant scenes (Z = 43, p = .001, g = –0.87, CI95% [–1.36, –0.45]). The difference between neutral and unpleasant conditions was not statistically significant (Z = 219, p = .280, g = 0.20, CI95% [–0.20, 0.62]). Bayesian t-tests confirmed that H1 should be preferred when comparing pleasant to neutral (BF10 = 15.12 ± 0.00%) and unpleasant conditions (BF10 = 227.19 ± 0.00%), whereas the difference between neutral and unpleasant conditions leaned towardsH0 (BF10 = 0.35 ± 0.03%) (see Table 2 and Figure 2).
We also found differences in the topographical distribution of the 2 Hz signal for the different emotion conditions (Lehmann & Skrandies, 1984; Michel & Murray, 2012; M. M. Murray, Brunet, & Michel, 2008). While there was a significant dissimilarity between 2 Hz topographies for regularly presented neutral and pleasant images (Global Map Dissimilarity, GMD = 0.638, p < .001) as well as neutral and unpleasant images (GMD = 0.587, p = .033), there was no significant dissimilarity between regularly presented unpleasant and pleasant images (GMD = 0.357, p = .317). These results point towards statistically similar topographic representations of the 2 Hz signal when emotional images are presented regularly, which is different from that of regularly presented neutral images (see Figure 3A).
This pattern was mirrored in the source estimates of the 2 Hz signals for the different emotional conditions (Friston et al., 2008; Litvak et al., 2011). In two clusters, source power estimates differed significantly as a function of emotional content of the regularly presented images (pvoxel < .001,pcluster < .05, whole-brain FWE-corrected; see Figure 3B). In one cluster, comprising of areas in right inferior temporal gyrus and fusiform gyrus, source power estimates were highest when neutral images were presented regularly while lower for emotional images (MNI coordinates of peak voxel: x = 52,y = –44, z = –26,Fpeakvoxel = 39.86, 704 voxels in cluster). Source power estimates in a second cluster – including more anterior areas in right fusiform and right inferior temporal gyrus, as well as right hippocampal regions and parts of the right amygdala (MNI coordinates of peak voxel: x = 38, y = –10,z = –30, Fpeakvoxel = 38.17, 534 voxels in cluster) – were higher when emotional images were presented regularly as compared to regularly presented neutral images.
Discussion
The results of the “perceptual stress test” presented here demonstrate that the human brain is able to extract, within 170 ms post-stimulus onset, diagnostic cues from complex naturalistic scenes. Importantly, we show that the extracted information is not limited to basic semantic features, e.g., the presence of living or non-living objects (Grill-Spector, Kushnir, Hendler, & Malach, 2000; Johnson & Olshausen, 2003; Liu, Agam, Madsen, & Kreiman, 2009; Thorpe, Fize, & Marlot, 1996; VanRullen & Thorpe, 2001a), but encompasses motivationally relevant cues such as emotional valence and arousal (Lang & Bradley, 2010). The continuous stream of rapidly changing stimuli is known to produce clear masking effects, thereby making visual information available only for the presentation time of each image (Alonso-Prieto et al., 2013; Bekhtereva & Müller, 2015; Retter & Rossion, 2016), which was still sufficient for the extraction of emotional cues even during continuous presentation over 7 seconds. Of note, our stimulation protocol was different from recent studies that used regular presentations of faces, body parts, or houses within a stream of natural objects (Jacques, Retter, & Rossion, 2016;Retter & Rossion, 2016; Rossion, Torfs, Jacques, & Liu-Shuang, 2015) or expressive faces in a stream of neutral faces (Dzhelyova, Jacques, & Rossion, 2017), in that it conveyed the regularity using the same stimulus type as the fillers, i.e., naturalistic scenes matched with respect to apparent contrast, subjective and objective complexity, and proportion of living/non- living objects (seeSupplementary Materials for details). This strategy allowed us to circumvent several caveats associated with the exclusive use of faces to elicit regularity in the visual system. For instance, faces are perceptually simpler (M. S. Keil, 2008; VanRullen, 2006), over-trained (Bukach, Gauthier, & Tarr, 2006; Gauthier & Nelson, 2001; Tanaka, 2001; Tarr & Gauthier, 2000), and more salient – because they typically convey important social signals (Calder & Young, 2005;Frith, 2009; Said, Haxby, & Todorov, 2011) – compared to non-face objects or naturalistic scenes. Even more relevant for the current study, emotional cue extraction occurs earlier for faces than scenes (Bekhtereva, Craddock, & Müller, 2015) and may be subserved by partially segregated neural circuits (Britton, Taylor, Sudheimer, & Liberzon, 2006; Haxby, Hoffman, & Gobbini, 2000). Finally, this study complements existing literature suggesting rapid extraction of emotional information after short stimulus presentation (e.g., Codispoti et al., 2009) by focusing on early visual cortex activity instead of late electrophysiological components, which are the by-product of several perceptual and cognitive processes (Hajcak, MacNamara, & Olvet, 2010).
Pleasant information is prioritized in early visual cortex
Nonparametric and Bayes factor analyses converged to show a robust 2 Hz SSVEP response to the regularly presented neutral, unpleasant, and pleasant pictures, but not their scrambled versions. This result excludes the possibility that our 2 Hz response for original scenes could solely be ascribed to the rapid extraction of low-level features such as spatial frequency, color, or contrast, which were matched between original and scrambled pictures.
Crucially, our paradigm highlighted robust amplitude differences between emotional conditions, with pleasant scenes eliciting larger 2 Hz amplitude compared to unpleasant and neutral pictures. These findings are consistent with a wealth of studies showing privileged processing of emotional visual stimuli (Carretié, 2014; Frijda, 2016; Lang & Bradley, 2010; Pessoa, 2008; Pourtois, Schettino, & Vuilleumier, 2013; Vuilleumier, 2005). Surprisingly, 2 Hz activity was specifically enhanced for pleasant scenes, suggesting preferential attentional capture for intrinsically hedonic stimuli. One plausible post-hoc explanation could be that, during such an effortless task (passive viewing) that would eventually lead to a reward (i.e., monetary compensation for participation), observers would be able to engage their cognitive resources towards a thorough exploration of the RSVP streams and focus on the rewarding information conveyed by pleasant scenes. Thispositivity offset, as termed within the Evaluative Space Model framework (Cacioppo, Gardner, & Berntson, 1997; Ito, Cacioppo, & Lang, 1998; Norris, Gollan, Berntson, & Cacioppo, 2010), refers to a stronger motivation to approach than avoid unfamiliar but non-threatening contexts. From an evolutionary perspective, this activation function enables organisms to explore novel environments, with the ultimate goal to find additional sources of nourishment and protection as well as occasions for mating and reproduction (Cacioppo & Berntson, 1994; Cacioppo & Gardner, 1999). Research has shown that positivity offset (as well as negativity bias) can be generalized across different kinds of stimuli (Norris, Larsen, Crawford, & Cacioppo, 2011), is temporally stable and trait-like consistent (Ito & Cacioppo, 2005), and can be used as a theoretical construct to interpret serotonergic function in healthy and clinical populations (Ashare, Norris, Wileyto, Cacioppo, & Strasser, 2013;Carver, Johnson, & Joormann, 2009; Gollan et al., 2016).
During our task, positivity offset may have outweighed negativity bias because of the absence of proximal environmental danger which, in turn, promoted exploration (Ito & Cacioppo, 2005;Schettino, Loeys, Bossi, & Pourtois, 2012; Schettino, Loeys, Delplanque, & Pourtois, 2011; Schettino, Loeys, & Pourtois, 2013). Furthermore, participants were probably inclined to thoroughly examine images carrying intrinsically rewarding information also because these pictures likely matched their motivational dispositions (Byrne & Eysenck, 1993;Niedenthal, Halberstadt, & Setterlund, 1997; Niedenthal & Setterlund, 1994; Raila, Scholl, & Gruber, 2015). Broadly speaking, positivity offset processes could underpin attentional biases towards pleasant stimuli frequently observed in healthy people (Pool, Brosch, Delplanque, & Sander, 2016; Sennwald et al., 2016). Intriguingly, positive stimuli seem to have their major impact during the initial stages of visual processing (Pool et al., 2016), when attentional shifts are more likely to be driven by stimulus properties rather than top-down goals (Theeuwes, 1994). Thus, pleasant images might show strongest attentional capture – and, consequently, largest electrophysiological effects – when briefly presented, like in our RSVP streams.
As we have shown in previous work (Bekhtereva & Müller, 2015), the modulation of SSVEPs for emotional as compared to neutral, rapidly presented visual images requires sufficient processing time for emotional cue extraction in the time range of the EPN (Junghöfer et al., 2001; Schupp, Flaisch, Stockburger, & Junghöfer, 2006), thus potentially linking SSVEP modulations and the EPN. Intriguingly, the EPN seems to be modulated by stimulus valence in a similar manner, showing a larger deflection for pleasant as compared to unpleasant stimuli (Schupp, Junghöfer, Weike, & Hamm, 2004). Only recently we could show that the SSVEP amplitude modulation by the emotional content of the image stream differed for various stimulation frequencies and that these differences may be explained by a model that posits the SSVEP as a superposition of ERPs (Bekhtereva, Pritschmann, Keil, & Müller, 2018). It is thus tempting, but highly speculative, to link the modulations of SSVEP found here to known ERP modulations of the EPN by the emotional picture content.
Is the 2 Hz response a consequence of stimulus predictability?
Given the regular presentation of pictures of the same valence category, one could wonder whether the observed 2 Hz SSVEP response is a consequence of predictability rather than perceptual categorization. Indeed, recent studies have shown that temporal expectations could aid visual facilitation of regularly presented items (Breska & Deouell, 2014; Cravo, Haddad, Claessens, & Baldo, 2013), and that perceptual expectations may influence recognition processes at different stages (Carlson, Grol, & Verstraten, 2006; Ploran, Tremel, Nelson, & Wheeler, 2011; Summerfield & de Lange, 2014; Summerfield & Egner, 2009) depending on the emotional connotation of the stimuli (Barrett & Bar, 2009). Temporal expectations might indeed have had an impact on previous studies (Peyk, Schupp, Keil, Elbert, & Junghöfer, 2009) reporting emotional cue extraction up to 12 Hz (e.g., 83 ms per image), because of the presentation of simple and predictable stimulus sequences (e.g., pleasant – neutral – pleasant – neutral – …). Nonetheless, we consider the influence of temporal and perceptual expectations highly unlikely in our paradigm, for several reasons. First, in the regular conditions the pseudorandom stimulus presentation created completely unpredictable sequences (see Results section). Second, the regularity was always elicited by different pictures, which had little in common except emotional valence. Third, participants were not aware of any regularities in the RSVP stream, as post-experiment verbal reports confirmed. Fourth, a recent paper (Quek & Rossion, 2017) provided additional evidence that category-specific responses in RSVP streams are unlikely to be solely influenced by temporal expectations.
Distinct patterns of cortical activation for emotional vs. neutral content
The reported electrophysiological responses to emotional and neutral scenes also differed in the topographical distribution and estimated cortical sources of the 2 Hz SSVEP signal. Neutral images elicited higher activity in right inferotemporal and fusiform gyri, whereas more anterior portions of the right fusiform and temporal gyri as well as hippocampal regions responded more strongly to emotional scenes. Of note, no statistical differences were observed between pleasant and unpleasant scenes, presumably because these stimuli recruit the same neural circuits while modulating their intensity. Overall, these results point towards a general involvement of scene-selective cortical regions in the lateral occipital cortex (Grill-Spector, Kourtzi, & Kanwisher, 2001; Nasr et al., 2011). Emotional scenes seem to elicit enhanced activity in anterior occipitotemporal areas (Bradley et al., 2003; Sabatinelli et al., 2011) and even the amygdala, a subcortical structure classically implicated in emotion detection and recognition (Adolphs, 2002; LeDoux, 2007;Phelps & LeDoux, 2005; Pourtois et al., 2013; Vuilleumier, 2005). However, we prefer not to provide a strong interpretation of these findings, not only because of the exploratory nature of our analyses but, most importantly, due to the characteristics of the electrophysiological signal recorded on the scalp as well as the poor spatial resolution of source localization algorithms without individual co-registration (Belardinelli, Ortiz, Barnes, Noppeney, & Preissl, 2012; Grech et al., 2008; López, Litvak, Espinosa, Friston, & Barnes, 2014; López, Penny, Espinosa, & Barnes, 2012). Future studies using imaging techniques suited to precisely pinpoint activity in specific brain areas could provide more compelling evidence of the recruitment of deep brain structures in this experimental paradigm.
Conclusions
The present study significantly advances our knowledge of complex scene processing by showing that the human brain can rapidly extract motivationally relevant information within a perceptually challenging temporal succession of pictures. Despite limited visibility of each individual scene by forward and backward masking, emotional cues could quickly be extracted within the first 167 ms post-stimulus onset. Greater neural facilitation for pleasant compared to neutral and unpleasant images may be due to a positivity bias that could dominate under such rapid categorical demands in non-threatening contexts. Whether the presentation time of 167 ms is the lowest limit for semantic categorization of complex naturalistic scenes is the subject of future studies but, on the basis of current models of object processing (Fabre-Thorpe, Delorme, Marlot, & Thorpe, 2001; Grill-Spector et al., 2000; Johnson & Olshausen, 2003; VanRullen & Thorpe, 2001b) as well as other studies employing a similar paradigm (Alonso-Prieto et al., 2013; Bekhtereva & Müller, 2015), presentation frequencies around 6 Hz seem to be the upper bound for efficient semantic categorization.
Materials and Methods
Participants
Twenty-six Caucasian individuals (13M/13F, 25 right-handed; median age 24 years, range 19–38) were recruited from the student population of the University of Leipzig and among the general public. An equal number of male and female participants was planned to avoid possible gender-specific differences in emotion reaction and evaluation (Lithari et al., 2010; Proverbio, Adorni, Zani, & Trestianu, 2009; Sabatinelli, Flaisch, Bradley, Fitzsimmons, & Lang, 2004). All volunteers were German speaking, had normal or corrected-to-normal vision, and reported no history of neurological or psychiatric disorders. This sample size was chosen based on available time and economic resources (no statistical a priori power analysis was conducted).
The experimental protocol was approved by the ethics committee of the University of Leipzig (ethical approval #415/17-ek). The study was conducted in accordance with the guidelines of the ethics committee of the University of Leipzig and the Code of Ethics of the World Medical Association. All volunteers gave written informed consent prior to participation. At the end of the experiment, they were fully debriefed and received 12 € (6 € per hour).
Stimuli
One-hundred and fifty pictures were selected from the IAPS (Lang, Bradley, & Cuthbert, 2008) and EmoPics (Wessa et al., 2010) databases, equally divided into three emotion categories – neutral, unpleasant, and pleasant – according to their normative valence and arousal ratings. These stimuli were resized to 419 × 314 pixels (to minimize eye movements) and comparable with respect to a number of low-level visual properties (for a complete description, see the Supplementary Materials). A separate set of scrambled images was additionally created: each original picture was modified by applying a spatial discrete Fourier transform to each RGB-color channel, replacing the phase spectrum with random values, and reconstructing the image by applying an inverse Fourier transform. This procedure disrupts picture content while keeping color, contrast, and spatial frequency content intact (Hindi Attar, Andersen, & Müller, 2010; Hindi Attar & Müller, 2012; M. M. Müller, Andersen, & Hindi Attar, 2011; Schettino, Keil, Porcu, & Müller, 2016). In sum, a total of 300 pictures were used, 50 for each stimulus type: (i) original neutral; (ii) original pleasant; (iii) original unpleasant; (iv) scrambled neutral; (v) scrambled pleasant; (vi) scrambled unpleasant. An example of each stimulus type is provided in Figure 1.
Procedure
Upon arrival at the laboratory, participants signed the informed consent, had EEG sensors placed on the scalp, and were seated in a dimly lit Faraday cage at approximately 80 cm from a 19” CRT monitor (Samsung Samtron 98PDF (L) L, 1024 × 768 pixels screen resolution, 16-bit color, 60 Hz refresh rate) connected to a PC running Matlab v7.5.0 (The Mathworks, Inc, Natick, MA) and the Cogent toolbox (v1.32; http://www.vislab.ucl.ac.uk/cogent.php).
The experimenter ensured that the task was understood by providing verbal and written instructions. After a practice session with 14 trial pictures (not included in the experimental picture set), the main experiment started. On each trial, a central white fixation cross (1° × 1° degrees of visual angle) was presented on a gray background for 250 ms. The stimuli (10.5° × 7.9°) were subsequently presented in the center of the screen – time-locked to the refresh rate of the monitor – as an RSVP stream flickering at 6 Hz (i.e., ~167 ms on-presentation time per image). Participants were asked to simply focus on the content of each picture. In the main experimental conditions, a neutral, pleasant, or unpleasant picture (regular neutral, regular pleasant, and regular unpleasant conditions, respectively) was regularly presented once every 3 stimuli during the stream, i.e., at 2 Hz. The second and third images of each image triplet was randomly drawn from the two remaining emotional content categories, so that for each triplet each emotional content category was presented once. Please note that, to create such regularity, we never used the same picture more than once within a stream: every 2 Hz cycle contained a different scene, and the only common feature of all images within the other cycles was the emotional content. To verify that the hypothesized enhancement of the electrophysiological signal was not due to the influence of low-level features (e.g., color, contrast, or spatial frequency content), the same regular presentation scheme was employed for scrambled images (scrambled regular neutral, scrambled regular pleasant, and scrambled regular unpleasant conditions). In the last condition (irregular), the original pictures were presented randomly, i.e., no regularity at 2 Hz was imposed (see Figure 1 for examples). Each trial lasted 7,000 ms, with 42 pictures shown at the driving frequency of 6 Hz and 14 presentations at 2 Hz in the regular conditions. During the inter-trial interval, randomly varying between 1,500 and 2,000 ms, a white ‘X’ signaled that participants could blink. The experiment consisted of 490 trials subdivided in 14 blocks (i.e., 35 trials per block), with 70 trials per condition.
To ensure that our pre-selected pictures were processed by participants in accordance with our categorization, valence and arousal ratings were collected at the end of the main task via the Self-Assessment-Manikin (Bradley & Lang, 1994), ranging from 1 (low arousal – unpleasant valence) to 9 (high arousal – pleasant valence) (see results in the Supplementary Materials). Responses were given on the numeric pad of a standard QWERTZ keyboard connected via USB.
EEG recording and pre-processing
Electroencephalographic activity (EEG) was recorded with an ActiveTwo amplifier (BioSemi, Inc., The Netherlands) at a sampling rate of 512 Hz. Sixty-four Ag/AgCl electrodes were fitted into an elastic cap, following the international 10/10 system (Oostenveld & Praamstra, 2001) except electrodes T7 and T8, which were moved in position I1 and I2 to increase spatial resolution at occipital sites. The common mode sense (CMS) active electrode and the driven right leg (DRL) passive electrode were used as reference and ground electrodes, respectively. Horizontal and vertical electrooculogram (EOGs) were monitored using four facial bipolar electrodes placed on the outer canthi of each eye and in the inferior and superior areas of the left orbit.
Data preprocessing was performed offline with custom MATLAB scripts and functions included in EEGLAB v14.0.0b (Delorme & Makeig, 2004), FASTER v1.2.3b (Nolan, Whelan, & Reilly, 2010), and SPM12 (http://www.fil.ion.ucl.ac.uk/spm/software/spm12/) toolboxes. After subtracting the mean value of the waveform (DC offset), the continuous EEG data were epoched between 0 and 7,000 ms, corresponding to the beginning and end of the picture streams, respectively. After referencing to Cz, FASTER functions were used to label data exceeding a z-score of ±3 standard deviations as contaminated by artifacts (for details, see our script at https://osf.io/vb8a4/). Noisy channels were interpolated via a spherical spline procedure (Perrin, Pernier, Bertrand, & Echallier, 1989). Epochs containing artifacts and/or more than 12 interpolated channels were discarded. After preprocessing, the average number of interpolated channels was 3.12 (SD = 1.53, range 0–7) and the mean percentage of rejected epochs was 5.81% (SD = 3.81, range 1.02–15.51). After re-referencing to the average amplitude of all scalp electrodes, 7 grand-averages were computed: (i) irregular (number of averaged trials: M = 65.54, SD = 2.77); (ii) regular neutral (M = 64.85, SD = 2.77); (iii) regular pleasant (M = 65.69, SD = 2.38); (iv) regular unpleasant (M = 65.92, SD = 2.68); (v) scrambled regular neutral (M = 65.08, SD = 3.07); (vi) scrambled regular pleasant (M = 65.65,SD = 2.48); (vii) scrambled regular unpleasant (M = 66.23, SD = 2.52).
Confirmatory analysis: Amplitude of SSVEP signal
Amplitude at 6 Hz was analyzed only to confirm that the main stimulation frequency elicited a robust entrained signal in the brain. The results can be found in the Supplementary Materials.
With respect to 2 Hz, electrodes with maximum SSVEP amplitudes were identified by calculating isocontour voltage maps based on grand-averaged data collapsed across all conditions. As shown in Figure 2, activity was mainly localized at right occipito-temporal channels (e.g., PO10, PO8, P8). To account for inter-individual variations in topographical SSVEP amplitude distributions, we identified and averaged activity from the four electrodes displaying, for each participant, the largest frequency-specific amplitude. After removing linear trends, we extracted SSVEP amplitude at 2 Hz from each individual electrode cluster, separately for each condition (averaged across trials). Fast Fourier Transforms on the EEG signal in a time window from 500 ms (to exclude the typically strong phasic visual evoked response to picture onset) to 7,000 ms after stimulus onset was applied, and amplitudes were obtained by extracting the absolute values of the resulting complex Fourier coefficients.
Given that the data in some conditions violated the assumption of normality (verified via Shapiro-Wilk and Anderson-Darling tests), we compared SSVEP amplitudes across conditions using robust nonparametric methods (Field et al., 2012; Field & Wilcox, 2017). When comparing more than two conditions, we employed repeated measures ANOVAs on 20% trimmed means (rbANOVARM), accompanied by an explanatory measure of effect size ξ and its respective 95% bootstrapped confidence intervals (5,000 samples). One-sample and paired-sample comparisons were conducted via Wilcoxon signed-rank test, complemented by bootstrapped Hedges’ g (and respective 95% confidence intervals) as a measure of effect size (Fritz, Morris, & Richler, 2012; Lakens, 2013). The significance level for all tests was set atp = .05, corrected for multiple comparisons via Bonferroni-Holm procedure (Holm, 1979).
Given the problems inherent in accepting the null hypothesis with classical frequentist procedures (Kruschke, 2010;Morey, Romeijn, & Rouder, 2016;Rouder, Morey, Verhagen, Province, & Wagenmakers, 2016; Wagenmakers, 2007), we additionally calculated Bayes Factors (Jeffreys, 1961; Kass & Raftery, 1995). With Bayesian ANOVAs, Bayes Factors (BF10) were estimated – using Markov-chain Monte Carlo sampling (100,000 iterations) – to quantify the evidence in favor of each model of interest relative to the null model (Rouder, Engelhardt, et al., 2016; Rouder et al., 2012, 2017). With respect to Bayesian t-tests, BF10 were calculated to estimate the degree of evidence in favor of a model assuming differences between two specified conditions relative to a model assuming no differences (Rouder et al., 2009). For all analyses, the null hypothesis was specified as a point-null prior placed on standardized effect size (i.e., δ = 0) whereas, for the alternative hypothesis, Jeffrey-Zellner-Siow (JZS) priors were used, i.e., a folded Cauchy distribution centered around δ = 0 with various scaling factors (r = 1, r = 0.707, r = 0.5). This sensitivity analysis is useful to verify the robustness of the results regardless of changes in the prior (Schönbrodt, Wagenmakers, Zehetleitner, & Perugini, 2017). The BF10 with JSZ prior Cauchy(0, 0.707) is reported in the main text, but the reader is referred to the corresponding tables for details.
Exploratory analysis: Topographic distribution of SSVEP signal
To quantify the observed differences in the topographic distribution of SSVEP amplitude values between conditions of interest, we used the Global Map Dissimilarity index (GMD). Mean topographic distributions for each condition were first normalized by the Global Field Power (GFP), a time-dependent measure of the electric field strength that reflects the amount of synchronized activity across all electrodes (Michel & Murray, 2012). The dissimilarity between any two normalized topographies was then calculated as the square root of the averaged differences in amplitudes between these two topographies at each channel (Lehmann & Skrandies, 1984; M. M. Murray et al., 2008). The resulting GMD – ranging from 0 (topographic homogeneity) to 2 (maximal topographic inversion) – was then statistically evaluated with a permutation approach. By comparing the actual GMD with the distribution of permutation-based chance GMDs, a Monte Carlo basedp-value was calculated. Specifically, topographies across the two compared conditions were treated as if they belonged to the same distribution. The resulting p-value represents the likelihood of getting a GMD as large as the one observed purely by chance (i.e., by randomly assigning the condition labels to the topographies). For ap-value < .05, one would assume the observed GMD to be due to the fact that topographies for the conditions of interest belong to two different distributions rather than one, i.e., to be significantly different from each other.
Exploratory analysis: Source localization of SSVEP signal
Sources of SSVEPs were modelled using source reconstruction algorithms implemented in SPM12. A standardized forward model was constructed using a template cortical mesh with 8,196 vertices co-registered to standard EEG positions. The lead field for the forward model was computed using the three-shell BEM EEG head model implemented in SPM12. The preprocessed EEG data were filtered with a 1.5–2.5 Hz Hamming windowed-sinc FIR filter (filter order 4096), and trials were averaged for each condition. Sources were estimated using multiple sparse priors (Friston et al., 2008). For each participant and condition, three-dimensional source power estimates were extracted. These source power images could then be statistically analyzed on a group level using conventional SPMt-tests and regression statistics. Statistical parametric maps were thresholded at p < .001 on a voxel level and corrected on the cluster level using random field theory (pcluster < .05, whole-brain FWE corrected).
To estimate potential source power differences for the processing of images of different emotional quality, the source power images for the 2 Hz signal of the regularly presented original images were tested with a one-way repeated measures ANOVA (ANOVARM) with factor emotion (neutral, pleasant, unpleasant). The model comprised only regular conditions because 2 Hz signals for these conditions were different from noise (seeResults section). For an estimation of the effects, the averaged beta coefficients of the effects of interest from the ANOVARM model were extracted for each significant cluster with the help of the MarsBaR toolbox (Brett, Anton, Valabregue, & Poline, 2002). To relate results to cytoarchitectonic references, the SPM anatomy toolbox (Eickhoff et al., 2005) was used whenever possible and images for publication were created using MRIcron (Rorden & Brett, 2000).
Software
Data visualization and statistical analyses were performed using R v3.4.3 (R Core Team, 2017) viaRStudio v1.1.419 (RStudio Team, 2015). We used the following packages (and their respective dependencies):
data manipulation: tidyverse v1.2.1 (Wickham, 2017b), magrittr v1.5 (Bache & Wickham, 2014),broom v0.4.3 (Robinson, 2017);
statistical analyses: Rmisc v1.5 (Hope, 2013), nortest 1.0-4 (Gross & Ligges, 2015),WRS2 0.9-2 (Mair, Schönbrodt, & Wilcox, 2017), bootES v1.2 (Gerlanc & Kirby, 2015),BayesFactor v0.9.12-2 (Morey & Rouder, 2015), userfriendlyscience v0.7 (Peters, 2017);
visualization: ggplot2 v2.2.1 (Wickham, 2009), ggthemes v3.4 (Arnold, 2017), akima v0.6-2 (Akima & Gebhardt, 2016),scales v0.5 (Wickham, 2017a), mgcv v1.8-23 (Wood, 2017), gridExtra 2.3 (Auguie, 2017), yarrr v0.1.5 (Phillips, 2017);
report generation: pacman v0.4.6 (Rinker & Kurkiewicz, n.d.), knitr v1.19 (Xie, 2018),here v0.1 (K. Müller, 2017), kableExtra v0.7 (Zhu, 2018).
Amplitude spectra and EEG topographies were created by adapting the scripts by Dr. Matt Craddock (https://github.com/craddm/eegUtils). Topographic and source estimation analyses were carried out in Matlab v7.5.0 (The Mathworks, Inc, Natick, MA).
Data Accessibility Statement
Raw and pre-processed data, materials, and analysis scripts are available on https://osf.io/9dcsm/.
Acknowledgments
We would like to thank Renate Zahn for her help with data collection.
Funding Information
This work is supported by grants from Deutsche Forschungsgemeinschaft (MU972/22-1, MU972/22-1/2) and Ghent University (BOF14/PDO/123). The funding sources had no involvement in the study design; collection, analysis, and interpretation of data; writing of the report; and decision to submit the article for publication.
Competing Interests
The authors have no competing interests to declare.
Author Contributions
AS, CG, and MMM conceived the study. AS and CG programmed the experimental paradigm. AS and CG analyzed the data. AS, CG, and MMM contributed reagents/materials/tools. AS wrote the main manuscript text. AS, CG, and MMM reviewed and critically revised the manuscript.
Peer Review Comments
The author(s) of this paper chose the Open Review option, and the peer review comments are available at: http://doi.org/10.1525/collabra.226.pr