Observations of behavior are of central importance in social-personality psychology. A major downside of such observations, however, is that they are highly costly in terms of the required logistics, time, financial resources, and effort. The current research tested whether accurate behavioral observations could also be gained in a far more economical way, by analyzing behavior during online video calls, rather than in the laboratory. It did so by analyzing whether classic findings reported from laboratory studies on accurate personality impressions and on the link between personality and behavior could be replicated when behavior is observed during online video calls. Participants first completed an online survey, which included assessments of various personality traits (Big Five, narcissism, self-esteem, motives) and intelligence. They were then interviewed online, and these interactions were recorded and then coded by external observers. In total, the pattern of results and the effect sizes matched very well with previous findings using offline settings in laboratories. Most personality traits could be accurately detected by observers (mean r = .29) and correlated with the predicted behaviors (mean r = .22). Thus, behavior observations based on recordings of online video calls can yield valid results, indicating that online environments offer a suitable assessment context for behavior measurement.

In everyday life, people routinely try to infer people’s personality from their behavior. For example, we can’t help, but spontaneously judge whether someone is nice, trustworthy, ruthless, or malicious. Such personality impressions form very quickly and are highly consequential for decisions in the social world, such as whom to cooperate or befriend with, whom to fear, or whom to avoid altogether (Todorov, 2017). A large body of evidence indicates that even at minimal acquaintance, people are surprisingly good at estimating others’ personality (e.g., Back & Nestler, 2016; Borkenau et al., 2004; Connelly & Ones, 2010; Connolly et al., 2007; Letzring & Spain, 2021; Vazire & Carlson, 2010; Wu et al., 2023).

The Accuracy of Personality Judgments

In accuracy research, observer-ratings are compared to some kind of accuracy criterion. When the accuracy of personality impressions is studied, the most often used criterion with which observer-ratings are compared to are target persons’ personality self-reports (“self-other agreement”; Funder, 2012) and this was also the case in the current research. On the side of the observers, two approaches can be distinguished: aggregate-observer accuracy (also called “group” or “pooled” accuracy) and single-observer accuracy (e.g., Back et al., 2010; Bernieri, 2001; Carney et al., 2007). Aggregate-observer accuracy represents the level of accuracy gained by a group of raters (in which the idiosyncrasies of single raters have been averaged out; Block, 1961), whereas single-observer accuracy reflects how accurate a single, “typical” observer would be (Naumann et al., 2009). An advantage of the aggregate-observer approach is that due to the aggregation principle (Epstein, 1983) the reliability of the ratings is higher, which is why we used this approach in the current research.

How can personality be accurately inferred even by strangers? This process is described by the Realistic Accuracy Model (RAM; Funder, 1995), which builds upon Brunswik’s (1956) lens model. Both models suggest that personality aspects have to manifest in observable behaviors (i.e., cues), which must then be detected and utilized by observers to derive personality judgments. The RAM describes further conditions that must be met in or order for accurate personality judgements to occur. That is, the situational context must have some relevance with regard to the trait that is being judged, the trait-related behavioral cues must be available to the observers and the observers must detect and correctly interpret the behavioral cues. If all conditions are met, the total degree of accuracy is thought to depend on how easy to judge a target person is (“good target”), how visible an attribute is (“good trait”), how much and what kind of information is provided (“good information”) and how good observers are at rating others (“good judge”; Funder, 2012).

Laboratory studies have identified several traits that can be accurately detected by strangers (e.g., Carney et al., 2007; Letzring et al., 2021; Letzring & Spain, 2021; Vazire & Carlson, 2010), including the Big Five (i.e., extraversion, conscientiousness, openness, emotional stability and agreeableness; Connelly & Ones, 2010; Connolly et al., 2007), self-esteem (Hirschmüller et al., 2018; Naumann et al., 2009), and motive dispositions (i.e., the achievement, affiliation and power motives; Bassler et al., 2023; Dufner & Krause, 2023). However, researchers suggest that some traits are easier to detect than others (for an overview, see Connelly & Ones, 2010). Extraversion is considered a “good trait” due to its visibility. Highly visible traits involve tendencies that are externally expressed (mainly social behavior), making them easier to perceive (Zillig et al., 2002). On the contrary, traits low in visibility, such as emotional stability or openness to experience, involve tendencies that are more internal (e.g., thoughts or affective states), requiring additional information from the target to be detectable by others. Therefore, emotional stability is often challenging to accurately judge by unacquainted observers (Hirschmüller et al., 2015). Furthermore, highly evaluative traits, such as agreeableness (John & Robins, 1993), are often hard to detect, because targets may employ various self-presentation strategies to create a positive impression of themselves.

Personality and Behavior

Other research has exclusively focused on the association between personality and social behavior. Two traits whose behavioral correlates have been well-studied are intelligence and narcissism. Studies have linked intelligence to paralinguistic cues such as fluent speaking, a pleasant speech style, and a faster speech rate (Borkenau & Liebler, 1993, 1995; Breil et al., 2021; Murphy et al., 2003; Reynolds & Gifford, 2001). Results on the behavioral correlates of grandiose narcissism indicate that admiration, which represents the assertive self-enhancement aspect of narcissism, goes along with agentic behavior (e.g., self-assuredness, activity level), whereas rivalry, which represents the antagonistic self-protection aspects of grandiose narcissism, goes along with attenuated communal behavior (e.g., less warmth and friendliness; Back et al., 2013).

Challenges of Conventional Research Designs Studying Personality Impressions or the Link between Personality and Behavior

In order for personality impressions to be gathered or for behavior to be coded, participants typically have to attend laboratory sessions, where they perform tasks or interact socially (Borkenau & Liebler, 1995). Typically, participants’ behavior is videotaped, and the recordings are then judged by unacquainted observers (Bakeman & Quera, 2012; Borkenau & Liebler, 1995).

A major disadvantage of this approach, however, is that it is rather inefficient, time-consuming, and costly (Furr & Funder, 2007). Whereas in many research domains, online studies are widespread (Gosling & Mason, 2015), studies on personality impressions and on the relations between personality and behavior still require a physical laboratory that can be used for at least several weeks and that is equipped with video cameras. Furthermore, visiting a laboratory is burdensome for the research participants, which makes it hard to gather sufficiently large samples and potentially leads to biased samples. After all, laboratory assessments are limited in reach and inclusion (Perry et al., 2021), as visiting a laboratory can, for example, be troublesome for persons with impaired mobility.

Online Settings as an Alternative to Laboratory Research

Over the last few years, digitalization − the use of digital technology for data collection, generation, and analysis to create value and enable innovation (Cappa et al., 2021; Goduscheit & Faullant, 2018) − has progressed rapidly. At least since the COVID-19 pandemic, online formats for chatting via text messages, talking, or meeting up with other people are finding their way into private life as well as the professional world (Nguyen et al., 2020). Digitalization also affects almost every scientific discipline including psychological research. Digital solutions can be found for data collection (e.g., online surveys, real-time experience sampling, assessment of behavioral parameters in virtual interaction games), algorithms for data analysis, and more interactive applications with human-machine communication (Ostermann et al., 2021). Digitalization adds to the toolbox for psychological research and opens up novel avenues for testing theories and concepts (Kende, 2014), including those referring to personality judgments and behavior observations (Letzring et al., 2021). Moreover, it offers great potential for interdisciplinary collaboration and points the way for a more inclusive, reliable and generalizable psychological science (Moshontz et al., 2018). Given the greater acceptance and easy use of online formats, a potential solution for the problems of conventional laboratory studies could be to study personality impressions and behavior based on online interactions.

Some initial research has already started to explore the chances of online behavioral observations. Wang and Chanel (2021) provided first evidence indicating that it might be possible to infer the target persons’ warmth and competence with some degree of accuracy after having seen them interact in a video call. In a relationship study, Perry et al. (2021) observed couples’ behavior during a video call and found that the online observed behavioral reactions were very similar to those observed in prior studies conducted in the laboratory. Diana et al. (2023) found that mimicry of multiple behaviors and perceptions of trustworthiness did not differ systematically when they were assessed in face-to-face, video call, and pre-recorded video interactions. Finally, in a study by Tissera et al. (2023) participants interacted via video calls versus face-to-face and then, meta-perceptions (i.e., estimates about how they were seen by the interaction partners) were gathered. The results indicated that the accuracy of these meta-perceptions was similar across the two formats.

Taken together, these studies indicate that when behavior is observed during video call interactions, results might be similar to cases when behavior is observed during face-to-face interactions with regard to personality impressions and to links between personality and behavior. However, no research has thus far tested systematically and comprehensively whether classic results from the literature on the accuracy of personality impressions and the links between personality and behavior would also emerge when assessments are made based on online interactions. The current research was aimed at filling this gap. If the pattern of results should be comparable to offline investigations, the use of online interactions might become standard practice in the field of personality impression research.

Aims of the Study

Personality judgments and behavior observations are important to study for both theoretical and practical reasons (Funder & West, 1993). With regard to theory, none of the discussed hypotheses concerning the processes linking personality and social evaluations and the role of moderating factors can be tested without assessments of behavior and observer evaluations. This also includes theoretical claims concerning the interplay between the person and the situation in generation of social behavior (Furr & Funder, 2018). With regard to practice, personality judgements that are based on behavior observations play an important role in settings such as personnel selection or relationship initiation − which also take place increasingly online.

Research involving personality assessments on the one side and personality impressions or behavior observations on the other is flourishing. Yet, a major impairment is the necessity for laboratory designs. An online approach that allows for an economic and efficient assessment of personality impressions and behavior observations would be highly beneficial. The aim of the current research was to test whether such an approach is viable. To be clear, the aim was neither to shed novel light on theoretical questions dealing with accuracy or the link between personality and behavior nor to introduce a novel method for the assessment of personality. Instead, we aimed to examine whether valid behavior observations can be made in the context of online video calls.

Precisely, the study tested whether results from prominent offline studies on the accuracy of personality impressions and on the link between personality and behavior can be replicated in an online video call format. Video recordings of online calls are highly similar to the ones that stem from typical interactions in the laboratory. In laboratory research, participants are often filmed from the front and at least the face and upper body are visible, so that facial expressions and gestures can be coded well. Often, the experimenter is seen from the back or not at all. The angle is therefore very similar to the view on online video calls. Furthermore, both laboratory and online interactions typically occur in real-time and synchronously, enabling the largely automatic, moment-to-moment tuning of emotional signals (Diana et al., 2023). It therefore seemed likely that results based on observations during online calls would be similar to the ones reported for laboratory studies.

As to-be-replicated findings, we selected relatively robust results on either the accuracy of personality impressions or the relation of a given trait with a specific behavioral outcome. With regard to the accuracy of personality judgments, we focused on the following traits, which can all be inferred with above chance-level accuracy in newly met others in laboratory settings: the Big Five (extraversion, neuroticism, conscientiousness, openness, agreeableness; Connelly & Ones, 2010; Connelly et al., 2007), self-esteem (Hirschmüller et al., 2018; Naumann et al., 2009), motive dispositions (Bassler et al., 2023), intelligence (Murphy et al., 2003; Reynolds & Gifford, 2001) and narcissism (Giacomin & Jordan, 2019; Lukowitsky & Pincus, 2013). With regard to the links between personality and observed behavior, we examined cues that have been linked to the respective trait in previous studies that had a very similar design like ours, but were conducted offline. We hypothesized that intelligence (general and verbal) would be related to acoustic cues such as way of speaking (i.e., clearer, more fluent, less hectic; Borkenau & Liebler, 1995), that narcissistic admiration would be positively related to agentic behavior (Back et al., 2013), that narcissistic rivalry would be negatively related to communal behavior (Back et al., 2013) and that vulnerable narcissism, which is characterized by feeling of self-importance and entitlement paired with feelings of insecurity and inferiority (Miller et al., 2017), would be positively linked with nervous behavior.

A second aim of the current study was to develop a paradigm that could be used in future research on personality perception. The consistent usage of a specific interactive paradigm is a way to ensure greater comparability of research findings, which is crucial for the cumulative sciences (Allen et al., 2017). In some research fields, such standard paradigms exist, such as the Trier Social Stress Test (TSST; Kirschbaum et al., 1993) in the field of stress research, the Cyberball paradigm (Williams & Jarvis, 2006) in research on social exclusion or the noise blaster paradigm (McCullagh et al., 1994) in aggression research. In order for a trait to manifest itself in a specific behavior, the situation needs to be trait-relevant (Funder, 1995; Hirschmüller et al., 2015; Tett & Guterman, 2000). For example, intelligence will only be linked to specific cues if the context is somewhat cognitively demanding. Accordingly, we implemented an interactive paradigm that included different questions and tasks that we considered relevant for intelligence and different aspects of personality. The aim was to add so much variety to the situation that as many traits as possible (extraversion, self-esteem, motive dispositions, etc.) would be made visible in social behavior. We reasoned that if the results should turn out as expected, our procedure could serve as a standard paradigm in future studies on personality impressions and on the relation between personality and behavior.

Open Practices Statement

This study was not preregistered. Materials, participant data, and analysis scripts are available at the Open Science Framework: https://osf.io/8utzk/

Participants

The study took place between December 2022 and January 2023. The ethics commission of Witten/Herdecke University approved the study. For recruitment, an online platform hosted by the university and online social networks as well as an external panel provider were employed. Participants were required to be at least 18 years old. Participants from the panel provider received a monetary compensation (17,70€). Among the remaining participants, five 10€ Amazon vouchers were raffled off, and psychology students had the opportunity to receive course credit.

A total of 189 persons participated (99 female, 90 male) with a mean age1 of 38.34 years (SD = 15.08, Range: 18-74 years). 24.3% of the participants had a master’s degree, 18% a bachelor’s degree, 42.3% a high school degree, and 15.3% had a lower school diploma. One hundred and eighteen participants had been recruited via the panel provider. One hundred and twenty-eight participants used a laptop, 32 used a PC, 15 used a tablet and 12 participants used a smartphone to take part in the study.

Design

The study included an online survey in which participants completed intelligence tests and personality self-report questionnaires. (The exact order of the online survey can be found in Table S1 of the Supplemental Online Material). The survey was programmed using the software formr (Arslan et al., 2020) and took approximately 30 minutes. After participants had completed the online survey, they booked an appointment for an online interview. (If people did not book an appointment or failed to attend the scheduled appointment, their online survey data was not included. Apart from this criterion, no data have been excluded.) Five undergraduate students who were not acquainted with the participants conducted these online interviews using the online platform Zoom. The interviews lasted 5-6 minutes on average (M = 338 seconds, SD = 123 seconds) and each undergraduate student interviewed 38 participants on average.

Measures

Intelligence

General intelligence was assessed with the 16-item short form of the International Cognitive Ability Resource (ICAR16; Condon & Revelle, 2014), a brief measure of cognitive abilities comprising letter and number series, matrix reasoning, verbal reasoning, and three-dimensional rotation items. Each item offers seven answer options, including the possibility that none of the displayed options is correct. For each item, we coded whether it was correctly solved.

As an additional measure that taps specifically into verbal aspects of intelligence, we administered the German version of the Multiple-Choice Vocabulary Intelligence Test (MWT-B; Lehrl et al., 1995). In this test, participants are presented five letter strings, one of which represents a real word. Their task is to find out, which one the word is. A total of 37 items were presented in ascending difficulty. We again coded for each item whether it was correctly solved.

Personality

Narcissism. The German version of the Narcissistic Admiration and Rivalry Questionnaire (NARQ; Back et al., 2013) was employed to measure the two narcissism sub dimensions of grandiose narcissism, admiration and rivalry. The questionnaire consists of 18 items (1 = not agree at all to 6 = agree completely) of which 9 items refer to each dimension. Vulnerable narcissism was assessed by aggregating across the three dimensions Contingent Self-esteem, Hiding the Self and Devaluing of the German version (Morf et al., 2017) of the Pathological Narcissism Inventory (PNI; Pincus et al., 2009). The 26 items were rated from 1 = not at all like me to 5 = very much like me.

Big Five. To measure the Big Five personality traits, the German version (Rammstedt et al., 2018) of the short version of the Big Five Inventory-2 (BFI-2-S; Soto & John, 2017) was used. Thirty items were rated with a 5-point scale (1 = disagree strongly to 5 = agree strongly) with six items for each trait.

Self-Esteem. Self-esteem was assessed with the German version (von Collani & Herzberg, 2003) of the Rosenberg Self-Esteem Scale (RSES; Rosenberg, 1965), which is a well-established measure of global self-worth, and additionally with the multidimensional self-esteem scale (MSES; Schütz & Sellin, 2006), which measures different dimensions regarding self-worth. The MSES is an adapted German version of Fleming and Courtney’s (1984) multidimensional scale. For the current purposes, the general self-esteem score that is based on all items of the MSES was used. The RSES consists of 10 items (1 = strongly disagree to 4 = strongly agree), and the MSES consists of 32 items (1 = never to 7 = always).

Motive Dispositions. The short version of the Unified Motive Scale (USM-6; Schönbrodt & Gerstenberg, 2012) was employed to measure the affiliation, achievement and power motives. The scale consists of 30 items (0 = strongly disagree/not important to me to 5 = strongly agree/extremely important to me).

Online Interview

The online interviews were conducted via the video communication Zoom (Version 5.13.3 (11494); Zoom Video Communications, Inc.). The gallery view was used so that both the participant and interviewer appeared in equal size on the screen, had the same perspective and remained in the same position on the screen for the entire interview. Participants were instructed to position themselves directly in front of their camera, ensuring that only their face and shoulders were visible. Prior to the start of the interview, the interviewer made sure that all participants were seated in identical positions and that both the camera and microphone were functioning correctly.

The online interview consisted of nine questions and two short tasks. The online interview was standardized so that each interviewer asked the questions in a fixed order. At first, the interviewer gave a short introduction with some details about the interview:

“Now here is a short interview. I will ask you some questions. Please answer honestly, there are no right or wrong answers. During the interview, I will record the picture and sound. You don’t need to pay attention to this. When you are ready, we will start.”

Then, the interviewer asked nine personal questions (details see below). We adopted questions 1 to 8 from a study on self-esteem and self-confident behavior by Krause et al. (2016), because the set of questions performed very well in offering a trait-relevant situation for detecting self-esteem and they also seemed trait-relevant for other personality aspects we focused on, such as the Big Five, motive dispositions, or narcissism. The questions referred to different aspects of participants’ lives and were aimed at eliciting a wide range of different emotions and behaviors that might therefore be informative for a multitude of personality aspects. We added the question “How would you describe your hobbies?”, which might be relevant for traits such as openness to experience (Borkenau et al., 2004).

After the personal questions, participants engaged in the two tasks, which were primarily aimed at triggering intelligence-related behavior. The two tasks were (a) to explain the word “symmetry” and (b) to read out aloud a short weather forecast. These latter two tasks aimed at making behavioral cues visible that are linked to individual differences in intelligence. In a study by Borkenau et al. (2004), both tasks were diagnostic for measured intelligence rated by unacquainted observers. In order to explore participants affective experience, the experimenter asked each participant before and after the experiment to indicate their current mood on a scale from 1 = very bad to 5 = very good. Due to time constraints, the fact that mood effects were not central to our research question and previous findings indicating that the valence dimension of mood can be validly assessed with a single item (Weidman et al., 2017), we considered the single item measurement appropriate in our case. Data for one person was missing because the question about mood was forgotten to be asked.

Questions:

  1. “Please consider what was a major success in your life?”

  2. “When you look back on your life, do successes or failures predominate?”

  3. “Where do your talents lie?”

  4. “What would you like to change about yourself?”

  5. “What do you believe makes you unique?”

  6. “How would you rank yourself in comparison to your colleagues/fellow students/friends regarding your achievements and competencies?”

  7. “How do you deal with negative emotions?”

  8. “What makes a person particularly likable?” (pause) “Would you describe yourself as a likable person?”

  9. “How would you describe your hobbies?”

  10. “Explain the term ‘symmetry’.”

  11. “Please read this short text aloud”: On Friday, in the eastern half, friendly periods will develop after the dissipation of fog fields. Otherwise, it is often cloudy, but initially, it remains mostly dry. Rain moves in from the west from noon. 13 to 19 degrees. Overnight into Saturday, temperatures drop to 15 degrees in the Upper Rhine region and down to 9 degrees in the Sauerland and the Erzgebirge. On Saturday, temperatures rise to 14 degrees on the North Frisian coast and up to 20 degrees in the Upper Rhine region. (For the original German version, see Supplemental Online Material)

Rating Measures

To obtain ratings of participants’ behavior and of personality impressions, nine observers (four male and five female graduate students who were not acquainted with the target persons) watched the video recordings of the online interviews. The observers were instructed to watch each online interview while focusing solely on the participant (and ignoring the experimenter). Due to the relatively large number of ratings to be made, two independent groups of observers were formed. Group 1 (two male and two female observers) rated personality impressions (impressions of the Big Five, self-esteem, and motives), whereas Group 2 (two male and three female observers) rated the presumed behavioral correlates of intelligence and narcissism. To ensure that the behavioral assessments were reliable, each observer underwent a two-hour online observer training session along with the other observers of the specific group. At the beginning of the training session, all observers received general information about the rating procedure. Then, the two different groups were formed and each group received specific information about the to-be-rated constructs. The aim of these sessions was that the observers should develop a common understanding of their constructs. Subsequently, they rated two target persons and discussed their ratings afterwards (yet, no ratings were changed post-hoc). After the training, observers had one month to code all online interviews and were required to make their ratings independently of the other observers.

Personality impressions were assessed by using the same or slightly adapted versions of items that have been employed in offline zero-acquaintance studies on the accuracy of personality impressions. A complete list of all items can be found in Table 1. That is, impressions of the Big Five were assessed with the same items as in Borkenau et al. (2004), and impressions of motive dispositions were assessed with the items by Bassler et al., 2023 (also used by Dufner & Krause, 2023). Impressions of self-esteem were assessed with one item used by Krause et al. (2016), as molar-level assessments are suitable to capture behavioral self-confidence on a psychologically meaningful level (see Funder & Colvin, 1991). As potential behavioral correlates of intelligence, we picked paralinguistic cues such as easiness of understanding or fluent speaking that have been linked to intelligence in a laboratory study with a very similar design like ours (i.e., interview situation; Borkenau & Liebler, 1995). As indicators of agentic and communal behavior, which should be linked to narcissism, we used items from the Interpersonal Adjectives List (IAL; Jacobs & Scholl, 2005), as pilot laboratory research from our own lab indicated that behavior codings using the IAL are reliable and indeed show the expected links to the two narcissism dimensions (Schmaliy, 2020). All ratings were made on a 6-point Likert scale ranging from 1 = not at all to 6 = very much.

Table 1.
Items that were used by the Observers to Rate Personality Impressions and Behavior
DomainConstructRating item
Big Five Extraversion Gregarious 
  Aloof 
 Neuroticism Nervous 
  Composed 
 Conscientiousness Conscientious 
  Careless 
 Openness Witty 
  Unwitty 
 Agreeableness Friendly 
  Unfriendly 
Self-esteem  General impression of self-esteem 
Motives  Affiliation 
  Power 
  Achievement 
Intelligence  Self-assured appearance 
  Easy to understand (clear way of speaking) 
  Fluent way of speaking 
  Hectic way of speaking 
  Use of high-level language 
  Effort in reading 
  Reading time (in seconds) 
  Reading mistakes 
 Symmetry question Correct explanation 
  Efficient explanation 
Narcissism Agency Assertive 
  Self-assured 
  Shy 
  Insecure 
 Communion Empathy 
  Warm 
  Hostile 
  Ruthless 
DomainConstructRating item
Big Five Extraversion Gregarious 
  Aloof 
 Neuroticism Nervous 
  Composed 
 Conscientiousness Conscientious 
  Careless 
 Openness Witty 
  Unwitty 
 Agreeableness Friendly 
  Unfriendly 
Self-esteem  General impression of self-esteem 
Motives  Affiliation 
  Power 
  Achievement 
Intelligence  Self-assured appearance 
  Easy to understand (clear way of speaking) 
  Fluent way of speaking 
  Hectic way of speaking 
  Use of high-level language 
  Effort in reading 
  Reading time (in seconds) 
  Reading mistakes 
 Symmetry question Correct explanation 
  Efficient explanation 
Narcissism Agency Assertive 
  Self-assured 
  Shy 
  Insecure 
 Communion Empathy 
  Warm 
  Hostile 
  Ruthless 

Sample Size Calculation

Using the software G*Power (version 3.1.9.4; Faul et al., 2007), we determined the required sample size to attain sufficient statistical power for our main statistical tests (i.e., Pearson correlations between the self-report measures and the observer-ratings based on the recordings of the online interviews). We looked up correlation coefficients that have been reported in offline laboratory studies with a very similar design like ours on the accuracy of personality impressions as well as on the relation between personality and behavior (the single coefficients can be found in Table S2 and Table S3). We averaged the coefficients across samples using Fisher’s r-to-z transformation and then transformed the average back into a Pearson correlation. The average correlation across all relevant effect sizes was r = .28 (please note that the identical value would have been indicated by Vazire & Carlson’s 2010, meta-analytic effect size for associations between personality and behavior more generally). Based on this value, 95 participants would have been necessary to detect the effect with a likelihood of 80 percent (alpha = .05, two-tailed), which has been proposed as a sufficient benchmark for behavioral science (Cohen, 1992). However, we also considered the possibility that this effect size estimate might have been too optimistic, as some previous findings might have been inflated due to publication bias. We therefore aimed for a larger sample sizes. Our resources allowed us to test 189 participants, and with this sample size, the power to detect an effect of r = .28 was .98 and the power to detect an effect of r = .20 (which is considered “medium” in social-personality psychology; Funder & Ozer, 2019) was .80. We will adhere to the recommendations by Funder and Ozer (2019) and interpret effect sizes of r = .10 as small, of r = .20 as medium, of r = .30 as large and of r = .40 as very large.

Concerning the number of observers, an analysis of the interrater agreement in the past relevant studies shows an average ICC value of .36 (for details, see Tables S2 and S3). Given the available resources, we had the opportunity to employ groups of four and five observers, respectively (see above). Using the Spearman-Brown formula (de Vet et al., 2017), we estimated that four observers would lead to a mean ICC of .69 and that five observers would lead to a mean ICC of .74, which can be considered moderate to good (Koo & Li, 2016).

We only present the main results here, more detailed results (on observer-agreement, accuracy, and experimenter effects and the effects of the device used during the online conversations) can be found in the Supplemental Online Material (SOM).

Descriptive Statistics and Reliabilities

Descriptive statistics and reliabilities for the intelligence tasks and the self-report measures as well as their intercorrelations can be found in Table 2. As can be seen, the reliabilities of all measures were higher than .70 and thus satisfactory to good. Furthermore, different measures of the same overarching construct (i.e., the two measures of intelligence and the two self-esteem scores) were significantly positively correlated.

Table 2.
Descriptive Statistics, Reliabilities and Intercorrelations of and between Intelligence and Personality Self-Report Measures
Intercorrelations
Measures M (SD) α 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 
Intelligence                  
1. General (ICAR) 0.46 (0.23) .80 −               
2. Verbal (MWT-B) 0.81 (0.09) .70 .25*** −              
Narcissism                  
3. Admiration 2.97 (0.85) .85 -.01 -.05 −             
4. Rivalry 1.98 (0.86) .90 -.12 -.26*** .24** −            
5. PNI – Vulnerable 2.44 (0.98) .96 -.04 -.24** .00 .64*** −           
Big Five                  
6. Extraversion 3.33 (0.73) .76 .13 -.01 .42*** -.16* -.18* −          
7. Neuroticism 2.46 (0.80) .86 -.11 -.16* -.26*** .39*** .55*** -.19** −         
8. Conscientiousness 3.80 (0.65) .74 -.05 .14 .17* -.35*** -.33*** .19* -.39** −        
9. Openness 3.67 (0.67) .68 .12 .09 .31*** -.17* -.07 .40*** -.16* .22*** −       
10. Agreeableness 3.87 (0.60) .72 .05 -.01 .18* -.50*** -.29*** .27*** -.42*** .36*** .30*** −      
Self-esteem                  
11. RSES 3.19 (0.57) .89 .11 .20** .27*** -.51*** -.73*** .31*** -.72*** .42*** .20** .36*** −     
12. MSES 4.87 (0.99) .95 -.01 .15* .27*** -.46*** -.76*** .37*** -.72*** .42*** .16* .32*** .82*** −    
Motives                  
13. Affiliation 3.84 (1.07) .87 .10 -.09 .36*** -.13 -.13 .55*** -.25*** .07 .18* .38*** .28*** .32*** −   
14. Power 3.10 (1.01) .84 .18* -.09 .53*** .28*** .11 .49*** -.07 .01 .28*** -.03 .08 .05 .29*** −  
15. Achievement 4.01 (0.99) .87 .23** .07 .35*** -.09 -.05 .41*** -.17* .36*** .35*** .23** .19** .17** .34*** .51*** − 
Intercorrelations
Measures M (SD) α 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 
Intelligence                  
1. General (ICAR) 0.46 (0.23) .80 −               
2. Verbal (MWT-B) 0.81 (0.09) .70 .25*** −              
Narcissism                  
3. Admiration 2.97 (0.85) .85 -.01 -.05 −             
4. Rivalry 1.98 (0.86) .90 -.12 -.26*** .24** −            
5. PNI – Vulnerable 2.44 (0.98) .96 -.04 -.24** .00 .64*** −           
Big Five                  
6. Extraversion 3.33 (0.73) .76 .13 -.01 .42*** -.16* -.18* −          
7. Neuroticism 2.46 (0.80) .86 -.11 -.16* -.26*** .39*** .55*** -.19** −         
8. Conscientiousness 3.80 (0.65) .74 -.05 .14 .17* -.35*** -.33*** .19* -.39** −        
9. Openness 3.67 (0.67) .68 .12 .09 .31*** -.17* -.07 .40*** -.16* .22*** −       
10. Agreeableness 3.87 (0.60) .72 .05 -.01 .18* -.50*** -.29*** .27*** -.42*** .36*** .30*** −      
Self-esteem                  
11. RSES 3.19 (0.57) .89 .11 .20** .27*** -.51*** -.73*** .31*** -.72*** .42*** .20** .36*** −     
12. MSES 4.87 (0.99) .95 -.01 .15* .27*** -.46*** -.76*** .37*** -.72*** .42*** .16* .32*** .82*** −    
Motives                  
13. Affiliation 3.84 (1.07) .87 .10 -.09 .36*** -.13 -.13 .55*** -.25*** .07 .18* .38*** .28*** .32*** −   
14. Power 3.10 (1.01) .84 .18* -.09 .53*** .28*** .11 .49*** -.07 .01 .28*** -.03 .08 .05 .29*** −  
15. Achievement 4.01 (0.99) .87 .23** .07 .35*** -.09 -.05 .41*** -.17* .36*** .35*** .23** .19** .17** .34*** .51*** − 

Note: α = Cronbach’s Alpha. ICAR = International Cognitive Ability Resource, MWT-B = Multiple-Choice Vocabulary Intelligence Test; PNI = Pathological Narcissism Inventory; RSES = Rosenberg Self-Esteem Scale, MSES = Multidimensional Self-Esteem Scale. *p < .05, ** p < .01, *** p < .001

Descriptive statistics and observer agreements for the personality ratings can be found in Table 3 and for the behavior judgments in Table 4 (their intercorrelations are shown in Tables S4 and S5 of the SOM). To estimate observer agreement for each rating item, we calculated intraclass correlation coefficients (ICCs [2, k]). For ratings with more than one item (e.g., agency), we first computed a mean score of the rating for each observer (e.g., the mean across all agency items) and then used these mean scores to calculate the intraclass correlation across observers. (The ICCs for every single item can also be found in Table S6). With few exceptions for some intelligence-specific cues, the observer agreement was satisfactory to good for all items. It should be noted, however, that communion was ultimately computed with two instead of four items. The reversed items “hostile” and “ruthless” showed poor values (ICC [2, k] = .07 and .17), which can most likely be explained by floor effects (hostile: M = 1.52, SD = 0.30; ruthless: M = 1.55, SD = 0.33). We therefore excluded these two items from the communion composite.

Accuracy of Personality Impressions

Table 3 shows the results for personality impressions. Accuracy was determined by correlating the aggregated observer ratings with the self-ratings. (Accuracy for single observer are presented in Table S7.) Each of the Big Five traits was positively correlated with the respective observer judgment and the strongest of these accuracy correlations was found for extraversion, which is also the most visible of the Big Five. In each case, the observer-rating from the respective domain was the strongest correlate of the self-rated personality trait (i.e., self-rated extraversion correlated most strongly with observer-rated extraversion, etc.). Regarding self-esteem, both measures were significantly positively correlated with observer-judgments of self-esteem. Again, observer-rated self-esteem was the strongest correlate of both self-esteem measures. Looking at motive dispositions, we found the same pattern for the power and achievement motives. For each domain, a positive correlation between self-reports and observer-ratings was present, and the rating in the respective domain was the strongest correlate of the self-rating. Surprisingly, affiliation self-reports did not correlate significantly with affiliation observer-ratings.

Table 3.
Descriptive Statistics and ICCs for Personality Impressions as well as their Correlations with Self-Reported Personality Traits. Values in square brackets indicate 95% confidence intervals
RatingsSelf-report measures
 Big Five Self-esteem Motives 
 M SD ICC RSES MSES Affiliation Power Achievement 
Big Five              
Extraversion 2.93 0.71 .79 .38***
[.24, .50] 
-.20** .22** .25** .26*** .25*** .26*** .21** .11 .13 
Neuroticism 2.49 0.56 .55 -.10 .18*
[.05, .32] 
.12 -.04 -.05 -.17* -.20** -.06 -.03 .07 
Conscientiousness 3.69 0.65 .75 .06 -.10 .36***
[.23, .48] 
.15* .13 .10 .11 .04 .12 .23*** 
Openness 2.92 0.61 .73 .19* -.15* .05 .31***
[.16, .45] 
.11 .16* .13 .04 .17* .09 
Agreeableness 4.43 0.48 .59 .01 -.06 .03 .15* .15*
[.04, .30] 
.14 .06 .09 -.07 .05 
Self-esteem              
Impression 4.14 0.61 .70 .31*** -.24** .21** .24** .21** .36***
[.24, .48] 
.36***
[23., .47] 
.16* .11 .10 
Motives              
Affiliation 3.78 0.79 .72 .02 -.07 .10 .04 .32** .10 .10 .11
[-.02, .25] 
-.18* -.12 
Power 2.38 0.67 .66 .30*** -.17* .23** .06 .02 .20** .25*** .06 .35***
[.20, .48] 
.20** 
Achievement 3.82 0.81 .75 .21** -.06 .16* .17* -.06 .09 .07 .05 .27*** .33***
[.18, .47] 
RatingsSelf-report measures
 Big Five Self-esteem Motives 
 M SD ICC RSES MSES Affiliation Power Achievement 
Big Five              
Extraversion 2.93 0.71 .79 .38***
[.24, .50] 
-.20** .22** .25** .26*** .25*** .26*** .21** .11 .13 
Neuroticism 2.49 0.56 .55 -.10 .18*
[.05, .32] 
.12 -.04 -.05 -.17* -.20** -.06 -.03 .07 
Conscientiousness 3.69 0.65 .75 .06 -.10 .36***
[.23, .48] 
.15* .13 .10 .11 .04 .12 .23*** 
Openness 2.92 0.61 .73 .19* -.15* .05 .31***
[.16, .45] 
.11 .16* .13 .04 .17* .09 
Agreeableness 4.43 0.48 .59 .01 -.06 .03 .15* .15*
[.04, .30] 
.14 .06 .09 -.07 .05 
Self-esteem              
Impression 4.14 0.61 .70 .31*** -.24** .21** .24** .21** .36***
[.24, .48] 
.36***
[23., .47] 
.16* .11 .10 
Motives              
Affiliation 3.78 0.79 .72 .02 -.07 .10 .04 .32** .10 .10 .11
[-.02, .25] 
-.18* -.12 
Power 2.38 0.67 .66 .30*** -.17* .23** .06 .02 .20** .25*** .06 .35***
[.20, .48] 
.20** 
Achievement 3.82 0.81 .75 .21** -.06 .16* .17* -.06 .09 .07 .05 .27*** .33***
[.18, .47] 

Note. Reliabilities calculated via ICCs (2,k). α = Cronbach’s Alpha. E = Extraversion, N = Neuroticism, C = Conscientiousness, O = Openness, A = Agreeableness; RSES = Rosenberg Self-Esteem Scale, MSES = Multidimensional Self-Esteem Scale. Major results are printed in bold type. *p < .05, **p < .01, ***p < .001

Relations between Personality and Behavior

Table 4 shows the correlations of intelligence and narcissism with behavioral cues. (Correlations for single observer are presented in Tables S8 and S9.) In line with our assumptions, intelligence as assessed with the ICAR and the MWT-B, was significantly correlated with almost every predicted cue. For narcissism, the correlational pattern was also as expected. That is, admiration was significantly positively correlated with ratings of agentic behavior, rivalry was significantly negatively correlated with ratings of communal behavior, and vulnerable narcissism was significantly positively correlated with nervous behavior.

Table 4.
Descriptive Statistics and ICCs for Ratings of Behavior as well as their Correlations with Intelligence and Personality Self-Reports. Values in square brackets indicate 95% confidence intervals
RatingsMeasures
 Intelligence Narcissism 
 M SD ICC General (ICAR) Verbal (MWT-B) Admiration Rivalry PNI – Vulnerable 
Intelligence         
Self-assured appearance 3.71 0.61 .68 .13
[-.00, .26] 
.24**
[.09, .39] 
.23** -.32*** -.39*** 
Easy to understand 4.41 0.49 .80 .23**
[.10, .35] 
.32***
[.15, .81] 
.12 -.29*** -.27*** 
Fluent way of speaking 4.02 0.52 .49 .22**
[.06, .36] 
.23**
[.08, .37] 
.13 -.25** -.24** 
Hectic way of speaking 2.55 0.64 .66 -.22**
[-.36., -.07] 
-.10
[-.25, .07] 
.01 -.23** .08 
Use of high-⁠level language 4.37 0.45 .65 .25***
[.12, .36] 
.21**
[.05, .36] 
.08 -.10 -.05 
Effort in reading 4.02 0.53 .70 .20**
[.05, .35] 
.15*
[.02, .28] 
-.03 -.20* -.26*** 
Reading time (in seconds) 31.63 5.24 .98 -.20**
[-.35, -.06] 
-.20**
[-.38, .01] 
-.03 .08 .01 
Reading mistakes 1.25 1.12 .89 -.23**
[-.36, -.10] 
-.15*
[-.31, .01] 
.11 .13 .11 
Correct explanation 1.10 0.61 .91 .44***
[.33, .55] 
.35***
[.21, .48] 
.00 -.04 -.07 
Efficient explanation 1.56 0.37 .68 .23***
[.15, .32] 
.17*
[.03, .29] 
.05 .24 .19 
Narcissism         
Agency 3.59 0.61 .78 .00 .17* .30***
[16., .44] 
-.21** -.32*** 
Communion 3.82 0.72 .75 .04 .04 .06 -.15*
[-.28, .00] 
-.01 
Nervous 2.49 0.56 .55 -.14 -.14 .04 .18* .15*
[.01, .29] 
RatingsMeasures
 Intelligence Narcissism 
 M SD ICC General (ICAR) Verbal (MWT-B) Admiration Rivalry PNI – Vulnerable 
Intelligence         
Self-assured appearance 3.71 0.61 .68 .13
[-.00, .26] 
.24**
[.09, .39] 
.23** -.32*** -.39*** 
Easy to understand 4.41 0.49 .80 .23**
[.10, .35] 
.32***
[.15, .81] 
.12 -.29*** -.27*** 
Fluent way of speaking 4.02 0.52 .49 .22**
[.06, .36] 
.23**
[.08, .37] 
.13 -.25** -.24** 
Hectic way of speaking 2.55 0.64 .66 -.22**
[-.36., -.07] 
-.10
[-.25, .07] 
.01 -.23** .08 
Use of high-⁠level language 4.37 0.45 .65 .25***
[.12, .36] 
.21**
[.05, .36] 
.08 -.10 -.05 
Effort in reading 4.02 0.53 .70 .20**
[.05, .35] 
.15*
[.02, .28] 
-.03 -.20* -.26*** 
Reading time (in seconds) 31.63 5.24 .98 -.20**
[-.35, -.06] 
-.20**
[-.38, .01] 
-.03 .08 .01 
Reading mistakes 1.25 1.12 .89 -.23**
[-.36, -.10] 
-.15*
[-.31, .01] 
.11 .13 .11 
Correct explanation 1.10 0.61 .91 .44***
[.33, .55] 
.35***
[.21, .48] 
.00 -.04 -.07 
Efficient explanation 1.56 0.37 .68 .23***
[.15, .32] 
.17*
[.03, .29] 
.05 .24 .19 
Narcissism         
Agency 3.59 0.61 .78 .00 .17* .30***
[16., .44] 
-.21** -.32*** 
Communion 3.82 0.72 .75 .04 .04 .06 -.15*
[-.28, .00] 
-.01 
Nervous 2.49 0.56 .55 -.14 -.14 .04 .18* .15*
[.01, .29] 

Note. Reliabilities calculated via ICCs (2,k). α = Cronbach’s Alpha. ICAR = International Cognitive Ability Resource, MWT-B = Multiple-Choice Vocabulary Intelligence Test; PNI = Pathological Narcissism Inventory. Major results are printed in bold type. *p < .05, **p < .01, ***p < .001

Additional Exploratory Analyses

We first examined participants’ mood. Compared to before the online interview (M = 3.95, SD = 0.76) affect scores were significantly more positive after the interview (M = 4.14, SD = 0.68), t(188) = -4.43, p \< .001, d = 0.32, indicating that participants’ mood was actually better after the online interaction.

Then, we checked our experimental paradigm for possible experimenter effects. That is, using Analysis of Variance we tested whether for each personality impression and behavior coding whether it varied systematically across experimenters. With one exception (easy to understand; p = .014), we found no significant differences, indicating that no substantial experimenter effects existed.

We next explored potential effects of the device used during the online conversations. That is, using independent samples t-tests we compared participants using a PC or laptop computer (n = 160) with participants using a smartphone or tablet (n = 27) and tested with regard to their mean personality impression and behavior codings (detailed results can be found in the SOM). Group differences for most variables were non-significant, yet there was significantly more higher-language use among participants using a PC/laptop whereas reading time and reading mistakes were higher among participants using a smartphone/tablet. A potential explanation for the latter two findings might be that the text that participants were supposed to read was smaller and therefore harder to read for smartphone/tablet users. We also explored whether the expected accuracy/personality-behavior correlations differed significantly for the two groups using Steiger’s Z test. For the behavioral correlates of intelligence, there were several cases where the correlations with intelligence were significantly higher for the smartphone/tablet group than for the PC/laptop group, but only when intelligence was measured with the MWTB.

Investigating the formation of personality impressions and testing the association between personality and behavior play a central role in personality and social-psychological research. Yet to date, virtually all of the relevant research takes place in offline laboratories, which is costly, time-consuming, and non-inclusive. In the present study, we investigated the ecological validity of personality judgment findings outside the laboratory. The results indicate that collecting observational behavior data in an online setting seems a viable alternative.

Comparability of Results to Previous Offline Laboratory Studies

The levels of observer agreement were good for both personality impressions and behavioral judgments and similar in magnitude to agreement scores found in previous laboratory studies (Borkenau et al., 2004; Borkenau & Liebler, 1995; Krause et al., 2016). Thus, observer ratings based on online situations seem to be just as coherent as when based on laboratory situations. There was one exception: the two reversed items “hostile” and “ruthless” of communion showed substantially poorer level of observer agreement than in a previous offline study (Schmaliy, 2020). However, it is likely that this difference was not due to the digital format itself, but to the specific interactive paradigm we used. For details see below.

Most importantly, almost without exception the correlational pattern between personality impressions on the one side and personality and behavior observations on the other that has been reported in offline studies could be replicated using the online format. As in previous online studies, we found significant levels of self-other agreement for the Big Five (Borkenau et al., 2004; Borkenau & Liebler, 1993; Hirschmüller et al., 2015), self-esteem (Hirschmüller et al., 2018; Naumann et al., 2009) and the achievement and power motives (Bassler et al., 2023; Dufner & Krause, 2023). Regarding the Big Five, our findings align with the self-other correlation patterns observed in meta-analyses by Connelly & Ones (2010) and Connolly et al. (2007). Specifically, self-other correlations were highest for extraversion, followed by conscientiousness and openness, and finally neuroticism and agreeableness. Extraversion, being a highly visible trait, offers numerous easily detectable cues. Conversely, neuroticism poses greater difficulty for detection as it encompasses tendencies in internal affective states with low visibility (Zillig et al., 2002). The low accuracy correlation for agreeableness stems from its high trait evaluativeness. Since agreeableness is a valued trait for many, people often engage in self-presentation in order to appear agreeable, making it challenging to accurately observe their genuine character (Connelly & Ones, 2010; John & Robins, 1993). A similar explanation may apply to the affiliation motive, which is also socially valued in terms of establishing and maintaining relationships with others.

With regard to the behavioral correlates of intelligence, we could replicate previous findings showing that visual as well as acoustic behavioral cues correlated significantly with measures of intelligence (Borkenau & Liebler, 1995). Moreover, as hypothesized, both intelligence measures were associated with coded behavior. This shows that also when it comes to verbal cues, not only verbal intelligence tests, such as the MWT-B, but also general intelligence tests, such as the ICAR, possess predictive validity. One might wonder about the low (but still significant) correlation between the ICAR and the MWT-B. Past studies have, in fact, reported comparatively low relations between the ICAR and verbal aspects of intelligence such as vocabulary (e,g., Young & Keith, 2020), which can be explained by the fact that the ICAR aims to measure intelligence broadly, and not just its verbal aspects.

For narcissism, we could replicate earlier results indicating that the admiration dimension predicts agentic behaviors, whereas the rivalry dimension predicts a lack of communal behaviors (Back et al., 2013; Schmaliy, 2020). However, one should keep in mind that we deployed adjectives as macro-level, holistic judgments of agentic and communal behavior, rather than specific behavioral cues such as “self-assured voice” or “authentic smile” (Back et al., 2013). Future research should additionally assess micro-level behaviors in an online format. With regard to vulnerable narcissism, researchers have speculated that it goes along with shy and neurotic behavior (e.g., Van der Linden et al., 2010), even though we are not aware of any research including direct behavior observations. The present results indicate that vulnerable narcissism is indeed positively associated with observer ratings of nervousness.

Accuracy of Personality Judgments in the Online Setting

Our results indicated that for both personality impressions and the links between personality and behavior, results that are based on video observations of online calls closely resemble those based on similar studies that used offline laboratory designs. This was true not only with regard to the statistical significance of the correlations, but also for their magnitudes. In almost all cases, the effect sizes reported in the literature (for an overview, see Tables S2 and S3) were within the confidence intervals of the current correlations and similar in size. Specifically, for accuracy of personality impressions, the average effect size in comparable offline laboratory studies is r = .24, while we found an average effect size of r = .29. Regarding the relationship between personality and behavior, the reported effect sizes in offline laboratory studies average at r = .26, compared to our average effect size of r = .22. In each case, the effect sizes can be considered medium-to-large (Funder & Ozer, 2019). Concerning consensus (i.e., the degree to which observers agreed in their evaluation of a given target person; Funder, 1995), the results also resembled past laboratory research. The most relevant past laboratory studies show an ICC value of .36 for a single observer. In our study the average value was highly similar (ICC = .39 for a single observer). For the scores that were aggregated across the four to five observers, the ICCs were considerably higher and mostly in a sufficient range, which indicates that even with relatively few observers sufficient agreement can be attained in the current design.

In summary, previous research has indicated that accurate personality judgements can be made in various virtual settings (see Hinds & Joinson, 2019), such as text-based interactions (Tskhay & Rule, 2014), social network sites (Azucar et al., 2018; Liu & Campbell, 2017; Settanni et al., 2018), or personal websites (Vazire & Gosling, 2004). The current results further reveal the benefits of digital assessment by showing that results on personality expression based on online video calls are highly similar to the ones based on laboratory studies in terms of accuracy. Regarding the moderators of the degree of accurate personality judgments (Funder, 2012), the current findings indicate that online video calls can offer “good information” as basis for accurate personality judgments.

Evaluation of the Interactive Paradigm

The current research provides an interactive paradigm for the study of personality impressions and the link between personality and behavior. All that is needed on the side of the experimenter and the participant is an Internet-capable device that includes a camera and a microphone. With a duration of five to six minutes, the procedure is highly economic, and it is generally experienced as pleasant, as participants’ increase in positive affect from pre- to post-assessments indicates. The fact that we found the predicted results for so many different personality constructs indicates that our goal to include questions and tasks that possess trait relevance for different personality aspects was accomplished (even though it might be interesting for future research to systematically examine whether some questions possess more diagnostic values than others). An exception might be researchers interested in studying antisocial behavior, as hostile and ruthless behavior hardly ever occurred (see above). If such behaviors are in the focus, more competitive interpersonal situations might be better suited.

In many other cases, researchers who aim to study the effects of any specific personality trait on personality impressions or social behavior and who have an interest in feasibility, reach, inclusion, and comparability might consider using the paradigm (or an adapted version of it).

One should keep in mind, however, that by creating an interview situation, in which participants are also asked personal things about themselves, verbal information is likely to play a key role. To test whether this is indeed the case, future research could vary the modality of the information provided to the judges who rate the targets’ personality or behavior (e.g., with some judges receiving only visual information and other only receiving auditory information; Borkenau et al., 2004). If one wants to focus exclusively on how personality is linked to nonverbal micro-level behavior, such as the frequency of frowning or the body posture, perhaps other tasks that are likely to trigger the specific behaviors of interest (e.g., stress tests) might be more suitable.

Challenges of Online Behavioral Observations and Recommendations

Despite the promising results of the current research, online studies including behavior observations are facing several challenges. In the following, we will outline these challenges and give guidance on how to deal with them (for a related list of recommendations, see Perry et al., 2021).

Online approaches necessarily rely on technology capabilities (e.g., Internet connection, audio and video quality, reliability of the software used for the video calls). Whereas researchers can do their best to optimize this point on their side, they have little control over the technical equipment used by the participants. Our recommendation is to at least conduct a quick check of audio and video quality before the actual recording is started. Which kind of device should participants use in future studies with the online setting? In our exploratory analyses, we found no differences between users of different kinds of devices. There was one exception: When behavioral correlates of intelligences were investigated, some results seemed to differ, depending on whether participants used a laptop/PC or a smartphone/tablet. Yet, this conclusion must be tentative given the exploratory nature of the analyses and the modest cell size for the smartphone/tablet group. Such potential sources of noise can easily be prevented by requesting all participants of a given study to use the same kind of device.

Additionally, technical disturbances such as a slow Internet connection or a freezing camera can occur at any time, which significantly reduces the quality of the observation. We recommend researchers to develop clear guidelines on how to deal with such disturbances so that they can be dealt with maximum efficiency and consistency across experimenters. Moreover, participants may have concerns about data storage and privacy when transmitting sensible information about themselves over the Internet, even if anonymity and data protection have been assured by the experimenters. Researchers should only employ online communication software and data storage practices that comply with current privacy policies and explicitly inform participants about this practice. In addition, researchers should rely on server-focused technologies (i.e., no special software or hardware should need to be installed; Perry et al., 2021) and mobile-friendly platforms for participation using a tablet or smartphone.

In general, the online practice might scare off persons which relatively little digital experience, such as, for example, parts of the elderly, or persons who have no constant access to an Internet-capable device. This could potentially result to non-representative samples and disadvantage some groups. On the other hand, however, internet and smartphone usage continues to increase (Silver et al., 2019) so that these issues are likely to become less virulent in the future. Additionally, challenges arise when couples or families participate. Online video calls for behavioral observation do not eliminate the inherent logistical effort required to align partners’ or family members’ schedules to complete study procedures (Perry et al., 2021).

Furthermore, even though conducting studies via online conversations may be more inclusive for physically handicapped persons, one should keep in mind that such persons must have the appropriate technical equipment such as big screens, large character keyboards or screen reader for voice output.

Another challenge for researchers conducting behavior observations online is that participants should be in a comfortable and private space while taking part in the study. Hence, researchers should ensure before the start of the study that participants are home alone or in a private location where they are not disturbed and can act freely, which can be challenging for people living with their family or in flat-sharing community.

Open Questions and Avenues for Future Research

As usual in studies on the accuracy of personality impressions, observer ratings were limited to one relatively short interview situation. This practice is not ideal if one is interested in a target person’s typical behavior (Funder & Colvin, 1991). Longer observations, observations in different situational contexts or observations that are made repeatedly over time in an online setting would most likely enable even higher levels of accuracy. Through such an approach that is based on behavior aggregation (Epstein, 1983), it might even be able to use behavior observations not only as an outcome variable that is predicted by traditional assessments of personality, but as an independent personality assessment itself (Tackett et al., 2016).

Another interesting challenge for future research would be to automatize the procedure further. For example, rather than actually interacting with the experimenter, participants could receive instructions and tasks via e-mail with the request to videotape their behavior. Such an approach would, of course, be still far more economic than the current paradigm. Also on the side of personality ratings and behavior codings, the level of automatization could be increased. In emotion research, the usage of emotion recognition software that detects emotions from facial expressions or from speech signals is already widespread (for an overview, see Wani et al., 2021) and some artificial intelligence models are already being trained to infer target persons’ personality (Cannata et al., 2022). However, personality detection frequently encounters ethical dilemmas concerning the appropriate interpretation and utilization of assessment techniques (Mukherjee & Kumar, 2016). The growing integration of artificial intelligence across various domains of psychological research in recent years necessitates the development and adherence to specific legal and ethical guidelines.

The online setting appears promising for studying personality impressions and the relation between personality and behavior. However, the current study only demonstrated that the results pattern and effects sizes are similar to what has been reported based on offline settings, without precisely delineating how or whether online settings might differ from classical laboratory observations. To address this question, future research might test the same participants both settings and to directly compare the results (for a similar approach, see Kaurin et al., 2018). In such a study, one could also address the possibility that in online assessments, which typically take place in people’s private rooms, impression management concerns might be attenuated in comparison to laboratory settings, which typically feel unfamiliar and might even maybe slightly threatening.

Even though the contribution of the present research is a methodological one, the findings also raise substantive research questions. Whereas most previous studies focused on either a single personality trait or small number of traits on the predictor side and a single judgment or a small number of judgments on the outcome side, we simultaneously assessed a relatively large number of personality constructs and observer judgments. What this approach showed is that observer judgments were often not only linked to the hypothesized personality predictor, but to other traits as well. For example, observer-rated neuroticism was almost as strongly correlated to extraversion as it was to neuroticism. Such a pattern shows that even though observer judgments of personality have convergent validity, their discriminant validity can be low in many cases. Likely, there are behavioral cues that are systematically interpreted in an incorrect fashion (e.g., a behavioral correlate of extraversion is interpreted as an indicator of low neuroticism). Future research should examine this issue more systematically using lens model analyses (Brunswik, 1956) and it should do so by considering both shared and unique effects.

In doing so, researchers should also give deeper thought on the presumed ontological status of personality constructs. Typically in research on personality and behavior, a personality trait is (either explicitly or implicitly) considered a causal entity that resides inside a person and makes them likely to display a specific behavior (e.g., extraversion causing outgoing behavior). However, it is also possible to view personality and inter-related behaviors from a formative perspective (targets who show behaviors A and B may be called “C”) or from a dynamic network perspective (descriptions of A and B might correlate − and thus form a factor − because A is the cause of B; for an overview, see Leising & Borgstede, 2020). Future researchers should specify their theoretical perspective and ideally test it by using a formalized modelling approach.

Limitations

Some limitations of the present research should be noted. First, our study was not pre-registered. In hindsight, this was unfortunate, given that pre-registration enhances transparency, reduces researcher bias, and fosters credibility within the scientific community (Simmons et al., 2021). We acknowledge the benefits of adopting this practice in future research endeavors.

Second, and somewhat relatedly, due to the numerous statistical tests, it cannot be ruled out that false-positive results may have occurred. Those who are cautious should interpret only effects with p ≤ .01 or even more cautiously, if with p ≤ .005 (Benjamin et al., 2018), which makes false positives highly unlikely. However, even using these conservative thresholds, most effects persist, which speaks in favor of the online paradigm.

Third, we used undergraduate psychology students as observers, a choice that was primarily guided by logistical reasons (and that has also been made in many earlier studies). An upside of using students as observers is their familiarity with the to-be-judged constructs, requiring minimal explanation or training. A downside of student observers, however, is their demographic homogeneity (i.e., predominantly young and female), potentially leading to agreement biases stemming from shared stereotypes (Kenny, 2004; Letzring et al., 2021). Furthermore, it is conceivable that due to the students’ interests and training in psychology, accuracy correlations might be higher than in the general population. Especially if the goal is to draw conclusions about personality impressions in real life, using a more representative set of observers would be advisable.

Fourth, it cannot be ruled out that the accuracy correlations could be influenced by stereotypes unrelated to actual behavior − even if more heterogeneous samples of observers are tested (see e.g., Smartt et al., 2022). To address this possibility, future studies should code behavioral cues for all traits and test if they mediate the accuracy effects.

Finally, because mood was only assessed with a single, the assessment is rather unspecific and its reliability is unknown. A nuanced, multi-item measure would have avoided these issues.

Behavioral observations offer a valuable and irreplaceable source of information in the study of personality. However, laboratory studies are cost intensive in terms of the investment of time, money, and effort and also non-inclusive. The present research demonstrates that observations of video calls yield highly similar results to those made in the laboratory when it comes to linking personality to personality impressions or behavior judgments. Our research also provides a specific paradigm for online behavioral research that can be easily implemented, is evaluated positively by participants, and provides valid results. Video calls therefore not only connect us, but also open a new lens for exploring the nuances of personality.

Contributed to conception and design: MS, MD

Contributed to acquisition of data: MS

Contributed to analysis and interpretation of data: MS

Drafted and/or revised the article: MS, MD

Approved the submitted version for publication: MS, MD

We thank Alina Klute, Annelie Schnabel, Can Aygün, Zeza Arckel, and Lennart Kraume for their help in study implementation and data collection.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

The authors declare no potential conflicts of interest concerning the research, authorship, or publication of this article.

Supplemental material can be found online at the Open Science Framework: https://osf.io/8utzk/

This study was not preregistered. Materials, participant data, and analysis scripts are available at the Open Science Framework: https://osf.io/8utzk/

1.

Due a programming error, data for age was not available for 28 participants.

Allen, A. P., Kennedy, P. J., Dockray, S., Cryan, J. F., Dinan, T. G., & Clarke, G. (2017). The Trier Social Stress Test: Principles and Practice. Neurobiology of Stress, 6, 113–126. https://doi.org/10.1016/j.ynstr.2016.11.001
Arslan, R. C., Walther, M. P., & Tata, C. S. (2020). formr: A study framework allowing for automated feedback generation and complex longitudinal experience-sampling studies using R. Behavior Research Methods, 52, 376–387. https://doi.org/10.3758/s13428-019-01236-y
Azucar, D., Marengo, D., & Settanni, M. (2018). Predicting the Big 5 personality traits from digital footprints on social media: A meta-analysis. Personality and Individual Differences, 124, 150–159. https://doi.org/10.1016/j.paid.2017.12.018
Back, M. D., Küfner, A. C. P., Dufner, M., Gerlach, T. M., Rauthmann, J. F., & Denissen, J. J. A. (2013). Narcissistic admiration and rivalry: Disentangling the bright and dark sides of narcissism. Journal of Personality and Social Psychology, 105(6), 1013–1037. https://doi.org/10.1037/a0034431
Back, M. D., & Nestler, S. (2016). Accuracy of judging personality. In J. A. Hall, M. Schmid Mast, & T. V. West (Eds.), The social psychology of perceiving others accurately (pp. 98–124). Cambridge University Press. https://doi.org/10.1017/CBO9781316181959.005
Back, M. D., Stopfer, J. M., Vazire, S., Gaddis, S., Schmukle, S. C., Egloff, B., & Gosling, S. D. (2010). Facebook Profiles Reflect Actual Personality, not Self-Idealization. Psychological Science, 21(3), 372–374. https://doi.org/10.1177/0956797609360756
Bakeman, R., & Quera, V. (2012). Behavioral observation. In H. Cooper, P. M. Camic, D. L. Long, A. T. Panter, D. Rindskopf, & K. J. Sher (Eds.), APA handbook of research methods in psychology, Vol. 1. Foundations, planning, measures, and psychometrics (pp. 207–225). American Psychological Association. https://doi.org/10.1037/13619-013
Bassler, P., Dufner, M., & Denissen, J. (2023). Motive Perception at First Impressions: On the Relevance of Targets’ Explicit and Implicit Motive Dispositions. Personality Science, 4(1). https://doi.org/10.5964/ps.10753
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., … Johnson, V. E. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6–10. https://doi.org/10.1038/s41562-017-0189-z
Bernieri, F. J. (2001). Toward a taxonomy of interpersonal sensitivity. In J. A. Hall & F. J. Bernieri (Eds.), Interpersonal sensitivity (pp. 17–34). Psychology Press.
Block, J. (1961). Ego identity, role variability, and adjustment. Journal of Consulting Psychology, 25(5), 392–397. https://doi.org/10.1037/h0042979
Borkenau, P., & Liebler, A. (1993). Convergence of stranger ratings of personality and intelligence with self-ratings, partner ratings, and measured intelligence. Journal of Personality and Social Psychology, 65(3), 546–553. https://doi.org/10.1037/0022-3514.65.3.546
Borkenau, P., & Liebler, A. (1995). Observable Attributesas Manifestations and Cues of Personality and Intelligence. Journal of Personality, 63(1), 1–25. https://doi.org/10.1111/j.1467-6494.1995.tb00799.x
Borkenau, P., Mauer, N., Riemann, R., Spinath, F. M., & Angleitner, A. (2004). Thin Slices of Behavior as Cues of Personality and Intelligence. Journal of Personality and Social Psychology, 86(4), 599–614. https://doi.org/10.1037/0022-3514.86.4.599
Breil, S. M., Osterholz, S., Nestler, S., & Back, M. D. (2021). Contributions of nonverbal cues to the accurate judgment of personality traits. In T. D. Letzring & J. S. Spain (Eds.), The Oxford handbook of accurate personality judgment (pp. 195–218). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190912529.013.13
Brunswik, E. (1956). Perception and the representative design of psychological experiments. University of California Press. https://doi.org/10.1525/9780520350519
Cannata, D., Breil, S. M., Back, M. D., Lepri, B., & O’Hora, D. (2022). Toward an Integrative Approach to Nonverbal Personality Detection: Connecting Psychological and Artificial Intelligence Research. Technology, Mind, and Behavior. https://doi.org/10.31234/osf.io/gm7x4
Cappa, F., Oriani, R., Peruffo, E., & McCarthy, I. (2021). Big data for creating and capturing value in the digitalized environment: unpacking the effects of volume, variety, and veracity on firm performance. Journal of Product Innovation Management, 38(1), 49–67. https://doi.org/10.1111/jpim.12545
Carney, D. R., Colvin, C. R., & Hall, J. A. (2007). A thin slice perspective on the accuracy of first impressions. Journal of Research in Personality, 41(5), 1054–1072. https://doi.org/10.1016/j.jrp.2007.01.004
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155–159. https://doi.org/10.1037/0033-2909.112.1.155
Condon, D. M., & Revelle, W. (2014). The international cognitive ability resource: Development and initial validation of a public-domain measure. Intelligence, 43, 52–64. https://doi.org/10.1016/j.intell.2014.01.004
Connelly, B. S., & Ones, D. S. (2010). An other perspective on personality: Meta-analytic integration of observers’ accuracy and predictive validity. Psychological Bulletin, 136(6), 1092–1122. https://doi.org/10.1037/a0021212
Connolly, J. J., Kavanagh, E. J., & Viswesvaran, C. (2007). The Convergent Validitybetween Self and Observer Ratings of Personality: A meta-analytic review. International Journal of Selection and Assessment, 15(1). https://doi.org/10.1111/j.1468-2389.2007.00371.x
de Vet, H. C., Mokkink, L. B., Mosmuller, D. G., & Terwee, C. B. (2017). Spearman–Brown prophecy formula and Cronbach’s alpha: different faces of reliability and opportunities for new applications. Journal of Clinical Epidemiology, 85, 45–49. https://doi.org/10.1016/j.jclinepi.2017.01.013
Diana, F., Juárez-Mora, O. E., Boekel, W., Hortensius, R., & Kret, M. E. (2023). How video calls affect mimicry and trust during interactions. Philosophical Transactions of the Royal Society B, 378(1875), 20210484. https://doi.org/10.1098/rstb.2021.0484
Dufner, M., & Krause, S. (2023). On How to Be Liked in First Encounters: The Effects of Agentic and Communal Behaviors on Popularity and Unique Liking. Psychological Science, 34(4), 481–489. https://doi.org/10.1177/09567976221147258
Epstein, S. (1983). Aggregation and beyond: Some basic issues on the prediction of behavior. Journal of Personality, 51(3), 360–392. https://doi.org/10.1111/j.1467-6494.1983.tb00338.x
Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191. https://doi.org/10.3758/BF03193146
Fleming, J. S., & Courtney, B. E. (1984). The dimensionality of self-esteem: II. Hierarchical facet model for revised measurement scales. Journal of Personality and Social Psychology, 46(2), 404–421. https://doi.org/10.1037/0022-3514.46.2.404
Funder, D. C. (1995). On the accuracy of personality judgment: A realistic approach. Psychological Review, 102(4), 652–670. https://doi.org/10.1037/0033-295X.102.4.652
Funder, D. C. (2012). Accurate Personality Judgement. Current Directions in Psychological Science, 21(3), 177–182. https://doi.org/10.1177/0963721412445309
Funder, D. C., & Colvin, C. R. (1991). Explorations in behavioral consistency: Properties of persons, situations, and behaviors. Journal of Personality and Social Psychology, 60(5), 773–794. https://doi.org/10.1037/0022-3514.60.5.773
Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psychological research: Sense and nonsense. Advances in Methods and Practices in Psychological Science, 2(2), 156–168. https://doi.org/10.1177/2515245919847202
Funder, D. C., & West, S. G. (1993). Consensus, self-other agreement, and accuracy in personality judgment: An introduction. Journal of Personality, 61(4), 457–476. https://doi.org/10.1111/j.1467-6494.1993.tb00778.x
Furr, R. M., & Funder, D. C. (2007). Behavioral observation. In R. W. Robins, R. C. Fraley, & R. F. Krueger (Eds.), Handbook of research methods in personality psychology (pp. 273–291). Guilford Press.
Furr, R. M., & Funder, D. C. (2018). Persons, situations, and person-situation interactions. In O. P. John & R. W. Robins (Eds.), Handbook of personality: Theory and research (pp. 667–685). Guilford Press.
Giacomin, M., & Jordan, C. H. (2019). Misperceiving grandiose narcissism as self-esteem: Why narcissists are well liked at zero acquaintance. Journal of Personality, 87(4), 827–842. https://doi.org/10.1111/jopy.12436
Goduscheit, R. C., & Faullant, R. (2018). Paths Toward Radical Service Innovation in Manufacturing Companies—A Service-Dominant Logic Perspective. Journal of Product Innovation Management, 35(5), 701–719. https://doi.org/10.1111/jpim.12461
Gosling, S. D., & Mason, W. (2015). Internet Research in Psychology. Annual Review of Psychology, 66, 877–902. https://doi.org/10.1146/annurev-psych-010814-015321
Hinds, J., & Joinson, A. (2019). Human and Computer Personality Prediction From Digital Footprints. Current Directions in Psychological Science, 28(2), 204–211. https://doi.org/10.1177/0963721419827849
Hirschmüller, S., Egloff, B., Krause, S., Schmukle, S. C., Nestler, S., & Back, M. D. (2015). Accurate Judgments of Neuroticism at Zero Acquaintance: A Question of Relevance. Journal of Personality, 83(2), 221–228. https://doi.org/10.1111/jopy.12097
Hirschmüller, S., Schmukle, S. C., Krause, S., Back, M. D., & Egloff, B. (2018). Accuracy of self-esteem judgments at zero acquaintance. Journal of Personality, 86(2), 308–319. https://doi.org/10.1111/jopy.12316
Jacobs, I., & Scholl, W. (2005). Interpersonale Adjektivliste (IAL). Diagnostica, 51(3), 145–155. https://doi.org/10.1026/0012-1924.51.3.145
John, O. P., & Robins, R. W. (1993). Determinants of interjudge agreement on personality traits: The Big Five domains, observability, evaluativeness, and the unique perspective of the self. Journal of Personality, 61(4), 521–551. https://doi.org/10.1111/j.1467-6494.1993.tb00781.x
Kaurin, A., Schönfelder, S., & Wessa, M. (2018). Self-compassion buffers the link between self-criticism and depression in trauma-exposed firefighters. Journal of Counseling Psychology, 65(4), 453–462. https://doi.org/10.1037/cou0000275
Kende, M. (2014). Global Internet Report - Internet Society. http://www.internetsociety.org/map/global-internet-report/
Kenny, D. A. (2004). PERSON: A General Model of Interpersonal Perception. Personality and Social Psychology Review, 8(3). https://doi.org/10.1207/s15327957pspr0803_3
Kirschbaum, C., Pirke, K. M., & Hellhammer, D. H. (1993). The ‘Trier Social Stress Test’–A Tool for Investigating Psychobiological Stress Responses in a Laboratory Setting. Neuropsychobiology, 28(1–2), 76–81. https://doi.org/10.1159/000119004
Koo, T. K., & Li, M. Y. (2016). A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. Journal of Chiropractic Medicine, 15(2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012
Krause, S., Back, M. D., Egloff, B., & Schmukle, S. C. (2016). Predicting Self–Confident Behaviour with Implicit and Explicit Self–Esteem Measures. European Journal of Personality, 30(6), 648–662. https://doi.org/10.1002/per.2076
Lehrl, S., Triebig, G., & Fischer, B. (1995). Multiple choice vocabulary test MWT as a valid and short test to estimate premorbid intelligence. Acta Neurologica Scandinavica, 91(5), 335–345. https://doi.org/10.1111/j.1600-0404.1995.tb07018.x
Leising, D., & Borgstede, M. (2020). Hypothetical Constructs. In V. Zeigler-Hill & T. K. Shackelford (Eds.), Encyclopedia of Personality and Individual Differences. Springer. https://doi.org/10.1007/978-3-319-24612-3_679
Letzring, T. D., Murphy, N. A., Allik, J., Beer, A., Zimmermann, J., & Leising, D. (2021). The Judgment of Personality: An Overview of Current Empirical Research Findings. Personality Science, 2(1). https://doi.org/10.5964/ps.6043
Letzring, T. D., & Spain, J. S. (Eds.). (2021). The Oxford handbook of accurate personality judgment. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190912529.001.0001
Liu, D., & Campbell, W. K. (2017). The Big Five personality traits, Big Two metatraits and social media: A meta-analysis. Journal of Research in Personality, 70, 229–240. https://doi.org/10.1016/j.jrp.2017.08.004
Lukowitsky, M. R., & Pincus, A. L. (2013). Interpersonal Perception of Pathological Narcissism: A Social Relations Analysis. Journal of Personality Assessment, 95(3), 261–273. https://doi.org/10.1080/00223891.2013.765881
McCullagh, P. J., McAllister, H. G., Armstrong, G. A., & Regan, M. F. (1994). PC software library for the production of auditory stimuli in cognitive event-related potential experiments. Computer Methods and Programs in Biomedicine, 45(4), 283–289. https://doi.org/10.1016/0169-2607(94)01588-7
Miller, J. D., Lynam, D. R., Hyatt, C. S., & Campbell, W. K. (2017). Controversies in Narcissism. Annual Review of Clinical Psychology, 13, 291–315. https://doi.org/10.1146/annurev-clinpsy-032816-045244
Morf, C. C., Schürch, E., Küfner, A., Siegrist, P., Vater, A., Back, M., … Schröder-Abé, M. (2017). Expanding the nomological net of the Pathological Narcissism Inventory: German validation and extension in a clinical inpatient sample. Assessment, 24(4), 419–443. https://doi.org/10.1177/1073191115627010
Moshontz, H., Campbell, L., Ebersole, C. R., IJzerman, H., Urry, H. L., Forscher, P. S., … Chartier, C. R. (2018). The Psychological Science Accelerator: Advancing psychology through a distributed collaborative network. Advances in Methods and Practices in Psychological Science, 1(4), 501–515. https://doi.org/10.1177/2515245918797607
Mukherjee, S., Kumar, U. (2016). Ethical issues in personality assessment. In U. Kumar (Ed.), The Wiley handbook of personality assessment (pp. 415–426). John Wiley Sons, Ltd. https://doi.org/10.1002/9781119173489.ch30
Murphy, N. A., Hall, J. A., Colvin, C. R. (2003). Accurate Intelligence Assessments in Social Interactions: Mediators and Gender Effects. Journal of Personality, 71(3), 465–493. https://doi.org/10.1111/1467-6494.7103008
Naumann, L. P., Vazire, S., Rentfrow, P. J., Gosling, S. D. (2009). Personality judgments based on physical appearance. Personality and Social Psychology Bulletin, 35(12). https://doi.org/10.1177/0146167209346309
Nguyen, M. H., Gruber, J., Fuchs, J., Marler, W., Hunsaker, A., Hargittai, E. (2020). Changes in Digital Communication During the COVID-19 Global Pandemic: Implications for Digital Inequality and Future Research. Social Media + Society, 6(3). https://doi.org/10.1177/2056305120948255
Ostermann, T., Röer, J. P., Tomasik, M. J. (2021). Digitalization in psychology: A bit of challenge and a byte of success. Patterns, 2(10), 100334. https://doi.org/10.1016/j.patter.2021.100334
Perry, N. S., Sullivan, T. J., Leo, K., Huebner, D. M., O’Leary, K. D., Baucom, B. R. W. (2021). Using web-based technologies to increase reach, inclusion, and generalizability in behavioral observation research. Journal of Family Psychology, 35(7), 983–993. https://doi.org/10.1037/fam0000856
Pincus, A. L., Ansell, E. B., Pimentel, C. A., Cain, N. M., Wright, A. G. C., Levy, K. N. (2009). Initial construction and validation of the Pathological Narcissism Inventory. Psychological Assessment, 21, 365–379. https://doi.org/10.1037/a0016530
Rammstedt, B., Danner, D., Soto, C. J., John, O. P. (2018). Validation of the Short and Extra-Short Forms of the Big Five Inventory-2 (BFI-2) and Their German adaptations. European Journal of Psychological Assessment, 36(1), 149–161. https://doi.org/10.1027/1015-5759/a000481
Reynolds, D. A. J., Jr., Gifford, R. (2001). The Sounds and Sights of Intelligence: A Lens Model Channel Analysis. Personality and Social Psychology Bulletin, 27(2), 187–200. https://doi.org/10.1177/0146167201272005
Rosenberg, M. (1965). Society and the Adolescent Self-Image. Princeton University Press. https://doi.org/10.1515/9781400876136
Schmaliy, A. (2020). Der Zusammenhang zwischen Narzissmus und agentischem und kommunalem Sozialverhalten am videographischen Beispiel [Unveröffentlichte Bachelorarbeit]. Medical School Berlin.
Schönbrodt, F. D., Gerstenberg, F. X. R. (2012). An IRT analysis of motive questionnaires: The Unified Motive Scales. Journal of Research in Personality, 46(6), 725–742. https://doi.org/10.1016/j.jrp.2012.08.010
Schütz, A., Sellin, I. (2006). Die multidimensionale Selbstwertskala (MSWS). Hogrefe.
Settanni, M., Azucar, D., Marengo, D. (2018). Predicting individual characteristics from digital traces on social media: A meta-analysis. Cyberpsychology, Behavior, and Social Networking, 21(4), 217–228. https://doi.org/10.1089/cyber.2017.0384
Silver, L., Huang, C., Taylor, K. (2019). In Emerging Economies, Smartphone and Social Media Users Have Broader Social Networks. Pew Research Center.
Simmons, J. P., Nelson, L. D., Simonsohn, U. (2021). Pre-registration: Why and how. Journal of Consumer Psychology, 31(1), 151–162. https://doi.org/10.1002/jcpy.1208
Smartt, T., Talaifar, S., Gosling, S. D. (2022). Dostoyevsky’s conjecture: Evaluating personality impressions based on laughter. Journal of Nonverbal Behavior, 46(4), 383–397. https://doi.org/10.1007/s10919-022-00408-3
Soto, C. J., John, O. P. (2017). The next Big Five Inventory (BFI-2): Developing andassessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power. Journal of Personality and Social Psychology, 113(1), 117–143. https://doi.org/10.1037/pspp0000096
Tackett, J. L., Herzhoff, K., Kushner, S. C., Rule, N. (2016). Thin slices of child personality: Perceptual, situational, and behavioral contributions. Journal of Personality and Social Psychology, 110(1), 150–166. https://doi.org/10.1037/pspp0000044
Tett, R. P., Guterman, H. A. (2000). Situation Trait Relevance, Trait Expression, and Cross-Situational Consistency: Testing a Principle of Trait Activation. Journal of Research in Personality, 34(4), 397–423. https://doi.org/10.1006/jrpe.2000.2292
Tissera, H., Mignault, M.-C., Human, L. J. (2023). “Zooming” in on positive and accurate metaperceptions in first impressions: Examining the links with social anxiety and liking in online video interactions. Journal of Personality and Social Psychology, 125(4), 852–873. https://doi.org/10.1037/pspp0000457
Todorov, A. (2017). Face Value: The Irresistible Influence of First Impressions. Princeton University Press. https://doi.org/10.1515/9781400885725
Tskhay, K. O., Rule, N. O. (2014). Perceptions of personality in text-based media and OSN: A meta-analysis. Journal of Research in Personality, 49, 25–30. https://doi.org/10.1016/j.jrp.2013.12.004
Van der Linden, D., te Nijenhuis, J., Bakker, A. B. (2010). The General Factor of Personality: A meta-analysis of Big Five intercorrelations and a criterion-related validity study. Journal of Research in Personality, 44(3), 315–327. https://doi.org/10.1016/j.jrp.2010.03.003
Vazire, S., Carlson, E. N. (2010). Self-Knowledge of Personality: Do People Know Themselves? Social and Personality Psychology Compass, 4(8), 605–620. https://doi.org/10.1111/j.1751-9004.2010.00280.x
Vazire, S., Gosling, S. D. (2004). e-Perceptions: Personality Impressions Based on Personal Websites. Journal of Personality and Social Psychology, 87(1), 123–132. https://doi.org/10.1037/0022-3514.87.1.123
von Collani, G., Herzberg, P. Y. (2003). Zur internen Struktur des globalen Selbstwertgefühls nach Rosenberg. Zeitschrift für Differentielle und Diagnostische Psychologie, 24(1), 9–22. https://doi.org/10.1024//0170-1789.24.1.9
Wang, C., Chanel, G. (2021). An Open Dataset for Impression Recognition From Multimodal Bodily Responses. In 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII) (pp. 1–8). IEEE. https://doi.org/10.1109/ACII52823.2021.9597421
Wani, T. M., Gunawan, T. S., Qadri, S. A. A., Kartiwi, M., Ambikairajah, E. (2021). A Comprehensive Review of Speech Emotion Recognition Systems. IEEE Access, 9, 47795–47814. https://doi.org/10.1109/ACCESS.2021.3068045
Weidman, A. C., Steckler, C. M., Tracy, J. L. (2017). The jingle and jangle of emotion assessment: Imprecise measurement, casual scale usage, and conceptual fuzziness in emotion research. Emotion, 17(2), 267–295. https://doi.org/10.1037/emo0000226
Williams, K. D., Jarvis, B. (2006). Cyberball: A program for use in research on interpersonal ostracism and acceptance. Behavior Research Methods, 38, 174–180. https://doi.org/10.3758/BF03192765
Wu, W., Mitchell, P., Lv, Y. (2023). Consistency in personality trait judgments across online chatting and offline conversation. Frontiers in Psychology, 14. https://doi.org/10.3389/fpsyg.2023.1077458
Young, S. R., Keith, T. Z. (2020). An Examination of the Convergent Validity of the ICAR16 and WAIS-IV. Journal of Psychoeducational Assessment, 38(8), 1052–1059. https://doi.org/10.1177/0734282920943455
Zillig, L. M., Hemenover, S. H., Dienstbier, R. A. (2002). What Do We Assess When We Assess a Big 5 Trait?: A Content Analysis of the Affective, Behavioral, and Cognitive Processes Represented in Big 5 Personality Inventories. Personality and Social Psychology Bulletin, 28(6), 847–858. https://doi.org/10.1177/0146167202289013
Zoom Video Communications, Inc. (2011). Zoom (Version 5.13.3 (11494)) [Computer Software]. https://zoom.us/
This is an open access article distributed under the terms of the Creative Commons Attribution License (4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Supplementary data