Around the world, hundreds of millions of people have used social chatbots designed to provide companionship to their users. But can people reap genuine social benefits from interacting with chatbots? In Studies 1a&b (pre-registered; N=801), participants shared good news with a supportive or less supportive interaction partner whom they believed was either a chatbot or a human. Participants’ feelings following the interaction were influenced by their partner’s response style, but not by whether their partner was human. In Study 2 (pre-registered, N=201), participants derived more social connection from having a supportive conversation with ChatGPT than with a less supportive human. In a final pre-registered study (Study 3; N=401), we identified an important boundary condition, demonstrating that the relative benefits of interacting with chatbots (vs. humans) may be reduced when they claim too much humanity.
In November 2022, the world was introduced to ChatGPT, ushering in what has been called a Promethean moment for humanity, akin to the introduction of fire (Friedman, 2023). Although its capabilities are seemingly endless, ChatGPT is not programmed to develop a strong emotional bond with its users. Nevertheless, other popular “social chatbots” have been explicitly designed to emotionally connect with humans. For example, hundreds of millions of users have downloaded Replika and Xiaoice, social chatbots designed to offer long-term companionship (Pardes, 2018; Zhou et al., 2020). But is meaningful digital companionship simply a matter of overcoming technological barriers, or are there also fundamental psychological barriers that preclude meaningful relationships with chatbots?
It is possible that even the most sophisticated chatbot may fail to provide meaningful social benefits because chatbots lack a mind. People’s perception of both “experience”— the capacity to be conscious, experience emotions, and have a personality— and “agency”—the capacity to exert self-control, have memory, and recognize emotions—are key components of determining whether something has a mind (H. M. Gray et al., 2007). Chatbots lack both experience and agency (for brevity, we refer to the combination as “agency” throughout the paper). As such, chatbots’ inherent lack of agency may prevent their human conversation partners from ever feeling like their digital friend truly “gets them”, precluding the benefits of a social interaction between humans (Morelli et al., 2014; Pollmann & Finkenauer, 2009; Reis et al., 2017). However, according to the Computers Are Social Actors Framework (CASA; Nass et al., 1994; Nass & Moon, 2000; Reeves & Nass, 1996), individuals instinctively perceive and interact with computers as if they are people, perhaps because of our propensity to anthropomorphise (Epley et al., 2008; Waytz et al., 2010). This psychological tendency to “see human” (Epley et al., 2007) could enable chatbots to offer a viable source of social connection.
Of course, up until very recently, human-like artificial intelligence (AI) only existed in the realm of science fiction. To bypass technological constraints and increase experimental control, human-computer interaction researchers typically rely on “Wizard of Oz” methodology: participants are led to believe they are interacting with a digital agent, when in reality the digital agent is controlled behind the scenes by a member of the research team (Dahlbäck et al., 1993). In a study utilizing this methodology, participants shared a personal problem via text with a partner (a research assistant) who they believed was either a chatbot or a human (Ho et al., 2018). The results revealed no significant differences in the emotional benefits derived from the interactions with the “chatbot” vs. “human.” Because the study included fewer than 25 participants per cell, however, the null effect may reflect inadequate power. Alternatively, the null effect may hint towards a human capacity to connect with chatbots, despite their lack of a mind.
Although they lack agency, chatbots offer an important advantage over humans: chatbots can be programmed to respond in a highly supportive manner, whereas not all humans are reliable providers of support. For example, in a study of 59 dating couples (Gable et al., 2004), participants were asked to rate how their partners typically responded when they shared good news (i.e., capitalization; Langston, 1994). While some participants endorsed items such as “my partner usually reacts to my good fortune enthusiastically” (reflecting an active-constructive style), others endorsed statements such as “he/she points out the potential problems or downsides of the good event” (reflecting an active-destructive style). Having a partner who tended to respond in an active-constructive manner was associated with greater relationship satisfaction (Gable et al., 2004). Thus, it is possible that highly supportive chatbots could provide more emotional benefits than less supportive human counterparts.
The Present Research
The present research was motivated by the question of whether the core benefit of chatbots—their ability to respond optimally—could outweigh their core limitation: their lack of a mind. If a conversation partner’s objective level of agency is a pre-requisite for feeling socially connected, people should feel less social connection, rapport and positive emotions from an interaction when they think their partner is a chatbot. We propose, however, that people can derive genuine feelings of connection and happiness from engaging with a supportive partner—even if that partner is an AI. To test this, we led participants to believe they were interacting with either a chatbot or a human, who responded to them in a supportive or unsupportive manner (Studies 1a & 1b). This allowed us to test whether participants derived fewer benefits (e.g., feelings of social connection) from a conversation when they believed their partner was a chatbot rather than a human. Importantly, we were also able to test whether a highly supportive bot could be a more rewarding conversation partner than a less supportive human. In Study 2, we replicated this key test while having participants interact with ChatGPT, rather than simply leading them to believe they were talking to a chatbot. Finally, in Study 3 we examined a theoretically derived boundary condition for the potential relative benefits of chatbot (vs. human) conversation partners.
Studies 1a & 1b
Methods
Study 1a and 1b were nearly identical in design, aside from a few differences. Below, we outline the overall design of the two studies while noting where they differ.
Experimental design
All participants shared positive news with a partner via text, and they were randomly assigned to a condition in a 2 (Perceived Conversation Partner) X 2 (Response Style) between-subjects design. They were led to believe their partner was either a human or a chatbot. In reality, the partner was always a research assistant, who was trained to behave in either a highly supportive (i.e., active constructive) or less supportive (i.e., active destructive) manner (see Figure 1 for examples of these response styles). This approach enabled us to ensure that conversation quality was equivalent in both the chatbot and human conditions. As such, we were able to test how believing one’s partner lacked a mind influenced the benefits people gleaned from an interaction. When discussing the chatbot and human conditions, we take the point of view of the participant and describe the conditions as either “interacting with a chatbot” or “interacting with a human”.
Note: The upper row (+) contains highly supportive conversations; the lower row (-) contains less supportive conversations. The left column includes conversations with research assistants, while the right column includes conversations with ChatGPT (Study 2 only). There were no conditions in our studies in which ChatGPT responded in a less supportive manner. As such, the lower right quadrant is empty.
Note: The upper row (+) contains highly supportive conversations; the lower row (-) contains less supportive conversations. The left column includes conversations with research assistants, while the right column includes conversations with ChatGPT (Study 2 only). There were no conditions in our studies in which ChatGPT responded in a less supportive manner. As such, the lower right quadrant is empty.
Past research suggests people may interact differently with a chatbot compared to a human (e.g., disclosing more to a chatbot; Lucas et al., 2014). As such, in Study 1a, we only told participants they had talked with a chatbot after they had the interaction. In real life, however, people typically know when they are talking with a chatbot, and so in Study 1b we told participants they would be talking to a chatbot prior to the conversation.
Pre-registration
The pre-registration for Study 1a & 1b are available on the Open Science Framework (OSF) at https://tinyurl.com/y58v4hcn (Study 1a) and https://tinyurl.com/tfkhk3cs (Study 1b).
Procedure
The procedures of all studies presented here were approved by our institution’s review board. All participants first completed a consent form and answered demographic questions. Next, participants wrote about a positive event from the past three months that they would be comfortable sharing with a stranger. Participants were then told they would be interacting with a conversation partner in the study via text for 8 minutes. We instructed them to spend the entire interaction discussing their positive experience with their conversation partner. Participants interacted with their text partner using chatplat (www.chatplat.com), a text chat platform that integrates directly into Qualtrics surveys. After the interaction, the participants completed measures of our dependent variables of interest.
As noted, in Study 1a, every participant was told they would be interacting with a fellow participant in the study prior to the interaction. Immediately after the interaction (and prior to completing outcome measures), they were told they actually interacted with either a chatbot (chatbot condition) or a research assistant (human condition); as such, participants in both conditions experienced misdirection. In contrast, participants in Study 1b were told they would be interacting with either a fellow participant (human condition) or a chatbot (chatbot condition) prior to the text interaction and received no additional information about their partner after the interaction ended.
Sample
We pre-registered sample size targets of N = 400 for both Studies 1a and 1b. We had 80% power to detect effect sizes of Cohen’s d = .35 or greater between the individual conditions, and main effects of approximately Cohen’s d = .28 or greater. Study 1a was the first study we conducted on this topic, and we pre-registered a sample size of N = 400 (n = 100/cell) based on resource constraints and what we believed was a reasonable effect size estimate. Given that this sample was large enough to detect the effect of interest in Study 1a, we pre-registered the same sample size (i.e., n = 100/cell) for all subsequent studies. The dataset and R-code for Study 1a and 1b (as well as Studies 2-3) can be found at <tinyurl.com/15ycsfxv>.
Study 1a. We recruited participants from both our university human subjects pool (in exchange for bonus course credit) and adults living in the United States or Canada from Prolific Academic (in exchange for \$2.17 USD). Our final sample consisted of 400 participants after exclusions (age: M = 24.47, SD = 8.86; 68% women).
Study 1b. Participants were recruited from Prolific Academic in return for \$2.17 USD. Our final sample consisted of 401 participants after exclusions (age: M = 32.67, SD = 11.25; 48% women).
Exclusion criteria
We pre-registered the same exclusion criteria for Studies 1a and 1b. Table 1 contains the number of participants excluded for each pre-registered criterion in each study. First, we excluded participants who were unable to have a conversation due to technical difficulties with the chatplat text platform (e.g., incompatible internet browsers, disconnecting from the internet). Second, we excluded participants who failed our conversation partner manipulation check (e.g., participants in the chatbot condition who said they had interacted with a human). Finally, we excluded participants who failed our response style manipulation check. Specifically, participants answered three yes-or-no questions assessing whether they perceived their conversation partner as highly supportive (e.g., “Did your interaction partner share your excitement about the positive experience you shared?”); participants in the highly supportive condition who answered “no” to all three questions were excluded. Likewise, participants answered three questions assessing whether they perceived their conversation partner as low in supportiveness (e.g., “Did your interaction partner minimize the significance of the positive experience you shared?”); participants in the less supportive condition who answered “no” to all three questions were excluded. A considerable number of people in the less supportive condition were excluded because they did not view the interaction partner as unsupportive, perhaps because their conversation partners were engaged and asked many questions.
. | Technical difficulties preventing conversation . | Failed the conversation partner manipulation check . | Failed the response style manipulation check . |
---|---|---|---|
Study 1a | n = 30 | Chatbot condition n = 17 Human condition n = 9 | Highly supportive chatbot condition n = 1 Less supportive chatbot condition n = 16 Highly supportive human condition n = 0 Less supportive human condition n = 10 |
Study 1b | n = 47 | Chatbot condition n = 24 Human condition n =14 | Highly supportive chatbot condition n = 1 Less supportive chatbot condition n = 8 Highly supportive human condition n = 1 Less supportive human condition n = 16 |
Study 2 | n = 8 | Chatbot condition n = 8 Human condition n = 12 | Highly supportive chatbot condition n = 8 Less supportive human condition n = 28 |
Study 3 | n = 54 | Chatbot condition n = 27 Human condition n = 9 | No exclusions |
. | Technical difficulties preventing conversation . | Failed the conversation partner manipulation check . | Failed the response style manipulation check . |
---|---|---|---|
Study 1a | n = 30 | Chatbot condition n = 17 Human condition n = 9 | Highly supportive chatbot condition n = 1 Less supportive chatbot condition n = 16 Highly supportive human condition n = 0 Less supportive human condition n = 10 |
Study 1b | n = 47 | Chatbot condition n = 24 Human condition n =14 | Highly supportive chatbot condition n = 1 Less supportive chatbot condition n = 8 Highly supportive human condition n = 1 Less supportive human condition n = 16 |
Study 2 | n = 8 | Chatbot condition n = 8 Human condition n = 12 | Highly supportive chatbot condition n = 8 Less supportive human condition n = 28 |
Study 3 | n = 54 | Chatbot condition n = 27 Human condition n = 9 | No exclusions |
Measures
To assess the effects of each interaction, we measured their feelings of rapport with their conversation partner, as well as their overall feelings of social connection. We also assessed participants’ positive feelings about the experience they shared, as well as their overall positive mood. In Studies 1a and 1b, we pre-registered the same measures of rapport, social connection, and feelings about the positive experience. However, in Study 1a we pre-registered the Positive and Negative Affect Schedule (PANAS; Watson et al., 1988) as our measure of positive mood. In Study 1b, we used the Scale of Positive and Negative Experiences (SPANE; Diener et al., 2009), because it consisted of more relevant positive emotions than the PANAS (see Table 2 for information on all the measures used in the present studies). Additionally, in Study 1b, we included a measure of perceived agency to assess whether participants’ perceptions of agency in their conversation partner mediated the benefits they felt from the interaction. While all participants presumably know that a chatbot objectively lacks agency, this measure captured how participants felt about their partner during the interaction. For example, participants indicated the extent to which they felt their partner “had a mind of their own” and “experienced emotions”. We created composite scores for each of the measures by averaging participants’ responses to the items in each measure, unless they answered less than 80% of the items (as pre-registered).
Measure | α (Study 1a) | α (Study 1b) | α (Study 2) | α (Study 3) | # of items | Sample item | Source |
Rapport (pre-registered, all studies) | .96 | .97 | .97 | .96 | 15 | I felt _____ toward my partner.a | *Masked*, in prep |
Social connection, single-item (pre-registered: Study 2) | 1 | Compared to how socially connected you normally feel, how socially connected do you feel right now? | Created for study | ||||
Social connection, multi-item (pre-registered: Studies 1a&b and 3) | .94 | .95 | .94 | 11 | I had a sense of belonging.b | Lok & Dunn, 2023 | |
Positive affect: SPANE (pre-registered: Studies 1b, 2, and 3) | .95 | .94 | .94 | 6 | Joyful.c | Diener et al., 2009 | |
Positive affect: PANAS (pre-registered: Study 1a) | .90 | Enthusiasticc | Watson et al., 1988 | ||||
Feelings about experience (pre-registered: Studies 1-2) | .89 | .90 | .89 | 5 | How excited are you about the experience you identified? d | Created for present study | |
Perceived Agency (Pre-registered: Study 1b Exploratory: Study 2 & 3) | .78 | .93 | .88 | 4 | I felt like my conversation partner had free wille | Epley et al., 2008 |
Measure | α (Study 1a) | α (Study 1b) | α (Study 2) | α (Study 3) | # of items | Sample item | Source |
Rapport (pre-registered, all studies) | .96 | .97 | .97 | .96 | 15 | I felt _____ toward my partner.a | *Masked*, in prep |
Social connection, single-item (pre-registered: Study 2) | 1 | Compared to how socially connected you normally feel, how socially connected do you feel right now? | Created for study | ||||
Social connection, multi-item (pre-registered: Studies 1a&b and 3) | .94 | .95 | .94 | 11 | I had a sense of belonging.b | Lok & Dunn, 2023 | |
Positive affect: SPANE (pre-registered: Studies 1b, 2, and 3) | .95 | .94 | .94 | 6 | Joyful.c | Diener et al., 2009 | |
Positive affect: PANAS (pre-registered: Study 1a) | .90 | Enthusiasticc | Watson et al., 1988 | ||||
Feelings about experience (pre-registered: Studies 1-2) | .89 | .90 | .89 | 5 | How excited are you about the experience you identified? d | Created for present study | |
Perceived Agency (Pre-registered: Study 1b Exploratory: Study 2 & 3) | .78 | .93 | .88 | 4 | I felt like my conversation partner had free wille | Epley et al., 2008 |
Note:a = “very out of sync” (1) to “very in sync” (6); b= “strongly disagree” (1) to “strongly agree” (7); c = “not at all” (1) to “extremely” (5); d = “not at all excited” (1) to “extremely excited” (10); e=“Strongly disagree” (1) to “Strongly agree” (6). Values reported in columns 2-5 are Cronbach’s alpha.
Analytic Strategy
For both Studies 1a and 1b, we pre-registered four 2 (Response style) X 2 (Conversation Partner) ANOVA’s for each of our measures of rapport, social connection, positive mood, and feelings about the positive experience. In addition, we pre-registered comparisons between the highly supportive chatbot condition and the less supportive human condition on each of our four pre-registered outcome variables in each study. This comparison allowed us to test our hypothesis that people could receive more benefits from a highly supportive bot compared to a less supportive human. In Study 1b, we additionally pre-registered mediation analyses, assessing whether perceived agency mediated/suppressed any of the benefits of interacting with a human vs. a chatbot.
Study 1a Results
Pre-registered Analyses
First, we examined the main effects of response style and conversation partner to test whether either of these factors affected how participants felt after the interactions. There was a main effect of response style for three of our four pre-registered outcome variables; participants who interacted with a highly supportive (vs. less supportive) partner reported higher levels of rapport and connection and more positive feelings about the experience they shared (see Table 3). However, there was no main effect of response style for positive mood, suggesting that the PANAS was not sensitive to our response style manipulation. There was no main effect of conversation partner for any of our outcome variables, suggesting that participants who believed they interacted with a chatbot (vs. a human) derived similar emotional benefits from the conversations (see Table 4).
Note: Error bars = standard error of the mean
Note: Error bars = standard error of the mean
Mean (SD) | |||||||
More supportive condition (n = 211) | Less supportive condition (n = 189) | F | p | Cohen’s d | 95% CI | ||
Rapport | 4.67 (0.86) | 3.91 (1.03) | F(1,396) = 64.056 | < .001 | 0.81 | [0.60, 1.02] | |
Social connection | 5.19 (1.00) | 4.67 (1.25) | F(1,396) = 21.646 | < .001 | 0.46 | [0.26, 0.66] | |
Positive affect | 2.96 (0.91) | 2.85 (0.89) | F(1,396) = 1.339 | .248 | 0.12 | [-0.08, 0.31] | |
Feelings about positive experience | 6.47 (2.03) | 6.02 (2.06) | F(1,396) = 4.813 | .029 | 0.22 | [0.03, 0.42] |
Mean (SD) | |||||||
More supportive condition (n = 211) | Less supportive condition (n = 189) | F | p | Cohen’s d | 95% CI | ||
Rapport | 4.67 (0.86) | 3.91 (1.03) | F(1,396) = 64.056 | < .001 | 0.81 | [0.60, 1.02] | |
Social connection | 5.19 (1.00) | 4.67 (1.25) | F(1,396) = 21.646 | < .001 | 0.46 | [0.26, 0.66] | |
Positive affect | 2.96 (0.91) | 2.85 (0.89) | F(1,396) = 1.339 | .248 | 0.12 | [-0.08, 0.31] | |
Feelings about positive experience | 6.47 (2.03) | 6.02 (2.06) | F(1,396) = 4.813 | .029 | 0.22 | [0.03, 0.42] |
Mean (SD) | ||||||
Chatbot condition (n = 198) | Human Condition (n = 202) | F | p | Cohen’s d | 95% CI | |
Rapport | 4.22 (1.07) | 4.41 (0.94) | F(1,396) = 2.486 | .116 | 0.19 | [-0.01, 0.38] |
Social connection | 4.97 (1.19) | 4.92(1.12) | F(1,396) = 0.563 | .454 | 0.05 | [-0.15, 0.24] |
Positive affect | 2.91 (0.91) | 2.90 (0.89) | F(1,396) = 0.025 | .875 | 0.01 | [-0.19, 0.21] |
Feelings about positive experience | 6.19 (2.01) | 6.32 (2.10) | F(1,396) = 0.278 | .598 | 0.06 | [-0.13, 0.26] |
Mean (SD) | ||||||
Chatbot condition (n = 198) | Human Condition (n = 202) | F | p | Cohen’s d | 95% CI | |
Rapport | 4.22 (1.07) | 4.41 (0.94) | F(1,396) = 2.486 | .116 | 0.19 | [-0.01, 0.38] |
Social connection | 4.97 (1.19) | 4.92(1.12) | F(1,396) = 0.563 | .454 | 0.05 | [-0.15, 0.24] |
Positive affect | 2.91 (0.91) | 2.90 (0.89) | F(1,396) = 0.025 | .875 | 0.01 | [-0.19, 0.21] |
Feelings about positive experience | 6.19 (2.01) | 6.32 (2.10) | F(1,396) = 0.278 | .598 | 0.06 | [-0.13, 0.26] |
Next, we assessed whether the effects of response style were influenced by whether participants believed they were interacting with either a chatbot or human. As such, we examined the Conversation Partner X Response Style interaction effect in each of our four 2X2 ANOVA’s. There was no significant Partner X Response Style interaction for rapport F(1,396) = 0.002, p =.966, social connection, F(1,396) = 0.000, p =.987, feelings about the experience, F(1,396) = 0.100, p =.752, or positive mood F(1,396) = 0.277, p =.599. These results suggest participants derived similar feelings of rapport, social connection and positive feelings from a highly supportive or less supportive partner, regardless of whether that partner was a chatbot or human.
Lastly, our planned contrasts showed that participants who interacted with a highly supportive chatbot reported significantly higher levels of rapport and social connection than those who interacted with a less supportive human (see Table 5). There were no significant differences in positive mood or feelings about the positive experience they shared, however.
Mean (SD) | ||||||
More supportive chatbot condition (n = 99) | Less supportive human condition (n = 90) | F | p | Cohen’s d | 95% CI | |
Rapport | 4.60 (0.84) | 3.99 (0.87) | F(1,396) = 19.540 | < .001 | 0.71 | [0.41, 1.01] |
Social connection | 5.24 (0.99) | 4.62 (1.18) | F(1,396) = 13.823 | < .001 | 0.56 | [0.27, 0.86] |
Positive affect | 2.94 (0.93) | 2.82 (0.89) | F(1,396) = 0.818 | .366 | 0.13 | [-0.15, 0.42] |
Feelings about positive experience | 6.38 (1.94) | 6.04 (2.05) | F(1,396) = 1.313 | .253 | 0.17 | [-0.11, 0.46] |
Mean (SD) | ||||||
More supportive chatbot condition (n = 99) | Less supportive human condition (n = 90) | F | p | Cohen’s d | 95% CI | |
Rapport | 4.60 (0.84) | 3.99 (0.87) | F(1,396) = 19.540 | < .001 | 0.71 | [0.41, 1.01] |
Social connection | 5.24 (0.99) | 4.62 (1.18) | F(1,396) = 13.823 | < .001 | 0.56 | [0.27, 0.86] |
Positive affect | 2.94 (0.93) | 2.82 (0.89) | F(1,396) = 0.818 | .366 | 0.13 | [-0.15, 0.42] |
Feelings about positive experience | 6.38 (1.94) | 6.04 (2.05) | F(1,396) = 1.313 | .253 | 0.17 | [-0.11, 0.46] |
Exploratory Analyses
Due to unequal condition sizes and homogeneity of variance (HOV) violations, we conducted exploratory robust ANOVA’s on the rapport and social connection variables (the two variables that violated HOV assumptions). These analyses are presented in the supplemental online material (SOM) and were broadly consistent with the pre-registered analyses. We also found consistent results when we conducted intent to treat analyses that included the 27 participants who failed our response style manipulation check (see SOM).
Study 1b Results
Pre-registered Analyses
As in Study 1a, there were significant main effects of response style for three of our four outcome variables (See Table 6). Participants who interacted with a highly supportive partner felt more rapport, social connection, and positive mood than those who interacted with a less supportive partner. However, there were no differences between the two conditions in feelings about positive experiences. Similar to Study 1a, there was no main effect of conversation partner (see Table 7).
Mean (SD) | ||||||
More supportive condition (n = 210) | Less supportive condition (n =191) | F | p | Cohen’s d | 95% CI | |
Rapport | 4.92 (0.90) | 3.85 (1.27) | 95.98 | <.001 | 0.98 | [0.76, 1.19] |
Social connection | 5.11 (1.18) | 4.71 (1.22) | 11.35 | <.001 | 0.34 | [0.14, 0.53] |
Feelings about positive experience | 6.88 (1.99) | 6.60 (2.00) | 2.01 | .157 | 0.14 | [-0.06, 0.34] |
Positive affect | 3.75 (0.98) | 3.27 (1.10) | 22.38 | < 001 | 0.47 | [0.27, 0.67] |
Mean (SD) | ||||||
More supportive condition (n = 210) | Less supportive condition (n =191) | F | p | Cohen’s d | 95% CI | |
Rapport | 4.92 (0.90) | 3.85 (1.27) | 95.98 | <.001 | 0.98 | [0.76, 1.19] |
Social connection | 5.11 (1.18) | 4.71 (1.22) | 11.35 | <.001 | 0.34 | [0.14, 0.53] |
Feelings about positive experience | 6.88 (1.99) | 6.60 (2.00) | 2.01 | .157 | 0.14 | [-0.06, 0.34] |
Positive affect | 3.75 (0.98) | 3.27 (1.10) | 22.38 | < 001 | 0.47 | [0.27, 0.67] |
Mean (SD) | ||||||
Chatbot condition (n = 205) | Human Condition (n = 196) | F | p | Cohen’s d | 95% CI | |
Rapport | 4.38 (1.26) | 4.45 (1.16) | 0.65 | .419 | 0.06 | [-0.14, 0.25] |
Social connection | 4.89 (1.22) | 4.95 (1.21) | 0.32 | .573 | 0.05 | [-0.15, 0.25] |
Feelings about positive experience | 6.76 (1.98) | 6.72 (2.02) | 0.03 | .872 | 0.02 | [-0.18, 0.21] |
Positive affect | 3.47 (1.10) | 3.57 (1.03) | 1.17 | .281 | 0.10 | [-.10, 0.29] |
Mean (SD) | ||||||
Chatbot condition (n = 205) | Human Condition (n = 196) | F | p | Cohen’s d | 95% CI | |
Rapport | 4.38 (1.26) | 4.45 (1.16) | 0.65 | .419 | 0.06 | [-0.14, 0.25] |
Social connection | 4.89 (1.22) | 4.95 (1.21) | 0.32 | .573 | 0.05 | [-0.15, 0.25] |
Feelings about positive experience | 6.76 (1.98) | 6.72 (2.02) | 0.03 | .872 | 0.02 | [-0.18, 0.21] |
Positive affect | 3.47 (1.10) | 3.57 (1.03) | 1.17 | .281 | 0.10 | [-.10, 0.29] |
There was no significant Partner X Response Style interaction for feelings of rapport F(1,397) = 1.85, p =.174 or positive mood F(1,397) = 1.76, p =.185. However, there was a significant interaction for thoughts about the positive experience F(1,397) = 8.65, p =.003 and social connection, F(1,396) = 4.95, p =.027 (see Figure 3). To illuminate the nature of the two significant interaction effects, we conducted simple effects analyses comparing the chatbot and human partners within the highly supportive and less supportive conditions. Participants who interacted with a highly supportive human (vs. highly supportive chatbot) felt marginally better about their positive experience (p = .056, d = .26) and had marginally higher levels of social connection (p = .052, d = .27). Participants who interacted with a less supportive human (vs. less supportive bot) felt significantly worse about their positive experience (p = .026, d = 0.32), but not significantly less socially connected (p = .22, d = 0.17). Thus, interacting with a highly supportive partner was marginally more rewarding when the partner was a human (vs. a bot), but interacting with a less supportive partner was also significantly more detrimental when the partner was a human (vs. a bot). These results suggest that people’s feelings may be influenced more strongly—both for better and for worse—by a human than by a bot.
Note: Error bars = standard error of the mean.
Note: Error bars = standard error of the mean.
Based on the results of Study 1a, we pre-registered 4 one-tailed t-tests to assess whether participants benefited more from interacting with a highly supportive bot compared to a less supportive human. One-tailed t-tests are an effective way to increase statistical power, and pre-registering one-tailed t-tests is recommended when researchers are making directional predictions (Lakens, 2016). Our results revealed that participants who interacted with a highly supportive chatbot reported significantly higher levels of rapport (p \< .001), social connection (p =.025), and more positive mood (p = .005) than those who interacted with a less supportive human. There were no significant differences for participants’ feelings about their positive experience (p = .11; see Table 8).
Outcome Variable | Mean (SD) | t | df | p | Cohen’s D | 95% CI | |
More supportive chatbot (n = 109) | Less supportive human (n = 95) | ||||||
Rapport | 4.81 (0.96) | 3.82 (1.16) | 6.59 | 183.51 | <.001 | 0.94 | [0.63, 1.24] |
Social connection | 4.96 (1.28) | 4.60 (1.27) | 1.97 | 198.01 | .025 | 0.28 | [-0.01, 0.56]a |
Positive affect | 3.64 (1.05) | 3.25 (1.07) | 2.62 | 197.31 | .005 | 0.37 | [0.08, 0.65] |
Feelings about positive experience | 6.63 (2.05) | 6.27 (2.07) | 1.22 | 197.72 | .112 | 0.17 | [-0.11, 0.45] |
Outcome Variable | Mean (SD) | t | df | p | Cohen’s D | 95% CI | |
More supportive chatbot (n = 109) | Less supportive human (n = 95) | ||||||
Rapport | 4.81 (0.96) | 3.82 (1.16) | 6.59 | 183.51 | <.001 | 0.94 | [0.63, 1.24] |
Social connection | 4.96 (1.28) | 4.60 (1.27) | 1.97 | 198.01 | .025 | 0.28 | [-0.01, 0.56]a |
Positive affect | 3.64 (1.05) | 3.25 (1.07) | 2.62 | 197.31 | .005 | 0.37 | [0.08, 0.65] |
Feelings about positive experience | 6.63 (2.05) | 6.27 (2.07) | 1.22 | 197.72 | .112 | 0.17 | [-0.11, 0.45] |
Note:a= t-test is one-tailed; as such, the 95% CI includes 0, even though the t-test is significant
Perceived agency. We were interested in whether perceptions of agency mediated/suppressed any of the benefits participants derived from talking to a human vs. a chatbot. In service of this, we first examined whether people perceived different levels of agency in the chatbot and human conditions. Notably, participants did not perceive significantly lower levels of agency in a chatbot (M = 4.46; SD = 1.23) compared to a human conversation partner (M = 4.58; SD = 1.16), t(398.93) = 0.99, p = .322 (two-tailed). It appears the human capacity for anthropomorphism enabled them to perceive agency in a chatbot who responded in a human-like manner. Because there was no difference between conditions in perceived agency, this variable did not mediate the effects of conversation partner (chatbot vs human) for any of our pre-registered outcome variables (p’s > .26), even though perceived agency was positively correlated with all of our outcome variables (see Table 9). However, participants interacting with a highly supportive partner did perceive marginally higher levels of agency (M = 4.62; SD = 1.26) in their conversation partner than those who interacted with a less supportive partner (M = 4.40, SD = 1.11), t(398.55) =1.82, p = .069 (two-tailed). As such, perceptions of agency may be more influenced by response style than by the humanity of one’s partner.
1 | 2 | 3 | 4 | 5 | |
1. Perceived agency | 1 | ||||
2. Rapport | .45a | 1 | |||
3. Social connection | .34a | .61a | 1 | ||
4. Positive affect | .26a | .59a | .62a | 1 | |
5. Feelings about positive experience | .27a | .32a | .43a | .52a | 1 |
1 | 2 | 3 | 4 | 5 | |
1. Perceived agency | 1 | ||||
2. Rapport | .45a | 1 | |||
3. Social connection | .34a | .61a | 1 | ||
4. Positive affect | .26a | .59a | .62a | 1 | |
5. Feelings about positive experience | .27a | .32a | .43a | .52a | 1 |
Note:a= p < .001
Exploratory Analyses
As in Study 1a, due to HOV violations with our rapport variable, we conducted exploratory robust analyses (see SOM), which are consistent with the pre-registered analyses presented above. We also found consistent results for all our variables when conducting intent to treat analyses that included the 26 participants who were excluded from the pre-registered analyses due to failing the response style manipulation check.
Studies 1a & 1b Discussion
Studies 1a and 1b suggest that the benefits of a conversation (e.g., feelings of rapport and social connection) depend more on the response style of one’s conversation partner than on their humanity. In both studies, participants who interacted with a highly supportive chatbot felt more rapport and social connection than those who interacted with a less supportive human. In Study 1a, we controlled for the possibility that people may interact differently with a chatbot by only telling participants they had interacted with a chatbot after the conversation had ended. Of course, participants may have been reluctant to update their impressions of the conversation given that they initially believed they were talking to a human. To eliminate this problem in Study 1b, we told participants they were talking to a chatbot (or human) prior to the conversation. Thus, taken together, the two studies provide complementary evidence that people can reap social connection, rapport, and positive feelings from interactions with chatbots. While Studies 1a and 1b were conducted prior to the release of ChatGPT, in Study 2 we harnessed this new AI to test whether having a supportive conversation with ChatGPT would be more rewarding than interacting with a less supportive human.
Study 2
Method
Experimental Design
The design of Study 2 was similar to Study 1, except we included only the highly supportive chatbot condition and less supportive human condition (rather than the full 2X2 design). It was not possible to include a less supportive chatbot condition, because ChatGPT is pre-programmed to avoid unsupportive responses.
In the highly supportive chatbot condition, participants interacted with ChatGPT, but a research assistant mediated this interaction by copying the participants’ messages, pasting them into a conversation with ChatGPT, and then sending ChatGPT’s response. We did this to ensure that participants in both conditions used the same chat platform. Prior to every interaction with ChatGPT, we prompted ChatGPT to respond in a highly supportive manner similar to the response styles used in Studies 1a and 1b (see SOM for more details). Importantly, even though research assistants were mediating the interaction between the participants and ChatGPT, the research assistants did not alter ChatGPT’s responses. In the less supportive human condition, participants interacted with a research assistant trained to respond in a less supportive manner, as in Studies 1a and 1b.
Procedure
As in Study 1b, participants were told they would be interacting with a chatbot (chatbot condition) or a fellow participant (human condition) immediately prior to the text interaction. In both conditions, participants interacted through text with their conversation partner using the same chat platform as in Study 1. After the interaction, participants completed our outcome measures of interest.
Pre-registration
The pre-registered sample size target, analysis plan, and exclusion criteria for Study 2 are available on the OSF at https://tinyurl.com/mt4snpvt.
Sample
We recruited adults living in the United States or Canada from Prolific Academic (in exchange for \$2.67 USD). Because Study 2 only contained two of the four conditions used in Studies 1a and 1b, we pre-registered a sample size half as large (N = 200). Our final sample consisted of N = 201 participants (Age: M = 37.71, SD = 11.68; 33% women). This gave us 80% power to detect an effect of d = 0.35.
Exclusion criteria
Consistent with Study 1, we excluded participant based on our pre-registered criteria. The most common reason for exclusion was that participants did not perceive the less supportive response style as being unsupportive (see Table 1).
Measures
We pre-registered the same measures as Study 1b, except we used a single-item measure of social connection (see Table 2), and we did not pre-register any analyses involving perceived agency (see SOM for exploratory analyses examining agency).
Results
In contrast to the previous studies, the chatbot condition in Study 2 required research assistants to copy and paste messages between the conversation platform and ChatGPT. As such, we tested whether this caused a delay in how quickly research assistants were able to respond to participants in the chatbot condition relative to the human condition. Interestingly, research assistants responded to participants faster in the chatbot condition (Mean delay = 22.37 seconds; SD = 11.06 seconds) than in the human condition (Mean delay = 31.48 second; SD = 20.63 seconds). This difference represents a potential benefit of chatbots in that they can respond to messages more quickly than human counterparts. However, it is also worth noting that there were no significant differences in the number of messages participants sent in the chatbot (M = 8.10, SD = 3.39) and human conditions (M = 8.71; SD = 3.32), t(190.31) = 1.28, p = 0.20.
Pre-registered analyses
We conducted 4 pre-registered one-tailed independent samples t-tests to assess whether participants who interacted with the highly supportive chatbot derived more feelings of rapport, social connection, and positive feelings than those who interacted with the less supportive human. Participants who interacted with a highly supportive chatbot felt significantly higher levels of rapport (p \< .001) and social connection (p = .047; see Table 10). However, there were no significant differences between the two groups in positive mood (p = .140) or feelings about the positive experience (p = .200).
Outcome Variable | Mean (SD) | t | df | p | Cohen’s D | 95% CI | |
More supportive chatbot (n = 111) | Less supportive human (n = 90) | ||||||
Rapport | 4.86 (0.93) | 4.13 (1.17) | 4.86 | 167.76 | <.001 | 0.71 | [0.42, 0.99] |
Social connection | 1.32 (2.11) | 0.84 (1.87) | 1.68 | 197.41 | .047 | 0.24 | [-0.04, 0.51]a |
Positive affect | 3.56 (1.12) | 3.39 (1.00) | 1.08 | 197.24 | .140 | 0.15 | [-0.13, 0.43] |
Feelings about positive experience | 6.56 (2.12) | 6.32 (1.95) | 0.84 | 195.84 | .200 | 0.12 | [-0.16, 0.40] |
Outcome Variable | Mean (SD) | t | df | p | Cohen’s D | 95% CI | |
More supportive chatbot (n = 111) | Less supportive human (n = 90) | ||||||
Rapport | 4.86 (0.93) | 4.13 (1.17) | 4.86 | 167.76 | <.001 | 0.71 | [0.42, 0.99] |
Social connection | 1.32 (2.11) | 0.84 (1.87) | 1.68 | 197.41 | .047 | 0.24 | [-0.04, 0.51]a |
Positive affect | 3.56 (1.12) | 3.39 (1.00) | 1.08 | 197.24 | .140 | 0.15 | [-0.13, 0.43] |
Feelings about positive experience | 6.56 (2.12) | 6.32 (1.95) | 0.84 | 195.84 | .200 | 0.12 | [-0.16, 0.40] |
Note: t-test is one sided; as such, the 95% CI includes zero despite the overall effect being statistically significant.
Study 2 Discussion
The results of Study 2 suggest that existing chatbot technology can be harnessed to produce supportive responses, yielding greater feelings of rapport and social connection than people would get from interacting with less supportive humans.
Study 3
The conversations we examined in Studies 1-2 may have been well-suited for observing positive effects of interacting with a chatbot, in that only the participants engaged in self-disclosure. In contrast, the benefits of chatbots may be limited in other forms of conversation that require the chatbot to share information about itself. According to classic theorizing, robots that exhibit too much humanity may fall into an “uncanny valley,” generating unfavorable reactions compared to less humanlike robots (Mori, 1970; Mori et al., 2012; see Wang et al., 2015 for a review). According to Gray and Wegner’s (2012) mind perception hypothesis, humanlike robots provoke uncanny feelings and reduced affinity because people perceive subjective experience in the robots, while at the same time recognizing that such subjective experience is impossible. Thus, in Study 3, we tested whether the benefits of interacting with a chatbot (vs. human) would be reduced when the chatbot shared its own positive news.
Experimental Design
This study used a 2 (Conversation Type) X 2 (Conversation Partner) design. Participants either shared a positive experience from their life with a partner (sharer condition) or listened to their partner share positive news (listener condition). Additionally, participants were told they would be interacting with either a chatbot or a fellow participant. As in Studies 1a & 1b, however, participants always interacted with a trained research assistant1. Thus, we were able to control for conversation quality between conditions, allowing us to isolate the effect of believing one’s partner was a chatbot (vs. a human). When participants shared positive news, their partner always responded in a highly supportive manner (replicating the highly supportive condition in previous studies). In the listener condition, the research assistants shared a positive life event from a pre-specified list, based on real participants’ responses in the previous studies, ensuring the content of the conversations was similar across the sharing and listening conditions. Although it may seem unusual to ask an inanimate chatbot about its daily life, in a survey of existing chatbot users that we conducted, 20% of respondents reported that they “asked their chatbot about its day”, suggesting this is not an uncommon behavior among chatbot users (see Table S1 in the SOM).
Procedure
Similar to previous studies, all participants identified a positive experience from their life, and were told prior to the interaction that they would be interacting with a chatbot (chatbot condition) or a fellow participant (human condition). In addition, participants were instructed to either share and discuss their own positive experience (sharer condition), or to discuss their partner’s positive experience (listener condition). After the interaction, participants completed the outcome measures of interest.
Pre-registration
The pre-registered sample size target, analysis plan, and exclusion criteria for Study 3 are available on the OSF at https://tinyurl.com/53bpw85h.
Sample
We recruited participants from Prolific Academic for \$2.17 USD. We pre-registered the same sample size target (N = 400) as in Study 1, which also used a 2 X 2 design. After exclusions, our final sample size consisted of 401 participants (Age: M = 31.59, SD = 11.78; 64% women). With this sample size, we again had 80% power to detect effect sizes of Cohen’s d = .35 or greater between the individual conditions, and main effects of approximately Cohen’s d = .28 or greater.
Exclusion Criteria
See Table 1 for exclusion details for Study 3. The most common reason participants were excluded stemmed from technical difficulties that prevented participants from having a conversation.
Measures
In contrast to previous studies, we did not pre-register analyses examining feelings about the positive experience, as this measure was not relevant for the listener conditions (because participants did not discuss their own positive news). The list of pre-registered measures for Study 3 can be found in Table 2.
Pre-registered Hypotheses and Analytic Plan
We had two pre-registered hypotheses. Consistent with our previous findings, we expected that participants who shared positive news with a chatbot would not report significantly different levels of rapport, social connection, or positive mood compared to those who shared with a human conversation partner. However, we hypothesized that participants who listened to a chatbot share news with them would reap significantly fewer benefits compared to participants who listened to a human share positive news.
To explore these hypotheses, we conducted 2 (Conversation Type) X 2 (Conversation Partner) ANOVA’s on our three pre-registered dependent variables. However, (as pre-registered) the key tests of our two hypotheses were the comparisons between the human and chatbot conversation partners within each of the listener and sharer conditions. We focused on these two comparisons in our pre-registration as opposed to the overall interaction, given the massive sample sizes required to detect attenuated interactions (e.g., often in the thousands; Blake & Gangstad, 2020). As in previous studies, we pre-registered one-tailed tests to maximize power for our directional predictions.
Results
Pre-registered Analyses
There were no main effects of conversation type, meaning people felt about the same regardless of whether they shared with their partner or listened to their partner share. Similar to Studies 1a and 1b, there were no main effects of conversation partner, meaning people did not feel significantly more rapport, social connection or positive mood from a conversation with a human vs. a chatbot (see Figure 4 and Table 11 & 12). As we anticipated in our pre-registration, the Conversation Type X Conversation Partner interaction did not reach significance for any of our outcome variables (p’s > .22 see Figure 4). Consistent with our pre-registered prediction, participants who shared a positive experience with a chatbot (vs. a human) did not significantly differ in their levels of rapport (p = .845), social connection (p = .560), or positive mood (p = .773; see Table 13). As pre-registered, the key test of our second hypothesis was whether participants listening to a human (vs. a chatbot) share an experience reported greater benefits from the conversation. In line with our prediction, participants who listened to a human share a positive experience felt more rapport (p = .032) and positive affect (p = .041) than those who listened to a chatbot. However, there were no significant differences in social connection (p = .169; see Table 14).
Mean (SD) | ||||||
Listener condition (n = 195) | Sharer condition (n = 206) | F | p | Cohen’s d | 95% CI | |
Rapport | 4.76 (0.96) | 4.80 (0.93) | 0.22 | .637 | 0.04 | [-0.15, 0.24] |
Social connection | 5.14 (1.07) | 5.06 (1.21) | 0.55 | .458 | 0.08 | [-0.12, 0.27] |
Positive affect | 3.55 (0.94) | 3.66 (1.00) | 1.26 | .262 | 0.11 | [-0.09, 0.31] |
Mean (SD) | ||||||
Listener condition (n = 195) | Sharer condition (n = 206) | F | p | Cohen’s d | 95% CI | |
Rapport | 4.76 (0.96) | 4.80 (0.93) | 0.22 | .637 | 0.04 | [-0.15, 0.24] |
Social connection | 5.14 (1.07) | 5.06 (1.21) | 0.55 | .458 | 0.08 | [-0.12, 0.27] |
Positive affect | 3.55 (0.94) | 3.66 (1.00) | 1.26 | .262 | 0.11 | [-0.09, 0.31] |
Mean (SD) | ||||||
Chatbot condition (n = 199) | Human Condition (n = 202) | F | p | Cohen’s d | 95% CI | |
Rapport | 4.72 (0.97) | 4.85 (0.92) | 2.13 | .146 | 0.14 | [-0.05, 0.34] |
Social connection | 5.04 (1.15) | 5.16 (1.14) | 1.14 | .286 | 0.11 | [-0.09, 0.30] |
Positive affect | 3.54 (0.95) | 3.67 (0.99) | 1.92 | .166 | 0.14 | [-0.06, 0.33] |
Mean (SD) | ||||||
Chatbot condition (n = 199) | Human Condition (n = 202) | F | p | Cohen’s d | 95% CI | |
Rapport | 4.72 (0.97) | 4.85 (0.92) | 2.13 | .146 | 0.14 | [-0.05, 0.34] |
Social connection | 5.04 (1.15) | 5.16 (1.14) | 1.14 | .286 | 0.11 | [-0.09, 0.30] |
Positive affect | 3.54 (0.95) | 3.67 (0.99) | 1.92 | .166 | 0.14 | [-0.06, 0.33] |
Note: Error bars = standard error of the mean
Note: Error bars = standard error of the mean
Outcome Variable | Mean (SD) | t | df | p | Cohen’s D | 95% CI | |
Sharing with chatbot (n = 104) | Sharing with human (n = 102) | ||||||
Rapport | 4.79 (0.95) | 4.82 (0.91) | 0.20 | 203.89 | .845 | 0.03 | [-0.25, 0.30] |
Social connection | 5.01 (1.23) | 5.11 (1.19) | 0.58 | 203.98 | .560 | 0.08 | [-0.19, 0.35] |
Positive affect | 3.64 (0.97) | 3.68 (1.04) | 0.29 | 202.36 | .773 | 0.04 | [-0.23, 0.31] |
Outcome Variable | Mean (SD) | t | df | p | Cohen’s D | 95% CI | |
Sharing with chatbot (n = 104) | Sharing with human (n = 102) | ||||||
Rapport | 4.79 (0.95) | 4.82 (0.91) | 0.20 | 203.89 | .845 | 0.03 | [-0.25, 0.30] |
Social connection | 5.01 (1.23) | 5.11 (1.19) | 0.58 | 203.98 | .560 | 0.08 | [-0.19, 0.35] |
Positive affect | 3.64 (0.97) | 3.68 (1.04) | 0.29 | 202.36 | .773 | 0.04 | [-0.23, 0.31] |
Outcome Variable | Mean (SD) | t | df | p | Cohen’s D | 95% CI | |
Listening to chatbot (n = 95) | Listening to human (n = 100) | ||||||
Rapport | 4.63 (0.98) | 4.89 (0.93) | 1.87 | 191.08 | .032 | 0.27 | [-0.02, 0.55] |
Social connection | 5.07 (1.07) | 5.22 (1.08) | 0.96 | 192.73 | .169 | 0.14 | [-0.14, 0.42] |
Positive affect | 3.44 (0.92) | 3.67 (0.94) | 1.75 | 192.74 | .041 | 0.25 | [-0.03, 0.53] |
Outcome Variable | Mean (SD) | t | df | p | Cohen’s D | 95% CI | |
Listening to chatbot (n = 95) | Listening to human (n = 100) | ||||||
Rapport | 4.63 (0.98) | 4.89 (0.93) | 1.87 | 191.08 | .032 | 0.27 | [-0.02, 0.55] |
Social connection | 5.07 (1.07) | 5.22 (1.08) | 0.96 | 192.73 | .169 | 0.14 | [-0.14, 0.42] |
Positive affect | 3.44 (0.92) | 3.67 (0.94) | 1.75 | 192.74 | .041 | 0.25 | [-0.03, 0.53] |
Note:a= t-test is one-tailed; as such, the 95% CI includes 0, even though the t-test is significant
Study 3 Discussion
Study 3 highlighted an important boundary condition for the benefits of interacting with chatbots compared to human conversation partners. Although participants who shared positive news derived similar benefits from interacting with what they thought was a chatbot or a human (replicating our earlier studies), they derived significantly fewer benefits from listening to the chatbot (vs. human) share positive news.
General Discussion
The present research suggests that people can potentially derive benefits from interacting with a partner who they know to be inanimate. In Studies 1a & 1b, participants who believed their conversation partner was a chatbot felt similar levels of social connection, rapport and positive feelings compared to those who believed their partner was human. In Study 2, participants found a supportive interaction with ChatGPT more rewarding than a less supportive interaction with a human. At the same time, the results of Study 3 suggest that when chatbots “go too far” in their human-like behavior by sharing their own good news, the benefits of such interactions are no longer comparable to human counterparts.
With the rise of generative AI, scholars have debated whether chatbots can provide a viable source of social support. For example, Perry (2023) argued that “AI can learn to say the right words—but knowing that AI generated them demolishes any potential for sensing that joy or pain is genuinely being shared” (p. 1808). The findings here cast doubt on this assumption that simply knowing one is talking to an AI precludes any potential for connection and closeness. Across our studies, the response style of one’s conversation partner was far more impactful than whether people believed they were talking to a chatbot or a human. Indeed, people felt more socially connected to a supportive version of ChatGPT than a less supportive human partner.
Of course, it would not be surprising if a chatbot was a better conversation partner than a highly toxic human counterpart. Importantly, however, research assistants in our “less supportive” condition were not trained to behave in a toxic manner; even in this condition, research assistants responded to participants’ good news in a highly engaged manner. Indeed, across studies, participants in the “less supportive” conditions reported experiencing levels of rapport that were above the midpoint of the scale.
While we assume that participants objectively understood that chatbots lack conscious experience, they apparently still felt as though their chatbot had a mind. Given the variety of research documenting human’s propensity to anthropomorphize (e.g., Epley et al., 2008; Heider & Simmel, 1944), it should perhaps be no surprise that human beings readily perceive agency in a chatbot who responds to their good news with interest and personalized engagement. Consistent with the mind perception hypothesis (K. Gray & Wegner, 2012; Wang et al., 2015), however, chatbots were less satisfying partners when they shared their own news and created an obvious violation of their inherent lack of experience (Study 3). The findings from Study 3 are relevant in thinking about how design choices could impact the well-being of chatbot users. A growing body of research has examined how design choices for digital agents can influence users’ preferences for such technology, but this work has tended to focus on customer service contexts. Such research suggests customers prefer to use more human-like (vs. less human-like) customer service chatbots (see Blut et al., 2021 for review). In contrast, the results of our Study 3 suggest that there may be a limit to how human-like users want their social chatbots to act.
Importantly, our findings do not suggest that people should seek out a supportive chatbot over a supportive human. Just as ChatGPT may be more helpful for a “C” student than an “A” student in writing an essay, digital companionship may have more to offer for those with unsupportive (vs. supportive) friends and family. Given that nearly 25% of American adults reported having no one in their lives to discuss important matters with (McPherson et al., 2006), chatbots could fill an important social niche. Indeed, when we surveyed 208 social chatbot users (see SOM), over 70% indicated that their chatbot was a meaningful source of social connection in their lives.
Interestingly, the present research points to the possibility that the emotional impact of a conversation—positive or negative—might be greater when one’s partner is a human compared to a chatbot. In Study 1b, participants felt marginally better about their positive experience (p =.056) when they shared it with a supportive human (vs. supportive chatbot); conversely, they felt significantly worse about the experience when they shared it with a less supportive human (vs. less supportive chatbot). This finding should be interpreted as preliminary, but is ripe for future research.
The present research represents a starting point and is marked by several key limitations. In particular, because we recruited participants from Prolific Academic and our university’s subject pool, they might have been more comfortable with technology than the average person, enabling them to connect more easily with chatbots. Future work should assess whether the findings here generalize to a broader sample of participants with a variety of experiences with technology. Future research should also further explore the benefits of chatbots by utilizing more ecologically valid control conditions. In the present set of studies, participants in the human conditions had brief text interactions with complete strangers. Although this comparison provided tight experimental control—enabling us to vary only the humanity of participants’ conversation partners—text interactions with strangers may be rare in daily life. Interestingly, Drouin et al. (2022) recently compared the benefits of face-to-face interactions with strangers versus text conversations with the chatbot Replika. In line with our findings here, participants who interacted with a stranger face-to-face did not significantly differ in their level of positive mood compared to those who had a text interaction with Replika. Participants who had a face-to-face conversation did, however, feel significantly more negative emotions than those who interacted with the chatbot. Still, it is an open question how interactions with chatbots would compare to more enjoyable social interactions, such as conversations with close friends.
The long-term consequences of social chatbot use are also largely unknown, and our studies only speak to the immediate effects of interacting with chatbots. Past research suggests individuals can reap mental health benefits from repeated interactions with therapy chatbots (Fitzpatrick et al., 2017; Inkster et al., 2018; Oh et al., 2020), but these chatbots were designed to implement therapeutic interventions as opposed to simply providing social companionship. While Studies 1 and 2 suggest that chatbots could meaningfully satisfy the need to belong in the short-term (Baumeister & Leary, 1995), our findings in Study 3 show that chatbots may only be effective social companions when the conversations are relatively one-sided, precluding the kind of mutual self-disclosure that contributes to long-term relationship development (Greene et al., 2006). Nevertheless, social chatbots could still be a useful “social snack” (Gardner et al., 2005) that provides momentary hits of social connection in times of loneliness (Krämer et al., 2018). Alternatively, they may simply be “empty calories” that offer shallow social satiation while preventing people from developing human relationships that provide deeper satisfaction. Given the speed at which chatbots have become ubiquitous in our lives, there is an urgent need for more research on this new form of social relationship.
Contributions
Contributed to conception and design: DF, SY, EW
Contributed to acquisition of data: DF, SY, EW
Contributed to analysis and interpretation of data: DF, EW
Drafted and/or revised the article: DF, SY, EW
Approved the submitted version for publication: DF, SY, EW
Funding Information
This work was supported by a grant from the Social Sciences and Humanities Research Council of Canada (grant #: GR012572)
Competing Interests
The authors have no competing interests.
Supplemental Material
Additional analyses and data are included in the supplemental material PDF provided with the manuscript.
Data Availability Statement
The datasets and R code for Studies 1-3 are available at https://tinyurl.com/4rh6kcte. The pre-registrations are available at https://tinyurl.com/y58v4hcn (Study 1a), https://tinyurl.com/tfkhk3cs (Study 1b), https://tinyurl.com/mt4snpvt (Study 2), https://tinyurl.com/53bpw85h (Study 3).
Footnotes
This study was conducted from October to December 2021, prior to the advent of ChatGPT.