The following article gives a short overview of some functions of vocal sounds in video games. The sound of voices contributes to the realization of fictitious game worlds, since it makes the fictitious world appear significantly more real. I briefly cover the atmospheric and emotional function of the sound of voices as well as how they are utilized in video games for supporting characters. In lieu of discussing dialogues and linguistically conveyed information, I focus on the sound of voices and their influence in generating feelings and moods, and thus how they contribute to a deepening of the immersion of the player. These considerations are based on Michel Chion's concept that sound may have an added value—the recipient assigns a special meaning to a sound, which enriches the audiovisual experience. For this purpose, I analyze a number of games in short case studies in regard to their utilization of vocal sounds with added value. The research is further contextualized through Karen Collins's concept of embodied cognition, as discussed in her seminal work Playing with Sound: A Theory of Interacting with Sound and Music in Video Games and “Making Gamers Cry: Mirror Neurons and Embodied Interaction with Game Sound.”
Video games tend not to run linearly but rather are programmed to respond to the player as the action of the gameplay unfolds and locations are encountered within the storyline (among other factors). Sound has always played an important role in video games.1 It gives information and reveals emotions.2
In the audiovisual experience of a game, players are not necessarily focused on the perception of sound. They are playing, and due to the enormous number of stimuli and demands that they are exposed to during the gameplay, their attention is limited. Perception is thus always heuristic,3 given that players are not primarily actively listening but are busy with the gameplay. In spite of this, it has been shown that sound has an effect on passively listening audiences.4
Over time, the complexity of sound in games has increased, often driven by the aim of creating more effective immersion and ultimately promoting a flow state for the user.5 Sounds (music, sound effects, voices, and speech) support the narrative of the game and help to present the game world.6 Players are oriented through sound, and additionally their gameplay is supported by sonic features that can be important for events unfolding within the drama. Sound actively supplements players' actions using auditory feedback, and though what is heard may not always be consciously perceived, it is registered as part of the overall experience of the game. Sound completes the game world with the inclusion of sounds related to movement (steps on different surfaces, engine sounds), environmental sounds (human, animal, or monster sounds; water and weather; natural, artificial, or unidentifiable sounds), combat sounds (various firearms, explosions, swords swooshing through the air or clashing against each other), and much more.7 As American composer John Cage suggests: “I have the feeling that sound is acting, and I love the activity of sound.”8 Certainly games, then, seem filled with the activity of sound. In this article I focus specifically on the sound of voices within gameplay.
Many fictional game worlds are populated by various characters who make up and define a scene and universe. They can inform the atmosphere, support a couleur locale,9 and provide the gamers with pertinent hints relevant to the gameplay. Characters may enter into dialogue with the player's avatar or be heard without any direct contact. Off-screen voices may also be heard.10 Spoken language conveys information, but speech is not the only vehicle for information dissemination in games. Moreover, both tonal and structural features of spoken language possess the same relevance as linguistic semantics.11
When we hear a voice, we always have some idea about the state of the speaker.12 Therefore, in games, radio plays, films, etc., certain stereotypes are used that vocally correspond most closely to the character. In humans, auditory perception begins with the fetus in the womb. In addition to sounds produced by the mother's body such as digestion and heartbeat, voices are among the first sounds a fetus perceives. Even if the meaning of vocal statements cannot be understood, voices are a fundamental sound that surrounds the hearing person from the beginning. Even as a baby and throughout childhood, we are surrounded by voices. But before we learn the meaning of something spoken, we learn to distinguish between different moods that a voice suggests,13 given that vocal sounds are sensorial and may invoke our imaginations.
Voices are information mediators; they allow for a semantic, content-related function—that which is said and conveyed. But they also have an expressive, meaningful function—how the conveyed information is expressed. There is then a play-immanent information level to vocal sound. By this I mean that the two primary questions of gameplay in the auditory domain are: 1) Whose voice can be heard, and from which direction relative to the player? and 2) What does the statement (inclusive of the semantics) mean for the players and their gameplay?
Before I examine particular games as case studies, I will first outline the concepts of added value and embodied cognition and empathy. My examination of voice sound in games/gaming situations is based on these aspects.
In video games, the correct classification or identification of a sound is essential, especially if it indicates an imminent danger. The player most often develops recognition and interpretation of noises and voices during play, since players may not anticipate the sound a given adversary makes before having played the game. During the game, players are presented with auditory hints and sounds that they can later rely on for helpful information. Critical events within the story of the game are often characterized by sounds or music suited to the nature of what is being conveyed. A representative background noise or cue can therefore help players better orientate themselves within the fictitious world. The sound depends on the game, its genre, and its contents, along with their respective acoustic counterparts.14 As Collins suggests, “Immersion can come as much from the physical effects of sound, as from their affects.”15 Sound therefore gives the game an added value,16 because it directly and immediately influences the perception of the fiction and makes it more credible.
Michel Chion suggests that sound provides an “added value” (valeur ajoutée). I want to transfer his idea to sound in games, particularly those in which “the image is the conscious focus of attention, but where sound at every moment brings a number of effects, sensations, and significations.”17 In such audiovisual experiences, one has the impression that the connection between sound and image is natural and plausible. But sound also adds something more: it can help create moods and emotions, and it can convey information about situations, characters, locations, and much more. As Kristine Jørgensen argues, context is important given that “a specific sound cannot be comprehended in isolation, but that the situation in which it is heard always decides the interpretation of the informative content of the sound signal.”18 Moreover, particular sounds may be charged with special meaning and point to something that is not yet visible. This phenomenon of added value allows gameplay to emerge from the necessary relationship between what a player sees and what they hear. Especially if players hear voices in an in-game situation, it is (in many situations) very important to listen to them, because voices can provide not only information relevant to the story but also additional information beyond what has been said, if players hear how something is said and which emotions are conveyed by this voice.
Because sound in a virtual environment may provide important information, special attention has been paid to acoustic stimuli in video game design, such as the use of sound as a warning device to alert the player to the location of adversaries. Kristine Jørgensen has also made a similar statement in her analysis of the battlefield playgrounds in World of Warcraft: “Two situations may be used as illustration: If the player suddenly hears a sound when he knows he is alone, the sound will certainly be alarming by signalling a potential, invisible danger. However, if the avatar is guarding the lumber mill together with a friendly rogue, the sound has a different meaning: It is the positive notification that he has backup in case of an attack.”19 Thus, the reception of the sound depends on the game-internal situation.
Chion's notion of added value can thus be understood as a dynamic form of meaning generation. A sound can induce a specific meaning: for example, an approaching moment of danger. This can be a crucial hint for the player since—as Jørgensen has put it—“game audio provides proactive and reactive information that the players utilize in order to orient themselves, and when evaluating what actions they should take.”20 As acoustic stimuli are received serially, the information and moods they may represent can be processed in real time.
EMBODIED COGNITION AND EMPATHY
The term embodied cognition (embodiment) describes a theory of mental representation that assumes that there is an interaction between cognition and sensory and motor skills, and that this is reflected in the representation of thought processes. Embodied cognition is based on psychological theory, cognitive science, and how our understanding of the (real) world is shaped by our ability to physically interact with it. As such, our perception is always associated with a mental re-enactment of embodied experience.21
But we transfer the rules of reality into virtual spaces too. Karen Collins has presented “two significant means through which scientific research … and philosophical theory … blend together to suggest ways in which the player may be more empathic towards their game character: bodily extension and gestural interaction.”22 In the technology of a game, the embodied cognitive connection to sound is crucial for expanding our self as an extension of our body. Hence, our body expands into the fiction not through the controller but through the game's character.
Perception is thus not a process of imaging sensory stimuli on an inner model of the world but of sensorimotor coordination, which always occurs in the overall concept of an acting being.23 Aki Järvinen has noted that “the magic circle produces a shift in the thematic field of the experience, which simultaneously both magnifies the emotional intensity … and its relation to the sense of reality variable. Therefore, variables concerning emotional intensities should be interpreted from this perspective.”24 Though I will not discuss in more detail here the concept of the “magic circle,”25 Michael Liebe's critique of the concept provides an excellent introduction: “In a computer game everything is programmed, every possible action, every physical simulation, even the boundaries of the virtual space itself. As a result, there is nothing magic about the circle.”26 Liebe's critique also brings to mind the notion of the willing suspension of disbelief27—that is, when players go on an adventure into virtual worlds knowing that these experiences are a game (fiction) but during game play decide to perceive this fiction as the current reality. They enter the fictional world for a while and identify with it so that a type of embodiment becomes possible. Regardless of which concepts we decide to assign to this phenomenon, it is clear that computer games offer the opportunity for immersive experiences.28 But such immersive experiences are not solely dependent on the player mastering the rules of the game or its hardware interface—embodiment is a universal phenomenon. Watching a movie does not presuppose special skills, but playing a video game demands particular skills, and each skill is defined by the game. At this extra-textual level, one can assume that players are used as co-producers.29 As the mastering of the hardware increases, so too does the involvement of the player and the corresponding feeling of immersion.30
The empathy with characters or actions in the virtual world is facilitated by various factors: interest, rules, possibilities, audiovisual representation, and so on. The characters and the gameplay become comprehensible through their audiovisual design and interactive possibilities so that the player can handle them empathetically. Empathy takes place when the viewer “experiences” an act performed by another person/character in some way and does not confine themselves to understanding the action in a purely intellectual way but assigns it to a certain conceptual category.31 Empathy through vocal sound especially arises when feelings are expressed vocally, such as crying or shouting, but can also be perceived in finer nuances of emotionality, given that every interpretation of any in-game situation is subjective. As Jørgensen suggests, one can assume that it is a “situation oriented perspective [that] emphasizes the idea that auditory comprehension is oriented towards interpreting sounds in terms of events instead of in terms of objects.”32
Sound designer Walter Murch distinguishes between “coded sound” such as speech that needs to be decoded and “embodied sound” such as music that is experienced directly.33 But it is also true that a voice not only carries information in terms of language but is able to be physically perceived by the recipient. “The voice … can already say much even when the speaker isn't trying to say anything at all.”34 This arises because the listener has empathy. But a voice also has an added value when the timbre expresses a feeling: “We experience vocal sound in terms of our own embodied experience of similar sounds: we mentally mimic the voice in our own body.”35 A voice is not only heard and cognitively understood, it may also be perceived as compassionate. It can be assumed that the added value of a vocal sound is perceived when the factors of embodiment, flow, and empathy coincide. And the audiovisual experience of a computer game allows the recipient to hear more than just voices—these voices are often accompanied by appropriate music fitting the situation.
WHAT NONVERBAL MEANING CAN VOICES CONVEY IN GAMES?
Listening to the voices of other human beings is an evolved anthropological trait. In early primeval times—that is, before humans possessed the ability to speak—it was possible to communicate through affect sounds and warning cries. This, too, was a necessary form of communication for survival.36 Vocal sounds are not only the sounds of a voice (pitch, timbre, etc.) but also a carrier of emotions, mirror of personality, and reflection of the state of mind of the speaker. The sound of the voice and the way a person/character speaks is something highly personal and individual. In forensic analyses, for example, speech recognition is a method of identifying an individual. In addition to determining a person's identity, the sound of the voice and way of speaking may imply attributes such as gender, education, regional and social origin, age, and state of health, and therefore allow conclusions to be drawn about emotional states.37 Affects transported by the sound of a voice are recognized by game players through the auditory experience and their own perception and empathy. Duncan Williams has similarly noted that “the voice is a powerful tool for emotional communication, especially when paired with affective-congruent music and sound effects.”38 Moreover, language and music are closely linked in the brain. The processing of speech and music occurs on strongly overlapping (sometimes even identical) neural resources,39 though the voice is put into the foreground of our perception, presumably because of its structured expressions.
Our auditory perception is immediate and requires no additional cognitive effort as reading does,40 and it allows players to obtain a variety of information from the sound of a voice. They can grasp the context of actions, distinguish emotional states of the speaking character (this can be apprehended through the expression of a voice41), and also their membership of a specific group (for example, language and dialect denoting class).
What a character communicates depends on its meaning for the game, and quite possibly where and in which situation in the game it occurs. Based on the sound of a voice, the players can draw conclusions about the type of character confronting them—for example, an orc, a witch, or a human. Trained actors are employed to make the voice of computer game characters sound as realistic as possible, so as to transport believable emotions and fit the personality and character type.42 Duncan Williams shows that a well-implemented emotionally realistic voice sound has the power to support immersion and increase engagement of the player.43 The expression of the voice is important because it helps the listener characterize the speaker in the situation, their ethnic, social, or regional group affiliation, their emotional state, and their role within the game. The vocal-articulatory expression not only reflects the personality of an individual person/figure who arbitrarily or involuntarily expresses their feelings, but it also relates to the addressee who stimulates those feelings and associations, thereby activating an interactive process. The sound of a voice and the way it is spoken express a wealth of reliable information about the signal source.44 Thus, a voice sound not only influences the narrative structure of a game but also contributes to the atmosphere of the game and provides a link between the players and the drama. Voices can also personify emotional states and the nearness of adversaries or other characters/players. Though you may not be able to see a character, the quality of the voice can point to the nature of the character as well as their location. This is sometimes exploited through playing with the expectations of the player.
I would like to clarify this with an example. Consider the voices of children (Little Sisters) in the game BioShock (2007). These voices do not spawn mental associations with innocent, peaceful, or harmless people, because the children are monster-like creatures that are protected by lethal fighting machines called Big Daddies. If these child voices are close, they announce a threatening combat situation and thus danger. If the player knows that the child is being protected by a Big Daddy, the child's voice points to an imminent danger. The sound of the voice thus receives the added value: danger. The player knows that they have to prepare for a fight. If the fight starts, it is accompanied by music.45 There is a semantic connection between the narrative of the game and what is heard at a given moment, the affective result46 of which is an interpretation that encourages the player to react accordingly (or to its fictitious source/meaning). Since there are so many different genres and games, some statements can only be substantiated by concrete examples. Following I examine a few possibilities of the functions of voices in four short case studies.
FIRST CASE STUDY: VOICE IN AN AUDIO-ONLY GAME
Sound is especially important in audio-only games, where the whole game—including the menu, story, characters, fictitious game world, and so on—is defined by sound. Players have to pay attention to every detail of sound. Der Tag wird zur Nacht (The Day Becomes the Night) was developed in 2003 at the Stuttgart College of Media. It is an audio-only game for (mainly but not only) blind children, aged 10–14; the main character is a child too. This child lives in ancient Pompeii at the time of the devastating eruption of the volcano Vesuvius. The game's main plot is defined by the darkness in the world caused by the eruption of Vesuvius; the screen remains unused, and the game prompts the players only through sound. After a short auditive introduction of the story, a voice explains the key assignments as well as the goal of the game. In order to bring the character to safety, the player must must enter different rooms within a predetermined timeframe. Both the gameworld and the gameplay are based exclusively on auditory experiences. If there is an obstacle in the way, a corresponding sound is heard; if the player reaches a passage, they hear the sound of an opening door. The virtual sound spaces of this game are complex—it is possible to estimate the size of a room based on the acoustics of the space. Some items send their own acoustic signals, and they can be tracked because their volume changes depending on the distance to the sound source. The player uses the keyboard to control their character, and the direction they take is announced each time by a voice (“turn left”). This makes it absolutely necessary to pay close attention to every auditory event occurring during gameplay.47 The surrounding sound and the voice lead the player through the darkness, and are the only clues that enable the player to advance through the virtual world.
If the player relies solely on their sense of hearing in audio-only gameworlds in order to perceive the spatiality and atmosphere of the virtual world, the media experience becomes highly visceral.48 Without sound, the game world would neither function nor exist. For players who are not blind, it is generally a new experience to “see” and orientate themselves only with their ears, and the familiar sound of a human voice greatly assists players to get through the game.
In this case study, sound has no added value at first. If one applies Chion's idea, sound cannot add value because there is no picture that it can be associated with. Sound, in this example, represents the entire gameworld acoustically. Though there is nothing visual, this game shows some peculiarities of sound, and how it not only leads the player through the darkness but makes it possible for the player to recognize objects. Sound thus illuminates what the relative dimensions of the game space are. It also acts as a support in the darkness through the sound of a voice.
In Der Tag wird zur Nacht, the voice of the speaker forms the first and only understandable element. All other sounds must be discovered through trial and error, such that sound becomes an ephemeral material for the construction of the gameworld. But the sound of the voice can add value, too, by not only announcing directions but also working as something familiar in a world whose sonic elements have to be explored and recognized. Thus, the voice gives a sense of safety and orientation in the darkness. The added value of the vocal sound is used here as a kind of emotional support.
SECOND CASE STUDY: ATMOSPHERE AND EMOTION IN
On the one hand, voices can be staged in different emotional states, and on the other, voices themselves can create very special atmospheres.49 Particularly impressive are the voices or vocalizations in the already discussed game BioShock. It is a game with a bizarre dystopian setting. The entire virtual world contains violence; people have been completely disfigured by operations, genetic manipulation, and other experiments. Splicers are mutated adversaries who must be overcome during gameplay. They roam through the corridors of destroyed buildings like ghosts while talking to themselves, complaining, and crying. Voices often indicate the presence of hidden or encroaching enemies, for which we have the added value of danger.
The backdrop of BioShock is a gloomy, desolate, and destroyed world in which rooms are covered in blood. The player hears vocal sounds of different figures, including synthetic voices, advertising announcements, and many more. Nearly all the action of the gameplay is accompanied with sound, though the entire soundscape seems both deliberately irritating and irrational. The audible voices in BioShock are usually distorted and therefore appear non-real and unnatural.
Nevertheless, remnants of formerly prosperous times are still recognizable. All the inhabitants that still exist following some undisclosed catastrophe are desperate. These include the Splicers but also the few remaining people who actually do show feelings, though they are mainly depressed or scared. They give expression to their despair using their voices. Players also receive numerous hints about the game through vocalizations like diary records that they can listen to—here, voices have the task of passing on information. These voices enrich the visual experience with people who existed in the past or who simply do not appear in the gameplay as a visible figure.
Even in cutscenes information is communicated vocally, but the voices here do not just convey narrative information. Above all, the voice sound expresses affects. Here it is the added value of emotionality. There is speaking and screaming (in terror, in pain, in malice, and so on), as well as crying, singing, groaning, sobbing, breathing, and bellowing. The voices represent emotions and moods. Through empathy, players can experience the feelings of the virtual figures.
Overall, BioShock presents players with an auditory mélange consisting of different noises, screams, messages, machine sounds, voices, music, and more. Players have to listen attentively to distinguish those sounds in the auditory chaos that are important for gameplay. A voice can pass on information or emotion in this game, and its origin can also indicate a danger state. It is the same with the singing and moaning of the Splicers. The creatures are depicted as desperate, and their humming is not an expression of contentment but rather the singing of mentally ill people, and it serves as a warning signal to the player that an enemy is in the vicinity. Other utterances seem to be echoes from another idyllic world, such as the request of one Splicer to “get everything for a picnic,”50 which is more than strange in the completely blood-covered context. These voices seem trapped in eternity. The negation of “reality” presents an overwhelming feeling of an omnipresent madness.
The voices also serve as a warning. If they are heard, there is at least one enemy nearby. These include the booming roar of the Big Daddies, which are the relentless, extremely destructive battle robots that guard the little girls. If the player fights a Big Daddy, the machine roars and moans. At the same time, a protesting girl's voice is heard: “You hurt him!” As such, violent acts become the only way to progress in the game, for which the act of violence itself is a means to an end. But it is the voice of the scared girl that makes the players complicit to this violence. When the robot is defeated, the little girl cries until she has been “saved.”
Listening is a highly complex form of perception,51 especially when many sounds are mixed up and heard at the same time. Listeners have to select and interpret in order to make sense of what has been heard and to decide what should be further processed or what can be disregarded. The players decide what is worth listening to and what is not. As Chion suggests, “besides the ‘cocktail party effect,’ so-called informed listening puts into play a group of behaviors and compensatory systems that together ‘help’ listening.”52 One might now assume that players are animated by the added value of a sound. This would make sense, especially in BioShock, because in the audiovisual chaos, cognitive game-relevant decisions must be made in addition to the haptic operation of the hardware.
THIRD CASE STUDY: VOICES AS CHARACTERS
In the game The Darkness (2007) players hear a voice that partially comments on the unfolding events, for which its statements seem like a gloomy premonition. The voice, spoken by Mike Patton (singer of the band Faith No More), is both a character and sound at the same time. It accompanies the player through a bloodthirsty, dark, and dismal adventure. At first, the character of the Darkness appears only vocally, but later during a dialogue with a secondary character the player's avatar learns of an “undesirable element” that shortly afterward becomes his constant companion throughout the game. The voice hisses toward the avatar, “You are my puppet … your will is mine,” and becomes a kind of alter ego of the avatar. With the exception of combat situations, the music is melancholic and accompanies the dark events that are distinguished through the use of cruel and evil voice sounds. Especially in connection with the music, the voice seems to be particularly uncanny. The voice is heard, of course, and parallel to the auditive it gets a visual experience: the Darkness embraces the avatar with dangerous tentacles. The voice is not only whispering comments on the game—it appears to contextualize feelings (perhaps of the player, perhaps of the avatar) by hissing while referring to emotional situations. When the Darkness does not speak, a kind of groaning is occasionally heard that underlines its otherness. The voice is a central element of the plot and assumes the function of an independent character. Its sound contributes decisively to the atmosphere of the game.
Similarly disturbing are the off-screen voices in Hellblade: Senua's Sacrifice (2017), which acousmatically depicts the psychotic disturbance of the character. Senua suffers from a serious psychosis such that her perception of reality deviates from that of a healthy mind. Throughout the game Senua (and at the same time the player controlling her) hears voices that are not based in the reality of the fictional gameworld. As a result, a special sense of immersion builds up over the course of the game. The voices do not refer to the fictional environment reality of Senua; they refer to Senua's entire perception of reality itself. The players themselves empathize with Senua's mental experience, which includes the difference between the outer (fictional) reality and the inner representation due to her mental disorder. Senua's psychosis is represented by the presence of visual and auditory hallucinations. In addition to optical features such as flickering image segments and suddenly appearing cryptic characters, the game producers focus on a specific compositional method: the acoustic design of the gameworld. This is especially true in authentic reproduction of certain sound characteristics, such as the changing location of the voices Senua hears in her head.53
The voices, which often overlap, are sometimes clearly understandable and at other times are partially distorted given they come from different directions. This approach greatly influences the auditory environment of the player. The voices in the game also have the added value of depicting the psychosis of Senua, and thus the psychosis becomes perceptible and comprehensible. This means that at times the voices may give important hints, while at other times they can confuse the player.
The voices belong to Senua but are intradiegetic and therefore audible to both Senua and the player, which seems to intimately connect each to the other. At the same time, the voices seem like they are a character of the game, because they verbally interfere in the game.
In the last example, The Stanley Parable (2013), an off-screen narrative voice continuously accompanies the action. The narrator describes, reacts, and comments on all the player's actions in a kind of ongoing sound and thus concretely shapes the sound of the game. At first, the voice sounds calm and inspiring. The player gets involved with the voice and receives hints on how the game works. If at some point they decide against a recommendation from the speaker and go their own way within the gameworld, the sound of the narrator's voice changes and takes on affective impulses. The narrator becomes angry, sounds frustrated, or reacts sarcastically. Narration and voice-sound equally influence the play. Through its constant presence and increasing vehemence, the sound exerts a certain pressure at the psychological level, which players either resist or give in to. Through the sound of the voice and its statements, it seems as if the narrator wants to regain control of the game.54
In all three case studies, voices have a special power. The voices are in dialogue with players and their game. This is also important as a method to convey emotion in games. This means that without suitably skilled voice actors in games there is a potential to lose valuable emotional inflections that may “significantly affect our ability to feel the emotion and thus develop empathy.”55
In all three examples, the function of the voice goes far beyond an auditory or narrative element. Each has the added value of its own character. The voices put pressure on the player. They act as characters, as other players, and as opponents. The auditory experience hugely influences the respective game. In none of these games does it make sense to play it without sound (playing without sound would significantly diminish the player's understanding), which means you cannot avoid the voice. The voices are perceived physically and psychologically56 and cannot be ignored. They form an intrinsic part of the game, and players have to face them to progress in the game.
FINAL CASE STUDY: VOICES DO NOT ONLY HAVE TO SPEAK, THEY CAN ALSO SING
The previously discussed examples have dealt with spoken language, but there are also numerous examples of singing voices in computer games. In Deponia (2012), for example, a bard with a guitar sings. He creates a connection between the chapters of the game and comments on the events. It is also a mission of the player to retrieve tones for him. The bard is a singing narrator, and the vocal information in the form of his songs supports the narrative structure of the game and gives the player orientation and hints for the gameplay. Here his voice has a narrative function. For this he seems humorous because of his audiovisual appearance. The fact that he sings and does not speak matches the atmosphere of the game, which seems humorous and self-deprecating.
An entirely different form of singing can be heard in The Path (2009). Here players experience a horror version of the fairy tale of Little Red Riding Hood and the Big Bad Wolf.57 In this game, singing has a warning function. It replaces the voice of Little Red Riding Hood's mother, which instructs her daughter to be careful in the forest. The song warns of the wolf and is heard quietly in the background. The dark music comes from the composers Jarboe and Kris Force. This musical backdrop seems epic. The timbres of the gloomy music match the acute mood of the visual presence of the gameworld. This too is dark, opaque, and partially clouded. The sounds are imaginative and thus construct an emotional environment. Their theme is kept simple through a catchy piano sequence that varies in its course: simple chord progressions and their constant repetition. Through this monotony, the listener is almost lulled. The instrumentation consists of string and keyboard instruments (mainly cellos, electronic keyboard, and piano). It is a mix of classical musical instruments and electronic sound, as often found in Dark Wave and other dark independent music. In addition, voices are heard that occasionally make statements. The song has a telling function and primarily includes the following warnings:
In The Path the player must listen very carefully to hear these texts. The voices are somewhat blurred within the overall audiovisual presentation. Together with the background music they form the atmosphere while also making a statement. This then provides a narrative added value, for which I would emphasize that the voice is primarily about the atmosphere and the statement is the added value. The sign system's sound, image, and lyrics generate both autonomous and interdependent parts of meaning within a semantic frame of the recipient. Music has a concrete effect on the feelings of the recipient, or as Heiner Gembris notes, “In the music that is heard, in being touched by the music, something becomes tangible that is not of this world. … In other words: in music, the listener is presented with a form of transcendence.”58
In the “fairy-tale” game The Path, which is more an experience than a game, each episode ends with Little Red Riding Hood lying in the rain, being woken up in front of Grandma's house after her encounter with the evil wolf. The house also does not turn out to be a safe place. The melody of the warning song accompanies her foray through the forest, sometimes with singing, sometimes only as an instrumental piece. But in ignoring the warning she ends up hurt and broken in a kind of ghost-house. Her mother, whose warnings are heard in the song, seems very far away.
We can surmise then that singing has very different functions in games. It can be humorous, as in the first example, or it can be a “place-marker for something unarticulated or inarticulable.”59 When a voice sings instead of speaks, it has a different aesthetic. Singing fills the auditory atmosphere of a game world, shapes the couleur locale, or has a signaling function. Sometimes the text also fulfills central informative and/or narrative functions.60
THE PLAYER'S VOICE: VOICE AS CONTROLLER
The last example is the voice of the players themselves. This is totally different from the previously discussed functions of the vocal sound, because the voice here works as a tool and an instrument to actively influence the virtual world of a game. The peculiarity here is that the player's voice is a unique instrument and not programmed element of the game. It has intention, but its sound is not predetermined by the game's designer; thus, the voice of the player becomes an individual element of the game.
Today, voice interaction is a tool that is used with Siri and Google Home in numerous households. But in games the voice can be used as a tool for operation as well. On the one hand the player's own voice contributes to the sound of a game, and on the other its sound has an impact on the virtual world and the figures acting within it. This is the case, for example, in Nintendogs (2005), where a virtual pet is trained and cared for using voice and speech. In Phoenix Wright: Ace Attorney (2006), players can influence the virtual court case by shouting “objection,” while The Legend of Zelda: Phantom Hourglass (2007) also uses voice interaction where players are asked to extinguish lamps or make windmills spin by blowing into the microphone.61 Through voice interaction, their own voice shapes the sound of the game and at the same time becomes the instrument with which the game is controlled.
In these case studies, I introduced the use of the voice in video game design and highlighted particular approaches to its integration in gameplay. In doing so, fewer questions were answered than raised, though I have sought to show how the overall sound of a game may be perceived differently, depending on whether a man, a woman, a young person, or an elderly person produces it. An interesting future direction to take from this overview would be an empirical study that examines the extent to which the sound of one's own voice influences immersion. This raises the question of whether players are more attracted to their own voice, or whether the “foreign” voice prompting of previously recorded voice events is sufficient for immersion.
On the other hand, one can also see the emergence of a special form of embodiment. The voice becomes part of the game and the voice belongs to the player. The voice overcomes the border between reality and the virtual world. It may have an impact on the virtual world without the need for a haptic-aware additional device (e.g., gamepad62). The player, whose presence in the game world is mostly possible by the avatar, sends a part of themselves into the virtual world, a condition that prompted Cheng's dialectic: “What's in a voice? … the body is in the voice.”63
Music, sounds, voices, languages—everything that is audible has a certain function within a game. The more sophisticated the sound design, the easier it seems to become immersed in the fictional computer gameworld. Since there is no such thing as “the” computer game but rather a large number of creative works in different genres and diverse versions, there is no single answer to the meaning of voice sounds in computer games. This article provided a brief overview of the function of voices and the connection between voice and character. From the case studies, I showed that vocal sounds encourage immersion, convey feelings, add value, characterize the speaking figure, convey narrative information, and shape an atmosphere, among other functions. Vocal sound also occurs in both spoken and musical (song) form. Sound is used as a means of representation to make an artificial world appear more alive or temporarily more authentic. To hear and react to the sound of something linguistically expressed is, as described, self-evident for people. Music, speech, and sounds support, supplement, and expand the visual component of video games. This is particularly important, for example, when something is heard that is not yet in the player's field of vision. The sound of voices supports the emotional atmosphere by directly addressing the empathic abilities of the players, and by clarifying the nature or appearance of the fictitious characters. Vocal sound contributes to the completion of fictional computer gameworlds, because it makes fiction seem more alive and a little more real.
While a voice can convey certain information, it is valuable to understand how this information is presented, and what the added value is, given that it represents an additional piece of information for the picture. It becomes more than simply what is illustrated and more than what is said, given that it is able to cause moods and feelings. Embodiment is a prerequisite for empathy, and when a player can intuitively use the gaming hardware, they have room for immersive experiences. In elaborately produced games nothing is left to chance.
As Karen Collins notes, “The use of voice, sound effects and music in coin-operated games served many purposes, from creating novelty, attracting attention, creating a sense of realism or immersion, overcoming legal boundaries, as reward, and to alert others not playing as to the state of the game (win, high score, etc.). These same uses of sound carried through the electro-mechanical era into the early days of video games and to today.” See Karen Collins, “Game Sound in the Mechanical Arcades: An Audio Archaeology,” Game Studies 16, no. 1 (2016), http://gamestudies.org/1601/articles/collins.
Collins, “Game Sound in Mechanical Arcades.” This approach has emerged not only from the utilization of new technologies in game design, but equally from a shift toward using strong narratives and story-driven frameworks.
Attention plays a big role in computer games. To be successful at a game requires a lot of concentration from the players. Furthermore, various in-game situations require different forms of attention. There is a difference between the attentive perception of expected impulses and the spontaneous perception of unforeseen events. This means that the anticipated event is attended actively and the event-responsive event is perceived more passively. However, if the players know (through experiences gained during the course of the game) that a special sound may signal an extraordinary event, they will listen attentively.
Compare to Marilyn Boltz, “Musical Soundtracks as a Schematic Influence on the Cognitive Processing of Filmed Events,” Music Perception 18, no. 4 (2001), and Claudia Bullerjahn, “Grundlagen der Wirkung von Filmmusik” [Basics of the Effect of Film Music], Hannover, Hochsch. für Musik und Theater, Diss., 1997, Wißner, 2001.
Mihaly Csikszentmihalyi was the first to nominate the concept of a “flow state,” for which the term flow in conjunction with recent considerations such as the “magic circle” and a “willing suspension of disbelief” combine to produce a realistic experience of a fiction. Here, the normal rules and reality of the world are temporarily suspended and replaced by the artificial reality of a game world. See Mihaly Csikszentmihalyi, Beyond Boredom and Anxiety: The Experience of Play in Work and Games (San Francisco: Jossey-Bass, 2000).
Some parts of this article are based on and advanced ideas from my publication: Yvonne Stingel-Voigt, “‘Stay Low and Avoid Contact If Possible’: Stimmklang in Computerspielen” [The Sound of Voices in Computer Games], Paidia, February 20, 2019, http://www.paidia.de/stay-low-and-avoid-contact-if-possible/.
Yvonne Stingel-Voigt, “Soundtracks virtueller Welten” [Soundtracks of Virtual Worlds], PhD diss., Freie Universität Berlin, Hülsbusch, 2014.
Michel Chion, Sound: An Acoulogical Treatise (Durham, NC: Duke University Press, 2016), 48.
Couleur locale means that (in many games) a special music is always hearable at the same location—so you can recognize the in-game location where the action takes place by listening to the soundtrack. It is similar to a leitmotif function. Stingel-Voigt, “Soundtracks,” 12.
There are numerous examples of the presence of voices in computer games, such as tape recordings (BioShock), radio messages, and others.
Compare to Vito Pinto, Stimmen auf der Spur [Voices on the Track] (Bielefeld, Germany: Transcript Verlag, 2014), 165.
Such perceptions though do not always correspond to reality. We know this from the experience of the (at first) disembodied voices of radio announcers or radio play actors. If at some point we see the person behind the voice, we may find that their appearance is quite different from our perception. It is not possible to capture a voice as a voice (dematerialization), but rather, each speaker creates a concrete, physical, and lived presence with their voice that evokes individual associations in the listener. Pinto, Stimmen auf der Spur, 163.
Compare to Erin E. Hannon and E. G. Schellenberg, “Frühe Entwicklung von Musik und Sprache” [Early Development of Music and Languages] in Bruhn, Musikpsychologie [Music Psychology], 131 ff.
For an in-depth discussion see Yvonne Stingel-Voigt, “Funktionen von Sound in Computerspielen” [Functions of Sound in Video Games] in Sound in den Medien: Sound Across Media, ed. Jan-Noël Thon and Thomas Wilke (Berlin: Peter Lang, 2018).
Karen Collins, Game Sound: An Introduction to the History, Theory, and Practice of Video Game Music and Sound Design (Cambridge, MA: MIT Press, 2008), 136.
Michel Chion, “Ton und Bild—eine Relation? Hypothesen über das Audio-Divisuelle,” in Bild und Stimme, ed. Maren Butte, Sabina Brandt, and Michel Chion (Paderborn, Germany: Fink, 2011), 52.
Chion, Sound, 151. Beneath “added value,” Chion also introduced the concept of “synchresis” (synchrèse), or “a portmanteau word made up of ‘synchronism’ and ‘synthesis’ … a spontaneous and reflexive psychophysiological phenomenon” (154). This means an irresistible and spontaneous connection arises between an acoustic stimulus and a short optical phenomenon, particularly when both occur simultaneously and independent of any rational logic. Because of the synchresis, images, sounds, and voices are automatically brought into harmony; voices and bodies/characters become related to each other. Compare to Pinto, Stimmen auf der Spur [Voices on the Track], 285. Both added value and synchresis help examine the perception of audiovisual networks in a more differentiated way. They make it possible to understand the interaction of visual and auditory cues in multimodal arrangements in media. The culturally encoded forms of representation that it has produced contributes significantly to the viewer's or player's reception activity. Compare to Barbara Flückiger, Sound Design: Die virtuelle Klangwelt des Films [Sound Design: The Virtual Soundscape of Movies], (Marburg, Germany: Schüren, 2001), 161. She reveals: “However, the analysis of this interaction can be further developed on the basis of perceptual psychological insights and newer semantic concepts. In addition, technical conditions have to be included, because the differentiation of the procedures and the dispositive with stereo surround sound have significantly changed the possibilities of interaction.”
Jørgensen, “Audio and Gameplay.”
Kristine Jørgensen, “‘What Are Those Grunts and Growls Over There?’ Computer Game Audio and Player Action,” PhD diss., Copenhagen University, January 2007, https://kristinejorgensen.w.uib.no/files/2018/12/jorgensen-thesis.pdf, p. 176.
Karen Collins, “Making Gamers Cry: Mirror Neurons and Embodied Interaction with Game Sound,” in Licinio Roque, ed., Proceedings of the 6th Audio Mostly: A Conference on Interaction with Sound (New York: ACM, 2011).
Collins, “Making Gamers Cry,” 6.
Aki Järvinen, “Understanding Video Games as Emotional Experiences,” in Video Game Theory Reader 2, ed. Bernard Perron and Mark J. P. Wolf (Routledge, 2008), 95.
Johan Huizinga, Homo Ludens: A Study of the Play-Element in Culture (Boston: Beacon, 1955); Roger Caillois and Meyer Barash, Man, Play, and Games (Urbana: University of Illinois Press, 2001); Katie Salen and Eric Zimmerman, Rules of Play: Game Design Fundamentals (Cambridge, MA: MIT Press, 2004).
Michael Liebe, “There Is No Magic Circle: On the Difference between Computer Games and Traditional Games,” in Conference Proceedings of the Philosophy of Computer Games 2008, ed. Stephan Günzel, Michael Liebe, and Dieter Mersch (Potsdam, Germany: Potsdam University Press, 2008), 338.
Samuel T. Coleridge, “Biographia Literaria, Or, Biographical Sketches of My Literary Life and Opinions,” https://archive.org/details/biographialitera00colerich/page/n8.
This is certainly also dependent on factors such as the game's narrative, its complexity and ambition, and the the audiovisual representation of the respective game.
However, embodiment and empathy are not limited to playing within a set of rules. There are many other ways to act inside of the virtual worlds. Players of the Ultima series, for example, discovered an Easter egg that enables them to build stairs out of virtual bread, through which they could reach otherwise unattainable locations of the Ultima game world. After a time, these constructions became competitions facilitated through associated online communities. A behavior (born out of an activity in the field of paidia (trying)) gets new rules in the competitive moment and is thus again ludic. Compare to Stingel-Voigt, “Soundtracks,” 30. However, paidia- or ludus-intense engagement with the virtual world requires a kind of flow state. As Alison Harvey points out: “A flow of the mind, the emotions, and the physical self are required to fully enter the flow channel, the magic circle of the rules, and meaningful play.” See Alison Harvey, “Seeking the Embodied Mind in Video Game Theory: Embodiment in Cybernetics, Flow, and Rule Structures,” Loading … 3, no. 4 (2009), http://journals.sfu.ca/loading/index.php/loading/article/view/57/54.
See Berck, Albert Michotte van den, “Die Emotionale Teilnahme Des Zuschauers Am Geschehen Auf Der Leinwand” [The Emotional Participation of the Viewer in the Action on the Screen], montage AV 12, no. 1 (2003): 126.
Jørgensen, “Audio and Gameplay.” She explains: “The situation oriented perspective emphasizes the idea that auditory comprehension is oriented towards interpreting sounds in terms of events instead of in terms of objects. This opens for contextual auditory comprehension since associating sounds with events means identifying the situation in which the sound occurs. This can be contrasted with an object oriented perspective that believes that our perceptual system associates a certain sound with a specific object, thereby suggesting a static and absolute relationship between the sound and the information it provides.”
Walter Murch, Dense Clarity—Clear Density, 2005, http://www.digitalchophouse.com/uploads/2/5/2/8/25282539/dense_clarity_by_walter_murch.pdf.
William Cheng, Sound Play: Video Games and the Musical Imagination (Oxford, UK: Oxford University Press, 2014), 151.
Collins, “Making Gamers Cry,” 7.
Walter Sendlmeier, “Die psychologische Wirkung von Stimme und Sprechweise—Geschlecht, Alter, Persönlichkeit, Emotion und audiovisuelle Interaktion” [The Psychological Effects of Voice and Speech—Gender, Age, Personality, Emotion and Audiovisual Interaction], in Resonanz-Räume: Die Stimme und die Medien, ed. Oksana Bulgakowa (Berlin: Bertz und Fischer, 2012), 100.
Sendlmeier, “Die psychologische Wirkung,” 99.
Duncan Williams and Newton Lee, eds., Emotion in Video Game Soundtracking (Cham, Switzerland: Springer, 2018), 23.
Stefan Koelsch and Erich Schröger, “Neurowissenschaftliche Grundlagen Der Musikwahrnehmung” [Neuroscience Basics of Music Perception], in Bruhn, Musikpsychologie, 409.
Brian Moore has suggested that speech possesses a special quality as auditory stimulus, and that “speech stimuli are perceived and processed in a different way from non-speech stimuli; there is a special ‘speech mode’ of perception.” See Brian C. J. Moore, An Introduction to the Psychology of Hearing, 6th ed. (Leiden, Netherlands: Brill, 2013), 348.
See Jens Asendorpf, “Lassen Sich Emotionale Qualitäten Im Verhalten Unterscheiden? Empirische Befunde Und Ein Dilemma” [Do Emotional Qualities Differ in Behavior? Empirical Findings and A Dilemma], Psychologische Rundschau 35, no. 3 (1984): 127–30.
On this aspect, Steve Horowitz and Scott Looney (2014) discuss the quality of voice sound in games and argue that “a good voice artist is a major talent, and like a virtuoso instrumentalist they use all the techniques they have learned over the years to pull off amazing performances under difficult and high pressure situations.” See Steve Horowitz and Scott Looney, The Essential Guide to Game Audio: The Theory and Practice of Sound for Games. (New York: Focal, 2014), 111.
Williams and Lee, Emotion in Video Game Soundtracking, 25.
Sendlmeier, “Psychologische Wirkung von Stimme und Sprechweise” [Psychological Effects of Voice and Speech], 115.
William Gibbons comments that “at a certain point in the game, Jack must defeat one of the guardians that surround these Little Sisters and make a choice about rescuing the girl. The battle is a difficult one; players must make use of all the resources at their disposal to be successful, and it will likely take a significant amount of time. Underscoring this (rather epic) scene, sneaking quietly in amid the clamor of bullets and explosions, is the 1941 Billie Holiday song ‘God Bless the Child.’ The juxtaposition of the song's mellow style—particularly Holiday's silky smooth voice—with an extended firefight is striking enough, and fits the ‘ironic’ category.” While music in this game often ironizes violent situations, the little sister's voice has something sinister about it. It embodies the contrast between a child's innocence and its inherent danger. See Gibbons, “Wrap Your Troubles in Dreams: Popular Music, Narrative, and Dystopia in Bioshock,” Game Studies 11, no. 3 (2011), http://gamestudies.org/1103/articles/gibbons.
Karen Collins argues, “In relation to mirror neutrons, we mentally re-create (visually and motorically) what we hear, and we hear in terms of intentionality and causality—including emotional intent—and thus we empathize with the originator behind the sound.” See Collins, “Making Gamers Cry,” 2011.
Compare to Stingel-Voigt, “Soundtracks,” 94f.
This corresponds to Collins's embodied cognition approach and the author's concept of how sound mediates our identification with, and empathy for, video game characters. See Collins, “Making Gamers Cry.”
Pinto, Stimmen auf der Spur [Voices on the Track], 187.
All game quotations in this article are transcribed by the author.
Kristine Jørgensen also affirms this statement: “Listening is a complex cognitive activity. A listener often needs to make sense of situations.” She also observes that “when the player makes meaning out of sound in context, the interpretation of what generates the sound is crucial for understanding what a specific sound communicates.” This can also be understood within the context of the sound of voices. Players decide whether it is important to listen to a particular character, and to what extent they allow themselves to be influenced by what has been said. For Jørgensen's discussion on listening see Jørgensen, “Audio and Gameplay.”
Chion, Sound, 27.
For a further discussion see, Yvonne Stingel-Voigt, “Funktionen von Sound in Computerspielen,” in Sound in den Medien: Sound Across Media, ed. Jan-Noël Thon and Thomas Wilke (Berlin: Peter Lang, 2018), 176.
Collins, “Making Gamers Cry,” 7.
Collins, “Making Gamers Cry.”
Compare to Stingel-Voigt, “Soundtracks,” 183f.
Heiner Gembris, “Ich singe, was ich nicht sagen kann—Anmerkungen aus der Musikpsychologie” [I Sing What I Cannot Say—Notes from Music Psychology], https://kw.uni-paderborn.de/fileadmin/fakultaet/Institute/IBFM/Downloads/Text-Kirchentag-2003.pdf, p. 2.
Of course there exists hardware parts that are necessary for sound recording and reproduction during gameplay, though the player does not see nor feel them.
Cheng, Sound Play, 151.