Through the example of Nobody – Vis et ressens (Nice, France, 2021), this article sheds light on the musical creation process involved in the conception of an automatized escape room, where participants experience a multimodal experience (sound, light, scenery, video) driven by an original scenario. As the composer, sound designer, and computer music designer, I propose to study Nobody as a playful game. I show how current ludomusicology insights can be used and adapted to a game that is played in a physical space. After first exposing considerations of both technical and aesthetic aspects, I explain the artistic choices made for creating the soundtrack of six interactive puzzles. Through practice-led research of these different mini music games, I analyze how the influence of electronic dance music (EDM) and related club culture are melded together into the development of the plot and its retro-futuristic theme. I show how constraints forced me to adapt the audio content yet also opened a wide range of musical possibilities. The analysis helps clarify how what I call “game music” differs from “background music”. Moreover, the discussion highlights compositional issues that are similar to video game music-making and shows that escape rooms need to be studied further within the field of ludomusicology. Although some of the analytical tools remain valid because of their technical aspect (transition types, typology of music games, transition speed quantization) or their theoretical aspects (ALI model, branching-layering concept, triple lock of synchronization), traditional concepts such as “immersion” or “game feel” need to be questioned for escape room games, as that type of game takes place in a physical space.
This article provides a case study of a practice-based research at the crossroads of game music and electronic dance music. Nobody is an artistic and lucrative project located in the French city of Nice that involves collaborative work. It is a real-world experience where players progress and perform through different rooms trying to solve puzzles that are related to a narrative. The first opus, entitled Vis et ressens (live and feel), was revealed to the public in 2021 and has been a great success since.1 Nobody belongs to the escape room family, a genre of entertainment developed in the 2010s where players were typically locked into a room and had one hour to find a way out.2 The project presented here is part of the third generation of escape rooms, involving extensive use of electronics, original scenarios, and tailor-made settings. The experience has been designed to ensure the players cannot be disturbed by any external events. Traditional concepts such as total countdown or game master are abandoned. The clues given on the fly are replaced by a smart and interactive help system, and the payment is integrated into the story.3 There is no face-to-face briefing to justify a specific environment and scenery. Instead, the experience starts as soon as the players press the intercom. This is made possible through the automatization and centralization of the various interactive dimensions. Players progress through a multimodal experience made of video mappings, dialogues, interactive music and lights. They are projected into a story that they discover as they go along: To free Daniel (the main character) from a mysterious loop that traps him, the players go through each emotion of his life. The key to understanding the whole plot lies in this aspect.
I worked as its sound director for two years, and beyond having participated in the sound system installation, I was in charge of creating all its sonic components (music, sound design, audio mixing, and dialogue edits) and programming all the automatizations (sound, video, lights, network devices).4 Nobody’s story takes place in a retro-futuristic world, which forced me to imagine the sonic identity in a contrasting manner. On the one hand, the retro side is based on sound techniques and timbres borrowed from the 1980s culture, whether in films or video games. On the other hand, the futuristic characteristics implied modern sound techniques and a certain freedom to break these rules. As a side note, my background is made up of fifteen years of club DJing and a passion for producing electronic music, especially electronic dance music (EDM).
The main goal of this article is to address a musical creation process that involves features of EDM, such as production techniques, tropes, and cultural references, in a real-world indoor game within the constraints of both its interactive aspect and the given scenario of the story. Unless otherwise specified, the term EDM is discussed here in a broad meaning, referring to electronic music that is composed to make people dance (typically in nightclub settings or at music festivals). This definition encompasses house and techno music genres as well as slower (dubstep) or faster (drum and bass) EDM subgenres. As we will see, similar problems arise between virtual games and real-world games because of the nonlinearity and nonpredictability of the media events. Therefore, the compositional work is investigated through the lens of research insights derived from the field of ludomusicology. It gives technical vocabulary and typologies that help answer the following questions: How does one create sonic coherences between the rooms? How does one keep the players’ attention within a music game? What kind of sound transitions are involved during a puzzle? What are the differences between creating an interactive audio track and a background track? Which characteristics can be borrowed from forms of EDM to make people dance in a reconstructed nightclub within the game? Finally, how can the study of the soundtrack of an escape room be integrated into current ludomusicology?5
These questions are addressed through practice-led research, which first involved setting up the technical framework of the adventure (devices and protocols used, loudspeaker location, organization of computer music design) as well as aesthetic prior considerations. Then, a selection of six puzzles that involve sono-gestural interactions in which influences of EDM are dominant will be analyzed. In doing so, ludomusicology approaches, such as Guy Michelmore, Leonard Paul, and Michael Austin’s typologies, are applied to elucidate artistic choices. Finally, a conclusion will open a discussion on relationships between video games and escape room music-making. To preserve the originality of the entire game that has not been copyrighted yet, I will only partially reveal the scenario as well as the puzzles that constitute it where they serve the practice-led research in the field of music in games. Any sound excerpts that follow will be numbered in the body text and listed at the end of the article (I warn the reader that the tracks have not been mixed and mastered yet—these final steps are reserved for the upcoming public release of the original score).6
The history of the Nobody project starts in 2019 in the French city Nice.7 At that time, the rooms in the rented commercial premises were completely empty and needed to be entirely renovated to be transformed into an escape room game. My prior considerations were then quite simple: How can a reliable sound system and other technical tools that respond to sound interaction be installed, and how can I compose and design sounds to best match a storyline? The first point regards technical aspects, whereas the second one is an aesthetic consideration.
I decided that at least two loudspeakers were needed in each room to reproduce the stereophonic field and help the intelligibility of the spoken text. In addition, in some of the rooms, a third loudspeaker emphasizes a specific aspect of the dramaturgy, enhances the spatialization of audio elements and, if anything, attracts the players’ attention. Indeed, the location of the loudspeakers was chosen specifically for the multiphonic rooms such as A and E. For instance, the loudspeaker E.c was placed just above a metal curtain present in room E to picture the helplessness of a prisoner locked behind it. The distribution of the loudspeakers throughout all rooms is shown in Figure 1.
At the time of writing, in 2022, the escape room gathers a total of thirteen loudspeakers and one microphone.8 Since it was not possible to use audio cables to connect the loudspeakers to the technical room, I decided to use the available Ethernet cables and a Dante system. Using audio-over-IP technology, this network protocol first allows me to connect to 512 bidirectional audio channels with a minimum latency of 1 µs. Secondly, the routing is easily manageable by matrix pin connectors accessible via any computer on the network. The Dante devices are recognizable by any audio software thanks to a driver that simulates a low-latency audio interface offering up to 64 x 64 channels with an audio quality up to 32 bits and 192 kHz. This was the most reliable and quickest way to provide a flexible sound system to Nobody.
As the computer designer of the game, I had one more step to take before diving into sonic creations: How can all the interactions from the unique computer be managed? Since Nobody involves not only audio but also video, lights, and network devices, I had to think of an overall software that would make it easy to program the different types of automatization in order to pre-wire the plot and react in real-time to the players’ interactions.9 After judging that Max/MSP was overkill and too complex for Nobody, I turned to Usine Hollyhock10 (abbreviated to Usine), which was perfectly able to respond to all the needs of the game: various supported protocols (such as DMX, ArtNet, HTTP, OSC, MIDI, IP video, VST, serial devices), many built-in modules (especially conditional ones) and the ability to script new ones, usability, fully customizable interface, low CPU usage, highly responsive community and technical support. To identify which action triggers which type of event, a collaborative worksheet was set up as shown in Figure 2.
Before programming the interactions, I had to organize the workspace of Usine according to Nobody’s plot. Was it more suitable to work with chronological elements or with the type of media? I chose a mix of both, considering that the chronology would help in the designing and the maintenance, while not forgetting that non-audio devices worked differently and needed to be treated accordingly. The plot of Nobody is divided into acts and scenes. That segmentation differs from the three playable parts of the game (called chapters) that are presented as episodes of a modern TV series.11 To date, only the first two chapters are open to the public12 and correspond to three acts due to plot consistency.13 Each act is associated with a Usine rack, itself divided into several patches defining each scene. These are in turn sometimes divided into sub-patches to organize the groups of actions defining a puzzle. The other racks are specific and do not depend on the chronology of the plot (see Table 1).
|Usine’s Element .||Assignment .|
|Specific racks||DMX control OSC control HTTP requests Gamemaster interface|
|Usine’s Element .||Assignment .|
|Specific racks||DMX control OSC control HTTP requests Gamemaster interface|
Concerning the different media involved in the programming, I used different colors to immediately identify which elements needed to be modified or tested, especially since the colors matched those of the collaborative worksheet, as in Table 2.
|Type of Media .||Color .|
|Type of Media .||Color .|
Finally, here is the list of the current gear involved in the project:
– MIDI devices (connected to the network by BomeBoxes14): 1 Bird DP1 digital piano, 1 Akai MPD 232 controller
– DMX devices: 12 UV lights, 3 LED bars, 5 LED strips
– Network devices controlled in real-time by the players: 2 iPads (called iPad K7 and iPad Tel), 1 ghetto blaster15
– Other network devices: 15 electromagnetic doors, 1 LED matrix, 1 tablet for video playing, 2 video projectors for mapping projections
After these prior technical issues were fixed, I needed to focus on the artistic constraints and how I could compose and design the sonic parts respecting Nobody’s world.
Before composing the original soundtrack and other sound design elements, it was necessary to imagine a “Nobody sound,” which means an aural identity that would match all the components and the requirements of the story. At that time, I only had partial elements because the dialogues were not fully written, nor was the scenery fully set. I knew a couple of elements that needed to be taken into consideration. At first, the retro-futuristic theme gave me a main musical direction and tightened the field of sonic possibilities. It invited me to mostly use electronic instruments for composing the soundtrack. Secondly, the personality of each room, with its own layout, decoration, time, puzzles, and light environment, gave me more details regarding the possible sonic orientations. For instance, a narrow hallway is a nice place to play stereophonic effects like ping pong delay, whereas an arcade machine calls for 8-bit sounds, and a modern nightclub evokes certain types of drums and a BPM range. Finally, the storyline included key elements that I couldn’t miss. For example, the meaning of some dialogues had to match the musical content.
With all these elements in mind, I started to gather different kinds of timbres that would fall into the retro and/or futuristic categories. To bring sonic diversity and modernity, I used audio plugins of old synthesizer emulations (whether analog, digital, or hybrid), such as Arturia or u-he products,16 as well as a virtual wavetable synthesizer. Beyond economic and practical reasons, I’ve always been convinced that these modern digital tools are fully capable to reproduce hardware synthesizers’ sound. About thirty presets were classified into four common categories (pads, keys, bass, lead) and then allocated to each room and scene according to several characteristics, such as synthesis type, polyphony, and evolutiveness. Once the melodic timbres had been selected, I chose the percussive tones from dozens of soundbanks. Already-made retro drum kits were selected, but I also made new ones from scratch, especially for the futuristic aspect of the project. To be honest, those decisions were more arbitrary and pragmatic than those made for the instruments. Similar work was accomplished for the sound design part. On the one hand, suggestive sounds (such as slam doors) were easy to pick from soundbanks (they just needed to be very well organized17) or record in my home studio. On the other hand, rhetoric sounds were designed from scratch, using sampling and synthesis methods (such as layering, time stretching, filtering, fades, pitch envelopes, or LFOs18). After creating different sound palettes to ensure that each room had its own sound atmosphere according to the scenario and the scenery,19 I sketched different versions of the required tracks. Once a specific version of a track of a given room was validated, I edited, arranged, and premixed it according to its own constraints (such as duration). A few more tweaks were made after the game was beta-tested.
Interactive possibilities raise technical and artistic questions. And Nobody had constraints as a game soundtrack. How can I compose an intro and an outro for a track that will be looped for a certain amount of time? How can coherent musical transitions be created between two adjacent rooms whose settings are very contrasting? As the discussion will show, this depends of course on the type of puzzle.
This section focuses on game portions in which the influence of EDM is dominant, following with reflections on insights that can be drawn from these. Giving elements of the plot enables me to contextualize the musical choices and thus be more in line with the experience lived by the players. Each part of the game that is chronologically focused on takes place in a specific room and is associated with a specific puzzle. In the manner of a video game like The Legend of Zelda: Ocarina of Time (Nintendo, 1998), each room could be considered as a place providing a mini music game defined by a puzzle. Once the music game is solved, players get clues and move forward to progressively achieve the main quest, a “meta puzzle” or “meta game.” As such, a significant portion of the existing ludomusicological analysis work for these mini games would directly apply. For instance, Austin offers a framework to study music games that still works on the following analyses.20 Table 3 is a categorization according to a slightly adapted version of the typology of genres proposed by the author.
|Puzzle .||Room .||Mini Music Game .||Genre .||Subgenre .|
|1||B||Vive sentique||Mnemonic music game||Play a melody|
|2||A||Layering||Musical puzzle game||Name that tune|
|Beatboxing||Karaoke music game||No backtrack|
|3||C||Blinking Lights||Mnemonic music game|
|4||E||Nightclubbing||Dance-based rhythm game||Corporeal|
|5||H||Playing Its Own EDM Score||Sandbox game|
|6||J||Retrogaming||Dance-based rhythm game||Manual|
|Puzzle .||Room .||Mini Music Game .||Genre .||Subgenre .|
|1||B||Vive sentique||Mnemonic music game||Play a melody|
|2||A||Layering||Musical puzzle game||Name that tune|
|Beatboxing||Karaoke music game||No backtrack|
|3||C||Blinking Lights||Mnemonic music game|
|4||E||Nightclubbing||Dance-based rhythm game||Corporeal|
|5||H||Playing Its Own EDM Score||Sandbox game|
|6||J||Retrogaming||Dance-based rhythm game||Manual|
The analyzed puzzles are very different from each other. Several skills are involved at different times so that the players’ attention is constantly renewed. Even if some mini music games seem similar (1B and 3C, 4E and 6J), the required gestures and the ways of thinking required to solve them are always different.
Puzzle 1: “Vive sentique” (room B)
When the team enters the building, the front door is barely opened and a track entitled “Vibration” (Audio 1) accompanies an argument between two characters,21 all in the dark (room A). The game has already started, with no other direction than the experience itself. This first track addresses two requirements. First, the content has to be minimalist so that the players are not overwhelmed with too much information from the start. Second, it is also constrained to the scenario: I had to picture and create the inner vibration of Daniel. To respond to the first need, I decided to only choose one electronic instrument and play one note. To represent the idea of vibration and to fit the darkness of the room, a bass register was welcome, so that actual low vibrations could be physically felt by players through their bodies. The single instrument’s timbre had to fill up a large part of the audible frequency spectrum, so I employed usual EDM production techniques: layering and blending. Three instances of a virtual Minimoog (one for each of the three common registers) were put together to form the raw timbre, and a series of audio effects were added to complete the signal chain (see Figure 3).
While the low-end and the mid frequencies are at the center (upper part of the figure), the highs are not, thanks to a stereo imager. To create sonic liveness, I automated some of the usual parameters found in club music: cutoff frequency and resonance of an emulation of the Pioneer DJM-900 mixer low-pass filter, overdrive gain of a Neve 1057 channel strip emulation, and white noise oscillator volume. The automation of the cutoff frequency produces irregular variations that echo the inner vibration of Daniel. Subliminal pumped rhythms at around 120 BPM may be heard if you listen carefully. The overdrive, the noise oscillator, and the bit crusher also bring dirtiness and disorder to the sound to fit the actual room: a dark and abandoned entrance hallway. The phaser effect brings subtle movement to the upper part of the spectrum, while the delay expands the mid-part and the low-end. To add a little bit of surprise and not act as an actual bourdon, the slow fading played note is renewed about every ten seconds. The filter attack then frankly punctuates the time and helps maintain attention.
After waking Daniel, the team enters his bedroom (room B). At the same time, the soundscape entitled “We Start Here” is played on the speakers. This room has a precise narrative function: by listening to fragments of memories of the main character, the players start to understand Nobody’s story. They discover that Daniel gradually recovers his memory thanks to a paper chase that he has created to fight against his amnesia. Unlike “Vibration,” “We Start Here” does not provide salient EDM characteristics. It is closer to drone music.22 The track is made of three layers: a bourdon texture, lush melodic keys, and a bass tone that comes in regular waves. The arrangement mainly consists of a low bourdon on which a few scattered, highly reverberated notes unfold, all in D aeolian (Audio 2). The bourdon helps avoid silence while remaining discreet. That way, the players can focus on understanding the dialogues and discovering their first puzzle. This timbre is formed by stacking two pad synthesizers, each one using an organ sample put into an emulation of the E-mu Emulator II synthesizer. The distorted bass part brings low-end frequencies and widens the stereophonic field. Since this timbre is the same as in “Vibration,” it offers a smooth transition between the entrance and the actual start of the game, but it also has a rhetoric function: Daniel is still present somewhere. The melodic part’s timbre is formed with the same technique as the drone one, blending this time two samples of mallets. As in “Vibration,” the sound organicity of “We Start Here” is provided by slowly modulating parameters over time: low-pass filter cutoff frequencies and white noise volume for the bass part, overdrive gain, volume, pan and morphing balance for the bourdon, release envelope for the keys part. The melodic patterns, the use of a bourdon, and the mode including mobile scale degrees give an impression of eternity, but also of mystery.23 This ethereal atmosphere fits the beginning of the game, where the scenario is only partially revealed. During this time, the team tries to succeed at their first puzzle, which is a mnemonic music game using the piano present in Daniel’s bedroom (see Figure 4).
Even if it is possible to directly play the right melody (entitled “Vive Sentique”24) on the piano, thanks to the score torn into four pieces scattered around the room, most of the time players do not read music. Therefore, each key pressed in the right order plays a little bell, whereas a buzz sound results from an error played in the played sequence.25 When the right melody is played, it triggers a type of event such as finishing a room. Indeed, each final action of a room permanently solves the current puzzle and instantly triggers a distorted and modulated version (called “upside-down”) of the current soundscape played loud on the speakers, which ends with a big reverberation tail before the next track fades in. This way of ending a room is effective because it creates a sonic climax based on a track that was playing for minutes. It makes a clear transition between two adjacent rooms. But that transition type is not directly referenced in Paul’s typology.26 It is a transition segment that appears suddenly, its content based upon the current playing track. Using the design and the names given by the author, it is possible to provide a new hybrid type that would be called “climax segment” (see Figure 5).
The point is that the middle segment is not a segment per se since its content belongs almost entirely to the previous segment. Otherwise, it would only be a butt segment (no fades between two adjacent musical segments). This raises the question of the identity of a musical segment that is beyond the scope of this article. When is an audio segment considered a “musical segment?” Does it depend on the differences between the two adjacent segments? If players cannot hear the transition, is this really a transition? Anyhow, one observes that practice-led research raises musical ontology questions that need to be developed within a more theoretical framework.
Puzzle 2: Beatboxing in the Metro (room A)
Once out of Daniel’s bedroom, a floating and surprising track entitled “Passing By” (Audio 3a) is being played, while new reading elements are visible in the metro hallway thanks to black lights and invisible ink markers. Although black lights have many uses, from medical applications to chemical ones, they are well-known in the club culture, and people are usually aware of black light’s funny characteristic: everything that is white (teeth, eye contour, very light-colored clothes, or props) becomes completely highlighted. In this context, the team of players meets a new character in the game called MJ, who has returned from the dead.27 He briefs them on the puzzle to be solved. As the players arrange various combinations of music symbols on the iPad K7 from clues written on the walls (see Figure 6), six layers of a famous tune (entitled “TDC”28) are added in the following order: clap, snaps, kick, shaker, choir, electric guitar. With six players, each can perform one beatbox layer and no one is left out. The layers’ order is not arbitrary. First, the clap provides a tactus with little information. Then the snaps bring a rhythm, then the kick brings the low frequencies. This contributes to maintaining the players’ attention. After a shaker, the melodic elements come to color this musical phrase. This mini musical puzzle game is solved once the team guesses which piece of pop music is being rebuilt. Therefore, the solution relies on personal experience of pop music culture. In that way, it is similar to SongPop (FreshPlanet, 2012, iOS), where players have to name a tune that is playing. In contrast, in Nobody, the tune to be guessed lasts only one measure and is not given in one go. It must be reconstructed progressively.
When I had to conceptualize this puzzle, I instantly thought of the clip-launching style popularized in the 2000s by the software Ableton Live. This playful way of making music speaks to everyone: pushing a button triggers an audio clip that is quantized to other clips according to a master tempo. To make things clear, I programmed a mini Ableton session view in Usine (see Figure 7).
That way of making music is widely used by EDM producers for sketching ideas or for playing live. Here, it is used in a restricted way since players cannot stop audio clips nor decide the order of the overlapping layers. A simple question arose: Which quantization speed was the most suitable for this mini game? In an article concerning interactive music in video games, Paul proposes several classifications, including transition speed quantization.29 The audio content can be quantized to a beat, a measure, a phrase, or at specific markers (such as ends of verses). The layers of “TDC” are quantized to a different value. Since the audio clips last one measure, with a tempo of 90 BPM, the quantization must be at least that duration. Consider that this is only one measure. If the players type the right combination of symbols at the end of the measure, the layer plays almost instantly without giving time for the validation bell sample to ring through the speakers. Therefore, the quantization speed was set to let the current measure end, plus one entire measure.30
Once the answer is validated, MJ invites them to perform that reconstructed sequence by beatboxing together in front of a microphone located in the middle of the hallway (see Figure 8).31
A pre-count then gives them the signal. For twelve measures,32 the kick layer of “TDC” acts like a metronome, and the signal coming into the microphone is amplified in the hallway speakers. At the end, MJ congratulates them and plays back their performance that was recorded.33 As we can see, the EDM-oriented characteristics of this puzzle rely more on the methodology itself (clip launching, quantization) than the sonic or structural aspects. It is important to note, however, that it is the first dancing beat the players hear in Nobody. In addition, the famous artist that would normally perform “TDC” is also a great dancer, and as a DJ, I can confirm that his songs are still largely played today at parties. The score of the portion analyzed is shown in Example 1.
The upper part of the score shows a syncopated rhythm (similar to a Jamaican dancehall groove) and regular hits of the bass drum on each beat. Although the score does not indicate the swing applied to both shaker and snaps, it is worth mentioning because it helps produce musical and physical movement. For the first time, players are implicitly invited to move their bodies.34 The more they progress in the game, the more the tempi of the tracks increase. It is a basic technique that DJs use in clubs to slowly raise the energy of their DJ sets. At a macroscopic level, I followed that rule because it was an easy way to bring power to the whole Nobody soundtrack.
Puzzle 3: Blinking Lights in the Tunnel (room C)
As players enter “Leo’s tunnel” (room C), where rails are laid out, a new track entitled “In-Between” accompanies them as they try to understand the context of a new puzzle. On the walls are engravings made by Leo, an outstanding artist and inventor (see Figure 9).
Leo has created a time machine that needs to be restored. To change the era, players must find the year of arrival of the trip they are going to make. Since the puzzles from each chapter must vary from one to another, and interactivity between players and media has to increase from chapter to chapter, I thought of a simple mini game based on synchronization between sound and light. That media association would open new game perspectives. The idea was to reproduce an effect similar to the auto-sound mode of nightclub lights. Thanks to a microphone placed inside the unit, this mode simply reacts to the dynamic of an audio signal that is being played. It is a common way to help create synergy on the dance floor, similar to the use of strobe lights during buildups of tension in the music. In Leo’s tunnel, the pulsation of “In-Between” is inseparable from the one given by the lights. As soon as the bassline marks every beat at 110 BPM, the glass bricks embedded in the walls blink according to different colors, as in Figure 10.
Consider ABCD the date to find. Since the sum of the four digits A + B + C + D equals 21, and the track changes occur at an eight-measure level, I initially thought that the simplest and the most effective way to reveal ABCD was to let the glass bricks blink at each beat: A red, B green, C blue, D white. That way, during the last 11 beats (= 32–21) of the pattern the players would have time to discuss before another cycle would start. Unfortunately, beta testers encountered difficulties in counting. Even at 110 BPM, light flashes on each beat were too short and it was too difficult to see the right color, so I decided to double the length of the blinking cycle. Henceforth, one “colored beat” alternates with a “no-colored beat.”35
“In-Between” consists of three consecutive sections that include sound techniques borrowed from the club culture. The first one sets the soundscape for Leo’s tunnel (Audio 4a), the second one provides a pulsation during the glass block puzzle, and the third one consists of the last eight looped measures of the track, during which the team validates the puzzle (Audio 4b). There is therefore no break between these three parts. The audio sample player is programmed in such a way that it can jump to the second part thanks to an action, then automatically start the loop of the third part.
Even if typical genres of EDM, such as house or techno, have a BPM range of around 120–140, other genres such as dubstep are slower.36 In addition, one of the dramaturgic purposes of Nobody is to gain intensity from one room to the next. As said before, a little contribution can be made at the tempo level. I knew that the BPM of “TDC” was 90 and that the main track of the next room had to match the steps of the choreography (created at 110 BPM). I then decided to select the maximum value: 110. Since the audio content of the next track is based on a four-measure cycle and is more impactful than “In-Between,” the feeling of “ramping up as you go along” is preserved.
The musical form of the track follows the eight-measure pattern principle common in many EDM genres, such as house music. It is used here to match the constraints of the puzzle. At each new eight-measure cycle, a new element is introduced and lasts until the end. It starts with a stubborn bassline ostinato, which sets a four-on-the-floor metric (see Example 2).37
To match the setting of the tunnel, its rails, and machines, I wanted a cold sound, digital and “harmonically mechanic.” The FM synthesis met my expectation. The timbre was made by an emulation of the Yamaha DX7, typical of the 1980s FM synthesis, with an automated macro controlling the low-pass filters present on the six operators. Since the drum part enters later, and the ostinato is very repetitive, I added a delayed reverberation to the bassline, that automatically pumps the beat. It was as if the track now had a proper rhythm. The reverberated audio signal was placed in front of every upbeat. At the end of the chain, I put an automated ping-pong delay, a classic effect largely used in EDM (see Figure 11). Its role was to repeat the signal with a lower volume than the original and send it to a different speaker at each occurrence: left, right, left, right, and so on. To achieve this, I set a dotted eighth delay in one channel and a dotted sixteenth one in the other, so that it would respond to the upcoming arpeggio running at sixteenth notes. Since the room is very narrow (around one meter by six), with one speaker at either end of the room, the result of the ping pong delay was very satisfying, especially for players located in the middle.
The harmonic progression is based on the tonic-dominant dualism to guarantee a functional and smooth loop. It is suggestive because not all the thirds are played, or they appear too quickly. To describe a mechanical behavior, I used an octave-based arpeggiator that is polarized to the four-on-the-floor metric (see Example 3).
The C minor key is established by a pad synthesizer at the start of each eight-measure cycle. That way, the second chord sounds like an F minor because the C minor key calls for the missed A-flat third. Either way, what makes the loop harmonically efficient is the movement of the leading tone to the tonic, played by the bass and the arpeggio sounds. As soon as the puzzle is solved, time travel begins and the upside-down version of the track plays (Audio 4c) and the lights flash very quickly, like a stroboscope.
During the discovery of the room, the characteristics of “In-Between” evolve. In the beginning, players can hear only background music. The sonic qualities seem to fit the room setting that the players have no control over. Even if there is no screen to watch, one would be tempted to say this is extra-diegetic content in the sense that what players see (such as drawings on the wall, or rails) does not correspond to what they hear. But that would consider the players to be characters, whereas there are already dialogues with officially created characters. While in film studies the term “diegetic” usually refers to content that is part of the plot scene, this concept becomes ambiguous in the case of an escape room. In the context of video games, Karen Collins points out that this concept must be refined, because of their interactive characteristics.38 She divides the diegetic and nondiegetic audio contents according to adaptive and interactive categories. In the first case, the player cannot directly affect the content. In the second one, they can. Contrary to Collins, Donnelly suggests that the diegesis concept does not suit video games and recommends instead to speak of “synchronization” between image, sound, and player.39 Following that principle, I propose to make a distinction between audio content that is linked to puzzles (or mini games) and audio content that is not. In the first category, players can control musical parameters (such as start, stop, filter cutoff, and reverberation balance), which corresponds to Collins’ notion of diegetic game music. In that case, the players and the sound are synchronized. In the second category, players do not influence the track that is being played, which Collins indicates as nondiegetic, and which in the game industry is known as background music. The players and the sound are not synchronized this time. Most of the time, the track starts as soon as the preceding puzzle is solved, so there is no audio gap between rooms. In a few cases, the two categories can overlap since a non-interactive track may give subliminal audio clues. In the case of “In-Between,” as soon as the players make a specific action, the second part of the track plays with a stinger transition, and the glass bricks rhythmically enlighten the tunnel. The background track thus becomes part of the puzzle itself (game music). The players’ actions start and stop the track, and there is a synchronization between music and lights. Finally, the example of “In-Between” helped build a simple distinction between game music and background music, showing that a given track can belong to both categories in the course of a mini game.
Puzzle 4: Nightclubbing (room E)
The next room is the portion of the game where EDM influence and club culture are present in the most explicit way. The scene rebuilds a nightclub environment, and to get out of the room, players need to perform choreography on an EDM track. When the team arrives, room E initially represents a street in the middle of the night, with scooter and ambient sounds (such as bats, crickets, drunk men, and car alarms) played through the speakers. Thanks to a taxiphone,40 MJ comes to help them in the resolution of the puzzle. When he mentions a nightclub called Transylvania, the sides of the door synchronously light up (see Figure 12).
To enter Transylvania, and thus hope to find the TV man, the players are told, one must be among the greatest. To evaluate their skills, MJ challenges the team to perform a special dance. At the end of the instruction, the club scene starts. Before analyzing the audio track made for the choreography, let’s focus on the scenery aspect. Smoke comes out while a video is being projected to the ceiling. This video mapping uses shapes commonly used in psytrance festivals, such as O.Z.O.R.A. (see Figure 13).41
To enhance the reproduction of an indoor nightclub environment before the choreographic part begins, a track is played through the speakers of the next room, behind the Transylvania door and its hallway. That way, players can hear bass drum hits that are naturally filtered, as if they were in the street, waiting to get into a nightclub. Since its role was to be heard “from outside,” this track entitled “Nightclub” (Audio 5c) was composed quickly. I chose a heavy bass drum to ensure the bass frequencies will be heard several meters away. Even if I still structured the track, the only effective element was bass drum hits, as well as quiet moments where players could hear a crowd shouting. The track lasts around two minutes and is looped until the choreography part begins.
After the players called the bouncer on the taxiphone to indicate they are ready to meet the challenge and enter the nightclub, neon lights illuminate the TV man statue42 and the audio track starts. In addition to having a story purpose, this sculpture displays words of encouragement and real-time dance moves via a tablet placed in its headset (see Figure 14).
My main constraint for creating this track was the already-made choreography. Fortunately, the video sections were still editable so that they could fit different musical arrangements. But the four-measure cycle and the 110 BPM were imposed. After several abandoned proposals,43 a new track judged more powerful was selected. The structure is shown in Figure 15.
The track is based on the typical EDM buildup-drop tandem, bounded by an intro and an outro. The unusual elements are the insertion of an orchestral section into the first bridge and a focus on the drum and bass parts for the last one. To not be too repetitive, the second drop brings more direct content and is shorter compared to the first one. For the purpose of the game, the track had to be efficient and functional. Therefore, it alternates energetic moments (drop 1, drop 2, bridge 2) with softer ones (intro, buildup 1, bridge 1, outro) and offers a minimalist set of instruments, required to design thick instruments, especially for the drum and bass parts.
The track has an ethereal introduction to fit the mindset of the players reading words of encouragement. On these long synthwave-inspired sounds,44 a crystal plucked tone unfolds. The timbre was created from an emulation of the Sequential Circuits Prophet-5, the most affordable polyphonic synthesizer in the early 1980s. With a long attack on the amplitude envelope and the filter cutoff, and a long release time, it brings softness and balance. To create a striking contrast, the end of the introduction gives way to a stubborn bassline ostinato that sounds dark and anguished (Audio 5a), as shown in Example 4.
As for “In-Between,” I used the I-V polarity in C minor to ensure long repetition of the bassline. However, I opted this time for an octave-base pattern and an emphasis on the leading tone (third measure). The high-end frequencies are filtered to keep energy to release for the drop. During the last two measures of the first buildup, a countdown is displayed on the screen of the TV man. To add more power, a stroboscope flashes the entire room during the last measure. Then the drop clearly announces the climax of the game, where players are invited to jump on a four-on-the-floor rhythm.
The bass timbre is made of three instances of a Minimoog emulation using pulse-wave oscillators. The amplitude spectrum of these specific waveforms can be modulated by modifying the duty cycle, which results in a rich tone with phase effects. To obtain a lot of harmonics and make the sound “thicker,” I used a classic technique popularized by trance and big room house music called unison.45 It consists of stacking slightly detuned waves of a given oscillator. I opted for sixteen voices of unison for two instances of the Minimoog. To make the voices more detailed, I put an offset on the respective detuned parameters. The other Minimoog is only used as a sub-oscillator tuned one octave below (see Figure 16).
As shown in the figure, the generated signal is then passed through a four-pole low-pass filter that attenuates the harmonics at appropriate moments in the track (introduction and bridge 1). A first triangle LFO acts on the phase of the main oscillators symmetrically, while a second changes the duty cycle of the waveforms. At the end of the signal chain, as is often the case in the EDM style, a compressor, an EQ, and a distortion ensure thickness and roundness in the global sound.
Although an arpeggio and a lead complete this instrumentation, the track was imagined according to the drum and bass archetype, which consists in focusing on the bass part and the drum part. To achieve this, all those sounds had to fill up a large part of the audio spectrum. In addition to the techniques mentioned to widen the sound, I used a stereo spread. I also used a bus compressor to glue that rhythmic section, and a sidechain compressor to add amplitude movement to the bass.46 Finally, a slight portamento effect was applied to this monophonic melody. Regarding the percussion, only five samples were chosen: kick, snare, closed and open hi-hat, and dropping noise. To thicken these sounds, a little distortion was added to the snare drum and the hi-hats, and a slight room reverberation was added to the snare.
Even if the bassline and the drums were sonically effective, I thought it lacked a theme. The first bridge was the right moment to use one. For the team to rest for a few moments, that orchestral section comes to temper the previous energetic moment. Staccato strings are added to a piano arpeggio, before the drums enter again, this time on a syncopated pattern. The bass phrase then sounds (Audio 5b), as shown in Example 5.
According to Caplin’s phraseology, that theme can be analyzed as a repeated sentence. Along with the period, it is the other fundamental theme type. It consists of a basic idea (measure 1) that is immediately repeated (measure 2), followed by a continuation phrase containing shorter fragments (first half of measure 3), and a cadential idea (second half of measure 3, and measure 4).47 On the same harmonic progression (I-IV-V-I), the sentence is repeated with a slight variation and ends one octave higher. I wanted a theme that was easy to memorize and that can be sung. The short and dotted rhythmic figures, as well as the polar notes of tonic and subdominant scale degrees marking the strong beats, ensure that effectiveness typical of EDM “anthems.” Even if anthems usually include soaring vocals with a memorable sing-along chorus line, the term is used here to describe a short and effective musical phrase, played by a synthesizer with a lot of harmonics, which can be sung simultaneously by a large audience. Usually played in nightclubs, stadiums, or at large festivals, anthems are unifying. Unlike anthems found in big room house music,48 the bassline anthem of “1980s” has fewer harmonics and no reverberation. That theme is also used at the end of the track because of its closure power.
Since this final chapter has not been released yet, the following puzzles are subject to change. Nonetheless, the creative process involved to date includes interesting content to analyze.
Puzzle 5: Playing Its Own EDM Score (room H)
This room represents an apartment located in a skyscraper (room H) in the near future. A projection mapping helps define the situation. For this new puzzle, the players have to reconstruct a track entitled “Videocœur” (Audio 6). To do this, they must use a MIDI controller stuck to the wall, which allows them to mix and modify the various sound layers of this interactive mini music game (see Figure 17).
This unit represents the kind of musical gear that is widely used in the EDM producers’ community. It gathers commands of different kinds: buttons, faders, pads, and rotary knobs. To make the interface clearer for the players, some parts were made inaccessible. For the gameplay to be effective, all the available buttons must serve the purpose. Here, the controller is programmed to simulate one of the possible uses: building an on-the-fly musical structure with eight sound layers. To achieve this, there are different actions depending on the type of the command: start/stop a loop with one of the sixteen pads, create a drum pattern with the sequencer, turn up/down the volume of each layer with the eight faders, modulate the sound of each layer with the eight knobs. The behavior of the latest is determined by the bank selection:
– Bank A (default): panning
– Bank B: cutoff frequency of a four-pole filter (low-pass or high-pass depending on the layer)
– Bank C: extreme distortion (fuzz) send level
– Bank D: extreme reverberation (lushed) send level
These parameters were selected because of their ease of listening. Turning any of the knobs makes directly audible changes. Unless a player makes tiny finger moves, audio processing cannot be subtle.
Knowing the constraints imposed by the puzzle, I had to compose a modern electronic track that had to be different from the rest of the soundtrack. I immediately thought of the IDM (intelligent dance music) genre,49 which is a kind of pioneering EDM that particularly inspired me in the last few years, especially the French artists Julien Chastagnol (aka Ruby My Dear) and Franck Zaragoza (aka Ocœur). Eight layers would be divided into four pitched instruments plus four percussions. The constraint of minimalism would help identify the instruments. Therefore, each layer needed a distinct timbre so that the team should be able to instantly understand the impact of their actions in real time. I started by writing the drum part, as it is the most salient feature of the IDM genre. I chose four archetypal drums (kick, layered clap and snare, hi-hat, white noise) and then created a pattern that would be played by default (but modifiable by the players via the MIDI step sequencer), as shown in Example 6.
This breakbeat rhythm contrasts with the previous four-on-the-floor ones.50 It helps create variation with other chapters. To keep the logic of increasing tempi as the music games unfold, and to let the pattern be intelligible, I chose a BPM of 130.
Four additional layers complete the drum instrumentation. Aside from a dropping FX played as a one-shot, players have access to three melodic instruments. First, a grainy thick bass sets a drone on a D-flat note. It is made of three oscillators (two main and a sub, each one with its own octave tune) that pass through a low-pass filter whose cutoff frequency is rhythmically modulated by an LFO. This opens the filter on the first beat and the third upbeat. I used wavetable synthesis to get unique sounds obtained by modulating the wavetable position. Although that synthesis type was first made available in the 1980s by PPG and Ensoniq, there has been a huge resurgence of interest in the 2010s thanks to the dubstep genre and the company Xfer Records that created Serum, a virtual wavetable synthesizer that is still, today, considered as a must-have synthesizer for EDM producers. Second, a pad synthesizer provides minor chord waves. Finally, a mallet synthesizer performs melodic loops. Example 7 corresponds with the audio excerpt.
There are specific musical gestures to make to succeed at it, but the purpose of this mini game is above all having fun. Non-musicians should understand and enjoy it as much as musicians. This is why an optional metronome is available and all actions are rhythmically quantized. This method is far from being new. Around thirty years ago, the players of the music-rail shooter video game Otocky (ASCII Corporation, 1987) could play notes when pressing a button. To ensure that the sonic output was always satisfying, the played notes were quantized to the beat playing in the background. For “Videocœur,” in addition to that technical rule, all combinations of layers had to work together, musically speaking. First of all, the loop repetition must run smoothly. According to Michelmore, there are five mandatory points the composer should not forget when creating looped cues in video games, because of the “indeterminacy of timing in this interactive medium.”51 In “Videocœur” each point was considered for each instrument:
– The harmonic structure is made to be looped. There cannot be unwanted dissonances due to undesirable adjacent chords or notes.
– The timbres and textures remain smooth. Each melodic part is ended when the loop starts again. The audio clips were also bounced with the “render as loop” feature of Ableton Live. That prevents the reverberation tail of the end of the clip from being cut off when the loops repeat.
– The automations made on melodic material keeps the players from being bored by the repetition. The repetition mainly depends on the flow of interactive actions made by the players.
– The dynamic and rhythm progression is preserved since the framing parts are in the same energy momentum.
– The end of each audio clip is controlled by a butt transition but the whole loop itself is made of layered transition.
These rules work on all interactive media, video games, and escape rooms in particular. But this only concerns the horizontal aspect. Since all the combinations of various layers must sound good, the vertical aspect is also constrained. Because of the MIDI controller conception, I had to deal with four loops for each of the four pitched instruments. All instruments can sound together, but each instrument is monophonic, which means only one loop with the same color can be played at the same time (see Figure 18).
Depending on the instrument function, I used simple variations of a default musical content to create the other twelve loops:
– Bass: change the bourdon note to the dominant, solo the sub-oscillator, add noise to the timbre
– Pads (harmony): substitute chords to C minor (Csus2, G minor, Eb major)
– Keys (melody): use the fragmentation and permutation operations
– FX: employ other envelope and LFO settings
This way of composing substitute audio clips is common in making video game soundtracks. For instance, audio loops composed for the platform-racing game Harold (MoonSpider Studio, 2013) are triggered according to the rank of the player. If the player is last, the track is minimalistic, with only a few elements (guitar, bass, drums). As the player overtakes opponents, the music becomes more and more powerful, until revealing an apotheosis gospel part when the player reaches first.52 This simple example shows a resemblance to game music making, be it virtual or real, but also a huge difference. Harold is not a music game per se. The interactive music is not consciously triggered by the player but is played depending on the objectives of the race (finish first). In Nobody, it’s the opposite: the goal is reached when music is played in a certain way.
Puzzle 6: Retrogaming with the Arcade Machine (room J)
This new place represents an arcade room whose walls and ceiling are covered with mirrors and electric cables (room J). To welcome the players, I created a “futuristic” audio track whose codes were borrowed from the existing genres (Audio 7). I started composing an ostinato theme with a progressive drum pattern made of glitch sounds and played at 160 BPM. This basic syncopated rhythm is typical of the drum and bass genre, shown in Example 8.
Seven other layers made of glitch sounds complete that drum part. On top of that, I wanted a simple melody. Contrary to the bassline of “1980s,” which was a sentence, the “Arcadiac” theme follows the phraseological structure of a period. A first part AB ends with a weak cadence (on the D note, the dominant) and is followed by a slight transposed variation A’B’ that ends on a stronger cadential closure (on the G note, the tonic).53 To ensure intelligibility, some notes are played staccato and short silences are regularly placed (see Example 9).
Because of its ease to be sung, its shortness, and its structure that mainly uses repetition and transposition, that theme can be analyzed as an EDM anthem like the ones found in big room house music. But here it is not supposed to be played to a large audience and sung by thousands of people. Since the context of Nobody is different, there is no point in using the usual supersaw timbre that easily cuts through the mix.54 Instead, I chose a thinner 8-bit timbre, made of distorted square waves and noise combined with a high-pass filter. As discussed later, this would easily recall chiptune music, hence matching the scenario and the setting of this arcade room.
The buildup-drop tandem function of “Arcadiac” is different from the one used in “1980s.” Instead of increasing power to the track by unfiltering the looped phrase and bringing a heavy four-on-the-floor metric with drums, I preferred here to make a sudden rhythmic change. For the buildup, the theme is repeated and slightly varied for nearly one minute. At the end, this melodic part slowly turns into a pad sound by increasing the release parameter of the amplitude envelope. Meanwhile, the drum pattern is punctuated by a heavy bass drum placed on the second and fourth beats. This has the effect of discreetly dividing the tempo by a factor of two. Massive dropping pitch envelopes bring additional low impacts to the buildup. Finally, after seconds of sonic resonance during the climax, a subliminal message substituted for the expected drop.55 Then, a dubstep-inspired section begins. It alternates heavy bass drum hits mixed with a grating growl bass with a relaxing synth pad. The growling sound, produced by wavetable synthesis, is used for its spectacular effect. Because of its specific modulation of harmonic content, it’s for me a synonym of the most modern sound we can obtain to date. Through a simple but effective combination, the synthesized sounds and glitches ensure the “electric” aspect of the room.
Located in the middle, and highlighted by large, sheathed cables, the arcade machine is indeed the main center of attraction of the team of players. A custom-made video game features the two main characters, Daniel and Eric, shown in Figure 19.
In this hybrid game using codes borrowed from platform and fighting games, the characters cannot fight but dance. Through the prism of a video game, players find a summary of the choreography performed in the street. This scene is therefore a mise en abyme of the one played a few rooms before, including the music. Indeed, an 8-bit version of “1983” (Audio 8a) is played on the speakers of the arcade machine.56 I used some typical codes of chiptune music: minimalist instrumentation (bass, lead, melodic ostinato, arpeggio, drums), pulse-wave oscillators, precise quantization, low dynamic variation, octave intervals in the bass, and white noise samples.57 As such, it’s intentionally a wink to the NES 8-bit console, which was the most famous one in Europe during the eighties.58 Apart from the aural skeuomorph aspect, that type of sound also links with the five well-hidden NES cartridges located in the room. At some point during this game-within-a-game, players must type a phrase that is a word combination of the five titles.59 The cartridges are part of this retrogaming puzzle.
In this room, the music has two different functions. First it serves a conceptual goal; the audio tracks are coherent regarding the setting and the objects present in the room. It also serves a game purpose, as players must reproduce the same dance steps of the EDM track “1980s,” in a slower way, and by the prism of a custom-made video game played on an arcade machine. This mise en abyme is the key to the understanding of the plot. By giving the necessary clues to finally solve the “meta puzzle,” the music ends up becoming the foreground of the adventure.
The third-generation escape room Nobody offers various musical puzzles. Each of those can be considered a mini game requiring a different strategy to renew the player’s experience. Having exposed technical and aesthetic considerations, the analysis of six game portions first shows how EDM production techniques are used despite the constraints linked to the scenario of the escape room and the interactivity in the games, which have resulted in very diverse audio content. Some of the tracks provide a pulse, others do not. When they do, the tempi are different, and the percussion part is always created with different samples, playing a four-on-the-floor or a breakbeat pattern. Some tracks are built on four-measure loops, others on eight. One of the main musical themes is a sentence while the other is a period. Different types of synthesis are used (subtractive, FM, wavetable) to fit the declensions of the retrofuturistic universe through each room. Despite this, some characteristics allow finding a sonic identity as well as coherence in the development of the plot, such as the use of a recurrent tonic of C, minor modes (harmonic minor, phrygian, aeolian), specific intervals (many minor thirds, augmented seconds and semitones, some tritones), or techniques to increase intensity (acceleration of the BPM, progressive addition of elements of the setting). The analysis makes it possible to distinguish three levels of EDM influence in the creative process:
Computer design methodology: the way of programming puzzles comes from my background in using EDM-oriented softwares (puzzles 2 and 5).
Audio content: tracks include EDM tropes (puzzle 1) or production techniques (puzzles 3–6).
Hyperrealism: the reproduction of a nightclub environment calls for the creation of an EDM track from scratch, as well as using iconic elements such as smoke or stroboscope (puzzle 4).
Beyond a pedagogical interest resulting from the explanation of production techniques (such as thickening a sound in different situations), the justification of aesthetic choices in the realization of certain tracks highlights compositional issues that also cover the field of video games. One then may wonder how escape room games compositionally differ from other ludic music.
First and foremost, one point should be made clear: Nobody is a game. Steve Swink defines the video game feel as a “real-time control of virtual objects in a simulated space, with interactions emphasized by polish.”60 We could extend the definition to other playful games. The game feel could then be defined as a “control of objects in a physical space, with interactions emphasized by polish.” In Nobody the objects can be as diverse as an iPad, NES cartridges, a MIDI controller, polaroid pictures, or a trunk. The simulated space is different in each room (metro hallway, tunnel, street, etc.), and the scenery matches the scenario as much as possible. Since Nobody involves observation skills, defined objectives according to a set of rules, rewards from solved puzzles, scores of achievement,61 and a story with characters, I argue that it is a game.62 In terms of sensation, this game offers all the most common experiences Swink notices,63 whether they are related to extending the senses (puzzle 3), mastering the skills via performances (puzzles 1, 2, and 4), or feeling an aesthetic sensation of control (puzzles 5 and 6).
With all these considerations in mind, it is possible to argue that composing for an escape room or a video game is very similar. In both cases, it requires a consideration of two directions. First, the horizontal dimension gives constraints on the sequencing of the musical content. Although the triggering order of each audio element is mostly determined, the timing is not. This is what Paul calls branching.64 The vertical dimension, called layering, refers to the superimposed audio content, such as dialogues that are played over the current playing track, stacked loops in puzzles 2 and 5, or simply multiroom streaming (puzzle 4: a soundscape playing in room E while the track “Nightclub” is playing elsewhere). Another distinction that is common to both virtual games and escape rooms relates to the degree or interactivity of the musical content. In Nobody some tracks are linked to puzzles (“1980s,” “Videocœur”) and other tracks that are not (“Vibration,” “We Start Here”). This is what I respectively called “game music” and “background music.” Following a specific action, the same track can jump from one category to the other (“In-Between”). The background tracks call for long crossfade and non-metrical content (puzzle 1), while game tracks are subject to more constraints (puzzle 5). As is the case in video game music-making, there is content that is played in a one-shot, and other that is looped. The techniques for creating loops and triggering them are therefore similar.
Although this study shows that the huge difference between a virtual environment and a physical one does not make much difference in terms of composition, in an escape room, the controllers can be of all kinds (iPad, MIDI controller, body), and there is no screen to permanently look at while playing. Therefore, we can no longer speak in terms of diegetic or incidental music, nor use the triple lock of synchronization between image, sound, and player proposed by Donnelly. That tripartition must be adapted to a physical environment, considering, for example, that the image parameter becomes the observation of the surrounding space. Nonetheless, whether for a video game or an escape room, the problematic remains the same: produce and renew interactive audio content for players without losing an artistic direction. For that reason, most of the ludomusicology theoretical concepts remain valid for analyzing Nobody, and more generally escape rooms.
The vocabulary and the typology used in the scope of this article provide a frame of reference for the analysis. Through the example of an escape room, the practice-led research elucidates artistic choices. It both illustrates and discusses ludomusicological theoretical concepts, and shows that some of them remain valid while others need to suit the specifications of a game that is played in a physical space. For instance, Austin’s classification highlights the diversity of the analyzed mini games, while Michelmore and Paul’s analytical tools (clean loops, transition types, quantization transitions) elucidate compositional constraints. To extend this study, it would be worthwhile to focus on global theories of sound and music in video games, such as the ALI (affect, literacy, interaction) analytical model proposed by Isabella van Elferen.65 Based on definition of media literacy given by Martina Roepke,66 van Elferen defines musical media literacy as “the fluency in hearing and interpreting film, television or advertising music through the fact of our frequent exposure to them and, subsequently, our ability to interpret their communications.”67 For example, the simple association I made between sonic features and eras (the past using subtractive synthesis and the future using wavetable synthesis) is shaped by the mass culture, such as the TV series Stranger Things (2016)68 or the movie Blade Runner (1982).69 According to the author, literacy, affect, and interaction converge to explain the musical immersion in a video game. We may wonder if it is also the case in an escape room, and how these three components would help explain Nobody’s immersion.
This leads to a fundamental question that was intentionally avoided in the course of this study: Without a screen but instead with bodily movements and locomotion in a physical space, what does “immersion” mean? At first glance, escape rooms seem to offer deeper immersion than virtual games. In games that are played in a physical space, immersion is not only achieved through sound and music, but also through the presence of human beings, whether it be the players themselves or actors. Escape rooms provide an additional degree of freedom whose features are linked to physicality. Contrary to virtual games, most elements of the setting can be observed in 3D and touched. Would the immersion be greater with real tiles or wallpaper? In that sense, the definition of “immersion” needs to be refined to encompass real games. Based on practice-led research, the example of Nobody and its EDM-oriented soundtrack shows that studying escape room music belongs to the ludomusicology field, and it gives insights into building an adapted analysis methodology.
The list of project participants is available in the appendix. In less than one year, this escape room has become the best referenced in its category and is one of the most popular entertainments in the city of Nice. According to the players, the artistic approach for all the components makes Nobody unique. As of July 21, 2022, 1,050 games were played (with an average of 4.1 players per game), and 764 comments on Google and 94 on Trip Advisor were left, both with an average of 5/5 stars. The word that appears the most in Google analytics is “immersion” (72 occurrences). For more information, see indicators and comments provided by Google and Trip Advisor.
Inspired by haunting houses and scavenger hunts, but also by interactive theatre, the escape room was born in the 2000s in Japan and the United States. It is only in the 2010s that it became popular in Europe; in the first wave of games, the resolution of the puzzles remained rudimentary (unlocking padlocks and codelocks, finding keys, ciphers, and paper clues). After the genre came to Europe, the second generation of escape rooms used mechanical components (such as moving an object to a specific place in order to unlock a door, or smoke that gushes during the final puzzle). A third generation introduced a fair amount of electronics, sensors, and other new technologies. Thus, the scenarios and puzzles can be very automatized. For a history of escape rooms, see Katriina Penttilä, “History of Escape Games Examined through Real-Life and Digital Precursors and the Production of Spygame” (Master’s thesis, University of Turku, 2018), accessed April 9, 2022, https://www.utupub.fi/bitstream/handle/10024/145879/History_of_Escape_Games_ProGradu_Katriina_Penttil%C3%A4.pdf?sequence=1&isAllowed=y.
The payment device is integrated into the game flow using an old Mentos dispenser. A picture is provided in the appendix.
For instance, validating a code must turn a light on, pressing a button must play a dialogue through the speakers, and answering a question must unlock a door.
In this article, only the musical parts of the escape room’s soundtrack are discussed. The sound design and dialogue are not addressed, even though they are integrated with the final soundtrack to be released.
For this reason, no music has yet been declared to a copyright organization. The soundtrack (consisting of music, sound design, and dialogue interludes) is scheduled for release in 2022, as is the franchise. It is therefore possible that the game will be internationalized, starting with a translation of all the recorded dialogues.
Official website, accessed April 9, 2022, https://nobody-escape.com/.
Other speakers are placed in rooms I through M during the testing phases of subsequent chapters.
The game master also has some manual commands to trigger certain actions from the control room (see Figure 20).
Usine is a visual programming software in development since 2006 by Olivier Sens (Brainmodular). Like Max/MSP, it allows you to wire different audio, video, MIDI, lights, or network modules. The interface of a workspace (name of a Usine project) is made up of racks (corresponding to mixing console channel strips) including inputs and outputs, in which are arranged patches containing modules (that may be scripted).
The programming of the game is conditioned by the chapter(s) to be played by the team. When playing only chapter 2, the sequence of actions is slightly different as summary audio files re-explain the highlights of the plot for helping players get back into the storytelling.
The first chapter has been available since August 2021; the second one has been available since April 2022. The final chapter will be available in 2023.
The detailed outline of the story is provided in the appendix.
A BomeBox is a device allowing connection of MIDI devices on a local network.
To match Nobody’s world, each device has been embellished. The players do not manipulate a simple iPad but an interactive element that is part of the game. A picture of the ghettoblaster is provided in the appendix.
Arturia offers a collection of plugins that emulate about twenty of the famous hardware synthesizers (such as Moog Modular, Buchla Easel, Yamaha DX7, or Roland Juno-6), while u-he offers high-fidelity emulations of the Minimoog and both Sequential Circuits Pro-One and Prophet-5. Compared to the original ones, all the emulated versions are enhanced because of modern computer capabilities (increase of voices of polyphony, routing flexibility, additional effects and modulations).
All my field records are classified and tagged according to the Universal Category System (UCS), as are the third-party soundbanks I own. When paired with a powerful audio explorer, UCS becomes very useful for searching for sounds. For that part, I use an AI-based software called Sononym, which allows you to find similar sounds from a given sound. For more information about UCS, see https://universalcategorysystem.com.
In its simplest form, an envelope is an audio synthesis module that modulates the volume over time. An LFO is an oscillator that runs at low frequency. Commonly not audible, it is used for parameter modulation.
Additional photos of the rooms are available in the appendix.
The seven main genres of music games Austin notes are: rhythm games; sampling/sequencing and sandbox games; karaoke music games; mnemonic music games, and musical puzzle games; musician video games; music industry games; and edutainment music games and musical gamification. The author also analyzes the different controllers involved in music games (peripheral controllers, motion controls, Wii nunchuks, smartphone and portable listening device touchscreens). He finally proposes to classify music games into two types based on the player’s musical engagement through the game. They are procedural (interacting with musical materials and procedures) and/or conceptual (explicitly themed around music-making contexts). For more information, see Michael L. Austin, “Music Games,” in The Cambridge Companion to Video Game Music, ed. Melanie Fritsch and Tim Summers (Cambridge: Cambridge University Press, 2021), 140–58.
A list of the roles of each of the characters is provided in the appendix.
Drone music is characterized by minimalist tracks that use very long sounds (pitched or textural). The musical movement comes from subtle sonic changes rather than harmonic ones.
The tonic D and the supertonic E are mobile scale degrees. After several A repeated notes, the fifth diminished chord (Am7b5, including an E-flat note) appears surprisingly, as is the diminished fourth (G diminished, including a D-flat note).
“Vive sentique” is the Latin translation of vis et ressens (live and feel).
For a technical insight on how the Usine patch was programmed, see Figure 21 available in the appendix.
Paul considers six transition types: fade out and fade in, butt edit (no fade), crossfade, transition segment, transition stinger, and layered transition. The names speak for themselves, except for the transition stinger, which means that a short stinger is played as soon as the next segment starts. For more information see Leonard J. Paul, “Droppin’ Science: Video Game Audio Breakdown,” in Music and Game: Perspectives on a Popular Alliance, ed. Peter Moormann (Wiesbaden, Germany: Springer VS, 2013), 63–80.
The initials stand for “Maître du Jeu” (Game Master), but they could very well be those of a well-known personality in the world of pop music…
In addition to being a track that was famous enough (but not too much!) and that could be easily deconstructed into stems, the title also has a rhetoric function regarding the plot.
Paul, “Droppin’ Science,” 67.
For a technical insight on how the Usine patch was programmed, see Figure 22 available in the appendix.
The placement of the microphone in this location caused a feedback effect. This unwanted frequency was easily cut off using a notch filter. A simple spectrogram from a smartphone application was sufficient to identify the frequency.
That value corresponds to the closest multiple of four that would make a record of around thirty seconds, which was judged as an optimal duration for beatboxing.
Programming the recording, storage, and playback of the performance was complex. It was necessary to consider the number of measures, the location of storage with the date, then to read the file again, and then erase the temporary memory so that the performance of the next team could be recorded in its turn. For each of these parts, special modules are available in Usine (see Figure 23 available in the appendix).
One of the players is also invited to wear his clothes and one of his famous accessories. That way, this mini game acts like a musician video game (Austin’s typology), where the player is identified as the musician himself.
For a technical insight on how the Usine patch was programmed, see Figure 24 available in the appendix.
Dubstep is a recent genre of EDM that features heavy bass lines, syncopated rhythms borrowed from two-step, a 4/4 signature, and a tempo of around 70 BPM. For a short history and a musical analysis of dubstep, see Rick Snoman, Dance Music Manual: Tools, Toys and Techniques (Abingdon, England: Routledge, 2019), 377–86.
A four-on-the-floor metric describes a measure that is filled with steady bass drum quarter notes. According to Mark J. Butler, the term comes from rock and refers to the drummer, who must depress the foot pedal to play the bass drum. For more information, see the chapter “Conceptualizing Rhythm and Meter in Electronic Dance Music” in Mark J. Butler, Unlocking the Groove: Rhythm, Meter, and Musical Design in Electronic Dance Music (Bloomington: Indiana University Press, 2006), 76–120.
Karen Collins, “An Introduction to the Participatory and Non-Linear Aspects of Video Games Audio,” in Essays on Sound and Vision, ed. Stan Hawkins and John Richardson (Helsinki: Helsinki University Press, 2007), 263–98.
K. J. Donnelly, “The Triple Lock of Synchronization,” in The Cambridge Companion, 94–109.
A picture of the taxiphone is provided in the appendix.
O.Z.O.R.A., from the Hungarian name of the city where it is located, is a psytrance festival yearly gathering around 60,000 people during five days. It is considered one of the most important psychedelic festivals in the world.
A picture of the statue is provided in the appendix.
Synthwave is an electronic music genre that emerged in the early 2010s. Largely influenced by the 1980s, it is characterized by an extensive use of synthesizers of the time, such as Roland Juno-6 or Yamaha DX7. Kavinsky is one of the most famous artists of the genre. At this stage of the game, the synthwave sounds were used to match the 1980s movie posters (such as Back to the Future, directed by Robert Zemeckis, 1985) stuck on both sides of the room.
Big room house is a subgenre of house music influenced by trance goa and developed in the 2010s. On a tempo of around 130 BPM, it features heavy bass and drum sounds with lead synths full of harmonics. The tracks are made to be played on very large sound systems during EDM festivals. Martin Garrix is one of the most popular artists of the big room house.
A sidechain compressor is an audio processor that compresses the volume of a signal according to another incoming signal. Although it is a technique used by all audio engineers for creating homogeneity between different instruments, the most obvious example found in EDM is when the kick frankly pumps the bass.
William E. Caplin, Analyzing Classical Form: An Approach for the Classroom (New York: Oxford University Press, 2013), 33–72.
Here is a list of big room house tracks in which EDM anthems can be found: “Animals” by Martin Garrix (2013), “Tsunami” by DVBBS and Borgeous (2013), “Booyah” by Showtek (2013), “Bad” by David Guetta (2014), “Secrets” by Tiësto and KSHMR (2015), “Arcade” by Dimitri Vegas and Like Mike (2016), “Shangai” by Carta (2016), and “Byte” by Martin Garrix & Brooks (2017).
IDM stands for intelligent dance music. This term encompasses genres in which strict repetition is almost always avoided. Instead, tracks feature metric changes, exotic scales and temperaments, constantly renewed drum patterns, and a lot of parameter modulation. The most famous artists are probably Aphex Twin and Amon Tobin. See also Venetian Snares, Autechre, Fine Cut Bodies, and Igorrr.
A breakbeat rhythm “de-emphasize[s] strong beats placing considerable stress on metrically weak locations.” Butler, Unlocking the Groove, 78–79.
Guy Michelmore, “Building Relationships: The Process of Creating Game Music,” in The Cambridge Companion, 68.
For more information, see Olivier Derivière (interviewed by Valentin Ducloux), “La musique interactive,” September 16, 2019, https://www.youtube.com/watch?v=pTEdr7aTyOY.
For more information, see Caplin, Analyzing Classical Form, 73–98.
A supersaw timbre is obtained by stacking multiple detuned sawtooth oscillators using manifold unison voices. Popularized by Roland with their JP-8000 synthesizer (1997), this technique creates a sound full of harmonics.
The voice of Jiddu Krishnamurti asks “Qui êtes vous?” (who are you?). Someone hesitates, and then we hear the sage’s answer: “Rien!” (nothing!).
Before composing the 8-bit version of “1983”, the arcade machine played the track entitled “2D game” (Audio 8b). During the design of this puzzle, the music of the choreography was “1983” (Audio 5d). Thanks to piezoelectric sensors, the steps on the floor of Room E triggered different layers of the music, and the musical structure was built on the fly and stored in a variable. This variable was then applied to the 8-bit version (Audio 8a) played on the arcade machine. Unfortunately, the idea of the dynamic floor was abandoned for technical and budgetary reasons, as was “1983”.
For more details on the composition and development of chiptune music, see James Newman, “Before Red Book: Early Video Game Music and Technology,” in The Cambridge Companion, 12–32.
The NES chip provides four synthesis channels (two pulses, one triangle, one noise). For more information on chips used in early video game consoles, see Newman, “Before Red Book,” 20.
These are the following games: Duck Hunt (Nintendo, 1984), RoboCop (Data East, 1988, NES version 1989), Super Mario Bros. (Nintendo, 1985), Teenage Mutant Hero Turtles (Konami, 1989), and Top Gun (Konami, 1987).
Steve Swink, Game Feel: A Game Designer’s Guide to Virtual Sensation (Burlington, MA: Elsevier, 2009), 6.
In Nobody the players receive automatically by e-mail a photo, their beatboxing recording, a short video of their choreography, and their score calculated according to the duration of the resolution of the puzzles and the discovery of Easter eggs.
By the way, escape rooms are also known as “escape games” and are often called as such in Europe.
Swink denotes five common experiences of game sensation: the aesthetic sensation of control, the pleasure of learning, practicing, and mastering a skill, extension of the senses, extension of identity, and interaction with a unique physical reality within the game. Swink, Game Feel, 10.
For more information, see Paul, “Droppin’ Science,” 64–66.
Isabella van Elferen, “Analysing Game Musical Immersion: The ALI Model,” in Ludomusicology: Approaches to Video Game Music, ed. Michiel Kamp, Tim Summers, and Mark Sweeney (Sheffield, England: Equinox, 2016), 32–52.
Media literacy is defined by Martina Roepke as “habituated practices of media engagement shaped by cultural practices and discourses.” Martina Roepke, “Changing Literacies: A Research Platform at Utrecht University,” Cultures and Identities, Working Paper no. 1 (2011), accessed August 13, 2022, https://view.officeapps.live.com/op/view.aspx?src=https%3A%2F%2Fmmroepke.files.wordpress.com%2F2010%2F03%2Fcl-working-paper-1-voor-boekje.docx&wdOrigin=BROWSELINK.
van Elferen, “Analysing Game Musical Immersion,” 36.
This science fiction TV series is set in the 1980s. The composers Michael Stein and Kyle Dixon made extensive use of vintage synthesizers (such as Sequential Circuits Prophet-5, ARP 2600, Roland SH-2) to recall the sound of artists of the time such as Jean-Michel Jarre, Vangelis, and Giorgio Moroder.
The soundtrack of this science fiction movie of the 1980s was composed by Vangelis, who used synthesizers of his time, such as Yamaha CS-80 and Roland VP-330.
Outline of the Story
– Act 1: Awakening (rooms A and B)
Scene 1: Entering the Dark Hallway
Scene 2: At the Core of Daniel’s Memory
Scene 3: Pop Music in the Metro
– Act 2: Rising (room C)
Scene 4: On the Rails of Truth
Scene 5: The Machine Room
Scene 6: The Voyage
– Act 3: Falling (room E)
Scene 7: In the Footsteps of Eric
Scene 8: At the Doors of Transylvania
Scene 9: Escaping the Horror
Role of the Characters
iPad K7: host, who welcomes the players and introduces and closes the scenes/chapters
Daniel: main character, who must be freed
Eric: historical figure, best friend of Daniel, with whom he had a fight
MJ (Maître du Jeu): historical figure, gamemaster who briefs the players
Leo: historical figure, artist, and creator of the time machine
The TV man: mysterious man, who has a primary function…and who may be close to you…
The Doctor: guide for finding the TV man
Hoax (removed): historical figure, old and wise man, friend of Eric
– MJ: Lucy
– iPad K7: Alice B.
– Leo: Stefano
– Eric: Yannick M.
– The Doctor (and two other famous characters): Sam
– Hoax: Léonard L.
– Daniel: Bruce B.
Work: Moussa, Emmanuel C.
Graffiti: Cédric (Kosh)
TV man statue and headset: Cédric P.
Scenario: Bruce B., Loïc A., Alexis
Digital design and/or development:
– iPad K7 app: Julien A.
– Ghettoblaster and iPad Tél apps: Loïc A.
– Microcontroller and original DMX lights: Stéphane P.
– Computer music design: Cyril D.
– LED matrix and taxiphone: Jacques-Alexandre
– Doors and network system: Stéphane P., Jacques-Alexandre
Projection mapping: Loïc A., Philippe M.
Choreography animation: Virginie Roges
Composition and sound design: Cyril D.