Mobilized by the new media formats of the past several decades, publications in music studies have steadily pushed toward the integration of printed text and audible sound. However, this push has not yet delivered the kinds of integrated analytical frameworks—printed text and sound speaking directly to one another—that it seems to render possible, especially in scholarship.1 Of course, there have been many enriching formats (blogs, podcasts, embedded sound clips in digital publications) in the twenty-first century that did not exist in the twentieth. But the mingling of sound or music with text has remained mostly out of sync with academic publishing, whose proprietary and logistical constraints require compromise. In this essay, I trace some routes through the contemporary technological thicket that might allow printed text and audio media to work together in intellectually productive ways, and which moreover the academic publishing industry can support. I outline one promising format in detail, although digital publishing in music and sound studies is an open field with many potential applications, so other formats are certainly possible. The challenge will be different for those working in say, popular music studies compared with those making ethnographic field recordings. But these fields of research are similar enough that they can be discussed simultaneously.
To begin, a bit of history. Scholarly presses have in fact long sought to integrate music and other kinds of sound with print-based publications. In the 1990s, presses bound CDs inside the back covers of popular music and ethnomusicology books, as they had done earlier to a more limited extent with cassettes and LPs. In the early twenty-first century, publishers began to build password-protected websites where audio material relevant to monographs or textbooks could be archived. But neither solution—physical format or digital link—fulfilled the promise of the available media. In the CD era, too much equipment had to be simultaneously at hand for text and sound to truly work in tandem—how many readers carried portable CD players? And as one editor at a university press noted, CDs had a frustrating tendency to get lost, especially library copies. Even those discs that remained in the book would eventually degrade. Compounding the problem, not all printing houses were willing to bind CDs, so if a publisher switched printers, as became increasingly common with the advent of print-on-demand, the publisher often needed to eliminate the musical supplement entirely. Password-protected websites may make audio material more stable, but it is not clear that this improves access for all readers. Printed (or even digital) books and online archives remain awkward to use together. For many readers, sonic supplements to music and sound studies publications have thus been too beset by technical hurdles to be analytically essential to the books themselves. And perhaps because sound has been broadly regarded as secondary to text (equally so in the Derridean sense and in the publishing industry), and because presses lack resources to edit or review audio, books in music and sound studies often come with low-quality, poorly organized supplements.2 Meanwhile audio books, when created at all, are licensed to outside companies or contractors, making them in effect a separate business. Text and sound have grown closer over the decades, but in the domains of academic publishing and popular music studies trade books, not quite close enough.
In some sense this is an old problem. Music-critical adjectives are never equal to sound, as Roland Barthes noted. Many audiophiles spend their lives seeking a degree of transparent sound reproduction that would obviate this issue, alas always in vain. In the end it is death and taxes for music scholars: no language is so precise, no inscriptive system so thorough, no audio tech so advanced that it can “capture” music as a sonorous artifact without transforming it both sonically and ontologically.3 But if one dispenses with the fantasy of perfect representation and thinks of text and sound as dialogic partners, rather than distorted mirrors of each other, our contemporary time (I write this in early 2019) may be felicitous. Today podcasts and audio books are two formats that could be examined and emulated more widely by scholars and presses. For in these, language itself is presented as sonorous, just like the stuff it describes. The largely artificial distinction between analytical printed text and analyzed sonic material collapses. Audio material, be it music or field recording, is not relegated to being illuminated by the writing, but instead becomes an element of commentary in its own right, speaking directly back to the text. In podcast and audio book formats (and their kin), text and sound are no longer eternally imperfect transductions of each other, but coequal partners.
Within this frame, I next explore some possibilities and implications of publishing scholarship in sound in the early twenty-first century. By this I mean something different from a Feldian “doing ethnography in sound,” however. Feld once asked, ‘‘What about an anthropology of sound? What about ethnographies that are tape recordings?’’4 Doing ethnography in sound, for Feld, begins with the recognition that sonic poetics are affective, and therefore irreducible to their interpretation. Sound ethnographies might serve in place of textual analysis, communicating affect as well as (if done thoughtfully) the structures of meaning that sound encodes. Simply put, Feldian sound is an alternative to text; it speaks to a discrete plane of knowing and being in the world. But certain limits remain. Recorded sound, like sound in any medium, cannot re-present the affect that it tends to activate without further elaboration. This circumstance is wholly familiar to anyone who has ever played a favorite song for someone else, only to find that person unmoved. To suggest something non-obvious about music or sound requires comment, explicit or otherwise. Music or sound need not be understood as alternatives to text; the two work best when they mingle and collaborate.
What is promising about the audiotechnical present, then, is not merely that field recordings can be made more accessible alongside explanatory text, nor (in a Feldian register) that sound can supplant text, but in fact that text itself can now be easily conveyed as sound. New media can usefully position text and audio material as inseparable. Deborah Kapchan incisively asks, in her introduction to a recent edited volume, “How might we not only write sound but sound theory differently?”5 Lingold, Mueller, and Trettien similarly advocate “giving voice to thought” in their preface to a new volume on digital sound studies.6 Heeding these imperatives—of sounding theory, of voicing thought—ensures that we do not play Haravagian god tricks. We ought not, in other words, confuse text for an objective, immaterial decoder of sound. Even on the page, text is not metaphysical or abstract. In a podcast or similar format, interpretive text is conveyed as a sonorous and aurally affective thing, experienced on the same plane as musical examples or field recordings. We hear and are thus continually reminded that the text, too, is material. In such formats, theory is sounded, thought is voiced, and much for the better.
During the past six months, I have adapted my own first ethnographic monograph, called Bangkok Is Ringing: Sound, Protest, and Constraint, into an audio edition that combines elements of podcast and audio book formats.7 This adaptation strives partly to represent sound and sonic experience, to enable readers to hear what it sounded like in the field. This has value. But more importantly, the combination enmeshes analytical text (that is, the book, in narrated form) with the audible material of fieldwork. It refuses the notion of the text as a bodiless entity, making the text sonorous and also subjecting it to the effects of other sonorities. To the extent that other authors might want to create similar adaptations with the technology now available, I offer this essay as a partial guide for considering ethical, economic, and production questions.
The structure of the audio edition of Bangkok Is Ringing mirrors that of the printed book, with sound threaded throughout. Each chapter is read by a different narrator, and once field recordings are added, made available as its own downloadable file.8 Using basic audio editing software, the narration track for each chapter is interwoven with relevant field recordings and sounds.9 This might include, for example, a brief snippet of a song after it is mentioned by the narrator, an ambience that plays during the description of a scene of protest, or a few seconds of an instrument following a discussion of that instrument’s timbre. Certain field recordings—especially interviews—substitute directly for narrated quotations, augmenting the text of remarks with the inflection, pacing, and sonority of the speaker. The audio edition is still technically a monograph, but its singular voicing is fragmented. One sentence reads, for example, “[narrator’s voice] Mae, for instance, enjoying no legitimate claim to discursive public space, used the weak, oblique medium of the recorder to shout into that space, saying almost whatever she pleased because she was powerless, and experiencing no fear as a result: [Mae’s voice] Go ahead! I’m not afraid of you fuckers!”10
This format allows for the fluid integration of authorial text and sound recording, even within the space of a single sentence.
Such integration can enliven the text, give voice to more people, and do analytical work. But it also raises ethical questions. Namely, to reproduce someone’s recorded voice is quite different from quoting their words in print—the sound of speech can introduce vulnerability. Especially because Bangkok Is Ringing is an ethnography of sound in political conflict (Thailand’s recent, antigovernment “Red Shirt” movement), the ethics of reproducing speech and other identifiable sound became a concern. Since many of my interlocutors assumed grave political risk, not all spoken quotations could be included safely in the audio edition. Just as the fraught relationship between people and their words must be considered when writing a book or article, with anonymity carefully preserved where necessary, an entirely new round of scrutiny was needed for the audio edition. Vocal recordings add useful detail for scholars, but no less so for cops.
Meanwhile, the narration process subjected the text itself to new questions, in ways that were both useful and challenging. The following difficult issues arose, among others: Who would narrate each chapter? What kind of authority would be assumed through spoken narration, even beyond that of writing? And how would cited passages from external sources within the text be spoken, as other people’s words were not only printed but spoken? I do not have an easy solution to the last problem in particular. I invited all authors cited in the book to record their own quotations, and offered to use these recordings in lieu of the narrator’s voice, if the author wished. Many obliged, and the remote recording process was explained so that it would be free, fast, and easy. But some authors did not respond, did not want to participate, became busy after initially agreeing, or were deceased, to name just a few extenuating circumstances. This already partial solution also butted up against the fact that not every figure in the book was cited approvingly. Should an author be invited to record their own quotation if that author is critiqued by the book’s argument? For every way that the audio edition of the book brought text, voice, and sound into a richer dialogue, so too did it necessitate new ethical considerations around representation. The shift in medium required an attendant shift in habits of etiquette and professionalism.
Despite these challenges, the analytical moves that became possible by bringing sound and narrated text together were significant. Chapter 2 of the monograph describes an older woman, Mae (the same person quoted above), who has a long history of involvement in protest movements in Thailand. The chapter describes how Mae’s self-identification as a political subject resonates with the life story of a legendary pop singer named Pumpuang Duangjan. In the text, I relate an anecdote about Mae singing Pumpuang’s music after dinner one night, as she spoke to a group of friends about the experiences that had motivated her political engagements. In the audio edition, I was not only able to thread Mae’s impromptu performance into the narration, but at one point aligned some of Mae’s singing with a recording of Pumpuang’s original recording of the same song, so that they were in effect singing together.
This alignment underscores how Pumpuang served as both political and musical archetype for Mae, a point made at length in the text and then reinforced poetically in the production. There are further layers to this juxtaposition, one of which is that the narrator of the chapter is Dr. Deborah Wong, one of my own mentors, whose ethnography of Thai musical archetypes informed this section theoretically. Dr. Wong is cited directly in the text, but she is also cited tacitly by this layering. In such instances, it became possible to introduce meaningful analytical gestures that complemented or even surpassed the narrated text. W.F. Hsu, in their writing on digital ethnography, discusses the analytical value of using digital tools to reframe information in different modes—for instance, looking at sound recordings as waveforms or through spectral analysis.11 Hsu describes the patterns gleaned from old Taiwanese cassette recordings by observing those recordings through the visual tools of a digital audio workstation. Like Hsu, I discovered that narrating the written text of a monograph aloud, and mingling it with the sound under discussion, reframed the material multimodally, revealing relationships that were not possible through print alone.
There are compromises in narrating text, as well, although creative producers might find ways to recover some of what is lost. For example, the audio edition of Bangkok Is Ringing contains almost no footnotes, except for a few that were essential to understanding the argument or action, and which were therefore moved into the main body for the narrator to read. A podcast-like format, with its singular, contiguous narrative structure, offers no obvious outlet for footnotes, which in a printed book the reader can choose to consult or not, either while they read or later on, and which they can locate, navigate, or ignore easily. All that footnotes contain, by convention, from minor asides to respectful citations of influential people or works, is sacrificed. In the audio edition, I use two distinct bell tones to signal the beginning and ending of quotations. But no equivalent type of signaling would make the inclusion of footnotes less clunky. Format translations will not be a net positive in every respect, at least until a single audio marking and navigation system is broadly adopted.
Still, the benefits of audio editions of sound and music scholarship are clear. And vitally, traditional publishers can support them. Oxford University Press, for example, reverted all audio rights for Bangkok Is Ringing to me, and I agreed to make the narrated chapters freely available. A free audio edition is likely to help advertise a book, and in any case publishers do not regard audio editions as competitors of print editions. Authors can therefore approach audio versions of traditional monographs with a great deal of flexibility.
But should audio editions be separated from presses as a matter of course? After all, my university paid for the production of my audio book, in effect, through a small summer grant. But not all authors have access to institutional support. Moreover, as one editor at an academic press told me in conversation, licensing of copyrighted material has hardly been simplified by digital platforms, and questions of fair use remain unsettled. (In my audio edition, I did not seek permission when using snippets of copyrighted material, such as recorded songs, although I did endeavor to use short segments where possible, as this may improve the case for fair use). If the production cost and legal obstacles of creating audio editions de facto falls on institutions rather than publishers, then independent and unstably employed scholars, as well as those on the job market who lack institutional affiliation, may be burdened with uncompensated additional labor as well as legal risk.12 The possibilities of audio monographs are great, but some consideration of the labor market is due before diving in too enthusiastically. We must be attentive to certain questions. Should authors be expected to shoulder the burden of recording narration and producing chapters on their own? What benefit would come to them for doing so, and how might they be equitably supported? What benefits might come to presses if scholars of music and sound were to create audio editions routinely, and what might they therefore offer authors in return?
Text and sound have now, perhaps, reached a technological proximity that will enable music scholars to integrate them in ways more satisfying and analytically useful than in the past. Here I have described one specific format, which brings text and sound onto the same medial plane so that scholarship can be “read” in a genuinely intermodal fashion, and in ways that may be sustainable for academic presses. But in considering the path that might lead toward this and other new audible formats, I am reminded of an anecdote about the radio production team the Kitchen Sisters (Nikki Silva and Davia Nelson), which is presented in the book Recording Culture.13 Silva and Nelson once wanted to record the ambiences of a pool hall. In their initial attempt to do so, they set up a mic and recorder and adopted a hands-off approach, as recordists often do. But the results they got were muddled, and did not express what they wanted when listening back. So the duo returned to the pool hall, this time isolating specific sounds they wished to highlight, then piecing it all back together artfully, in a “creative treatment of actuality”.14 The recognition that sound does not speak for itself freed them to produce a more effective narrative piece, guided more strongly by their own interpretive hand. This anecdote may be an object lesson both for the production of audio editions of music scholarship and for how we think about these works in the current publishing and labor market. Text and sound have gotten closer and might soon join with greater technical efficiency. But in order to work well, the process will require stewardship, interpretation, and ethical care.