Expressive timing in hip-hop flow concerns the practice whereby an MC (rapper) inflects their flow rhythms on a minuscule scale not easily representable with standard musical notation—how far “ahead” or “behind” the beat they rap. Mitchell Ohriner (2019) positions expressive timing as an integral part of hip-hop flow and discusses it in detail. This paper complements his work by surveying flow timing across the broader hip-hop genre.
Three broad practices of expressive timing in flow are identified. Swung timing subdivides the tactus unequally, similar to a common jazz drum timekeeping pattern. Lagging timing refers to the patterned delay of flow rhythm in relation to the underlying instrumental or sampled beat. And conversational timing pertains to flow performances that resemble rhythmic patterns idiomatic of spoken language. Theoretical and notational concepts developed by Fernando Benadon (2006, 2009) and Ohriner (2019) are used to illustrate the extent to which a flow performance involves these approaches to expressive timing, and propose analytical methods for these approaches that highlight their functional and rhetorical appeal. Expressive timing is investigated in light of Signifyin(g) in African American music (Samuel Floyd Jr., 2002), groove-based expressive microtiming (Vijay Iyer, 2002), Afrocentric models of rhetoric (Ronald Jackson, 1995), and narrativity.
In his verse on the 2006 song “Long Time” by The Roots, Philadelphia musician Peedi Peedi (Pedro Luis Zayas) raps the following: “synthesizers tweet to improvise your feet / I calculated every lyric to arrive on beat” (1:49). Peedi Peedi appears to be telling the truth. Many of his lyrics do sound as if they arrive on beat—they are synchronized with the beat layer, the music that supports his rapping. But if we were to slow the track down and scrutinize Peedi Peedi’s timing, we could identify asynchronies between his syllables and the beat layer. These asynchronies are so minuscule that they are difficult to hear at performance tempo, and standard musical notation would fail to encapsulate their presence. But these asynchronies are there, if we look closely enough.
This paper focuses on performances of flow—the rhythmicized delivery of rap lyrics—where such asynchronies are audible at performance tempo.1 Expressive timing concerns the extent to which an MC (or rapper) inflects their flow rhythms on a minuscule scale—how far ahead or behind the beat they rap.2 Expressive timing is commonly used in hip-hop flow, but its prevalence, functionality, and rhetorical potential have only recently been scrutinized in detail. I build on Mitchell Ohriner’s work by proposing a tripartite categorization system that can help identify and analyze instances of expressive timing in flow.3 My system serves as an entry point for more detailed discussions of how flow rhythm (including expressive timing) interacts with the beat layer, how this interaction is perceived, and how it is harnessed for its narrative, functional, and rhetorical potential. After briefly summarizing the role of expressive timing in African American music in general and hip-hop music in particular, I unpack my categorization system. Swung timing subdivides the pulse unequally, similar to the common jazz drum timekeeping pattern. Lagging timing concerns the patterned delay of flow rhythm in relation to the beat layer. Finally, conversational timing pertains to flow performances that invoke rhythmic and stress patterns one might hear in spoken language. Using this categorization scheme to inform close readings of expressively timed performances, I aim to further our understanding of how African American vernacular musical traditions have shaped the texture, rhythm, and rhetoric of hip-hop music.
The utility of this research—beyond its grounding in music theory—is visible through several lenses. The first concerns the role of hip-hop music in the broader history of African American culture, namely its ongoing assimilation into mainstream society. As a musical genre, hip hop was borne from a practice of live performance in 1970s New York City. DJs and hype persons would invigorate the crowd between cuts as a means to maintain a consistent energy level on the dance floor. This invigoration comprised spontaneous rhythmic vocalizing, and eventually evolved into what we today know as rapping. The spontaneity of hip-hop vocals has persevered through freestyling, battle rapping, and hip-hop cyphers: all locations influenced by centuries-old African American traditions of toasting, signifyin(g), and “doing the dozens.” But as hip-hop music has occupied an increasing share of the popular music mainstream, it has gradually drifted away from its cultural origins in a number of ways. Among these might be a diminishing of importance in expressive timing in rap vocals. Though I do not possess corpus-driven statistics to support this, I suspect that an evisceration of expressive timing in recent hip-hop flow might not only be arguable, but explained through an aesthetic shift to a more mechanized conception of the performer’s’ voice, in lieu of technological advancements in quantization and pitch correction that have been in widespread use since approximately 2000.4
Such shifts have occurred across popular music, as vocals become increasingly processed and manipulated after they are recorded. Expressive timing in flow, above all else, reflects something that is distinctly and innately “human” about hip-hop music. In the realm of hip-hop beat production, where expressive timing also exists in abundance, the intermediary of technology—be it the turntable, sampler, or computer—removes by a degree the agency of the human from the sound. In the realm of rapping, by contrast, the “human” is precisely what we hear in expressive timing. If Sylvia Wynter describes posthumanism as “completely outside our present conception of what it is to be human,” then expressive timing encapsulates aspects of what it means to be inside a conception of being human: nuance, non-isochrony, variability, and above all a resistance to systemization and metered measurement.5 This human/nonhuman balance is precisely what Chuck D is referring to when he tells J.D. Considine in an interview that, when he raps, “[I] just try to throw some style and flavor on [the track] without sounding so robotic.”6
A final underlying theme in this research pertains to the enormous rhetorical and expressive potential of lyrical utterances (and, in effect, performance decisions) that are aurally minute in scope. The slightest lag on a syllable, or a brief recourse to a rhythm that sounds conversational, can dramatically alter the rhetorical potential of lyrics. And while the methodology for identifying these expressive timings is grueling, the timings themselves are often quite easy to hear. Without the tedious work of measuring specific expressively timed passages, we cannot fully ascertain how they function, how they align or diverge with other performances, or how an MC’s rhythmic delivery can be subtly used to amplify the meaning and resonance of their lyrics. In this sense, expressive timing serves both a musical and a sociological role in the hip-hop genre. While pitch and timbre are also fundamental aspects of the rapped voice, the fine-grained temporal aspects of flow—down to the number of milliseconds spent lingering on a consonant, or the barely perceptible lag on an important lyric—play an outsized role in how rap lyrics are imbued with character beyond their inherent meaning.
Expressive Timing in African American Music
Expressive timing figures prominently in nearly every genre rooted in Black Atlantic—Jeff Pressing’s term for the collective West African and African American communities—musical tradition.7 It plays a central role in West-African musical cultures, and Olly Wilson has highlighted the relationship between these traditions and African American musical practice.8 Vijay Iyer proposes the idea of a “pan-African musical aesthetic,” which I interpret as involving expressive timing.9 This aesthetic foregrounds a connection between music and bodily movement and echoes Samuel Floyd Jr.’s discussion of swing as predicated on its symbolism as a “dance-related legacy of the ring shout.”10 The association of expressive timing with embodiment also manifests in Anne Danielsen’s concept of a song’s hidden impetus: “those features located in the song’s micro-rhythms that are crucial for inducing our movement to it.”11 And Joseph Schloss discusses the relationship between expressive timing and dance from the more practical perspective of hip-hop beat production. In his words, “The beat must neither be too mechanical nor too ‘sloppy.’”12
Expressive timing in hip-hop flow has only recently received detailed attention. Ohriner’s research has culminated in the monograph Flow: The Rhythmic Voice in Rap Music, which presents a rigorous methodology to encode and analyze expressive timing.13 Ohriner’s types of expressive timing are called phase shift, swing, tempo shift, and deceleration. While swing and deceleration are self-explanatory, phase shift refers to flow that falls consistently ahead or behind the beat, a constant lag or anticipation. By contrast, tempo shift involves flow that occurs in a unique tempo, meaning it drifts out of sync with the beat. Ohriner’s analysis of expressive timing in Talib Kweli’s 2003 single “Get By” demonstrates the precision made possible through a software-aided timing analysis.
Ohriner’s research on expressive flow timing implicitly raises two issues/questions that I take as points of departure here. The first issue involves the temporal structure against which expressive flow timing is measured. If the beat layer is periodic and exhibits isochronous rhythmic patterns, then expressive flow timing can be identified and measured against that quantized structure.14 But if the beat is not perceived as isochronous—if it exhibits unequal or inconsistent attacks or durations—then evaluating expressive flow timing becomes more nuanced.15 Regardless of the beat’s isochrony, expressive flow timing is perceived as a function of the beat’s timing. Danielsen offers an abstraction of this relationship by using the terms “actual sounding events” and “virtual reference structures,” the latter of which she describes as “the non-sounding schemes that structure sounding events.”16 It thus becomes evident that in order for expressive flow timing to be perceived, rapped syllables must either be reconciled against actual sounding events or virtual reference structures—the entrained sense of meter listeners develop and project forward during the listening process.
But how are virtual reference structures created when the beat is non-isochronous? Perception of isochrony in beats will differ from listener to listener: since I can only speak to my own entrainment of meter, I acknowledge that my perception of expressive timing will differ from that of other listeners. Danielsen has also derived a method of dealing with discrepancies of this nature, which she calls beat-bin meter.17 This term is predicated on her theory that, rather than perceiving non-isochronous groove-based events as an aberration of an isochronous temporal framework, listeners attend to these variations in timing as equally valid representations of the beat or pulse. Grooves thus do not exhibit discrete beats, but comprise beat-bins: small temporal windows wherein any sound event participates in expressing a particular beat in the metric structure. Applying Danielsen’s theories of beat-bin meter and sounding events/reference structures points to the conclusion that in order to fully understand the nature of expressive flow timing, the beat layer must also be carefully considered: both in its level of quantization and its effects on listener entrainment of meter.
The second issue involves perceptual thresholds of expressive timing: how much of the expressive timing capturable with computer-aided methods is actually perceivable by ear? And at what point do listeners interpret two asynchronous sound events as discrete?18 Consider the first verse of the 2018 song “Come Back Baby” by Pusha T (Terrence Thornton). While Pusha T’s flow audibly lags against the backbeat pattern of the beat layer (beginning at 0:14), depending on how I focus my listening, I can rationalize these lags as residing within a “beat-bin” characterization of the backbeat, or I can hear the lagged syllables as distinct from the backbeat. It follows then that even when expressive timing is eminently audible, we must acknowledge the possibility that it may not be perceived as such by some listeners.
In the context of expressive timing in jazz performances, Fernando Benadon discusses the issue of perceptual thresholds, addressing the danger of placing stock in expressive timings that are so small they are not perceivable.19 He settles on a rule of thumb for his own work that I adopt here: “measurement magnitudes are significant only when they serve to support and articulate a ‘by ear’ explanation of the passage…rest[ing] on the understanding that an analysis which purports to explain expressive effects is fundamentally suspect if its measurements do not reflect appreciable sonic manifestations.”20 Benadon concludes that timing discrepancies below 50 milliseconds (ms) are imperceptible for his purposes. Rather than applying a rigorous cutoff point in my own analytical work, I nevertheless focus my discussions on timing discrepancies that I can a) identify aurally at performance tempo, and b) whose measured duration is usually greater than 50 ms.
Categorization of Expressive Flow Timing
My types of expressive flow timing closely parallel Ohriner’s and are primarily meant as an extension or modification of his system of categorization. His system is driven by his own measurements of expressive timing in “Get By” and other songs in a small corpus he developed. Mine is similarly rooted in my own measurements of expressive timing, but I offer no accompanying corpus, instead rationalizing my analyses in the musical relationship between flow and beat layers, the types of expressivity, rhetoric, and narrative afforded through timing, and the musical/linguistic relationship in flow.
My first category, swung timing, parallels Ohriner’s swing category, and I use Benadon’s concept of beat-upbeat ratios (BURs) to model swing in a flow performance. My second category, lagging timing, amalgamates Ohriner’s phase shift and deceleration categories. Ohriner’s phase shift category accounts for flow that consistently falls ahead or behind the beat. Since, in my experience, examples of flow syllables arriving consistently ahead of the beat are rare, I do not account for them here. My lagging category also encompasses Ohriner’s deceleration because I interpret constant and decelerating lagging as capable of producing the same rhetorical and musical functions. Ohriner’s tempo shift category has no clear analogue in my system. My final category, conversational timing, can account for some instances of tempo shift, because certain flow performances in conversational timing sound as if they follow their own temporal/metric structure independent of the beat layer.
With his four types of expressive timing, Ohriner dissects Talib Kweli’s flow and compares it with several other performances by that MC, as well as a small, genre-wide corpus of flow performances by other artists. He concludes that Kweli’s performance in “Get By” is indeed notable in the quantity, diversity, and scale of expressive timing used. While not venturing into a similar statistical discussion here, I have noticed that instances of expressive timing are much easier to locate in 1980s and 1990s hip-hop music than in more recent years.21 The lineage from Ohriner’s work to my own embodies a fruitful relationship between corpus-driven analysis and subsequent closer readings of songs based on that analysis. Put another way, Ohriner grounds himself firmly in the “what” of expressive timing and begins to interrogate the “why.” I continue this discussion of the “why” with other repertoire, further refining Ohriner’s categorization system in the process to better situate it to accommodate discussions related to performance rhetoric, musical and narrative functionality, and perceptual issues.22
Identifying the attack point of vocalized syllables is, as with any sound, a complicated and variable process that has been studied at length by Danielsen and Justin London, and also discussed by Ohriner.23 Attack-point identification involves locating the perceptual center (p-center) of the sound (in this case a syllable), that is, “the specific moment when it is perceived to occur.”24 My own identification of perceptual centers follows Ohriner’s method, which he describes as a “practical heuristic.”25 I begin by following Ohriner’s preference for equating p-center with vowel onset. In order to precisely determine where vowel onsets occur, I performed the following steps. I first used Spleeter, a source-separation code that can extract vocals from an audio track.26 I then used the University of Wisconsin’s online forced aligner tool to generate a Praat textgrid file that aligned the sound file with the typed lyrics.27 After making manual corrections to this file (the automated process is almost never completely accurate), I was able to map specific phonemes—below the level of the syllable—to specific moments in the audio. This meant I could locate vowel onsets. In the rare case where a vowel onset was ambiguous, or was blended with a voiced consonant, I used Praat’s pitch analysis of the sound file to determine the precise point at which the voiced consonant began. This method produced temporal data in the order of milliseconds for all syllables and phonemes. In order to map the syllable timings against the beat layer, I used Sonic Visualiser to identify attack-point locations in the beat layer audio, and then combined my timing data for the flow and beat.
I used a consistent methodology to transcribe the examples in this paper. While the importance of this consistency is self-evident, the transcriptions represent my hearings only. I have chosen performances where the expressive timing is quite audible, where I could surmise that others hear it as I do. There is no way around this quandary: Peter Winkler reminds us that “transcription must be recognized as an intensely subjective act” and that the “act of transcription is inevitably suffused with the subjectivity of the transcriber.”28 As such, interpretive latitude must always be considered. As long as we remember that transcriptions serve as interpretive representations of recorded sound—an aid for better understanding what we hear in that recording—interpretive latitude does not cause irreparable damage to our understanding of the music. Despite the inherent subjectivity of transcribing and analyzing expressive timing, its undertaking serves as a point of departure for an abundance of musicological inquires.
Transcription is an essential part of analyzing flow, since hip hop is largely a non-notated, oral and recorded musical tradition. Two broad practices of notating flow have emerged in recent scholarship: graph notation and staff notation.29 Graph notation schemes first appear in Adam Krims’s monograph Rap Music and the Poetics of Identity.30 Krims developed a notational scheme that divides a quarter-note tactus into four sixteenth notes, featuring an x on each beat subdivision where a syllable occurs. He notated each subsequent measure below the last, so that the graph is interpreted much like a (Western) musical score or a passage of prose: left-to-right and downward. Adams uses a modified version of Krims’s graph notation, replacing the x-marked syllables with the song lyrics (Krims included the lyrics beside the graph) and modifying the size of each cell in the graph as needed for rhythms that fall outside the sixteenth-note grid imposed by the graph.31
Staff-based notation schemes are also frequently used in hip-hop analysis scholarship.32 Depending on whether the notation seeks to represent pitch, single- or multi-lined staff systems may be used. Staff notation is useful for a music-educated readership (in that it poses little to no barrier against efficient comprehension) but can be alienating and prohibitive for readers who lack the necessary music-notational literacy.33 For publications aimed at a non-music-literate readership, such as How to Rap (Paul Edwards 2009), graph notation is normally preferable. Staff-based notation is also far removed from how flow rhythm might be designed by MCs themselves. Nevertheless, I have chosen to use staff notation here because I believe it most capable for simultaneously expressing musical events in the beat layer, mapping subtle timing events in flow performances, and efficiently coupling lyrics to flow rhythms. With the methodology now described, I turn to the three types of expressive timing, offering examples that illustrate their nature and proposing some functional possibilities for each type.
Swung timing characterizes flow performances where MCs subdivide the basic beat in unequal durational proportions—normally long/short. In jazz orthography, swung rhythms are typically notated as straight (isochronous) rhythms but are interpreted as swung. I follow that convention here, notating swung rhythms as straight and augmenting them with two indicators of swing. The first indicator is known as the beat-upbeat ratio (BUR). The BUR is a concept developed by Benadon that measures the durational proportions of different subdivisions of the same beat.34 For example, two equally spaced eighth notes would have a BUR of 1, since they are the same duration. The triplet subdivision shown in example 1 would have a BUR of 2, since the first duration is twice as long as the second.35 The second indicator, also shown in example 1, uses percentages to measure the length of each beat and upbeat; the triplet rhythm shown here thus illustrates a swing percentage of 66.67%, because the longer subdivision occupies two-thirds of the full beat. This second swing measurement follows how swing is typically measured in the digital audio workstations (DAWs) commonly used in beat production. While BURs and swing percentages measure the same phenomenon, I include them both here to maximize the accessibility of my examples.
Matthew Butterfield posed the question: why do jazz musicians swing their eighth notes?36 To paraphrase Butterfield: why, then, do hip-hop MCs swing their flow rhythms? To be sure, swing is not as ubiquitously audible in hip-hop music as it is in jazz. But at its most elemental level, swung flow timing draws the listener’s attention to larger subdivisions, as Butterfield discusses for jazz. Furthermore, swung timing is frequently used to accumulate motional energy and defer closure in phrases of flow—two of Butterfield’s reasons for why jazz musicians swing their eighth notes. Another approach to the question of “why” can be found in Floyd’s assertion that “[swing] is an essential element and a most elusive quality of Black music.”37 But how can this essentiality be located? Floyd continues by stating that “swing is a natural and perfectly explicable product or by-product of the tropings of Black music.”38 One such trope is signifyin(g). He defines the notion of the “time-line,” an underlying constant rhythmic presence—sometimes only an implied one—fundamental to African music, and writes that “when sound-events Signify on the time-line, against the flow of its pulse, making the pulse itself lilt freely—swing has been effected.”39 Important for understanding Floyd’s concept of swing, then, is the role of signifyin(g). Geneva Smitherman notes that signifyin(g) is a type of subversive wordplay that “can be used for playful commentary or serious social critique couched as play.”40 In the case of swing, this commentary is effectuated on the notion of steady, isochronous pulse. Floyd’s discussion of swing is embedded in his greater project of situating Black musical expression (among other elements) as a legacy of the ring shout, a collective activity practiced by African slaves “in which music and dance…fused to become a single distinctive cultural ritual in which the slaves made music and derived their musical styles.”41 This cultural ritual involved a steady pulse, from the shuffling of feet, and frequent interjectional commentary on the pulse through bodily jerks and twitches, calls, hoots, cries, and other physical and vocal utterances.
Nina Eidsheim convincingly argues against essentializing the sound of race through particular qualities or characteristics, cautioning against the “erroneous logic that an essential black body gives rise to an essential black voice.”42 While Eidsheim’s work focuses mainly on racialized timbral aspects of vocal production, her argument is consequential here as well: the temptation to racially essentialize a musical quality risks obfuscating the relational network between performing and listening bodies. In an attempt to reconcile Floyd’s claims of essentiality and Eidsheim’s arguments against it, I interpret swing as the product of a dialogue between steady, pulse-based temporality (Floyd’s time-line) and the signifyin(g) trope—one that effects both subversive and playful qualities. These are both sites of focus (if not essentiality) in Black Atlantic musical tradition, and swing’s presence results from their intersection.
Subversive qualities of signifying can be identified in swing rhythm’s relationship to stresses in spoken language. In most dialects of spoken English, for example, lexical (within a single word) or prosodic (within a sentence or syntactic unit) stresses are normally uttered louder, higher, and for a comparatively longer duration than syllables that are unstressed.43 Beyond such generalities, however, patterns of lexical and prosodic stresses vary widely from speaker to speaker, from dialect to dialect. But pitch, volume, and duration are parameters we can identify in rapping—indeed, pitch and duration (rhythm) form a major portion of Robert Komaniecki’s work on flow analysis.44 Example 2 excerpts two passages in LL Cool J’s (James Smith) performance on Craig Mack’s 1994 single “Flava in Ya Ear.” Below the notation are BURs and swing percentages to measure the degree of swing, and above are frequencies (in hz) to measure the highest pitch reached on selected syllables. These data were obtained from a sonogram and pitch analysis performed in Praat. In the first excerpt, the highest pitch occurs on the first syllable of “really,” suggesting an intended prosodic stress. While lexical stresses are inherent in multisyllabic lyrics, an MC may choose to amplify or subvert these stresses. We can see here that many of the lexical stresses (underlined in the example) fall on on-beat sixteenths, and since nearly all the BURs are higher than 1, we know that these on-beat sixteenths are longer than their following offbeat. While many of the BURs as I have calculated them are perhaps so minuscule to the point of bordering on imperceptibility as swung rhythms, the on-beat lexical stresses amplify the sense of swung timing in this passage.
Later in LL Cool J’s verse, however, the swing/stress relationship is subverted. In the second excerpt shown in example 2, after the lyric “piranha,” the lexical stresses in “electrocute” and “barracuda” fall on swung off-beat sixteenths. Furthermore, these are both four-syllable words that have multiple stressed syllables at varying levels: e-LEC-tro-cute and bar-ra-CU-da.45 (The capitalized syllables are the primary lexical stress, while the bolded lowercase syllables are the secondary stress.) As the transcription demonstrates, these stressed syllables all fall on off-beat sixteenths. In faithfully accenting these lexical stresses through pitch—the lexical stresses receive higher pitches relative to their surroundings—LL Cool J creates a profound sense of syncopation that is enhanced through the swung timing he uses. This swung syncopation subverts the “steady flow of the pulse” quite noticeably, and as such exemplifies the sense of rupture that Ohriner, Tricia Rose, and Jeff Chang have all characterized as a defining feature of flow.46 Ohriner writes that Talib Kweli invokes rupture by “abruptly changing the way his flow relates to the instrumental streams.”47 Here, then, LL Cool J invokes rupture by abruptly changing the way his flow rhythm relates to the lyrics themselves.
But LL Cool J’s flow rhythms also relate to the beat, where the rhythmic subdivisions are swung to an even greater extent. Many hip-hop beats contain swung subdivisions; unsurprising given the fact that swung rhythms permeate nearly all genres of twentieth-century Black vernacular music.48 As detailed in example 2, we can see that the off-beat sixteenth hits in the kick drum exhibit large beat-upbeat ratios: recall that a BUR of 2 indicates a tripletized rhythm. These high BURs mean that the off-beat kick hits sound much later than they are notated here. The swung beat timing may also impact how LL Cool J’s swung timing is perceived. As example 2 demonstrates, some of the BURs in his flow are very close to 1—an equally spaced note pair—yet the passage certainly sounds swung, at least to my ear. It is likely that the swung subdivisions in the beat are conditioning my listening habits to favor a perception of swing in the flow.49
Since hip-hop beats are built on musical borrowing, we can look to the genres that hip hop borrows from for more insight into swung timing. For example, a subtle sense of swung timing appears frequently in the production work of Dr. Dre (Andre Young). The Dre-produced “Gz and Hustlas,” a track from Snoop Dogg’s (Calvin Broadus, Jr.) 1993 album Doggystyle, presents one of the more salient manifestations of swing in Dre’s beats. Shown in example 3, the four-bar loop underpinning this song exhibits salient swung timing in the piano riff, as well as the drum fill at the end of the excerpt. This beat was sampled from the 1981 song “Haboglabotribin’” by the funk/jazz keyboardist Bernard Wright, whose role in the swung timing of “Gz and Hustlas” can be described as intermediary. Dr. Dre’s production style is central to the “G-Funk” sound of West Coast hip hop that flourished in the early and mid-1990s. G-Funk takes stylistic cues (and samples) from 1970s funk, an era/genre from which Wright’s 1981 album ‘Nard arrived at the tail end. In addition to his own recording career as a solo artist, Wright worked as a session keyboardist for many fusion artists in the 1970s, cutting his teeth on repertoire that blended jazz and funk. So while Dr. Dre’s G-Funk beats may not exhibit what Justin Williams calls “jazz codes—sounds, lyrical references, and imagery that can be defined as jazz,”50 the presence of swing in Dre’s beats can still be indirectly tied to jazz’s influence on Black popular music.51 Indeed, Alexander Stewart’s observation of a re-emergence of swing in “contemporary styles of funk, hip hop, and jazz fusions”52 is reflected in swung timing beats and flow rhythms in hip-hop music, as well as in the related and short-lived genre of new jack swing.
Returning to “Gz and Hustlas,” flow performances such as the one excerpted in example 3 illustrate Snoop Dogg’s approach to swung timing. Snoop’s BURs are not excessively large; each of them resides below the value of 2, notable given the high BURs identified above in the beat. But the swung timing in this passage of flow is nevertheless audibly present and contributes to a palpable sense of downbeat anticipation throughout the measure, accruing the type of motional energy that Butterfield describes. This motional energy is heightened by the syntactical organization of Snoop’s lyrics. Instead of first identifying himself as the “dopest mothafucka that you’re hearing on the record,” he waits until the end of the line to reveal that it “is me,” leading the listener through a series of energy-accruing swung rhythms to the eventual presentation of the subject of the sentence. And just as LL Cool J did in “Flava in Ya Ear,” Snoop Dogg’s positioning of lexical stresses on strong beats helps amplify the sense of swing beyond the subtle BUR values his timing creates.
Inter-genre influence also surfaces in Azealia Banks’s 2012 single “212.” A sleeper hit for Banks, “212” features a beat produced by the Belgian electro/house producer Basto, aka Lazy Jay (Jef Martens). Example 4 documents the one-measure loop used in “212,” where substantial swung timing is present. As in funk, swung timing occasionally surfaces in mid-tempo genres of dance music—house, garage, and techno among them—but often in subtle quantities. Basto’s production work on “212” assumes the form of a 2-step Garage beat, borrowed from a genre popularized in the UK in the 1990s. Banks’s vocal performance atop this beat engages with its swung timing. The passage transcribed in example 4 is emblematic of the rhythmic structure of her flow; two features of which inject her performance with both motional energy (after Butterfield) and rupture (after Ohriner). The subtle swing existing in her flow gives this passage (and others like it) a strong sense of forward momentum, one that is both amplified and ruptured by the syncopation of the persistent off-beat syllabic attacks toward the end of the passage. The moment-to-moment sense of rupture effected by this swung syncopation is subsumed on a larger temporal level by the anacrusis-driven motional energy of the swung timing itself.
Swung timing is neither ubiquitous nor rare in hip-hop flow. If it were ubiquitous, we might be forced to decide whether swing itself should be treated as the “norm” by which straight-timed “abnormalities” are measured.53 But in certain song-based contexts such as the examples presented in this section, swing appears to be sufficiently prevalent that it functions as the primary marker of sub-tactus division of pulse. A productive way of interpreting such contexts might involve Tellef Kvifte’s Common Slow Pulse (CSP) model.54 Kvifte argues that using additive models of rhythm—those that are predicated on an isochronous Common Fast Pulse that characterizes longer timespans through its multiples—fails to account for the ubiquity and embodied nature of asymmetric rhythmic contexts, of which swung timing is one. Given the degree of swing present in both flow and beat across the three examples discussed above, the CSP model seems appropriate: a larger-level tactus operates as an isochronous reference, while any rhythmic level below this tactus is subdivided according to its own internal logic. In the case of swing, this logic is modelled with BUR values.
Unpacking swung timing in flow can involve a number of approaches that range from exploring intergeneric relationships that characterize hip hop, examining the intersection of linguistics and musical rhythm, and swung timing in the realm of signifying, as Floyd has done. Examining swing from an inter-generic perspective invites discussion on how swing connects various genres that owe their genesis, in some part, to African American musical tradition. Whether interpreted as jarring or lilting, swing assumes a primary role of vitalizing any music where it is present.
Characterized by Shock G of Digital Underground (Gregory Jacobs) as the addition of “lazy tails” to his flow, lagging timing describes a patterned delay of vocal syllabic onsets in relation to the beat layer.55 Evidenced by recent attention to behind-the-beat rapping by both Ohriner and Martin Connor, MCs’ propensity to lag is widespread.56 Connor writes that “one of the defining aspects of the rap genre [is] its vocalists’ expressive rhythmic delays.”57 My category of lagging timing recalls Ohriner’s categories of phase shift, tempo shift, and deceleration. While Ohriner’s phase shift, by definition, can involve flows that are either behind or ahead of the beat, my experience has been such that ahead-of-the beat flow examples are quite rare.58 A potential issue arises here: if a syllable arrives before or after a particular metric marker, there must be some limit at which point it ceases to be heard in relationship to that marker and instead is heard in relation to another. But this issue is hugely contextual—relying on accent, lyrics, meter, tempo, and above all listener subjectivity—and the only listening context to which I can speak with certainty is my own. Consequently, because I tend to hear instances of systematic shift, or nonalignment, between flow and beat as involving lagged lyrics, I characterize this category as such. My lagging category incorporates some aspects of Ohriner’s deceleration and tempo-shift categories, namely because I interpret behind-the-beat rapping—whether at a constant rate or not—to produce a family of related rhetorical and musical effects (some of which Ohriner describes). Why, and how, do MCs achieve this sense of lagging? The answer does not lie in a lack of skill: if anything, the opposite is true.59 The MC Evidence (Michael Perretta) of the group Dilated Peoples states, “It’s also a talent to know how to stay behind the rhythm—it’s something a lot of people don’t know how to do.”60 MCs who lag are doing so by using personalized, idiosyncratic, and complex rapping techniques, and lag timing is often mixed with more “in time” or quantized flow rhythms. The answer to “why” and “how,” then, probably involves a number of calculated motivations, techniques, and desirable consequences: some rhetorical, and some practical.
One such consequence may involve the perceptual streaming of dense musical textures. Iyer suggests that expressive timing contributes to listeners’ ability to stream larger amounts of auditory information, writing that “timing variations can allow an instrument that is sonically buried to draw attention to itself in the auditory scene.”61 In a similar vein, Danielsen has described how local timing inflections affect the overall sense of groove in a musical passage, and individual components of a composite texture can be adjusted to sound non-simultaneously, exuding an increased textural thickness to the ear.62 The effect of expressive timing, when viewed in this way, is one of a thickening of texture.
Vocal salience appears to be influenced through lagging timing. Paolo Ammirante and Fran Copelli found that MCs generally use vowels with higher formants over on-beat percussion sounds and hypothesize that this practice enables those lyrics to be better heard against the dominating percussion.63 MCs can also increase the salience of their lyrics by uttering them slightly lagged from prominent musical events, such as a backbeat snare hit. Pusha T’s performance on “Come Back Baby” foregrounds the salience of lag-timed flow amid other musical events. His flow rhythms in the first verse (0:14) involve a repeating rhythmic motive that always begins on beat two or four of the loop—against the backbeat. Example 5 details the displacement between each backbeat and the beginning of Pusha T’s motives. I measured the lag for the first syllable of each motive from the backbeat snare hit that underpins it. As the data suggest, nearly all these lags are greater than 50 ms, with some as high as 100 ms. While lagging timing plays a role in many of Pusha T’s performances, his lagging in this excerpt results in more clearly articulated (and audible) lyrics, free from sonic interference from the backbeat—itself a dominating sound in the song’s texture.
Lag timing is also useful for rhetorical means. Keith Gilyard’s introduction to the book African American Rhetoric(s): Interdisciplinary Perspectives provides a comprehensive summary of scholarly work in this field.64 Notable in Gilyard’s survey is the work of Ronald Jackson II, who draws attention to Nommo, which he describes as “the generative, life-sustaining force of the word” and a central aspect of African rhetorical traditions.65 According to Jackson, these activities include rhythm, soundin’, stylin’, improvisation, storytelling, lyrical code, image making, and call and response. In his model, Jackson describes the aspect of rhythm as “similar to polyrhythm in that it suggests that the energy of the rhetor must be one with the energy of the audience.”66 Elsewhere he describes it as “involving pauses, intonations, pitch, rate, and speed.”67 Soundin’ is related to Signifyin(g), which, as discussed above, might manifest in the subtle subversion of steady pulse found in swing. And stylin’ is the notion that a speaker has combined rhythm, excitement, and enthusiasm, which propel a message and the audience. It is usually accomplished through the vocal variety, resonance, percussion, anaphora, volume, rate, pitch, and tone.68 By coalescing around the notion of Nommo, Jackson’s elements of rhetoric in his Afrocentric model collectively exist in a framework that affords the rhetor space to cultivate a sense of authority with which to compel listeners. This framework treats “the word” as primal: rather than imbuing words with rhetorical weight, the rhetor activates a rhetoric already inherent in the words.
Few MCs have used expressive timing more consistently as a means to cultivate a particular identity than Snoop Dogg, whose drawl-infused signature flow has become one of the most identifiable personal styles in hip-hop music. Since his emergence onto the scene through collaborations with Dr. Dre, including his successful 1993 debut album Doggystyle, Snoop Dogg has cultivated a musical image that centers on his laid-back flow style that is arguably equal parts southern (due to his Mississippi roots) and West Coast (due to his Long Beach upbringing).69 In a 1993 New York Times interview, Snoop said, “I don’t rap, I just talk. I don’t like to get all pumped up and rap fast ‘cause that ain’t me. I want to be able to relax and conversate with my people.”70 His guest verse on De La Soul’s 2016 single “Pain” supports this quote and is excerpted in example 6. Below the notated example, a chart indicates the lag time of each syllable: the lag of the perceptual center of each vocal syllable is measured from either a sounding event in the beat layer (usually from the drums) or an interpolated timepoint when no sounding events are present. Some of these lag times are so small as to border on imperceptibility. Example 6 focuses on the ten syllables where Snoop Dogg’s lag is greater than 50 ms, graphically displacing them from the events in the beat layer they sound against.71 These lagged syllables are especially salient when sounding against a snare hit; note for example the lag on the syllables “shades,” “can’t,” “move,” and “do.” Snoop ends this passage with a touch of signifyin(g) on the lyric “swiftly,” which he transforms into a trisyllabic word in order to rhyme with “differently,” as well as employing some rather obvious lag on a lyric (“swiftly”) of which the meaning normally connotes anything but.
Queen Latifah’s (Dana Owens) performance on her 1989 hit “Ladies First” uses a technique related to gemination—the elongation of consonants—whereby she deliberately elongates the leading consonants of specific syllables, which in turn evokes a sense of lagging timing. This lag is perceivable because the vowel sounds in these lyrics arrive later than they would if Latifah had not elongated the consonant. (Recall that I interpret and measure vowel onset as the p-center for a syllable.) The excerpt shown in example 7 details the conclusion of Latifah’s second verse. For two lyrics in this passage, Latifah spends more time enunciating the initial consonant than she does the initial vowel.72 The L in little lasts twice as long as the vowel sound created by the I, while the T in touch is the same length as the sound created by the OU. These geminated consonants effectuate a lag, but one that is not fully salient until the beat kicks back in with the drum fill shown in the transcription. Because the beat has dropped out under the lyrics “with a little touch of,” the sense of lagging is suspended, deferred perhaps, until the temporal reconciliation between flow and beat occurs. By the time Latifah raps the title lyrics “ladies first,” her lag timing has become quite noticeable; the measured lag on “first” places the p-center for this lyric nearly 200 ms behind the downbeat of the measure as expressed in the beat.
Latifah’s lag timing emphasizes a subtle distinction in the oft-repeated title lyric “ladies first.” The title lyrics are inflected slightly differently each time they are rapped; it seems evident that Latifah means to imbue these lyrics with a marked quality. Indeed, this song’s role as a flag-bearing Afrocentric feminist anthem is emphasized through the aggressive, rhetoric-laden flow style of Latifah and her guest MC Monie Love (Simone Johnson). Latifah is doing exactly as she says in the lyrics—flipping the script—in that she is reclaiming hip-hop music as a vehicle for promoting women; “demanding equal treatment for women,” as Robin Roberts writes.73 Latifah’s flipping of the script works on a more detailed scale as well. Her delivery of the song’s title lyrics repurposes the phrase “ladies first.” This phrase might traditionally be understood as chivalrous, where a male character acts out of purported sense of genteelness. But in such circumstances it is still the male who holds decision-wielding power, and in rapping these lyrics herself, Latifah is flipping the script regarding their meaning; from one of cloaked male courtesy to one of female empowerment and assertion—she is the one dictating that it’s ladies first. The emphasis she gives these lyrics through lag timing heightens this assumption of control by Latifah.
In these examples from Pusha T, Snoop Dogg, and Queen Latifah, I have explored how lagging timing can be understood through the lenses of vocal salience, linguistics/phonation, and rhetoric. While my lagging category encompasses a larger assemblage of expressively timed performances than Ohriner’s more specific tempo-shift, phase-shift, and deceleration categories, my aim has been to draw parallels among his three categories to demonstrate how specific performance and perceptual issues can be contextualized across all of them through the common factor of lag.
In his monograph Book of Rhymes, Adam Bradley notes that Jay-Z (Shawn Carter) generally uses conversational flow, defining this style as “one that falls comfortably into conventional speech rhythms.”74 While Bradley does not further define “conventional speech rhythms,” it is clear what they are not: strict rhythmic, or even metric, adherences to the beat layer. Conventional speech rhythms are notoriously difficult to define, in any language, and by any measure. But Bradley’s comments invite some important questions. How do MCs rap in a way that impresses the notion of conversational speech upon the listener? In lieu of the lack of a rigorous definition of “conventional speech rhythms,” can any telltale organizing/rhythmic structures in conversational flow be identified? And finally, how (again) does this type of expressive timing engage with the beat layer?
Returning momentarily to Jay-Z helps contextualize the first question. To some extent, Jay-Z must regulate his flow rhythms. If they were completely conversational, we might not interpret his performances as rapping. But his flow rhythms are not completely synchronized with the beat either. Jay-Z’s flow style must fall somewhere in between these two extremes. Pressing situates speech- and groove-based rhythms as two main engagements with time in Black Atlantic music, suggesting that quite often, aspects of speech and groove are simultaneously driving the rhythmic fabric of a Black-Atlantic musical context.75 Pressing uses hip hop as an example of this simultaneity, situating flow and beat layers as embodying speech- and groove-based rhythms, respectively. I would further suggest that groove- and speech-based rhythms can each be identified in the flow layer alone; their interplay is what characterizes the nature of conversational timing.
I thus define my final category, conversational timing, in terms of the dialogue between rhythms of conversational speech—however underdefined—and groove-infused rhythms of rapping. Pressing writes that “speech-based rhythms superimpose timings organized by oral articulatory predilections on a foundation structure that has divisive and metrical properties.”76 In hip hop, this foundation structure is the beat layer. The influence of groove- and speech-based properties on flow rhythms can be construed as a spectrum between two poles—as shown in example 8—neither of which is ever fully realized in a rapped performance. On one pole, a completely “on-beat” performance would feature syllabic attacks that are always synchronized with some sort of quantized (and sometimes isochronous) event: either something from the beat layer or an interpolated time marker. On the other pole, a completely “conversational” performance would entail total and salient asynchrony of syllabic attacks and beat events and suggests complete rhythmic stratification of the flow and beat. Establishing this spectrum enables critical readings of expressively timed flow performances that are neither swung nor lagging.
By characterizing conversational timing as above, I approach Krims’s notion of speech-effusive flow, which he describes as featuring “enunciation and delivery closer to those of spoken language, with little sense often projected of any underlying metric pulse.”77 For Krims, speech-effusive flow embodies a rhythmic stratification where the consistency of the beat layer provides a template atop which the more variegated rhythmic structure of conversational-timed flow operates. In this sense, my category of conversational timing is identical to Krims’s speech-effusive flow. But Krims’s system is meant to encapsulate more than timing, and his other characterizations of speech-effusive flow—long strings of rhymed syllables, extremely complex rhythms, and notions of articulation and pitch—fall beyond the theoretical purview of conversational timing.
Chicago-based Noname (Fatimah Warner) tends to rap with conversational timing. Her performances frequently reveal a flow style that “often cuts across the phrase and accent structure of the underlying repeating instrumental groove,” to quote Pressing out of context.78 Noname’s approach to vocal rhythm often sounds speech-driven; her performance on “Self,” the opening track from her 2018 debut studio album Room 25, exemplifies conversational timing. The song’s beat features timing akin to a Dilla-feel: the drums almost sound as though they stutter along in loose coordination with the rest of the beat layer. From the outset, when the drums and harmonic layer (supplied by backing vocals, guitar, bass, and keyboard) dictate the song’s metric/hypermetric structure, Noname’s vocals do not consistently align with metric markers. Her nuanced rhyme scheme involves a variety of different beat-classes, and even then, they rarely, if ever, line up with sounding events in the beat layer. Notable is her conclusion of the first verse of “Self”: instead of aligning her final lyrics with the conclusion of the harmonic loop that underpins her flow, Noname instead simply finishes when she finishes, mid-loop (0:55). While temporal coordination between flow and beat layers is by no means a necessity in hip-hop music, it is a convention, and Noname’s violation of that convention here draws attention to her conversational style of rapping.
Conversational timing raises questions of textural stratification and coordination; terms I borrow from John Covach’s recent work on texture in rock music.79 Though specifically designed to model pitch-based coordination in rock, Covach’s spectrum charting the space between totally coordinated and totally stratified textures can used in a strictly temporal sense. Noname’s conversational performance in “Self” resides squarely in the stratified realm: the rhythmic pacing of her vocals seems to transpire quite independently of the rhythmic and metric structure of the beat. But not all conversational performances are so thoroughly stratified: 2 Chainz’s (Tauheed Epps) verse (1:55) on Chance the Rapper’s (Chancelor Jonathan Bennett) 2016 single “No Problem” embodies a conversational flow that more equally straddles the balance between co-ordination and stratification, between musicalized and speech-infused rapping. Throughout this verse, 2 Chainz inserts lyrics that imbue a metric organization to his flow. On the lyrics “where the hell you get them from? Yeezy said he ain’t make them” (2:03), we hear tight coordination between the ultimate syllables them from and make them. One way of characterizing 2 Chainz’s performance is to understand it as coordinated on a metric/hypermetric level, but stratified on a rhythmic level. This duality characterizes the nature of conversational timing: at once beholden to the rhythmic inflections of speech while locating sufficient points of metric coordination with the beat layer.
Durational patterns of speech—those that give rise to its rhythm—are highly variegated from speaker to speaker, from dialect to dialect, and from rhetoric to rhetoric, and as such, conversational timing is difficult to systematize. Since rhetoric, expression, and affect play a strong role in how we pace our speech rhythms, identifying rapped performances where these communicative elements are all in play appeals to a closer reading of conversational timing. Eminem’s (Marshal Mathers) hugely successful 2000 single “Stan” (ft. Dido) fits these criteria well. Eminem uses conversational timing to dramatize the title character Stan’s growing anger and frustration in the song’s narrative.80
Example 9 features charts that trace Eminem’s flow timing across the first four phrases of lyrics in the first verse of “Stan” (0:49). The syllable durations suggest that Eminem’s timing fits neither swung nor lagging profiles: swung timing would reveal alternating patterns of long and short syllable lengths, and lagging timing operates on the premise that, although lagging, the flow still exhibits a tight degree of rhythmic co-ordination with the beat layer. Neither scenario is present here. Instead, the diversity of syllable lengths in this example and their non-patterned ordering evinces conversational timing. But Eminem’s flow does exhibit small pockets of patterned timing. In phrase 1, the lyrics “but you still ain’t” are all rapped at approximately the same length—only around 40 ms separates them. A similar phenomenon occurs in phrase 3, where pairs of syllables are nearly identical in duration: “I sent,” “two let-,” “-ters back,” “aut-umn,” and “must not.” In fact, there almost appears to be a stratification of two different syllable durations here. And finally, in phrase 4, patterns of three shorter syllables seem to emerge on “-b’ly was a” and “-blem at the.”
It seems then that occasionally Eminem’s flow timing accords to a different tempo than the beat, invoking Ohriner’s tempo-shift category. This seems perfectly appropriate for speech rhythms, which at times seem to evoke a steady rhythmic and metric cadence, while at others are subject to rhetoric-led fluctuations in pacing. Though space prevents me from exploring the subsequent verses of “Stan” in detail, my investigation into these verses suggests that patterned passages of syllable durations become rarer as the song progresses. Stan becomes more irate, and his vocal rhythms become increasingly beholden to his anger and frustration, leading to a more spontaneous, speech-sounding flow performance. When small pockets of near-isochronous syllable durations do exist in these later verses, they typically function in the service of rhetoric, such as when Stan mockingly addresses Eminem as “Mr.-too-good-to-write-my-fans” at the opening of the third verse. Amid these speech-infused rhythms, Eminem still tethers his rhymes to the fourth beat of each measure (although he lags quite consistently in this regard, meaning the rhymes often arrive later than the fourth beat itself). Such anchoring enables him to drift progressively further and further from any sort of rhythmic patterns amenable to notation-based transcription. This development, along with his constantly rising vocal tessitura, enables Eminem to vividly portray Stan’s increasing angst.
The foregoing discussion encapsulates the essence of conversational timing: a flow practice that closely mirrors rhythms of speech but musically anchors them against the beat, even when these speech-infused rhythms express an independent metric structure. I observed how Noname adopts a speech-driven approach to rhythm in her rapping, stratifying the rhythmic surfaces of flow and beat layers, while 2 Chainz coordinates these layers by remaining consistent with his rhyme placements. Similarly, even when the character Stan grows increasingly irate in the second and third verses, Eminem’s performance still respects a fairly consistent (but lagged) rhyme placement that anchors the increasingly out-of-control rhetoric put forward by the character. This is, after all, still rap. As a white hip-hop artist, Eminem’s performance on “Stan” reinforces the Black Atlantic notion of speech/musical rhythmic stratification, demonstrating how non-Black artists also perpetuate this tradition within the broader practice of rapping. The techniques employed by Noname, 2 Chainz, and Eminem here are certainly not the only means by which an MC can create a conversationally infused flow, but the presence of a mechanism that yokes together beat and flow is nonetheless required to distinguish rapping from speech.
My work throughout this paper has been in the service of Iyer’s claim that “in groove-based contexts…fine-scale rhythmic delivery becomes just as important a parameter as say, tone, pitch, or loudness.”81 The two textural layers of hip-hop music—flow and beat—express remarkable nuance and depth of interaction when examined on a temporal plane. Expressive flow timing occurs in a variety of contexts and in varying quantities. I began this discussion by summarizing expressive timing in Black Atlantic music in general before focusing on its presence in hip-hop flow. Ohriner’s lengthy exploration of this topic paved the way for my research, and his intricate, empirically grounded discussion and detailed analysis of Talib Kweli’s flow timing provided a number of productive jumping-off points for the research I have presented here.
Ohriner’s swung timing inspired my own category of the same name, wherein I discussed swing’s importance in a number of Black Atlantic musical practices and its fluid relationship with isochronous pulse. On the one hand, swing can function as a subtle subversion of such isochrony, imbuing the flow of time with a lilting quality or mediating the flow of motional energy and anticipation. On the other hand, swing can amplify the sense of syncopation, especially when used in conjunction with another element, such as lexical stress. My category of lagging timing amalgamated aspects of Ohriner’s phase shift and deceleration categories. Where Ohriner bases these two categories on what actually occurs in the flow timing, I ground mine on how I perceive them—based both on practical matters, such as intelligibility, and rhetorical function, recalling aspects of Jackson’s Afrocentric rhetorical model based on Nommo. Seen this way, MCs (the rhetors in this case) are activating an inherent power already extant in their lyrics—rhetoric here is seen as a conduit of expression rather than an imposed element of communication.
My final category of conversational timing presented the greatest obstacles to systematic annotation and analysis. Apart from a loosely theorized spectrum between the rhythms of music and those of speech, conversational timing is analyzable only as far as its discrete manifestations in the hip-hop repertoire. I cannot conceive—nor do I see the utility—of an analytical method that attempts to precisely model “how conversational” a flow performance is; there is simply no way of objectively determining what constitutes conversational timing. But conversational timing is nonetheless clearly audible, creating flows that proceed according to their own internal, conversational rhythmic logic, occasionally communicating with the metric structure of the beat layer.
Iyer’s words also suggest that research on flow timing has only begun to scratch the surface. Additional corpus studies following Ohriner’s would provide greater insight into the prevalence of expressive timing, and could evaluate the comprehensiveness of his classification system as well as my own. Perceptual studies would shed light on how audible expressive timing is to listeners as well as clarify the perceptual thresholds between types of timing. Such studies might also engage with Danielsen’s notions of actual sounding events/virtual reference structures and beat-bin meter—interrogating how listeners generate the framework within which they perceive and mentally measure timed events. And finally, I am hopeful this research will inspire further close readings of expressively timed flow performances, in the pursuit of strengthening our understanding of how African and African American vernacular musical traditions have shaped the texture, rhythm, and rhetoric of hip-hop music. These close readings would help illuminate the humanness in flow—production techniques notwithstanding—that is eloquently summed up by Jay-Z in his recent memoir: “Flow isn’t like time, it’s like life. It’s like a heartbeat or the way you breathe, it can jump, speed up, slow down, stop, or pound right through like a machine.”82
An earlier version of this paper was read at the 2020 Annual Meeting of the Society for Music Theory. I wish to thank the online conference audience for their insightful questions, and Nicole Biamonte, Anne Danielsen, Mitchell Ohriner, and the anonymous reviewers for sharing their commentary and ideas.
I use the term expressive timing, similar to Ohriner (Mitchell Ohriner, “Expressive Timing,” in The Oxford Handbook of Critical Concepts in Music Theory, ed. Alexander Rehding and Steven Rings (New York: Oxford University Press, 2019), 369–96), but my understanding of this term equates it with microrhythm, microtiming, or “off-beat rapping” (Ohriner, Flow: The Rhythmic Voice of Rap Music (New York: Oxford University Press)). Iyer uses expressive microtiming (Vijay Iyer, “Embodied Mind, Situated Cognition, and Expressive Microtiming in African-American Music,” Music Perception 19, no. 3 (2002): 387–414); Oddelkalv uses microtiming (Kjell Andreas Oddelkalv, “The Urge to ‘Clean Up’ the Rap,” paper presented at the Art of Record Production Conference, Boston MA, 17–19 May 2019); Benadon uses expressive microrhythm (Fernando Benadon, “Slicing the Beat: Jazz Eighth-Notes as Expressive Microrhythm,” Ethnomusicology 50, no. 1 (2006): 73–98); and The University of Oslo’s RITMO Centre for Interdisciplinary Studies in Rhythm, Time, and Motion (directed by Anne Danielsen) uses musical microrhythm. Adams does not provide a specific term for this phenomenon, but includes in his articulative techniques of flow “the extent to which the onset of any syllable is earlier or later than the beat.” Kyle Adams, “On the Metrical Techniques of Flow in Rap Music,” Music Theory Online 15, no. 5 (2009): .
Ohriner, Flow: The Rhythmic Voice of Rap Music.
Provenzano explores the location of emotion in pitch-corrected voices following the popularization of Auto-Tune and related software in the early 2000s. See Catherine Provenzano, “Emotional Signals: Digital Tuning Software and the Meanings of Pop Music Voices” (Ph.D. diss., New York University, 2019).
David Scott, “The Re-Enchantment of Humanism: An Interview with Sylvia Wynter,” Small Axe: A Caribbean Journal of Criticism 4, no. 2 (2000): 136.
J.D. Considine, “Fear of a Rap Planet,” Musician 160 (1 February 1992): 92.
Jeff Pressing, “Black Atlantic Rhythm: Its Computation and Transcultural Foundations,” Music Perception 19, no. 3 (2002): 285–310.
Olly Wilson, “The Significance of the Relationship between Afro-American Music and West-African Music,” The Black Perspective in Music 2 (1974): 3–22.
Iyer, “Embodied Mind,” 387–414.
Samuel Floyd, Jr., “Ring Shout! Literary Studies, Historical Studies, and Black Music Inquiry,” Black Music Research Journal 22 (2002): 57.
Anne Danielsen, “The Sound of Crossover: Micro-rhythm and Sonic Pleasure in Michael Jackson’s ‘Don’t Stop ‘til You Get Enough,’” Popular Music and Society 35, no. 2 (2012): 155.
Joseph Schloss, Making Beats: The Art of Sample-Based Hip Hop (Middletown CT: Wesleyan University Press, 2004), 144.
Ohriner, Flow: The Rhythmic Voice of Rap Music.
London’s definition of meter is relevant here: “Musical meter is the anticipatory schema that is the result of our inherent abilities to entrain to periodic stimuli in our environment.” Justin London, Hearing in Time: Psychological Aspects of Musical Meter (New York: Oxford University Press, 2012), 12.
Schloss (Making Beats, 140–44) provides an excellent summary of quantization in hip-hop beats and how it is valued by listeners.
Anne Danielsen, “Introduction: Rhythm in the Age of Digital Reproduction,” in Musical Rhythm in the Age of Digital Reproduction, ed. Anne Danielsen (Burlington VT: Routledge, 2010), 6.
Anne Danielsen, “Pulse as Dynamic Attending: Analysing Beat Bin Metre in Neo Soul Grooves,” in the Routledge Companion to Popular Music Analysis: Expanding Approaches, ed. Ciro Scotto et al. (New York: Routledge, 2018), 179–89.
Ohriner describes this phenomenon—the just-noticeable difference—as “the smallest temporal interval that people can perceive as distinct onsets” (Ohriner, Flow: The Rhythmic Voice of Rap Music, 185), citing a study that puts this value in the range of 25–40 ms. See Daniel Levitin et al., “The Perception of Cross-Modal Simultaneity,” International Journal of Computing and Anticipatory Systems 5 (2000): 323–29.
Fernando Benadon, “Time Warps in Early Jazz,” Music Theory Spectrum 31, no. 1 (2009): 1–25.
My focus on the rhythmic rigidity of triplet flow in recent hip-hop music supports the notion that expressive timing might play a smaller role in flow rhythm recently than it once did. See Ben Duinker, “Good Things Come in Threes: Triplet Flow in Recent Hip-Hop Music,” Popular Music 38, no. 3 (2019): 423–56.
Indeed, Ohriner’s extensive research on expressive timing covers these aspects to varying extents as well.
Anne Danielsen et al., “Where is the Beat in That Note? Effects of Attack, Duration, and Frequency on the Perceived Timing of Musical and Quasi-Musical Sounds,” Journal of Experimental Psychology: Human Perception and Performance 45, no. 3 (2019): 402–18, Justin London et al., “A Comparison of Methods for Investigating the Perceptual Center of Musical Sounds,” Attention, Perception, and Psychophysics 81, no. 6 (2019): 2088–101, and Ohriner, Flow: The Rhythmic Voice in Rap Music.
Danielsen et al., “Where is the Beat?,” 402.
Ohriner, Flow: The Rhythmic Voice in Rap Music, Chapter 8 notes.
Manuel Moussallam, Spleeter by Deezer (2019), accessed 14 September 2020, https://deezer.io/releasing-spleeter-deezer-r-d-source-separation-engine-2b88985e797e.
Yuan Jiahong and Mark Liebermann, Penn Phonetics Lab Forced Aligner for English (2009), accessed 14 September 2020, https://web.sas.upenn.edu/phonetics-lab/facilities/ and Gersh Pevnick and Hanyong Park, UWM Forced Aligner (undated), accessed 14 September 2020, web.wm.edu/forced-aligner/.
Peter Winkler, “Writing Ghost Notes: The Poetics and Politics of Transcription,” in Keeping Score, ed. David Schwarz et al. (Charlottesville: University of Virginia Press, 1997), 199–200.
Ohriner uses a unique notation scheme that is neither graphic nor staff-based. Komaniecki and I each document the variegated notation practices for hip-hop transcription in greater detail. Robert Komaniecki, “Analyzing the Parameters of Flow in Rap Music” (Ph.D. diss., Indiana University, 2019), and Ben Duinker, “Diversification and Post-Regionalism in North American Hip-Hop Flow” (Ph.D. diss., McGill University, 2020).
Adam Krims, Rap Music and the Poetics of Identity (Cambridge: Cambridge University Press, 2000).
Adams, “Metrical Techniques.”
Robert Walser, “Rhythm, Rhyme, and Rhetoric in the Music of Public Enemy,” Ethnomusicology 39, no. 2 (1995): 193–217; Oliver Kautny, “Lyrics and Flow in Rap Music,” in the Cambridge Companion to Hip Hop, ed. Justin Williams (Cambridge: Cambridge University Press, 2015), 101–17, and Komaniecki, “Parameters of Flow.”
Komaniecki suggests that Krims’s notation scheme can be similarly difficult to understand. See Komaniecki, “Parameters of Flow,” 9. Other scholars have stressed the importance of considering readership or “knowing your audience” when using staff-based notation to transcribe non-notated musics, including Schloss, “Making Beats,” 12–15, and various panelists in the forum on transcription conducted by Jason Staynek. See Jason Staynek, “Forum on Transcription,” Twentieth-Century Music 11, no. 1 (2014): 101–61.
Benadon, “Slicing the Beat.”
Benadon provides a more detailed description of BURs, including a graphic example. See Benadon, “Slicing the Beat,” 75.
Matthew Butterfield, “Why do Jazz Musicians Swing their Eighth Notes?,” Music Theory Spectrum 33, no. 1 (2011): 3–26.
Floyd, “Ring Shout!,” 57.
Geneva Smitherman, Word from the Mother: Language and African Americans (New York: Routledge, 2006), 43.
Floyd, “Ring Shout!,” 50–51.
Nina Eidsheim, Measuring Race: the Micropolitics of Listening to Vocal Timbre and Vocality in African American Popular Music (Durham, NC: Duke University Press, 2019), 34.
Gillian Brown, Listening to Spoken English (New York: Routledge, 2013), Peter Ladefoged and Keith Johnson, A Course in Phonetics (Stamford, CT: Cengage Learning, 2011), 119, and Erik Thomas, “Prosodic Features of African American English,” in the Oxford Handbook of African American Language, ed. Sonja L Lanehart (New York: Oxford University Press, 2015), 420–33. Prosodic and lexical stress patterns are difficult to concretely define, and vary widely between different spoken dialects of a language.
Komaniecki, “Parameters of Flow.”
Wennerstrom states that spoken English contains three levels of stress: primary stress, secondary stress, and unstressed syllables. Ann K Wennerstrom, The Music of Everyday Speech: Prosody and Discourse Analysis (New York: Oxford University Press, 2001), 47–48.
Ohriner, Flow: The Rhythmic Voice of Rap Music, Tricia Rose, Black Noise: Rap Music and Black Culture in Contemporary America (Middletown: Wesleyan University Press, 1994), and Jeff Chang, Can’t Stop, Won’t Stop: A History of the Hip-Hop Generation (New York: Picador and St. Martin’s Press, 2006).
Ohriner, Flow: The Rhythmic Voice of Rap Music, 204.
Alexander Stewart, “‘Funky Drummer’: New Orleans, James Brown and the Rhythmic Transformation of American Popular Music,” Popular Music 19, no. 3 (2000): 293–318.
I have also encountered the opposite effect in my listening: where the beat “straightens” my perception of swung timing.
Justin Williams, Rhymin’ and Stealin’: Musical borrowing in Hip-Hop (Ann Arbor: University of Michigan Press 2013), 49.
Stewart, “‘The Funky Drummer’,” 314.
For an example of this, see Mats Johansson, “Rhythm into Style: Studying Asymmetrical Grooves in Norwegian Folk Music” (Ph.D. diss., University of Oslo, 2010).
Tellef Kvifte, “Categories and Timing: On the Perception of Meter,” Ethnomusicology 51, no. 1 (2007): 64–84.
Paul Edwards, How to Rap II: Advanced Flow and Delivery (Chicago: Chicago Review Press, 2013), 35.
Ohriner, Flow: The Rhythmic Voice of Rap Music, and Martin Connor, The Musical Artistry of Rap (Jefferson, NC: McFarland, 2018).
Connor, The Musical Artistry, 17.
Most of Ohriner’s discussion of these techniques of expressive timing in Chapter 8 of Flow the Rhythmic Voice of Rap Music also centers on syllabic attacks that arrive after the beat.
Ohriner interprets an intentionality to Talib Kweli’s late syllables, noting that they are “not haphazardly late” (Flow: The Rhythmic Voice of Rap Music, 193).
Paul Edwards, How to Rap: The Art and Science of the MC (Chicago: Chicago Review Press, 2008), 257.
Iyer, “Embodied Mind,” 402.
Danielsen, “The Sound of Crossover.”
Paolo Ammirante and Fran Copelli, “Vowel Formant Structure Predicts Metric Position in Hip Hop Lyrics,” Music Perception 36, no. 5 (2019): 480–87.
Keith Gilyard, “Introduction: Aspects of African American Rhetoric as a Field,” in African American Rhetoric(s): Interdisciplinary Perspectives, eds. Elaine Richardson and Ronald Jackson II (Carbondale: Southern Illinois University Press, 2007), 1–18.
Ronald L. Jackson, II, “Toward an Afrocentric Methodology for the Critical Assessment of Rhetoric,” in African American Rhetoric: A Reader, ed. L.A. Niles (Dubuque, IA: Kendall Hunt Publishing, 1995), 154. Gilyard, “Introduction,” 12. In particular, Gilyard extracts the idea of Nommo from the work of Molefi Kete Asanta (b. Arthur Lee Smith), whose writings positioned Nommo as an antipode to the rhetorical act of persuasion—as developed by Aristotle and cultivated in Western rhetorical practice. See Arthur Lee Smith, Language, Communication, and Rhetoric in Black America (New York: Harper and Row, 1972).
Jackson, “Toward an Afrocentric Methodology,” 154.
Ibid., 154. Many of these Afrocentric rhetorical strategies are discussed by Walser in some detail in the context of Public Enemy. See Walser, “Rhythm, Rhyme, and Rhetoric.”
Bradley and Dubois describe Snoop Dogg’s flow as “smoothed-out [and] nearly southern.” Adam Bradley and Andre DuBois, eds., The Anthology of Rap (New Haven: Yale University Press, 2010), 505.
Touré, “Snoop Dogg’s Gentle Hip Hop Growl,” New York Times, 21 November 1993.
Note that some of these impulses involve a heard sound event in the beat layer, while others are interpolated based on the temporal spacing of the snare hits on beats 2 and 4. This analysis also does not take into account the subtle timing variations in various instruments in the beat layer.
I determined the durations of each phoneme in a syllable. For the lyric “touch,” for example, the leading consonant “t” and the perceptual center “ou” sound for the same duration, at 0.19 seconds each, or 43% of the duration of the whole sounding lyric. This substantial amount of time spent on the leading consonant means the vowel, which my method takes as the perceptual center of the syllable, arrives noticeably late.
Robin Roberts, “‘Ladies First’: Queen Latifah’s Afrocentric Feminist Music Video,” African American Review 28, no. 2 (1994), 245.
Adam Bradley, Book of Rhymes: The Poetics of Hip Hop (New York: Basic Civitas, 2009), 29.
Pressing, “Black Atlantic Rhythm,” 287.
Krims, Rap Music, 51.
Pressing, “Black Atlantic Rhythm,” 287.
John Covach, “Analyzing Texture in Rock Music: Stratification, Coordination, Position, and Perspective,” in Pop Weiter Denken: Neue Anstöße aus Jazz Studien, Philosophie, Musiktheorie und Geschichte, Beiträge zur Popularmusikforschung, Band 44, eds. Ralf von Appen and André Doehring (Transcript Verlag, 2018), 64–65.
Lacasse explores the role of narrativity in “Stan.” Serge Lacasse, “Stratégies narratives dans « Stan » d’Eminem. Le rôle de la voix et de la technologie dans l’articulation du récit phonographique,” Protée 34, no. 2–3 (2006): 11–26.
Iyer, “Embodied Mind,” 398.
Shawn Carter, Decoded (New York: Spiegel & Grau, 2010), 12.