I am indebted to Frank Russo and Jeremy Day-O’Connell for two thought provoking commentaries on the original article. The text was intended to open up space for debate, and both commentaries introduce themes that usefully extend the conversation, and that warrant further discussion. Here I can do no more than add a few comments that encourage such a continuation.
Russo usefully positions the discussion of joint speech in the broad field of joint action, where a variety of theoretical frameworks co-exist, encompassing both representational and non-representational approaches. In Cummins (2012, 2013) I have sought to consider joint speech within broader embodied, enactive, and hence nonrepresentational approaches to joint action, but I think the topic also holds promise for exegesis within representational frameworks, such as Garrod and Pickering (2009), or Sebanz, Bekkering, and Knoblich (2006).
The question of whether joint speech and associated behaviors bear comparison with animal models is a rich one indeed, and underexplored. The collective dynamics of flocking and shoaling provides one possible route in, and the rich corpus of work by Vicsek and colleagues provides several rich points of comparison (e.g., Néda, Ravasz, Brechet, Vicsek, & Barabási, 2000, or Vicsek & Zafeiris, 2012). No clear analog among apes has been identified, but interesting comparisons have been made in Merker (1999, 2000). Synchronized chorusing in frogs or flashing in fireflies represent a rather different set of relevant behaviors that are characterized by a rigidity of tempo quite unlike human vocal and gestural synchronization. Some relevant discussion is given in Patel (2010, Chapter 7).
Day-O’Connell opens up discussion of melody and harmony, which I chose to leave unaddressed, as they are elements that most clearly differentiate music from speech generally, and joint speech in particular. There are no clear lines of demarcation here, and Day-O’Connell correctly points out that novel considerations, such as affect and the coordination of difference enter and enrich the debate, drawing our attention to new dimensions of the collective activity that brings forth music, not least to considerations of aesthetics and pleasure. I confess that my starting point as a phonetician leaves me less competent to comment here, but it is my hope that the nonce erasure of imagined boundaries between the domains of speech and song, and more broadly between music and language will encourage others, more qualified than I, to enter the fray.
The suggestion that the working definition of uttering “the same thing at the same time” might usefully be extended to enlarge the set of relevant phenomena is entirely apt and brings many more potential issues to the fore. Beyond enlarging the debate to include more canonical forms of music making, I look forward to finding ways to introduce neglected topics such as synchronized breathing in ritual, or the manner in which mantras are employed to the broad discussion. I thank both commentators and relish the extension of this debate in the future.