In his article “The Territory Between Speech and Song: A Joint Speech Perspective,” Cummins (2020) argues that research has failed to adequately recognize an important category of vocal activity that falls outside of the domains of language and music, at least as they are typically defined. This category, referred to by Cummins as joint speech, spans a range of vocal activity so broad that it is not possible to define it using musical or phonetic terms. Instead, the feature that draws the varied examples together is vocal activity that is coordinated across participants and embedded in a physical and social context. In this invited commentary, I argue that although joint speech adds an important thread to the discourse on the relations between speech and song by putting an emphasis on the collective, it is ultimately related to a wider class of joint action phenomena found in the animal kingdom.

Cummins (2020) makes no claims about the biological origins of joint speech and leaves open the possibility that it is an entirely human invention. However, in assessing its value as a new category of behavioral investigation, it seems prudent to further contemplate inclusion criteria, and its connection to non-human behavior. As a starting point, it is worth noting that all of the examples described by Cummins involve movements that are more or less coordinated in time across members of a group. One example he cites involves coordination in time without the presence of a discernable beat (an oath swearing ceremony where the timing of individual members is not highly coordinated). The rest of the examples he provides would appear to have a beat with varying levels of beat salience and/or metrical structure—i.e., a clear beat without meter (Gregorian chant), a weak beat in which a metrical structure emerges over time (political song), and a strong beat in which a metrical structure is evident from the outset (choral singing). In addition, the vocal activity may occur with or without the assistance of written text and notation, but it is always emergent and dynamic, influenced by the embodiment of its participants and the environment in which they are situated. Accordingly, while beat, metrical structure, and notation are not inclusion criteria, coordinated vocal activity, dynamic sharing of information, and specificity of physical and social context are inclusion criteria.

I would argue that with the exception of coordinated vocal activity, these inclusion criteria strongly resemble joint action—defined here as movement that is shaped by the physical and social context, and more or less coordinated in time and space. In the case of vocal movement, the temporal coordination may be thought of as the pacing of utterances, while the spatial coordination may be thought of as the shaping of the vocal tract, the tensing of the vocal folds, and the swaying of bodies. The collective vocal movement can be speech-like or song-like but the mere act of producing these movements collectively pushes them closer to song by regularizing timing and stretching vowels. One unique and noteworthy aspect of vocally based joint action is that it is possible to observe coordination across the group while being embedded within it as an actor. This ability for an actor to participate in, as well as observe, coordination across the group is necessarily limited when it comes to joint movements that are primarily observable through vision, as is the case in dance. Our vision provides us with a limited field of view and is further hampered by occlusion effects, while our hearing allows us to “hear in all directions.”

Nevertheless, it is not very difficult to conceive of joint speech as but one family of examples, of a wider class of joint action phenomena found not only in humans but also more broadly in the animal kingdom. For example, a flock of 400 birds moving at high speed can change its collective direction in as little as half a second without incident (Attanasi et al., 2014). This type of phenomenon depends on some form of sensorimotor information that is coordinated between co-actors. Although sensorimotor coordination can be observed at the level of the group (flock), its origin always starts at the level of local clusters where there is an adjustment to the flight path (typically in response to a threat). These local changes then propagate in wave-like fashion to envelop the entire flock. Similar forms of sensorimotor coordination can be observed in swarming ants, schooling fish, and herding sheep.

Figure 1.

Cummins conceptualizes joint speech as vocal activity falling somewhere along the speech-song continuum. The activity is coordinated across participants and embedded in a physical and social context. Joint speech may also be conceptualized as a class of joint action phenomena. While all of the classes depend upon sensorimotor information that is shared between co-actors, they vary with respect to the specific sensory systems involved and the extent to which they involve neural entrainment. Joint speech is primarily auditory and tends to involve a high level of neural entrainment.

Figure 1.

Cummins conceptualizes joint speech as vocal activity falling somewhere along the speech-song continuum. The activity is coordinated across participants and embedded in a physical and social context. Joint speech may also be conceptualized as a class of joint action phenomena. While all of the classes depend upon sensorimotor information that is shared between co-actors, they vary with respect to the specific sensory systems involved and the extent to which they involve neural entrainment. Joint speech is primarily auditory and tends to involve a high level of neural entrainment.

The ability of a flock to execute joint action has been well described using a limited set of mathematical rules that do not require the need for a beat (Attanasi et al., 2014; Sumpter et al., 2018). Other collective behaviors that may be observed in the animal kingdom fall closer to rhythmic activity found in human music making. For example, coordinated oscillatory activity can be observed in some species of frogs (Jones, Jones, & Ratnam, 2014), crickets (Greenfield & Roizen, 1993; Sismondo, 1990), and fireflies (Buck & Buck, 1968). Although this coordinated oscillatory activity is executed without a leader and thought to depend on sensorimotor coordination, it does not appear to have the same level of flexibility as beat-like phenomena observed in vocal learning species such as songbirds, parrots, and humans (Patel, 2006). The beat is ultimately a psychological construct (London, 2012) that depends upon the entrainment of internal neural oscillators (e.g., the basal ganglia in humans; Grahn, 2009). Once neural entrainment has been established across the collective, sharing of sensorimotor information is greatly facilitated, which in turn allows for some flexibility in expression at the individual level.

By considering joint speech in the context of joint action (Vesper et al., 2017), it may be possible to better encompass the varied mechanisms that underpin joint speech. The most basic mechanism is a common goal (e.g., recitation of an oath; cf. flying in the same direction). In most cases there is some form of sensorimotor coordination (e.g., rate accommodation; cf. adjustments to flight path). In rarer instances, a beat is clearly present (e.g., choral singing; cf. dancing). The presence of a beat affords an additional level of flexibility in the coordinated oscillatory activity allowing for flexibility inclusive of information exchanged across the senses (Russo, 2019).

From an evolutionary standpoint, joint action has been interpreted with respect to survival. The term “selfish herd” was coined by evolutionary Biologist William Hamilton (1971) to describe the tendency for groups of con-specifics to clump together to avoid predation. There exists experimental evidence that this behavioral tendency is effective against predator threats (Treherne & Foster, 1981) and that a predator is more likely to target prey possessing weaker coordination (Ioannou, Guttal, & Couzin, 2012). From this perspective, it is easy to think about joint speech as a means of building up group resiliency or warding away threats to the tribe. There is mounting evidence that joint action in the form of singing or drumming has the capacity to enhance trust, feelings of social connectedness, and prosociality (Cirelli, Einarson, & Trainor, 2014; Cross, Turgeon & Atherton, 2019; Good & Russo, 2016; Good, Choma, & Russo, 2017; Hove & Risen, 2009; Kirschner & Tomasello, 2010; Tunçgenç & Cohen, 2016; Valdesolo, Ouyang & DeSteno, 2010; Wiltermuth & Heath, 2009).

In summary, I have argued that joint speech may be better understood as but one family of examples of a wider class of joint action phenomena. Joint speech has elements of music making but dispels with the notion of performer/audience, focusing instead on the social contexts that enable coordination between co-actors and the resultant grounding of collectives. This is a refreshing and important view of music for music science to embrace. I would encourage further elaboration of this theory, with particular consideration regarding the relation of joint speech to other forms of joint action.

References

References
Attanasi
,
A.
,
Cavagna
,
A.
,
Del Castello
,
L.
,
Giardina
,
I.
,
Grigera
,
T. S.
,
Jelić
,
A.
, et al (
2014
).
Information transfer and behavioural inertia in starling flocks
.
Nature Physics
,
10
,
691
696
.
Buck
,
J.
, &
Buck
,
E.
(
1968
).
Mechanism of rhythmic synchronous flashing of fireflies: Fireflies of Southeast Asia may use anticipatory time-meaning in synchronizing their flashing
.
Science
,
159
,
1319
1327
.
Cirelli
,
L. K.
,
Einarson
,
K. M.
, &
Trainor
,
L. J.
(
2014
).
Interpersonal synchrony increases prosocial behavior in infants
.
Developmental Science
,
17
,
1003
1011
.
Cross
,
L.
,
Turgeon
,
M.
, &
Atherton
,
G.
(
2019
).
How moving together binds us together: The social consequences of interpersonal entrainment and group processes
.
Open Psychology
,
1
,
273
302
.
Cummins
,
F.
(
2020
).
The territory between speech and song: A joint speech perspective
.
Music Perception
,
37
,
347
358
.
Good
,
A.
, &
Russo
,
F. A.
(
2016
).
Singing promotes cooperation in a diverse group of children
.
Social Psychology
,
47
,
340
344
.
Good
,
A.
,
Choma
,
B.
, &
Russo
,
F. A.
(
2017
).
Movement synchrony influences intergroup relations in a minimal groups paradigm
.
Basic and Applied Social Psychology
,
39
,
231
238
.
Grahn
,
J.
(
2009
).
The role of the basal ganglia in beat perception
.
Annals of the New York Academy of Sciences
,
1169
,
35
45
.
Hamilton
,
W. D.
(
1971
).
Geometry for the selfish herd
.
Journal of theoretical Biology
,
31
,
295
311
.
Hove
,
M. J.
, &
Risen
,
J. L.
(
2009
).
It’s all in the timing: Interpersonal synchrony increases affiliation
.
Social Cogntion
,
27
,
949
960
.
Ioannou
,
C. C.
,
Guttal
,
V.
, &
Couzin
,
I. D.
(
2012
).
Predatory fish select for coordinated collective motion in virtual prey
.
Science
,
337
,
1212
1215
.
Jones
,
D. L.
,
Jones
,
R. L.
, &
Ratnam
,
R.
(
2014
).
Calling dynamics and call synchronization in a local group of unison bout callers
.
Journal of Comparative Physiology A
,
200
,
93
107
.
Kirschner
,
S.
, &
Tomasello
,
M.
(
2010
).
Joint music making promotes prosocial behavior in 4-year-old children
.
Evolution and Human Behavior
,
31
,
354
364
.
Greenfield
,
M. D.
, &
Roizen
,
I.
(
1993
).
Katydid synchronous chorusing is an evolutionarily stable outcome of female choice
.
Nature
,
364
,
618
620
.
London
,
J.
(
2012
).
Hearing in time: Psychological aspects of musical meter
.
Oxford, UK
:
Oxford University Press
.
Patel
,
A. D.
(
2006
).
Musical rhythm, linguistic rhythm, and human evolution
.
Music Perception
,
24
,
99
104
.
Russo
,
F. A.
(
2019
).
Multisensory processing in music
. In
M.
Thaut
&
D.
Hodges
(Eds.),
Handbook of music and brain research
.
Oxford, UK
:
Oxford University Press
.
DOI: 10.1093/oxfordhb/9780198804123.013.10
Sismondo
,
E.
(
1990
).
Synchronous, alternating, and phase-locked stridulation by a tropical katydid
.
Science
,
249
,
55
58
.
Sumpter
,
D. J.
,
Szorkovszky
,
A.
,
Kotrschal
,
A.
,
Kolm
,
N.
, &
Herbert-Read
,
J. E.
(
2018
).
Using activity and sociability to characterize collective motion
.
Philosophical Transactions of the Royal Society B: Biological Sciences
,
373
,
20170015
.
Treherne
,
J. E.
, &
Foster
,
W. A.
(
1981
).
Group transmission of predator avoidance behaviour in a marine insect: The Trafalgar effect
.
Animal Behaviour
,
29
,
911
917
.
Tunçgenç
,
B.
, &
Cohen
,
E.
(
2016
).
Movement synchrony forges social bonds across group divides
.
Frontiers in Psychology
,
7
,
782
.
Valdesolo
,
P.
,
Ouyang
,
J.
, &
DeSteno
,
D.
(
2010
).
The rhythm of joint action: Synchrony promotes cooperative ability
.
Journal of Experimental Social Psychology
,
46
,
693
695
.
Vesper
,
C.
,
Abramova
,
E.
,
Bütepage
,
J.
,
Ciardo
,
F.
,
Crossey
,
B.
,
Effenberg
,
A.
, et al (
2017
).
Joint action: Mental representations, shared information and general mechanisms for coordinating with others
.
Frontiers in Psychology
,
7
,
2039
.
Wiltermuth
,
S. S.
, &
Heath
,
C.
(
2009
).
Synchrony and cooperation
.
Psychological Science
,
20
,
1
5
.