Return to list
Joint Speech as a challenge to our understanding of language
Fred Cummins | UCD School of Computer Science

1 Introducing joint speech

You already know what joint speech is, and you have participated in joint speech throughout your life. Joint speech is found whenever multiple people utter the same sounds at the same time. That’s a definition, and it works. You can now identify joint speech with confidence (unlike most theoretical constructs in linguistics). In many situations, we might describe it as chant—a word that evokes the cries of protesters in the street, the revels of football fans in the stadium, or the plainsong of Benedictine monks. Perhaps you do none of these things, but you probably sing Happy Birthday to family members, or cry “yes” in unison when encouraged by an enthusiastic street performer or master of ceremonies. We find joint speech in the schoolroom, where the adults ensure that the children collectively recite the socially respected texts (multiplication tables, pledges of allegiance) but also in the playground where joint speech is the means by which a group of children turn on an outsider and bully them (nya-nya-nya-nya-nya). In all these cases, and many more, we are faced with something that is both entirely familiar, and, as a linguist, utterly perplexing.

A linguist studies language. But that assumes we know what language is. The term “language” serves at least two entirely distinct purposes, and these have been confused in the literature. Turning our attention to joint speech can make this clear.

On the one hand, “language” is used to describe systems of communication with names like “French” or “Yoruba.” These systems display properties that can be identified, and describing such properties is the daily toil of most linguists. Such systems are necessarily normative, in that we identify the system by distinguishing what its regularities are, what its boundaries are, which strings of words belong within the system and which are inadmissable, what its atomic units are, and so on. Any and every time that a linguistics paper discusses an entity like “French” or “Yoruba,” the term “language” is being used in this sense. To most, this is what linguistics is concerned with.

But the term “language” is used for an entirely different purpose. We use it to describe a property of our species that, to us at least, appears to set us apart from animals (or to make us a special kind of animal?). This innovation, unique to our species, allows forms of coordination and activity that have generated the human world, with all its variety, its glories and all of its problems, up to and including the potential destruction of the planet. So we need to understand the unique innovation that has singled us out like this. The above sense of “language” (“French!”) simply will not, and can not, inform us about this unique form of behaviour. Yet we use the term “language” for this too.

Here is where joint speech can help us to keep things clearer, and to separate these two importantly different senses of “language.” The view of language-as-system can only aspire to describing patterning in human activity that arises after the formation of complex socially-structured societies. Only in such societies does it make sense to speak of an utterance being grammatical, or a sequence well-formed, or to distinguish one set of rules from another. Complex human societies are a recent innovation in evolutionary terms. It is still not the case that everyone lives in such a society, and if we want to understand the trajectory of our species, and what so profoundly transformed our way of living on this planet, the last 5,000 years or so (far less in many areas of the world) is of marginal importance. Studying language-as-system has focussed the attention of linguists on regularities that are easier to treat of as abstract contrasts which can be more clearly represented in symbolic form such as writing than in the messy business of speech produced by specific people in specific contexts. A great deal of work that straddles the boundary between phonetics (physical, real) and phonology (symbolic, abstract) reinforces this view of language as something that can only be found by ignoring almost all of the context in which utterances occur. It also relies upon a problematic view of the person to whom an individual private mind is ascribed, as has been developed within largely Western European philosophy, politics, and culture—a nakedly colonial construction that we might learn to recognise and distrust, particularly when attempts to understand the entirety of humanity are concerned. This is not a foundation upon which we can understand the arising of homo loquens. For almost every human that ever lived, speech was the sole mode of vocal coordination, writing did not exist, and the normative practices that allow a system like “French” to arise were not in place.

When we turn to joint speech, we are introduced to a form of languaging1 that is found in every human society, that plays an integral role in bringing about the alliances and groupings that characterise the social order, that is older than writing, and that underlies ritual, prayer and assembly. Joint speech cannot be understood using the everyday terms of the linguist, for speakers and listeners are no longer distinguished, words chanted are authored elsewhere rather than creatively composed, intelligibility is largely irrelevant (and the “language” employed may not be the same as participants use for everyday purposes), and above all else, the context of uttering matters. Joint speech is found precisely where collective identities are forged and enacted. These are transient identities, but they frequently persist over centuries through their incorporation in rite, ritual and liturgy. Joint speech is at home in those locations where society is built—the temple, the school, the courtroom, the stadium. The Temple Hymn of Kesh (2,600 BCE) is arguably the oldest piece of literature in the world, coming from ancient Sumeria or present day Iraq. It is a liturgical text that shows a clear verse-chorus structure, typical for rites in which collective speaking arises. Joint speech is thus considerably older than writing.

There has been little scientific work done so far on joint speech. When we address it with the curious eyes of the human scientist, there is much to be found. In Cummins (2018) I review work in phonetics, neuroscience, movement science and pragmatics, all of which shows joint speech to be fertile ground for exploration. Once we make joint speech our object of study, many commonalities with basic musical patternings become clear, and the ground is prepared to better understand where both music and language come from in the first place. The absence of work in this area is to be attributed to the use of the term “language” in two distinct senses, of which one, language-as-system, has come to dominate. If nothing else, work on joint speech can help to remove this obstacle, so that we might better understand communicative behaviour as it arose within our species. Below I list some publications and resources devoted to the study of joint speech. The most conprehensive overview is provided by Cummins (2018). Joint speech is still a niche topic, as it poses questions that are not easy to address with the assumed background of the person as constructed by Western science. Understanding joint speech brings us a little closer to seeing ourselves as we jointly create our world through our participation in it.

1The verb “languaging” is intended to loosely refer to the many forms of coordinative and affiliative behaviours that gave rise to modern humans, and can help to distinguish these from language-as-system, or “language.” Languaging does not imply any system.

Related reading

The most comprehensive discussion of joint speech, introducing the topic, considering the scientific work done to date, and identifying the challenges the topic sets for scientists, is available in this book:

Cummins, F. (2018). The Ground From Which We Speak: Joint Speech and the Collective Subject. Cambridge Scholars.

A pdf of the book is available for free at

A collection of resources, including video examples and documentation, is available at Here are some articles on specific topics related to joint speech:

Cummins, F. (2003). Practice and performance in speech produced synchronously. Journal of Phonetics, 31(2):139–148.

Cummins, F. (2009). Rhythm as entrainment: The case of synchronous speech. Journal of Phonetics, 37(1):16–28.

Cummins, F. (2013). Towards an enactive account of action: Speaking and joint speaking as exemplary domains. Adaptive Behavior, 13(3):178–186.

Cummins, F. (2014). The remarkable unremarkableness of joint speech. In Proceedings of the 10th International Seminar on Speech Production, pages 73–77, Cologne, DE.

Cummins, F. (2018b). Joint speech as an object of empirical inquiry. Material Religion, 14(3):417–419

Cummins, F. (2019). Music, language and languaging. In Corrêa, A. F., editor, Music, Speech, and Mind. Curitiba: Brazilian Association of Cognition and Musical Arts - ABCM.

Cummins, F. (2020). The territory between speech and song: A joint speech perspective. Music Perception: An Interdisciplinary Journal, 37(4):347–358.

Jasmin, K. M., McGettigan, C., Agnew, Z. K., Lavan, N., Josephs, O., Cummins, F., and Scott, S. K. (2016). Cohesion and joint speech: Right hemisphere contributions to synchronized vocal production. The Journal of Neuroscience, 36(17):4669–4680.

von Zimmermann, J. and Richardson, D. C. (2016). Verbal synchrony and action dynamics in large groups. Frontiers in Psychology, 7:2034.