Return to list
Voice and the laryngeal system
John Esling | University of Victoria

The word ‘voice’ derives from the Latin ‹vōx, vōcis›, which designates the sound produced in speaking or singing, that is, ‘a voice, a cry, a call’. In parallel, since ‹vōx› implies the sounds of speech that phoneticians and linguists would now call ‘voiced’ (with vibration of the ‘vocal folds’) as opposed to ‘voiceless’ (without vibration of the vocal folds), the Latin term for ‘a vowel’ is ‹littera vōcālis›, literally ‘a letter having voice’. We would now say that letters of the alphabet (including the phonetic alphabet) that represent vowel sounds refer to vocalizations that are by and large almost always ‘voiced’, except in some contexts or languages where they become ‘devoiced’. The majority of sound that we produce in our speech is made by generating an airstream from the lungs that goes through the larynx (in the throat) but which, unlike in quiet breathing, is modulated by the vibration of the vocal folds at the ‘glottis’ (the opening at the bottom of the laryngeal mechanism). In phonetics, this phenomenon is called ‘voicing’, and the sounds produced with vocal fold vibration are called ‘voiced’. The many sounds produced without voicing are primarily consonants (although vowels can also be devoiced). Over the duration of a spoken utterance, however, the great majority of time will be filled by sounds that are voiced, especially the vowels that are at the core of the syllables.

Voicing occurs at the top of the trachea (the windpipe) at the entry to the laryngeal articulatory mechanism. To make the vocal folds vibrate, the primary intrinsic laryngeal muscles at the bottom of the epilaryngeal tube (see Figure 1) contract to bring the muscular and mucosal-tissue-covered folds in line at both ends of the glottis. This action does not start the vocal folds vibrating; it only approximates them to each other, with a gap at the midline, in a state called ‘prephonation’. When the airstream passes through this gap, between the folds, vibration occurs by the physical principle known as the ‘Bernoulli effect’. This action of the adductory muscles (those acting to approximate the folds) plus the aerodynamics of the airstream generates ‘phonation’. The neutral modality of phonation is usually called ‘modal voice’. But there is not just one kind of phonation; there are many types that are non-modal, and the ‘pitch’ of the voice (the auditory sensation caused by the changing frequencies of vibration of the vocal folds) also interacts to produce different qualities of sound, which are discussed in the entry on Voice Quality. This array of qualitative distinctions of the voice is primarily a function of how the laryngeal articulatory mechanism works.

Unlike the tongue in the oral vocal tract, the larynx (the laryngeal articulatory mechanism) in the lower vocal tract, does not point at various points of articulation in the mouth like a hydrostatic balloon, as the tongue is. The laryngeal mechanism is like a fist that opens to enable the intake of air (a breath) and that closes or folds in on itself to gradually close off the airway, shaping the array of configurations responsible for variations in voice quality and ultimately shutting off the passageway to the lungs altogether, in order to protect the respiratory system (as depicted in Figure 1). Directly above the glottis, the epilaryngeal tube changes its degree of opening as a function of the tension applied by the aryepiglottic sphincter mechanism at the top of the tube. So the vocal folds are at the base of the tube, generating primary vibrations (or degrees of turbulent air, if there is no voicing). The ventricular folds (pads) are above the ventricle that separates them from the vocal folds, causing the vocal folds to stop vibrating with sufficient inward and vertical pressure. And the aryepiglottic folds (bands) are at the summit of the tube, adding multiple non-modal qualities to the speech signal and eventually sphinctering shut the airway for protective closure.

Figure 1. The laryngeal system at the bottom of the vocal tract. 1 = the epilaryngeal tube, with the glottis at its base and the aryepiglottic folds at its top (which is the level where laryngeal constriction is manipulated). 2 = the pharyngeal isthmus, which responds to laryngeal constriction and to vowel quality. 3 = the oropharyngeal isthmus or velic sphincter in the oral tract, which can serve as a measure of tongue raising and of the effects of laryngeal constriction in the context of different vowels. Drawing from Esling et al. (2019: 9). Reproduced with permission of CUP through PLSclear.

The core of the larynx consists of three cartilaginous structures made of hyaline cartilage (‘glassy’ from the Greek): the ‘thyroid’, ‘cricoid’ and ‘arytenoid’ cartilages. The larger thyroid cartilage (a double-door shaped structure, from the Greek), is at the front; its anterior prominence forms the ‘Adam’s Apple’ in the neck. The thyroid cartilage sits atop the cricoid cartilage (signet ring, from the Greek), the front of which has a low aspect and the rear of which has a high aspect; two joints at the sides attach to the thyroid so that the two cartilages tilt forwards or backwards relative to each other. At the top of the high end of the cricoid, at the back, sit the two arytenoid cartilages (ladle, from the Greek), each of which has a pitted shape at the bottom, back and sides for the crico-arytenoid joint and for muscular attachments, a forward-pointing ‘process’ (protrusion) at the front, to which each vocal fold is connected, running forwards to attach to the thyroid, and a vertical arm extending upwards topped by a small ‘corniculate’ (‘horned’/elastic) cartilage, which is where the aryepiglottic folds begin. The aryepiglottic folds extend forwards and upwards to the sides of the epiglottis, another, larger elastic cartilage arising from the thyroid and attaching to the base of the tongue. There is an ‘elbow’ in each aryepiglottic fold, not far from the arytenoids, consisting of the ‘cuneiform’ (wedge-shaped/elastic) cartilage, to give stability to the folds and to assist in their angular closure forwards and upwards towards the tubercle or base of the epiglottis in order to constrict and ultimately seal off the airway. When the aryepiglottic folds are not constricted, they lie apart in an ‘inverted-V’ configuration over the top of the epilaryngeal tube and above the glottis, which has a smaller ‘V’ shape when it is open (as seen in Figure 2). When the aryepiglottic folds constrict, they begin by bending inwards slightly (as seen in Figure 2 for voicing); as constriction increases they bend in a right angle and push close to each other, near the midline, compressing forwards and upwards to approximate the epiglottis. The fact that the corniculate, cuneiform, and epiglottal cartilages are elastic and covered by mucous membrane, forming the upper ring of the epilaryngeal tube (or ‘sphincter’ of the larynx) means that sealing off the airway against the tubercle of the epiglottis is quite effective when constriction is complete. For a complete view of where these structures are located and how they interact with each other, download the Laryngonaut app developed by Scott Moisik (2023).

Figure 2. The laryngeal mechanism as viewed through a laryngoscope: on the left, in the state of ‘breath’ with the glottis slightly abducted; on the right, in the state of ‘voice’ with the vocal folds adducted and vibrating. tr=trachea, t=vocal (true) folds; the opening between the vocal folds above the trachea is the glottis; f=ventricular (false) folds, m=inner mucosa of the epilarynx, k=corniculate tubercles, c=cuneiform tubercles, ae=aryepiglottic folds, et=tubercle of epiglottis, e=epiglottis, ea=apex of epiglottis, pf=piriform fossae, ppw=posterior pharyngeal wall. Dashed lines = medial edge of ventricular folds; dotted lines = anterior edge of aryepiglottic folds (margin of upper epilarynx). Drawing from Esling et al. (2019: 39). Reproduced with permission of CUP through PLSclear.

Two fundamental actions characterize the movement of these assembled laryngeal structures. One is the control of pitch, to stretch, elongate, and tension the vocal folds themselves, in order to increase pitch (raising their frequency of vibration). The other is the action of constriction, engineered by the sphinctering of the aryepiglottic mechanism pulling forwards and upwards from the arytenoids at the back to reduce the volume of the epilaryngeal tube and which also has the effect of shortening and bunching the vocal folds, decreasing pitch (lowering their frequency of vibration). Contracting the crico-thyroid (CT) muscles, which attach from the front of the ring of the cricoid upwards to the anterior portion of the thyroid, shortens the distance between the two cartilages at the front, causing them to tilt and stretching the vocal folds, which are attached to the thyroid at the front (which rocks forwards) and to the arytenoids at the back on top of the cricoid (which rocks backwards). The external thyro-arytenoid (TA) muscles, above the vocal folds, within the walls of the epilaryngeal tube, connect upwards to the aryepiglottic folds; contracting them bends the folds medially towards each other, pulling them and the arytenoid complex towards the epiglottic tubercle and the thyroid at the front. The CT action increases pitch, and the TA action ostensibly decreases pitch while also tightening the airway and restricting the flow of air, resulting in perturbations to phonation and introducing further sources of vibration, making the ‘speech signal’ more complex.

The main function of the antero-posterior, anchored rocking action of the laryngeal mechanism is, therefore, to create longitudinal tension in the vocal folds, for the control of pitch. The main function of the postero-anterior, compressive constricting action of the aryepiglottic sphincter mechanism is, therefore, (a) physiologically, to restrict and then close off the airway to protect the lungs and (b) phonetically, to create the conditions for increasingly complex vibrations and cavity resonances that characterize non-modal phonation types. This second compression/constriction action can be called ‘the laryngeal articulator’, since it shapes sounds in the place of articulation known as ‘pharyngeal/epiglottal’ (in addition to ‘glottal’), just as the tongue is the main articulator shaping sounds in the oral cavity from uvular to dental. The folding of the laryngeal articulator is initiated even with simple adduction for modal voice. As the lateral crico-arytenoid muscles swivel the ‘vocal processes’ of the arytenoids medially towards each other, the vocal folds adduct; the interarytenoid muscles hold the arytenoid complex in place (both sides at the mid line), vibration occurs with the controlled exhalation of air, and phonation results as a periodic signal (acoustic sound waves that are propagated into the surrounding air). The effects of cavity resonance begin in the epilaryngeal tube itself and continue to ‘filter’ the acoustic signal (to shape the patterns of energy concentration in the spectrum) through the pharynx, the oral and nasal cavities and the lips.

There are two significant articulatory actions that accompany laryngeal constriction per se: tongue retraction and larynx raising. They always occur together in swallowing, reflexively, but in speech they can be controlled to a somewhat greater degree. In the natural, reflexive engagement of the laryngeal constrictor, especially in cases such as choking, gagging, and swallowing, the tongue is inherently retracted towards the pharynx/larynx, while the larynx itself is raised in height. In cases where laryngeally produced sounds occur in the speech sound repertory of a language, these effects are also observed, although lingual postures may be somewhat altered, and larynx height may vary, as acquired patterns. This means that different vowels in different languages will have different postural synergies between the upper/oral vocal tract (the shaping of the tongue and position of the jaw) and the lower/laryngeal vocal tract (the degree of tightness of the laryngeal constrictor and the regulation of the height of the larynx). This implies that the vowels configurations and the postures of the laryngeal articulator are related in any language but that the ‘choices’ of a given language will exploit different synergies than any other. Some languages will have more contrastive adjustments in the lower/laryngeal vocal tract than others in forming speech sounds.

Some linguistic systems, like English and French, have minimal speech-sound distinctions involving the laryngeal articulator, although vowel susceptibility can be found, especially in enhanced prosodic (conversational) situations. Such minimal effects might include slightly greater constriction (tightening of all three laryngeal components) accompanying open vowels, e.g. [a, ɑ], and the contrary, unconstricted posture (opening of the epilaryngeal tube and lowering of the larynx) accompanying close vowels, e.g. [i, u]. Some linguistic systems, like Tigrinya, Nlaka’pamuxcín, and Iraqi Arabic, have consonants in their inventories that depend on frequent, rapid laryngeal constriction, e.g. [ʡ, ʕ] or [ħ]. Other systems, such as !Xóõ, have vowel sequences that depend on ‘sphincteric’ aryepiglottic trilling along with open ‘breathy’ phonation in the same syllable, which are forms that might contrast in other languages. Danish has a prosodic effect where certain word forms acquire progressive constriction, realized as laryngealization (irregular vibrations, such as creaky voice) or glottalization (the presence of glottal stop). Many Tibeto-Burman languages characteristically contrast an open (unconstricted) series with a tight (constricted) series of syllables: Mpi realizes the constricted series with creaky phonation and ‘pharyngealized’ resonance, which is actually a quality that could be identified as an ‘aryepiglotto-epiglottal approximant’; Nuosu Yi realizes the constricted series as strong ‘pharyngealized’ resonance; Jianchuan Bai has a complex system of tonal registers where phonation type and resonance interact with pitch and enhanced laryngeal vibrations depending on tone height. ‘ATR’ languages, such as Akan or Kabiye, realize the ‘RTR’ series as ‘pharyngealized’ resonance with a concomitant effect on vowel quality. Some languages, like Somali or Dinka, have a register system where some syllables are constricted and some unconstricted or realized with lowered larynx. Still other languages, like Hanoi Vietnamese, have tonal systems where only two tones of the six might contrast in phonation and in glottalization, in addition to pitch elements.

The larynx is a complex articulatory system with multiple functions. Its fundamental physiological purposes are to sustain life by permitting breath (to the extent of full opening of the valve) and by enabling closure of the airway (when the laryngeal constrictor mechanism is activated to its fullest extent). From a species perspective, it is the mechanism that houses vocal fold vibration, which gives us ‘voice’ from cry to vowels to other contrastive speech sounds; every language in the world has a distinction between voiced and voiceless speech sounds. As infants, we learn phonetic contrast within the articulatory mechanism of the larynx during our first several months of life; manners of articulation such as stop, trill, fricative, approximant are all present and illustrate a pattern that can be applied to sounds made with the tongue in the oral vocal tract in successive developmental stages (see the entry on L1 Prosodic Development). The larynx balances multiple sources of vibration with spectral modulation resulting from varying the resonating cavities of the lower vocal tract, and these configurations are complemented by parallel configurations of the articulators of the upper vocal tract. The laryngeal system provides secondary (background) vibrations and resonances for primary-stricture articulations; although secondary articulations can also become primary articulations. The articulatory facility of the larynx interacts with its pitch-regulation capacity to add register (quality) to tone or to prosodic elements of speech, e.g. within intonation. The laryngeal system moderates vowel quality interactively within the voice quality of the individual. And the larynx provides inherent compensatory mechanisms in the event of damage to another part of the mechanism, e.g. aryepiglottic trilling can substitute for glottal vocal fold vibration in cases of ‘loss of voice’ (see the entry on Vocal Disorders and Rehabilitation).


References

Calamai, Silvia & Chiara Celata. (In prep.). Manuale di scienze del parlato. Roma: Carocci.

Carlson, Barry F. & John H. Esling (2003). Phonetics and physiology of the historical shift of uvulars to pharyngeals in Nuuchahnulth (Nootka). Journal of the International Phonetic Association, 33, 183–193.

Catford, J. C. (1964). Phonation types: The classification of some laryngeal components of speech production. In D. Abercrombie, D. B. Fry, P. A. D. MacCarthy, N. C. Scott & J. L. M. Trim (Eds.), In Honour of Daniel Jones. London: Longmans, Green & Co. Ltd., pp. 26–37.

Coey, Christopher, John H. Esling & Scott R. Moisik. (2014). iPA Phonetics, Version 1.0 [2014]. [app on the Apple Store]

Denes, Peter B. & Elliot N. Pinson. (1963). The Speech Chain: The physics and Biology of Spoken Language (2nd ed.). Long Grove, IL: Waveland Press.

Edmondson, Jerold A. & John H. Esling. (2006). The valves of the throat and their functioning in tone, vocal register, and stress: Laryngoscopic case studies. Phonology 23, 157–191.

Edmondson, Jerold A., John H. Esling & Li Shaoni (李绍尼). (2021). Jianchuan Bai. Journal of the International Phonetic Association, 51, 490–501. © First View, 27 April 2020. DOI: https://doi.org/10.1017/S0025100319000379

Edmondson, Jerold A., John H. Esling & Ziwo LAMA (拉玛兹偓). (2017). Nuosu Yi. Journal of the International Phonetic Association, 47, 87-97. © First View, 18 March 2016. http://dx.doi.org/10.1017/S0025100315000444

Edmondson, Jerold A., Cécile M. Padayodi, Zeki Majeed Hassan & John H. Esling. (2007). The laryngeal articulator: Source and resonator. In J. Trouvain & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences, vol. 3 (pp. 2065–2068). Saarbrücken: Universität des Saarlandes.

Esling, John H. (1996). Pharyngeal consonants and the aryepiglottic sphincter. Journal of the International Phonetic Association 26, 65–88.

Esling, John H. (1999). The IPA categories “pharyngeal” and “epiglottal”: Laryngoscopic observations of pharyngeal articulations and larynx height. Language and Speech 42, 349–372.

Esling, John H. (2003). Glottal and epiglottal stop in Wakashan, Salish and Semitic. Proceedings of the 15th International Congress of Phonetic Sciences, vol. 2, pp. 1707–1710. Barcelona: UAB.

Esling, John H. (2005). There are no back vowels: The Laryngeal Articulator Model. The Canadian Journal of Linguistics / La revue canadienne de linguistique, 50, 13–44.

Esling, John H. (2010). Phonetic notation. In William J. Hardcastle, John Laver & Fiona E. Gibbon (Eds.), The Handbook of Phonetic Sciences, 2nd ed. (pp. 678–702). Oxford: Wiley-Blackwell.

Esling, John H. (2013). Voice and phonation. In Mark J. Jones & Rachael-Anne Knight (Eds.), The Bloomsbury Companion to Phonetics (pp. 110–125). London: Bloomsbury.

Esling, John H., Katherine E. Fraser & Jimmy G. Harris. (2005). Glottal stop, glottalized resonants, and pharyngeals: a reinterpretation with evidence from a laryngoscopic study of Nuuchahnulth (Nootka). Journal of Phonetics, 33, 383–410.

Esling, John H. & Scott R. Moisik (2022). Voice quality. In Rachael-Anne Knight & Jane Setter (Eds.), The Cambridge Handbook of Phonetics (pp. 237–257). Cambridge: Cambridge University Press.

Esling, John H., Scott R. Moisik, Allison Benner & Lise Crevier-Buchman. (2019). Voice Quality: The Laryngeal Articulator Model. Cambridge: Cambridge University Press.

Esling, John H., Scott R. Moisik & Christopher Coey. (2015). ‘iPA Phonetics: Multimodal iOS application for phonetics instruction and practice’. Proceedings of the 18th International Congress of Phonetic Sciences (paper 263). Glasgow.

Fink, B. Raymond. (1974). Folding mechanism of the human larynx. Acta Oto-laryngologica 78(1-6), 124–128.

Fink, B. Raymond. (1975). The Human Larynx: A Functional Study. New York: Raven Press.

Fuks, Leonardo, Britta Hammarberg & Johan Sundberg (1998). A self-sustained vocal-ventricular phonation mode: Acoustical, aerodynamic and glottographic evidences. KTH TMH-QPSR 3/1998, Stockholm, 49–59.

Gauffin, Jan (1977). Mechanisms of larynx tube constriction. Phonetica 34, 307–309.

Gerratt, Bruce R. & Jody Kreiman (2001). Toward a taxonomy of nonmodal phonation. Journal of Phonetics 29, 365–381.

Hillel, A. D. (2001). The study of laryngeal muscle activity in normal human subjects and in patients with laryngeal dystonia using multiple fine-wire electromyography. The Laryngoscope, 111, 1–47.

Ladefoged, Peter. (1962). Elements of Acoustic Phonetics. Chicago: University of Chicago Press.

Laitman, Jeffrey T. & Joy S. Reidenberg (2009). Evolution of the human larynx: Nature’s great experiment. In Marvin P. Fried & Alfio Ferlito (Eds.) The Larynx (vol. 1). San Diego: Plural Publishing Inc., pp. 19–38.

Lindblom, Björn (2009). Laryngeal mechanisms in speech: The contributions of Jan Gauffin. Logopedics Phoniatrics Vocology 34, 149–156.

Moisik, Scott R. (2023). Laryngonaut (Version 1.0.6) https://sites.google.com/view/laryngonaut/home [Mobile App]. Google Play Store.

Moisik, Scott R., Ewa Czaykowska-Higgins & John H. Esling. (2021). Phonological potentials and the lower vocal tract. Journal of the International Phonetic Association, 51, 1–35. © First View, 16 April 2019. doi.org/10.1017/S0025100318000403.

Moisik, Scott R. & John H. Esling. (2011). The ‘whole larynx’ approach to laryngeal features. Proceedings of the 17th International Congress of Phonetic Sciences (pp. 1406–1409). Hong Kong: CUHK.

Moisik, Scott R. & John H. Esling. (2014). Modeling the biomechanical influence of epilaryngeal stricture on the vocal folds: a low-dimensional model of vocal-ventricular fold coupling. Journal of Speech, Language, and Hearing Research, doi: 10.1044/2014_JSLHR-S-12-0279.

Moisik, Scott R., Esling, John H. & Crevier-Buchman, Lise. (2010). A high-speed laryngoscopic investigation of aryepiglottic trilling. Journal of the Acoustical Society of America, 127(3), 1548–1559.

Moisik, Scott R., John H. Esling, Lise Crevier-Buchman, Angélique Amelot & Philippe Halimi. (2015). Multimodal imaging of glottal stop and creaky voice: Evaluating the role of epilaryngeal constriction. Proceedings of the 18th International Congress of Phonetic Sciences (paper 247). Glasgow.

Moisik, Scott R., Hua Lin & John H. Esling. (2014). A study of laryngeal gestures in Mandarin citation tones using simultaneous laryngoscopy and laryngeal ultrasound (SLLUS). Journal of the International Phonetic Association, 44, 21–58.

Negus, Victor E. (1949). The Comparative Anatomy and Physiology of the Larynx. London: William Heinemann Medical Books Ltd. Reprinted (1962).

Padayodi, Cécile M. (2008). Kabiye. Journal of the International Phonetic Association 38, 215–221.

Reidenbach, Martina M. (1998). The muscular tissue of the vestibular folds of the larynx. European Archives of Oto-Rhino-Laryngology 255, 365–367.

Sakakibara, Ken-Ichi, Leonardo Fuks, Hiroshi Imagawa and Niro Tayama (2004). Growl voice in ethnic and pop styles. Proceedings of the International Symposium on Musical Acoustics (ISMA2004), Nara, Japan.

Titze, Ingo R. (2006). The Myoelastic Aerodynamic Theory of Phonation. Iowa City: National Center for Voice and Speech.

van den Berg, Janwillem, William Vennard, D. Burger & C. C. Shervanian (1960). Voice Production: The Vibrating Larynx. [Film]. Groningen University, Medical Physics Department. Utrecht: Stichting Film en Wetenschap, Universitaire Film.