Return to list
Segmental Articulatory Phonetics
Malin Svensson Lundmark | Lund University, Sweden

Speech is often assumed to be made up of linear contrastive and discrete building blocks (e.g. /m/), produced via posture-like positions of the articulators (i.e. lips closed for /m/). But articulatory movements have a much more complicated physical reality, arranged in an overlapping multi-dimensional fashion. For example, /m/ is occlusive because of the lips pressed against each other, but it is also nasal because of a sudden shift in position far back in the vocal tract (lowering of the velum), and also voiced because of tensed vocal cords, all separate orofacial movements that overlap in a complex interaction of muscle activities. The acoustic results of the overlapping movements are features that in a spectrogram are interpretated as one segment, which consequently is a linear outcome of a much more complex production. Speech segments do exist and understanding their internal makeup and how they are produced can tell us about the part they play in speech. This chapter describes a new way of analyzing and understanding the internal makeup of segments. Further overview can be found in the Descriptive Approach to Segmental Articulations (DASA) (Svensson Lundmark & Erickson 2024; Svensson Lundmark 2024a).

Rapid movements at segment boundaries

It is generally assumed that the basic function of speech segments, as in consonants and vowels, is to make the sounds differentiate from each other and to create distinctiveness (Jakobson et al. 1969). However, the segments themselves are articulatory postures that display very little movement, and as John Ohala phrased it: Segments are dead intervals (Ohala 1992). As such they rarely present the listeners with any new information. Instead, listeners pay much more attention to sudden acoustic changes in the transitions between sound segments, which seem to provide the listener with the information necessary to decode the message (Stevens & Blumstein 1978; Ohala 1992). Indeed, studies on vowel perception (the so-called silent-center paradigm, i.e., Strange 1987; Jenkins et al. 1999) have shown that in vowel-identification tasks listeners use information of the formant transitions, i.e., resonance frequency transitions as the mouth moves from a closed position for the consonant constriction to a more open position for the vowel.

Segment transitions consist of rapid articulatory movements to and from speech postures that result in the large acoustic changes that we refer to as segment boundaries (e.g. Fant and Lindblom 1961; Gårding 1967; Zsiga 1994). This relationship is built on laws of physics: muscle activations cause jerky rapid movements which cause rapid changes in sound waves. In the case of /m/, this means jerky movements of three separate organs that are timed with one another: the lips, the velum, the vocal cords.

Figure 1. EMA (Carstens AG501) at the Lund University Humanities Laboratory (left) and a speaker with sensors glued on various articulators (right).

The forces that activate muscles of the different orofacial organs can be observed by closely monitoring movements in the oral cavity during speech. One way of doing this is by analyzing speech dynamics, e.g., when in time force is applied to movements. This can be recorded using an electromagnetic articulograph (EMA), in which an electromagnetic field picks up sensor movements in space and time (Fig 1). With EMA we can get position and speed of many articulators as they move from speech posture to speech posture, and we can estimate the activation of movements, that is, when force is added to articulatory movements by muscle activity.

Timing of jerky movements: deceleration and acceleration

In physics, when an object is moving, the rate of the position change is called velocity (i.e. how fast it is moving). When there is a change in velocity, we say that the object is accelerating, and when there is a change in acceleration the movement is jerky (velocity is the first, acceleration the second, and jerk the third derivative to position). Rapid movement changes of any object can be referred to as jerk and acceleration, which we may find e.g. in a rollercoaster or the sudden braking of a car, and in speech when an articulator changes speed and direction as it reaches or leaves a speech posture (Svensson Lundmark 2023). A change in acceleration of an object involves added force. In the case of an articulator, the direction of the force either leads to deceleration, as it reaches a speech posture, or acceleration, as the articulator leaves that posture. When it comes to the physiological reality of speech, the articulatory speed changes – deceleration and acceleration – both involve underlying positive muscle forces, of which some are used for contraction (agonist muscles), while others control with opposing forces (antagonist muscles). The amount of force added determines the speed of the articulator, and depending on the type of force added, the result of the maximum change in velocity may be a deceleration peak or an acceleration peak (Eager et al. 2016).

In Fig 2 we see the position signal (bottom curve) of an EMA sensor on the tongue tip of a Swedish speaker. The signal shows the sensor moving up/down during the word <bilar> (cars). As the speaker shapes the tongue tip constriction to produce /l/, the tip moves fast (a velocity peak marks the fastest moment) and then slows down rapidly (a deceleration peak). The tongue tip stays in position while forming a speech posture, changes direction somewhere mid-posture, and then moves rapidly away again (an acceleration peak, followed by another velocity peak). The jerk signal, on top in Fig 2, shows jerky movements in connection with the shaping of the speech posture.

Figure 2. From bottom up: speech signal with segments of the Swedish word <bilar> (cars); EMA position signal of a sensor on the tongue tip moving vertically (up/down) over time (x-axis in seconds, y-axis in mm); velocity, acceleration and jerk are first, second and third derivative to position, respectively (y-axis values represent the rate of change across the input vector, essentially using a central difference method with a 'step size' of 2, but they are not scaled by the sample interval). Circles represent the obtained EMA data; red line is the smoothed value in R we use to measure.

Recent EMA findings show that in speech production, an articulator constantly moves in and out of speech postures just like this, delimited by fast intervals, and jerky movements. Different articulators, i.e. the tip of the tongue, the lips or the jaw, make the same kind of jerky movements: a decelerating movement when the articulators approach a speech posture and a second accelerating movement when they move away from that posture (Svensson Lundmark 2023; Svensson Lundmark & Erickson 2024). These rapid movement changes correlate with segment transitions, a relationship that depends on the active part that the articulator plays in the constriction. For example, as the tongue tip is crucial for the constriction in /l/, the segment transitions are the result of the timing of the deceleration peak and acceleration peak of the tongue tip, as visualized by the vertical dotted lines in Fig 2. The phenomena have been further detailed in the Descriptive Approach to Segmental Articulations (DASA) (Svensson Lundmark & Erickson 2024; Svensson Lundmark 2024a), and appears to be robust across speakers, prosody, place of articulation, and relatively robust across manner of articulation (Svensson Lundmark 2022, 2023, 2024a; Svensson Lundmark & Frid 2023).

Different manners of articulation will result in different articulatory-acoustic relationships between the rapid movements and the segment boundaries. For example, the acceleration peak of lips and tongue tip aligns with segment offset in /m/ and /n/, while peak velocities occur well within the vowel segment. However, if we turn to constrictions with built up intra-oral pressure, as in the plosives /b/ and /p/, the velocity peak at the release of the lips co-occurs with the vowel segment onset. The acceleration peak is instead aligned with the release burst, as this marks the time the speech posture ends (Svensson Lundmark 2024a).

Consonants, vowels and syllables

While the timing of deceleration and acceleration of the consonantal articulators determine the timing of segment boundaries, the same cannot be claimed for vowels. As proposed by Öhman in 1966, consonantal articulation is of an instantaneous nature, layered on top of the vowels. This has been further developed into various models, including the C/D model by Fujimura (2000), and Articulatory Phonology (AP) by Browman and Goldstein (1989) following work by Fowler (1986, 1996). In short, the idea behind this notion of overlapping gestures is that the consonantal gestures are separate in shape and structure from the vowels. Vowels consist of slower movements than the consonants, which are fast ballistic movements, and as a result end up as shorter than the vowels. Moreover, a consonant-vowel-consonant syllable (CVC) follows the timeline of the jaw cycle: a consonant at syllable onset takes place during the jaw opening, while a consonant in coda position takes place when the jaw closes. It’s reasonable to assume very different types of movements of the active articulator in concordance with the jaw opening (syllable onset) as opposed to the jaw closing (syllable coda), as starting positions and distance to target position would vary, affecting the trajectory and speed of the articulator, which calls for different articulatory strategies in onset and coda. However, irrespective of the underlying articulatory strategies, the consonants in a CVC syllable occur on either side of the vowel, hence, the resulting acoustic vowel segment is limited by when either of its neighboring consonantal constrictions are made.

Indeed, EMA findings reveal that vowel segments are really the “leftover” distance between the acceleration peak of the consonantal articulator at syllable onset, and the deceleration peak of the consonantal articulator at syllable coda (Svensson Lundmark 2023). In fact, if we view the vocal tract configuration of the tongue for a specific vowel as vowel constriction locations (Stevens & House 1955; Wood 1979), analogous to place of articulation, we can measure deceleration and acceleration of the tongue body. In doing so we find that the tongue body speech postures of open vowels are much shorter, by 40-100 ms, than the acoustic vowel segment (Fig 3). In addition, as for Swedish, where tones may be used to contrast vowels, the effect of the falling and rising tones on vowel segment duration is well documented (e.g. Elert 1964). As vowel segments are results of the timing of the acceleration peaks of consonantal articulators, we can conclude that the laryngeal movements seem to affect the movements of the whole oral cavity, including the lips and tongue tip (Svensson Lundmark et al. 2021; Svensson Lundmark 2022).

Figure 3. A schematized figure on the DASA approach: the relationship between acoustic segment boundaries and timing of deceleration/acceleration of lower lip, tongue tip, tongue body and jaw movements. The grey and white areas marked CVC are duration of the acoustic segments. The thick solid lines are the speech postures which start with a deceleration peak and end with an acceleration peak. Between speech postures the articulators travel fast (thin solid lines with arrows) Still unresolved questions are marked with dotted lines. The schematized figure is based on findings of Svensson Lundmark (2023) and Svensson Lundmark & Erickson (2024). Reproduced with permission from: M. Svensson Lundmark and D. Erickson, JSLHR, https://doi.org/10.1044/2024_JSLHR-23-00092, accepted; licensed under a Creative Commons Attribution (CC BY) license.

Furthermore, the mandible has a specific purpose in speech as it shapes the syllable (MacNeilage & Davis 1990), and is lowered more for some syllables than others, in connection with stress or emphasis (e.g. Erickson & Kawahara 2016, Erickson and Niebuhr 2023). We have found that the decelerating and accelerating movements of the mandible have a fixed timing relation to the other articulators (Fig 3), in that for a CV syllable, the deceleration peak of the jaw arrives after the deceleration peak of the segmental consonant constriction, while the jaw acceleration peak, as it lowers for the nucleus of the syllable, occurs earlier than the acceleration peak of the segmental constriction (Svensson Lundmark & Erickson 2024). Furthermore, when there is more prominence added to a syllable, we still see this same timing, with rapid movements associating with segment boundaries (Svensson Lundmark & Frid 2023); however, the force of the acceleration/decelerations increases with increased prominence (Svensson Lundmark 2024b). In other words, timing of jerk and acceleration, as an approach to analyzing articulatory dynamics, can account for elements of real speech such as prosodic features.

Towards new horizons in segmental articulation research

Over 100 muscles are active when we produce speech, and we have a unique ability to compensate with those muscles to get the message across. Whether we adjust for prosodic reasons, or because we’re smiling, laughing, shouting, or chewing on something, we can reformat articulation so that the intended speech sounds are still produced, even though the shape and base of the vocal tract has changed. The jerk and acceleration patterns of the crucial articulators shape the segment boundaries. But speech does not consist solely of constrictions by crucial articulators, i.e. the segments, and more research on the internal makeup of segments is needed. By using the inter-articulatory timing of jerky movements, we may be able to model the muscle activations of speech postures of different articulators and capture the variance (for example the prosodic variance) of the acoustic segment duration outcome.


References

Browman, C.P. & Goldstein, L. (1992). Articulatory phonology: an overview. Phonetica. 49 (3–4), 155–180.

Eager, D., Pendrill, A.-M., and Reistad, N. (2016). “Beyond velocity and acceleration: Jerk, snap and higher derivatives,” Eur. J. Phys. 37(6), 065008.

Elert, C.-C (1964). Phonologic Studies of Quantity in Swedish. Almqvist & Wiksell, Uppsala.

Erickson, D. & Kawahara, S. (2016) Articulatory correlates of metrical structure: Studying jaw displacement patterns. Linguistic Vanguard 2, pp.102-110. De Gruyter Mouton. DOI 10.1515/lingvan-2015-0025.

Erickson, D. and Niebuhr, O. (2023) Articulation of prosody and rhythm: Some possible applications to language teaching,  Studies in Laboratory Phonology. Language Science Press (langsci-press.org), pp.1-45. DOI: 10.2478/9788366675728-001.

Fant, G., & Lindblom, B. (1961). Studies of minimal speech sound units. Dept. for Speech, Music and Hearing Quarterly Progress and Status Report, 2(2), 15.

Fujimura, O. (2000). The C/D model and prosodic control of articulatory behavior. Phonetica 57, 128–138.

Gårding. (1967). Internal juncture in Swedish. Gleerup.

Jakobson, R. Fant, G. & Halle, M. (1969). Preliminaries to speech analysis. 8th printing. Cambridge: MIT Press

Jenkins, J. J., Strange, W., and Trent, S. A. (1999). “Context-independent dynamic information for the perception of coarticulated vowels,” J. Acoust. Soc. Am. 106(1), 438–448.

MacNeilage, P. F. (2008). The origin of speech. Oxford, England: Oxford University Press.

Ohala, J. J. (1992). “The segment: Primitive or derived?,” in Papers in Laboratory Phonoloy II: Gesture, Segment, Prosody, edited by G. J. Docherty and D. Robert Ladd (Cambridge University Press, Cambridge, UK).

Öhman, S. (1966). Coarticulation in VCV utterances: Spectrographic measurements. Journal of the Acoustical Society of America, 39(1), 151–168.

Stevens, K. N., & House, A. S. (1955). Development of a quantitative description of vowel articulation. J. Acoust. Soc. Am. 27(3), 484–493.

Stevens, K. N., and Blumstein, S. E. (1978). “Invariant cues for place of articulation in stop consonants,” J. Acoust. Soc. Am. 64(5), 1358–1368.

Strange, W. (1987). “Information for vowels in formant transitions,” J. Mem. Lang. 26(5), 550–557.

Svensson Lundmark, M. (2022). Evidence of segmental articulations: Acceleration determines vowel segment duration in Swedish Word Accents. Proc 1st International Conference of Tone and Intonation (TAI 2021), SDU, Sønderborg.

Svensson Lundmark, M. (2023). Rapid movements at segment boundaries. Journal of the Acoustical Society of America, 153(3), 1452–1467.

Svensson Lundmark, M. (2024a). Timing of acceleration peaks and acceleration changes. 205-208. Proc. 13th International Seminar of Speech Production, Autrans, France.

Svensson Lundmark, M. (2024b) Magnitude and timing of acceleration peaks in stressed and unstressed syllables. Proc. Interspeech 2024, 2630-2634.

Svensson Lundmark, M., & Frid, J. (2023). Segmental articulations across prosodic levels. In O. Niebuhr and M. Svensson Lundmark (Ed.), Proceedings of the 13th International Conference Nordic Prosody Conference: Applied and Multimodal Prosody Research (s. 255-261). Sciendo. https://doi.org/10.2478/9788366675728-023

Svensson Lundmark, M. and Erickson, D. (2024). Segmental and syllabic articulation. A descriptive approach. J Speech Lang Hear Res, https://doi.org/10.1044/2024_JSLHR-23-00092

Svensson Lundmark, M., Ambrazaitis, G., Frid, J., & Schötz, S. (2021). Word-initial consonant–vowel coordination in a lexical pitch-accent language. Phonetica, 78(5-6), 515–569. https://doi.org/10.1515/phon-2021-2014

Wood, S. (1979). A radiographic analysis of constriction locations for Vowels. J. Phon. 7(1), 25–43.

Zsiga, E. C. (1994). Acoustic evidence for gestural overlap in consonant sequences. Journal of Phonetics, 22(2), 121–140. https://doi.org/10.1016/S0095-4470(19)30189-5