Return to list
Articulatory Models: The Fujimura C/D model
Donna Erickson | Haskins Laboratories, New Haven

A growing number of models have been proposed to describe articulation of spoken language. This dictionary gives a brief description of a handful of these models: (1) Browman and Goldstein’s Articulatory Phonology, (2) The Fujimura C/D model, (3) The Turk- Shattuck-Hufnagel XT/3C model, (4) The Erickson Articulatory Prosody Model, and (5) The Svensson Lundmark Segmental Articulatory Phonetics.

The Fujimura C/D model. This model was proposed by Osamu Fujimura (1927-2017) to explain the temporal organization of speech, with a heavy weighting on the rhythmic organization of speech. It is based in large part on intensive examination of articulatory patterns observed in X-Ray Microbeam articulatory data. Essentially, the C/D model is “a physicist’ approach to tackle a complex system that has aspects of discrete - symbolic - information processing and physical movement as well as sound production at the end” (Reiner Wilhelms Tricario:pc). The model was called the Converter Distributor model (C/D model) because it took abstract prosodic and phonological information as its input, which was subsequently converted to strings of syllables; then the prosodic and phonological information was distributed to articulatory movements, which were implemented by control function/signal generators.

This brief account of the C/D model is divided into three sections: (1) An overview of the C/D model, highlighting some of the unique perspectives of the model, (2) a case study illustrating how the model accounts for utterance prominence and phrasing patterns, and (3) future work.

1. Overview of the C/D Model

Four key points to the C/D model’s approach to prosody are (1) the syllable is a basic unit of speech, (2) the “syllable magnitude” (syllable prominence) is a product of, to a large part, the metrical organization of the utterance, (3) increased prominence is implemented by increased articulatory strength, and (4) increased articulatory strength also yields larger phrase breaks within utterances. Thus, prosody (arrangement of syllable magnitudes) provides the input specifications for articulation of consonants, vowels, and phrasing.

The model’s novel approach to segments, e.g., consonants and vowels, is 2-fold: (1) phonemic segments, as such, don’t exist. There is no role for phonemic segments in this theory. Instead, speech articulation is specified in terms of pulses: (a) syllable pulses, the height of which describes the syllable’s numeric prominence level; the syllable pulses also include the vocalic features, and (b) Impulse Response Functions, pulses with feature specifications (manner, place and voicing) to describe syllable margins, i.e., onset and coda. In addition, (2) the strength of each set of syllable margin feature specifications is intrinsically connected to the strength/magnitude of the syllable pulse, which in turn determines the onset and offset of each syllable and results in various temporal spacings among the syllables, i.e., morpheme, word, foot, phrase, boundaries.

Figure 1 shows an early diagram of the C/D model to describe the articulation of the utterance, That’s wonderful (Fujimura et al. 1991). The prosodic phonological input is described in terms of a metrical tree plus utterance parameters with numeric controls, e.g., speed, formality, excitement, dialect, speaker age, specified by the small letters on the left of the figure. The strong-weak branches of a metrical tree, along the lines of Liberman and Prince (1977), describe the arrangement of syllable magnitudes. The beginning and end of the utterance is marked by $, and the phrase break after that’s is marked with %.

This phonological prosodic input is then converted, in the CONVERTER (next level) to a “base function”, with specifications of vowels, syllable magnitudes, boundary information, syllable margin, and also F0 information. Each of the syllables is represented by a syllable pulse, the height of which represents the syllable magnitude; the points where the sloping lines on each side of the syllable pulse touch the base time line indicate the beginning and ending of each syllable, and thus indicate the size of the boundaries between syllables. The feature sets (place, manner and voicing) for the syllable margins are written below, as are the nucleus (vowel) information. (A note about feature sets: ‘place’ refers to where the constriction in the vocal tract occurs; ‘manner’ refers to nature of the constriction, e.g. complete, partial; and voicing refers to vocal fold adduction; for more details of the symbols for the features, see Fujimura 1994.) The DISTRIBUTOR selects which “elemental gestures” are to be enlisted to implement the feature sets, and then a multidimensional set of ACTUATORS assembles the stored feature sets of the Impulse Response Functions and sends these to CONTROL FUNCTION/SIGNAL GENERATORS (not shown in Figure 1).

Figure 1. C/D Model diagram (Fujimura et al. 1991)

Figure 2 (from a later revision of the model, Fujimura 2002) provides an amplified look at the CONVERTER, illustrated by the syllable /kit/, displayed in terms of four levels/rows. The top row is syllable magnitude and vowel information, including blue onset and purple coda pulses—which notice, are the same height, i.e., magnitude, as the syllable pulse; the next row, the syllable margin feature (manner and place) information encoded in the Impulse Response Functions (IRFs), and the blue and purple arrows show how the pulses in the top layer affect the magnitude of the IRFs; the next row, the vocalic information; and the last row, the voicing information.

Looking at the top row, the height of the syllable pulse represents the syllable magnitude for this syllable, which according to the model is commensurate with the amount the mandible lowers for making this syllable. The pulse also includes information about the vowel nucleus. Notice that the “syllable triangle” is an isosceles triangle, with the two edges of the triangle indicating the start in time of the (abstract) syllable onset and coda. (Note that the isosceles triangle is a modification of the original thinking shown in Figure 1.) The next layer shows the IRF feature set {K, τ} to indicate a velar place (K), stop manner (τ) syllable onset, and an IFR feature set {T, τ} to indicate an apical place (T), stop manner (τ) as the syllable coda. (see Fujimura 1994 for a description of features and their symbols). The IRFs generate a response curve, the dashed blue and purple curved lines; note that the peak of the slope does not align with the IRF pulse, and the onset of the curve starts before, and ends after the pulse. The strengths of the IRFs are dictated by the magnitude of the onset and coda pulses, which are the same magnitude as that of the syllable pulse. The bold blue and purple horizontal lines for the syllable onset and coda indicate the duration of the closure period of the articulators for producing velar K and apical T, respectively. Notice the closure for the onset starts before the onset pulse and ends right at the coda pulse.

The next level is the vocalic level. The underlying base of the syllable is from the blue onset pulse to the purple coda pulse, marked by dashed red horizontal lines. Since the vocalic syllable pulse (red upward arrow) generates a tongue advancement which starts before, and ends after, the onset and coda pulses, the surface duration extends beyond the base duration.

The last level is the voicing level. This level specifies laryngeal adduction for the voicing feature (which is not marked in the feature specifications if the syllable margin IRF is voiceless). It is triggered by the magnitude of the onset and coda pulses, and the IRF pulses. The horizontal dashed blue line indicates the (abstract) duration of syllable voicing; the green curve indicates the surface laryngeal adduction curve, which, again, starts before, and ends after, the onset and coda pulses. The voiced portion of the syllable is indicated by the solid green horizontal bar, which starts at the green dashed vertical line marked “on” and ends with the green vertical dashed line marked “off”. As for the closure part of the stop, it starts with the green laryngeal adduction curve, and ends at the blue vertical line. The aspiration period is the distance between the end of the stop and the beginning of the voicing for the vowel, that is, VOT is displayed as the discrepancy between articulatory release of the stop constriction and voice onset of the vowel. As the magnitude of the syllable pulse/onset pulse affects the strength of syllable margin features (e.g., voicing), it follows that syllable magnitude also affects VOT (see Matsui 2017).

Figure 2. C/D diagram of ‘kit’ (Fujimura 2002)

Not shown in Figure 2 is the voice quality component of the C/D model. F0 is described as part of voice quality, which along with other types of voice qualities, “may play crucial roles in prosodic control” (Fujimura (2008: 316). The concept of F0 as part of voice quality opens the door to thinking of F0 as more than just an F0 contour displayed in a spectrogram, but rather part of the complicated source-filter interactions involved in producing different voice qualities (see e.g., Obert et al. 2023). However, this part of the model has not yet been developed. As for Japanese pitch-accents, a Fujisaki-type model was proposed in Fujimura (2008).

As mentioned in the beginning paragraphs, the specifications in the CONVERTER are fed to the DISTRIBUTOR and the ACTUATORS (and SIGNAL GENERATORS) in order to implement a speech utterance. These parts of the model are yet to be implemented, and many questions remain about each of the details of the C/D model.

2. Experimental application of C/D model

The C/D model is based on intensive examination of articulatory data, mostly XRMB data. However, no real attempt to implement the model has been done, except for research based on EMA data by Erickson and colleagues exploring the C/D model tenants concerning the mandible as the syllable articulator; specifically, that the amount of mandible lowering, aka mandible displacement, is commensurate to syllable magnitudes, as well as phrase break magnitudes, see e.g. Erickson et al. 2015. Figure 3 shows tracings of the “segmental articulators” i.e., Crucial Articulators (TD, TT, LL) and syllable articulator (mandible/jaw) for the utterance Pam said bat that fat cat at the mat, where bat is emphasized. The vowels in this utterance are all /ae/ vowels, except for /ɛ/ in said, yet each syllable shows a different amount of mandible lowering (i.e, jaw displacement). Based on the amount of mandible lowering (from the occlusal bite plane) for each syllable in the utterance, a string of syllable pulses is created. The placement of each syllable pulse in time is determined by an algorithm involving timing of maximum velocity of onset and coda Crucial Articulators (CA), see Erickson et al. 2015 for the matlab algorithm. Note in the C/D model, “iceberg points” were used instead of maximum velocity, but see discussion in Kim et al. 2015, Bonaventura and Fujimura 2007, Fujimura 1986, 2000)

Figure 3. Vertical position tracings of Crucial Articulators (TD, TT, LL) and mandible (jaw) for the utterance Pam said bat that fat cat at the mat.
Figure 4. Same as figure 3, but with velocity tracings (v) as well as vertical position tracings of Crucial Articulators (TD, TT, LL) and mandible (jaw) for the utterance Pam said bat that fat cat at the mat.

Figure 4 is like figure 3 but includes velocity tracings as well as vertical position tracings for the four CA and mandible. The yellow vertical lines on each side of the syllable mark the point in time of maximum velocities of the onset and coda CA, e.g., the red arrows marking the maximum velocity of the LL articulator for the onset of bat and fat and the blue arrows marking the maximum velocity of the TT articulator for the coda of bat and fat. The white lines in the center between the yellow maximum velocity lines mark the point in time where the syllable pulses occur. As for syllable boundaries/phrases breaks, these are commensurate with the distance between the contiguous yellow lines between syllables.

Figure 5 shows a string of syllable pulses and syllable boundaries for Pam said bat that fat cat at the mat, based on the articulatory data, i.e., patterns of jaw displacement and maximum velocity of Crucial Articulators. Notice that emphasized bat has the largest syllable pulse, as expected since it was the word emphasized in the utterance; the largest break occurs between the two phrases in the utterance. Figure 6 shows a possible input metrical organization for this utterance, generating the observed articulatory patterns.

Figure 5.
Figure 6. Possible input metrical organization for Pam said bat that fat cat at that mat, derived from observed articulatory patterns, and converted to syllable pulse triangles (top panel). The height of the pulses indicates syllable magnitude (prominence) and downward pointing arrows indicate morphological and syntactic breaks. The large arrow indicates the largest phrase break.
3. Future work

The C/D model proposes a number of novel approaches to understanding temporal organization of spoken utterances, and, in large part, it seems to be intuitively going in the right direction, i.e., prosodic organization is the input to speech, the syllable is the unit of prominence, prominence affects the articulation of syllable margin features and syllable boundaries. Work by Erickson and colleagues corroborates the prosodic input/syllable strength tenant of the model; the recent ongoing work by Svensson Lundmark with focus on patterns of acceleration of articulators appears to be a parallel supporting approach to that of the C/D model (Svensson Lundmark and Erickson 2024). The enticing aspect of the C/D model is that each tenant of the model is testable, which is waiting to be done. It is hoped that this brief simplified description of the C/D model will encourage researchers to continue testing and developing the model, fine-tuning the points that are useful, and discarding those that aren’t.


References

Bonaventura, P. and Fujimura, O. (2007). Articulatory movements and phrase boundaries. In: P. Beddor, J.J. Ohala, and J.M. Solé (eds), Experimental approaches to phonology, in honor of John Ohala (pp. 209–227). Oxford: Oxford University Press.

Fujimura, O. (1986). Relative invariance of articulatory movements. In: J. S. Perkell, D. H. Klatt (eds), Lawrence Erlbaum.

Fujimura, O. (1994). C/D model: A computational model of phonetic implementation. In E. Ristad (ed.), Language Computations. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol. 17, pp. 1-20. Providence, RI: American Mathematical Society.

Fujimura, O., Erickson, D., and Wilhelms, R. (1991) Prosodic effects on articulatory gestures--A model of temporal organization. Proceedings of the XIIth International Congress on Phonetic Sciences, 2, 26-29.

Fujimura, 0. (2000) "The C/D model and prosodic control of articulatory behavior." Phonetica 57(2-4), 128-138.

Fujimura. O. (2002). Temporal organization of speech utterance: A C/D model perspective, Cad.Est.Ling., Campinas, (43):9-36, Jul./Dez. 2002

Fujimura, 0. (2008) Pitch Accent in Japanese: Implementation by the C/D Model, SP2008; pp. 313-316.

Erickson, D., Kim, J., Kawahara, S., Wilson, I., Menezes, C., Suemitsu, A., and Moore, J. (2015) Bridging articulation and perception: The C/D model and contrastive emphasis. International Congress of Phonetic Sciences 2015.

Kim, J., Erickson, D., and Lee, S. (2015) More about contrastive emphasis and the C/D model. Special Issue on the C/D model Journal of Phonetic Society of Japan, 19.2, 44-54.

Matsui, F. M. (2017) On the Input Information of the C/D Model for Vowel Devoicing in Japanese. Journal of the Phonetic Society of Japan, Vol. 21 No. 1April 2017, pp. 127–140.

Obert, K., Yun, J., Erickson, D., Reeve, M., Rowson, H., Møller, K. (2023) Voice quality: Interactions among F0, vowel quality, phonation mode and pharyngeal narrowing, Studies in Laboratory Phonology. Language Science Press (langsci-press.org), pp.190-199. DOI: 10.2478/9788366675728-001.

Svensson Lundmark, M., Erickson, D. (2024) Segmental and syllabic articulation. A descriptive approach. J Speech Lang Hear Res. https://doi.org/10.1044/2024_JSLHR-23-00092