Voice quality

Donna Erickson | Haskins Laboratory

How to cite:

Erickson, Donna (2021) Voice quality. In: Speech Sciences Entries. Speech Prosody Studies Group. Disponível em: https://gepf.falar.org/entries/15

What is “voice quality”? John Laver, a pioneer in the study of voice quality, describes voice quality as changes in the general sounding of one’s voice, the “overall, auditory coloring of the voice”, without affecting the phonemic performance (Laver 1980). That is, a person can say “hello”, using a wide variety of voice qualities (e.g., angry, sad, tense, old, tired, etc.) and still be understood to be saying “hello”. Or, a person’s voice can be described in terms of such characteristics as normal/abnormal, good/bad, bright/dark, oral/nasal (Colton et al. 1981); probably, there is no limit to the number of descriptive terms to describe voices, and as such, humans have a very versatile vocal instrument. Voice quality in a person’s voice is, in musical terms, akin to timbre. Similar to how we know a 440 Hz tone played by a violin comes from a violin, not a trumpet with the same 440 Hz tone, so we know various things about the speaker who says “hello”. Some voice quality characteristics of the voice are due to what Laver refers to as “long-term habitual settings” of articulation of the vocal folds and the vocal tract, e.g., speaking with pursed lips, a high larynx, a backed-tongue. “Short-term changes” of articulation can convey more transient situations, such as emotional information and social affects, etc. to listeners. The study of voice quality in linguistics is a relatively new field. Part of this may be due to the difficulty in defining and measuring voice quality, and part, to the fact that voice quality is not part of the written spelling system of a language; it is not generally taught in school, nor is it is something speakers/listeners tend to be consciously aware of (see e.g., Fujimura et al. 1990), although most likely listeners react to negative or positive voice qualities, without necessarily knowing what triggered their reactions. Writers of fiction use words to describe voice quality, like for instance, he spoke with an icy voice, or she spoke with the voice of an angel, etc. But what exactly IS an icy voice or an angelic voice, etc.? What are the acoustic characteristics of different voice qualities? How do we measure voice quality? How do we produce voice quality? How do we perceive voice quality? What is the importance of voice quality?

How do we produce voice qualities?

Speech is the result of air passing from the lungs through the vibrating vocal folds in the larynx, and then on through the vocal tract, which acts as a filter to the sound produced at the vocal folds. The rate of vocal fold vibration determines the fundamental frequency of the speech sound, and the vocal tract shape determines which harmonics of the fundamental frequency are amplified to produce which vowel sounds. This is the essence of the source-filter theory, outlined by Fant (1970), where the source is the vocal folds, and the filter is the vocal tract. The timbre of the voice is also produced by articulatory settings in both the voice source and the voice filter (Laver 1980). If we change how we move various supralaryngeal articulators when we speak, such as lips, jaw, tongue, velum, pharynx, then voice quality changes. One way, for example, to have a voice quality more like Marilyn Monroe is to advance the tongue and spread the lips. If we change laryngeal settings, we can also change voice quality, such as whispery, creaky, falsetto, etc. voice. One current characteristic of some American female speakers is a creaky voice, especially at the end of sentences, brought about by changing how the vocal folds come together when phonating.

What are some acoustic characteristics of different voice qualities, and how do we measure them?

Source-related voice quality changes

Our ear can hear timbre, but how do we quantitatively measure it? A big problem is that voice quality is indeed, as Laver has said, a product of both the source and filter. One approach to analyzing, say, only the glottal source, is by inverse filtering, a method for filtering out the vocal tract contribution to a speech sound, leaving only the glottal source information, which can then be acoustically analyzed in terms of various glottal source parameters (e.g., Gobl and Ní Chasaide 2010). We can also examine glottal source characteristics by comparing two sounds with the same fundamental frequency (F0) but with different voice qualities, similar to comparing a violin note with that of a trumpet on the same note. For example, take two /a/ vowels with the same F0, but one has a more breathy voice quality than the other. One way to measure the breathiness is by examining the pattern of the amplitudes of the harmonics of the fundamental frequency produced by the vocal folds. Vocal folds that close completely for each cycle produce a sound with a lot of energy, evidenced by strong harmonics in the spectrograms extending into the higher frequencies, and the sound is strong and not-breathy. In contrast, when the vocal folds close incompletely, allowing lots of air to pass through the glottis, not much energy is in the signal, as shown by fewer harmonics in the upper frequencies, and the sound has a very breathy quality. A large difference between the amplitude (measured in dB) of the fundamental frequency (H1) and the next harmonic (H2) indicates a breathy voice quality. Note, however, since position in the utterance, speaker, gender, F0, etc., affect vocal fold approximation, it is important to keep as many things as possible constant when measuring and comparing H1-H2 values. For more details about H1-H2, and how to compare breathiness of different vowels across different speakers and situations, please see Hanson (1997), but also see Iseli et al. (2007). OQ (open quotient), the open portion of the vibratory cycle in proportion to the total cycle, can be estimated by H1-H2 values. A large H1-H2 indicates a large OQ, hence, more breathiness. EGG (electroglottograph, also referred to as laryngograph) is a non-invasive way to measure OQ, by measuring the degree of contact between vocal folds (see e.g., Rothenberg and Mahshie 1988). CQ (closed quotient) measures the closed portion of the vocal folds during a vibratory cycle.

In addition to breathy voice quality, there are many other source-related voice qualities, such as e.g., modal voice quality. One approach to acoustically assess breathy vs. modal quality is to measure the difference between the amplitude (dB) of the third formant frequency (A3) and that of the fundamental frequency (H1). Small H1-A3 values are linked to a more tense, stronger modal voice (Hanson 1997) because they show there is still high energy levels in the higher part of the spectrum. One approach to conceptually understanding harmonic amplitudes is via Hirano’s Body-cover Theory of Phonation (Hirano 1982). Modal voice involves vibration of both the cover and body of the vocal folds; falsetto, only the cover, or epithelium. When both cover and body vibrate, the vocal folds are thicker, the vocalis muscle is dominant, more mass is vibrating, resulting in abrupt changes in the airflow, which produces a louder sound with more energy in the harmonics in the higher frequency range; when the cover only vibrates, the cricothyroid is dominant, greater longitudinal tension of the ligament and faster vibrations, resulting in less abrupt changes in airflow, producing a sound higher in pitch and softer, with less energy in the harmonics of the high frequency range. For modal (body-cover phonation) voice, H1-A3 values are smaller than for falsetto (cover-only phonation), as are H1-H2 values (see e.g., Keating and Esposito 2007; Henrich et al. 2005). Female voices tend to be more breathy than male voices, because they have thinner vocal folds, and also, anatomically, women often have a chink (incomplete closure) toward the posterior of the vocal folds (e.g., Schneider-Strickler and Bigenzahn 2003). Spectral tilt, the attenuation pattern of harmonic energy in the higher frequencies can also be assessed by measuring H1-A3. Falsetto, as does breathy voice, shows a steep drop off of energy of the harmonics, reflected in large H1-A3 values, in addition to large H1-H2 values; modal voice, as does tense voice, shows a pattern of more sustained energy of the harmonics, reflected in small H1-A3 values, in addition to small H1-H2 values. Long-term acoustic measurements of voice quality, such as the Hammarberg Index, since they may be less prone to measurement error linked to various factors, are useful for analyzing large databases of e.g., emotional or dialectal speech (e.g., Eyben et al. 2015).

The story of vocal adduction/abduction is a complicated one, involving a number of laryngeal muscles together with air flow dynamics, etc. For a description of laryngeal muscles, see e.g., Hirose (2010) and Sawashima and Hirose (1983); for a description of air flow dynamics, see e.g., Stevens (2000; chapter 2). Also, for a good introductory book (in Portuguese) on acoustics phonetics, see Barbosa & Madureira (2015). For a comprehensive laryngoscopic description of voice quality changes such as creakiness, vocal fry, whisper, harshness, tenseness, etc. see Esling et al. (2020). For more detailed acoustic descriptions of acoustic characteristics of various voice qualities and how to measure them, see Gobl and Ní Chasaide (2010), also Kreiman and Gerratt (2000). However, the less technologically-sophisticated researcher can take advantage of free and open source acoustic analysis tools such as PRAAT¹ (Boersma 2001) or OpenSmile (www.audeering.com; Eyben & Schuller 2015).

Filter-related voice quality changes

Filter-related changes are acoustically evidenced as locations of increased energy of the harmonics in the spectrum. The shape of the vocal tract causes certain harmonics to be amplified, while others become reduced. Vowel formant frequencies, or resonant frequencies of the vocal tract, are amplified harmonics, resulting from positioning of the jaw and tongue articulators for producing specific vowels. Narrowing or expanding areas of the vocal tract, as well as lengthening/shortening the vocal tract, can result in changing amplification of spectral energy, resulting in changes in voice quality. The singing formant found in opera voice quality, in which singers narrow a region above the larynx to produce a boost-up of energy around 3kHz (Sundberg 1974), is an example of this. Using singing protocols is an approach to better understanding source-filter interactions contributing to voice quality changes (e.g., Miller and Schutte 2005; Titze et al. 2003; Henrich et al. 2005; Henrich- Bernardoni et al. 2014), since singers can control better than untrained subjects their laryngeal and supralaryngeal articulations. An articulatory-acoustic study of twang (Erickson, Yun et al. 2020), using a subject trained in the Estill method of voice (Estill et al. 2017), reports that twang voice quality is associated with a lateral to medial narrowing in the pharynx, resulting in increased energy in the 3-4 kHz range, as well as the range above 6 kHz. Story et al. (2001) reported that twang quality is associated with a shortened vocal tract, more widely opened lips, along with a narrowed oral cavity, contrasting with a yawny voice quality, produced with the oral cavity widened and vocal tract lengthened. Another study on twang-type (metallic) voice quality, using fiberscopic video pharyngolaryngoscopy, describes adjustments of velopharynx, pharynx, and larynx with concomitant acoustical changes in formant pattern (Hanayama et al. 2006). Twang, a quality of singing associated with American and Brazilian country-style singing, is also found in various speaking styles and ethnophonetic voices, such as Japanese cake seller voices (Sadanobu et al. 2016).

Source-filter interactions

The acoustic analysis of voice qualities is complicated by the fact that the source and filter are not totally independent, they interact (e.g., Fant and Lin 1987). Modeling studies using inverse filtering, such as the LF model (Fant 1995) or ARX-LF model (Li et al. 2019), offer an approach to estimating source and filter contributions to voice quality. The LF model parameters assess the overall shape of the glottal pulse, and can describe, for instance, tense, modal, creaky and whispery voice qualities, see e.g., Gobl and Ní Chasaide (2010). In order to reduce the source-tract interactions, the ARX-LF model first applies the LF model and then estimates the glottal source and vocal tract shape simultaneously, and the parameters are used to describe various voice qualities due to interactions between source and filter. An ARX-LF study in conjunction with MRI imaging reported oral cavity differences in two /i/ vowels produced at 500 Hz at mid larynx position with two different vocal fold masses (Erickson, Takano et al. 2020). Specifically, for the thick fold chest voice phonation, the vocal tract area was larger, with a more open mouth, a more raised, fronted, bunched tongue, resulting in a larger oropharynx and posterior oral cavity. Titze and Story (1997), using modeling to calculate source together with vocal tract filter shapes for a variety of geometries, report that narrowing of the epilaryngeal area interacts with the glottal source, showing an increase in the amplitude of the glottal signal. Understanding the interactive contributions of source and filter to voice quality changes continues to be a challenging field of research.

What is the importance of voice quality?

Voice quality is an intrinsic aspect of communication. Successful businessmen, politicians, charismatic speakers employ voice quality to sell their products and ideas (e.g., Niebuhr 2020). Martin Luther King’s voice was “more harmonic” than the average male in terms of harmonic to noise ratio (HNR) (https://www.quantifiedcommunications.com/blog/martin-luther-king-i-have-a-dream). Communication of emotions, both positive and negative, involves voice quality changes: hot angry voices are usually characterized as tense (no breathiness), with high energy in the upper frequencies (elevated spectral tilt), while sad voices have increased breathiness and little energy in the upper frequencies (steep drop off of spectral tilt), e.g., Banse and Scherer (1996), Gobl and Ní Chasaide (2003). Various types of social interactions, including politeness, surprise, sincerity, etc., are expressed with different types of voice qualities; for instance, politeness generally has a breathy quality, with a large OQ (e.g., Ito 2004 for Japanese politeness; Brown et al. 2014, Winter and Grawunder 2011 for Korean politeness). However, breathiness can be associated with a large number of affects, including happiness, surprise, etc. (e.g., Ishi et al. 2008). Even our personalities are expressed through our voice qualities; for instance, agreeable personalities were found to have more clear, not breathy or hoarse, voices (Erickson, Rilliard et al. 2018). An interesting study with synthetic speech, varying spectral tilt and OQ, among other parameters, working within the framework of Laver’s voice qualities, examined acoustic and perceptual characteristics of affective expressions, and found associations of Indignation and Fearlessness, with variations of tense voice, and Bored and Sad, with variations of lax/creaky & whispery voice (Yanushevskaya et al. 2018). Moreover, language backgrounds (i.e., English, Russian, Spanish, and Japanese) affected speakers’ sensitivities to voice quality parameters. With regard to linguistic-differences in perception of voice quality, a cross-cultural study on Japanese “cake-seller” voices reported that Japanese listeners prefer a voice with a slight twang (pharyngeal narrowing), while listeners from India prefer a voice without any twang (Erickson, Sadanobu et al. 2018). As for “seductive, flirtatious” voices, Japanese listeners prefer a non-breathy voice with high F0, while Americans, French, and Brazilian Portuguese prefer a lower, more breathy voice (Rilliard et al. 2013; Rilliard and de Moraes 2017; Erickson, Kawahara et al. 2020). See also https://www.youtube.com/watch?v=IcT29r33yB0 at 4’59’’- for a famous sensual voice of a well-known Brazilian female airport announcer making airport announcements; the same is observed at French airports (Léon 1993).

Voice quality analysis is useful for auditorily describing voice qualities in clinical, forensic, dialectal, etc. settings, including describing the voice quality characteristics of anime villains and heroes (e.g., Teshigawara 2004). The Voice Profile Analysis (VPA), based on Laver’s work, is a method for auditorily describing voice qualities (see e.g., Segundo & Monpean 2017; Segundo et al. 2019; Camargo et al. 2019), with raters showing good agreement for judging voice qualities (e.g., Huges and Kavanagh 2019).

Other Topics

Voice quality topics are myriad, ranging from linguistic (phonological, phonetic, prosodic) descriptions to clinical pathological uses to psychological, emotional and therapeutic applications (see e.g., Gobl and and Ní Chasaide 2010). There is no way that a short summary of voice quality can do justice to this complicated topic. I have offered here some tidbits that I think are important, in the hope they might inspire readers to be intrigued enough to follow up with more in-depth research into voice quality. My sincere apologies to the many excellent researchers on voice quality not mentioned here.

Acknowledgments

I am grateful to Professor Tommaso Raso for asking me to write this article about voice quality; I also thank Professor Albert Rilliard for his valuable comments and suggestions.

Notes

¹For a manual: see e.g., those proposed at Praat website: https://www.praat.org, or https://web.stanford.edu/dept/linguistics/corpora/material/PRAAT_workshop_manual_v421.pdf

Introductory Bibliography

Banse R. & Scherer, K. (1996). Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70(3), 614-636

Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International 5:9/10, 341-345.

Brown, L., Winger, B., Idemaru, K. Grawunder, S. (2014). Phonetics and politeness: Perceiving Korean honorific and non-honorific speech through phonetic cues. Journal of Pragmatics 66, 45—60

Camargo, Z., Madureira, S., dos Reis, N., and Rilliard, A. (2019). The phonetic approach of voice qualities: challenges in corresponding perceptual to acoustic descriptions. Tools and Resources for Speech Sciences Universidad de Málaga

Colton, R.H., Estill J. A. (1981). Elements of voice quality: perceptual, acoustic and physiologic aspects. Speech and Language: Advances in Basic Research and Practice. New York, NY: Academic Press Inc.; 311–403.

Esling, J. H., Moisik, S. R., Benner, A., Crevier-Buchman, L. (2020). Voice Quality: The Laryngeal Articulator Model Cambridge Studies in Linguistics, Series Number 162.

Estill, J., Steinhauer, K., & McDonald, M. (2017). The Estill Voice Model: Theory and Translation. Estill Voice International, LLC: Pittsburgh PA.

Erickson, D., Rilliard, A., de Moraes, J., Shochi, T. (2018). On the varying reception of speakers expressivity across gender and cultures, and inference in their personalities. In (eds: Qiang Fang, Jianwu Dang, Pascal Perrier, Jianguo Wei, Longbiao Wang, Nan Yan) Studies on Speech Production (pp. 3-13). Springer Publishers.

Erickson, D., Sadanobu, T., Zhu, C., Obert, K., Daikuhara, H. (2018). Exploratory study in ethnophonetics: Comparison of cross-cultural perceptions of Japanese cake seller voices among Japanese, Chinese and American English listeners. Speech Prosody 2018.

Erickson, D., Kawahara, S., Rilliard, A., Hayashi, R., Sadanobu, T., Li, Yongwei., Daikuhara, H., de Moraes, J., Obert, K. (2020). Cross cultural differences in arousal and valence perceptions of voice quality, Speech Prosody 2020.

Erickson, D., Yun, J., Gao, J., and Obert, K. (2020). Interaction between phonation mode and pharyngeal narrowing: A pilot EGG study. International Seminar of Speech Production 2020.

Erickson, D., Takano, S., Li, Y., Gao, J., Kawahara S., Obert K., Takahashi K., and Akagi M. (2020). Source and filter contributions to voice quality differences. International Seminar of Speech Production 2020.

Fujimura, O. Erickson, D. Bauer , H. (1990). Non-F0 correlates of prosody in free conversation. J. Acoust. Soc. Am., 88, S128.

Gobl, C. and Ní Chasaide, A. (2003). The role of voice quality in communicating emotion, mood and attitude. Speech Communication, 40, 189–212.

Hirose, H. (2010). Investigating the Physiology of Laryngeal Structures in William J. Hardcastle (ed.), John Laver (ed.), Fiona E. Gibbon (ed.) The Handbook of Phonetic Sciences, Wiley-Blackwell, pp. 130-152.

Ishi, C. T., Ishiguro, H., Hagita, N. (2008). The roles of breathy/whispery voice qualities in dialogue speech, Speech Prosody 2008, Campinas, Brazil.

Ito, M., 2004. Politeness and voice quality – The alternative method to measure aspiration noise, Speech Prosody 2004, 213-216.

Keating, P. A. and Esposito, C. (2007). Linguistic voice quality. UCLA Working Papers in Linguistics, No. 105, 85-91

Laver, J. (1980). The Phonetic Description of Voice Quality, Cambridge University Press. ISBN 0-521-23176-0.

Léon, P. (1993). Précis de phonostylistique, Paris, Nathan.

Niebuhr, O. (2020). "Space fighters" on stage - How the F1 and F2 vowel-space dimensions contribute to perceived speaker charisma. Proc. 31st Conference on Electronic Processing of Speech Signals, Magdeburg, Germany, 1-12.

Rilliard, A., Erickson, D., Shochi, T., de Moraes, J.A. (2013). Social face to face communication – American English attitudinal prosody, Interspeech 2013. Proceedings of Interspeech 2013. Lyon, 2013, 1648-1652.

Rilliard and J. A. de Moraes, (2017). Social affective variations in Brazilian Portuguese: a perceptual and acoustic analysis, Revista de Estudos da Linguagem, Belo Horizonte, vol. 25.3, pp. 1043-1074, DOI: 10.17851/2237-2083.25.3.1043-1074.

Sadanobu, T., Zhu, C., Erickson, D. & Obert, K. (2016). Japanese ‘street seller’s voice’”. Proc. Mtgs. Acoust. 29, 060003, doi: 10.1121/2.0000404.

Schneider-Strickler, B. and Bigenzahn, W. (2003). Influence of Glottal Closure Configuration on Vocal Efficacy in Young Normal-speaking Women. J. of Voice, 17.4, pp. 468-80.

Sundberg, J. (1974). Articulatory interpretation of the ‘singing formant’. J. Acoust. Soc. Am. 55, 834-44.

Teshigawara, M. (2004). Vocally expressed emotions and stereotypes in Japanese animation: voice qualities of the bad guys compared to those of the good guys. J. Phonet Soc. Jpn, 8, 60-76.

Winter, B., Grawunder, S. (2011). The polite voice in Korean: searching for acoustic correlates of contaymal and panmal. In: Sohn, H., Cook, H.M., O’Grady, W., Serafim, L.A., Cheon, S. (Eds.), Japanese/Korean Linguistics 19. CSLI, Stanford, pp. 419--431.

Advanced Bibliography

Eyben, F., & Schuller, B. (2015). openSMILE:) The Munich open-source large-scale multimedia feature extractor. ACM SIGMultimedia Records, 6(4), 4-13.

Eyben, F., Scherer, K. R., Schuller, B. W., Sundberg, J., André, E., Busso, C., Devillers, L., Epps, J. Laukka, P., Narayanan, S., and Truong, K. P. (2015). The Geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE transactions on affective computing, 7(2), 190-202.

Fant, G. (1970). Acoustic theory of speech production (2 ed.). Mouton: the Hague.

Gobl, C. and Ní Chasaide, A. (2010). Voice Source Variation and Its Communicative in William J. Hardcastle (ed.), John Laver (ed.), Fiona E. Gibbon (ed.) The Handbook of Phonetic Sciences, Wiley-Blackwell, pp . 378-423.

Hanayama, E. M., Camargo, Z. A., Tsuji, D. H., and Pinho, S. M. R. (2006). Metallic voice: Physiological and acoustic features. J. of Voice, 23.1, pp 62-70.

Hanson, H. M. (1997). Glottal characteristics of female speakers: Acoustic correlates. J. Acoust. Soc. Am., 101.1, 466–481. doi: 10.1121/1.417991

Henrich, N., d’Alessandro, C., Castellengo, M., & Doval, B. (2005). Glottal open quotient in singing: Measurements and correlation with laryngeal mechanisms, vocal intensity, and fundamental frequency, J. Acoust. Soc. Am. 117, 1417–1430.

Henrich-Bernardoni, N., Smith, J., & Wolfe, J. (2014). Vocal Tract resonances in singing: variation with laryngeal mechanism for male operatic singers in chest and falsetto registers, J. Acoust. Soc. Am. 135, 491-501.

Hirano, M. (1982). The role of the layer structure of the vocal fold in register control, Vox Humana, University of Jyvaskyla, pp. 50–62.

Huges, V., and Kavanagh, C. (2019). The use of the vocal profile analysis for speaker characterization: Methodological proposals. Journal of the International Phonetic Association, 49.3., pp. 353-380.

Iseli, M., Shue, Y.-L., Alwan, A. (2007). Age, sex, and vowel dependencies of acoustic measures related to the voice sourceThe J. Acoust. Soc. Am. 121, 2283 (2007); doi: 10.1121/1.2697522

Kreiman, J., Gerratt B. (2000). Measuring vocal quality. In R. D. Kent & M. J. Ball (Eds.), Voice quality measurement. San Diego: Singular Publishing Group, pp. 73–101.

Miller, D. G. & Schutte, H. K. (2005). Mixing the registers: Glottal source or vocal tract? Folia Phoniatr. Logop. 57, 278–291.

Rothenberg, M and Mahshie, J.J. (1988). Monitoring vocal fold abduction through vocal fold contact area. J Speech Hear Res. 31 (3): 338–51. doi:10.1044/jshr.3103.338.

Sawashima, M. & Hirose, H. (1983). Laryngeal gestures in speech production. In P. F. MacNeilage (ed.), The Production of Speech (pp. 11–38). New York: Springer.

Segundo, E. S. and Mompean, J. A. (2017). A simplified vocal profile analysis protocol for the assessment of voice quality and speaker similarity. J of Voice, 31.5, 644-661.

Segundo, E. S., Foulkes, P., French, P. Harrison, P., Hughes, V. and Kavanagh, C. (2019). The use of the Vocal Profile Analysis for speaker characterization: Methodological proposals. Journal of the International Phonetic Association, 49.3, 353- https://doi.org/10.1017/S0025100318000130

Stevens, K. N. (2000). Acoustic Phonetics. Cambridge: MIT Press.

Story, B.H., Titze, I.R. & Hoffman, E. A. (2001). The relationship of vocal tract shape to three voice qualities, J. Acoust. Soc. Am. 109,1651-1667.

Titze I. R. & Story B. (1997). Acoustic interaction of the voice source with the lower vocal tract. J Acoust Soc Am. 101.4, 2234–2243.

Titze, I. R., Bergan, C. C., Hunter, E.J. and Story, B. (2003). Source and filter adjustments affecting the perception of the vocal qualities twang and yawn, Logopedics Phoniatrics Vocology, 28:4, 147 – 155. DOI: 10.1080/14015430310018874.

Yanushevskaya, I., Gobl, C. and Ní Chasaide, A. (2018). Cross-language differences in how voice quality and f₀ contours map to affect. J. Acoust. Soc. Am., 44, 2730 (2018); https://doi.org/10.1121/1.5066448

Super Advanced Bibliography

Li, Y., Sakakibara, K. I. & Akagi, M. (2020). Simultaneous Estimation of Glottal Source Waveforms and Vocal Tract Shapes from Speech Signals Based on ARX-LF Model. J Sign Process Syst 92, 831–838. https://doi.org/10.1007/s11265-019-01510-4

Fant, G. (1995). The LF-model revisited Transformations and frequency domain analysis. Speech Trans. Lab. Q. Rep., Royal Inst. of Tech. Stockholm, 2(3), 119–156.

Fant, G., & Lin, Q. G. (1987). Glottal source-vocal tract acoustic interaction, Speech Transmission Laboratory–Quarterly Progress and Status Report, 1/1987 (Royal Institute of Technology, Stockholm, Sweden), pp. 13–27.