Return to list
Mathematical models in Phonetics
Michael O'Dell | University of Helsinki

Mathematical models represent simplifications of real world processes and phenomena for the purpose of increasing understanding. Simpler models are easier to understand, but more complex models give a better fit to the complex processes of the real world. Modeling is thus always a compromise between fidelity and understanding. Mathematical modeling can be seen as an additional step after delimiting and idealizing a real world phenomenon of interest into a conceptual model. Turning a conceptual model into a precise mathematical model means further abstraction and symbolic representation as mathematical objects for the purpose of applying established mathematical theory to obtain conclusions or predictions. These conclusions are often not immediately or intuitively obvious, which is why using mathematical models can amplify our understanding of the phenomena involved. The conclusions and predictions of the model can then be compared with fresh data, possibly leading to new observations which otherwise might have gone unnoticed. Often this comparison leads to a new round of modeling focused on locating and revising the aspects of the model responsible for observed discrepancies between predictions of the model and real world data.

The application of mathematics (as a general science of patterning) can be seen as looking for familiar patterns in novel instances or settings. Often the cyclical modeling process starts from familiarity with the patterns of various branches of mathematics in general and proceeds by considering whether there might be sensible applications in the specific empirical field of study. For example, in the history of phonetics, applying the mathematical theory used for mechanical springs in physics to thinking about speech production has led to characterizing articulatory movements using concepts such as stiffness and damping [cf. eg. 21, 4].


Because the study of speech is primarily concerned with concrete events and processes occurring at many different time scales, models of speech generally have a dynamical character. The mathematical theory of dynamical systems is very well developed, so there are many advantages to be gained by applying the tools of dynamical systems analysis to modeling real world phenomena including speech. Dynamical systems theory includes many useful concepts such as phase portrait, trajectory, and bifurcation. It also provides a compatible language in which to describe both physical and mental phenomena and their interaction [cf. 11, especially Ch. 5, Intentional Dynamics]. A very accessible classic introduction to dynamical systems theory with a minimum of formal mathematics and emphasizing visualization techniques is [1]. In phonetics, dynamical systems theory is an integral part of the very succesful theory of Task Dynamics and Articulatory Phonology [cf. eg. 5, 21, 7, 23].

A dynamical system can be thought of in terms of two components: a state space consisting of all the possible states the system could ever be in at any single time, combined with a rule associated with each possible state telling which state the system will be in next. One of the ways of characterizing such systems is to ask about long term behaviour. For instance there may be attracting states that the system tends to approach or states the system tends to avoid. A slightly more complex object is the so called saddle point, a state which is attracting from some directions, but repellent in other directions.

A concept from dynamical systems theory which has proved useful in phonetics is the so called limit cycle: a repetitious trajectory or state sequence that the system tends to settle into. Such oscillatory behavior is ubiquitous in speaking, from turn taking in conversation to the repetitious vibration of the vocal folds during voiced portions of speech. Quite complex behavior can arise in dynamical systems where several limit cycles interact (“coupled oscillators”), possibly leading to roughly synchronous behavior with complex rhythms [cf. eg. 19]. Coupled oscillator models have been used to model several aspects of speech timing [cf. eg. 3, 2, 15, 16].

An area of dynamical systems theory which has received a great deal of attention recently (for instance in the fields of machine learning and cognitive processing) concerns systems with many saddle points chained together into so called heteroclinic networks (also known as winnerless competition models). The neighborhoods of saddle points in such a system act as temporarily stable (or metastable) macro states: behavior which is initially attracted towards such a state is eventually repelled towards the next saddle point and its corresponding region of state space. Interestingly such a system is capable of (re)producing different sequences of behavior based on small changes in its parameters, leading to the marriage of reproducibility and flexibility of transient behavior [20]. Heteroclinic networks have not as yet received much attention in phonetics research, but perhaps the time is ripe. Applied to speech, it is easy to think of the saddle points as corresponding to sequential phonetic categories, such as consonants and vowels, syllables, articulatory gestures, etc. In a heteroclinic network the saddle point targets are approached (and corresponding catgories realized) in sequence, but transitions between categories are continuous, just as in real world speech.


General mathematical models include parameters which are unknown and need to be estimated from empirical data. One approach is to find values of the parameters which minimize an “error function” based on some measurable quantity (empirical data) compared with the corresponding values predicted by the model for given parameter values. This approach might be called a curve fitting approach, because empirical data can be seen as dictating certain points in a graph, and modeling as finding a curve to fit, or at least approximate the data points. Even though the model considered may be a deterministic model which does not formally incorporate randomness (i.e. it is not a stochastic model described using probability distributions), such a curve fitting approach implies at least some stochasticity (or randomness), at least if some (minimized) amount of error is tolerated without rejecting the model. At the very least it implies that the empirical measurements themselves are inexact and include some degree of uncertainty. A more explicit form of the model then would be a stochastic model in order to include this measurement uncertainty.

Another source of uncertainty is indeterminancy inherent in the process being modeled, in other words when the process itself is considered to be non-deterministic, so that future behavior can not be exactly predicted in principle, even given exact knowledge of previous behavior. Of course, in practice, exact knowledge of previous behavior and all the myriad small influences of the environment is impossible, so there is also error (modeled as random) caused by the multitude of real world factors that have been excluded from the simplified model. This means that mathematical models need to have stochastic elements allowing the model to be fit to real data.

Bayesian analysis of models and data. If the error function used to find optimal values for model parameters is considered to represent a probability distribution, then curve fitting can be seen as an attempt to find the most likely parameter values. In this setting the parameter values found are called maximum likelihood estimates (MLE).

Rather than settling only for the most likely values, Bayesian analysis attempts to find a complete probability distribution characterizing the uncertainty after taking empirical data into account. This so called posterior distribution thus shows not only which parameter values are likely, but also the degree of uncertainty still remaining. In order to estimate this distribution, however, it is necessary to start with a prior distribution (characterizing the uncertainty before taking empirical data into account) in addition to the data. The resulting posterior distribution can be viewed as an updated version of this characterization taking the new data into consideration.

In addition to measurement error, Bayesian analysis can be used to evaluate uncertainty associated with any unknown values in the model, including values of parameters at various hierarchical levels in the model, as well as missing data, once the prior distributions are specified. In practice, prior distributions are generally used which restrict values as little as possible (so called vague or uninformative prior distributions), so as to give the greatest weight possible to the data. For instance, Yoshida et al. [25] analyzed the results of perception experiments with Finnish and Japanese listeners using a Bayesian hierarchical logistic model with “vague priors that allowed a very broad range of values relative to the scale of the data, whereby the prior has minimal influence on the posterior distribution”. If the data being taken into account is at all relevant to the model in question, the posterior distribution is guaranteed to be more exact (“narrower”, more informative) than the prior distribution. An excellent general reference for probability theory in a Bayesian setting is [9]. A recent exposition of Bayesian methods in phonetics can be found in [24].

Kruschke [12] is an excellent guide to carrying out Bayesian analysis in practice. Typically a mathematically precise determination of the required posterior distribution is intractable, except in very simple special cases. Instead, the posterior distribution is represented by a large sample of values from that distribution. Any information of interest about the posterior distribution (probability that some parameter is greater than zero, probability that one parameter is greater than another, etc.) can be estimated from this sample. Computationally there are efficient methods (notably so called Markov Chain Monte Carlo or MCMC) for generating the sample. The size of the generated sample, and thus the accuracy of the representation, is limited only by the computation time used and the amount of storage space available.


In a deterministic dynamical system the rule associated with each state gives a unique result, which in turn means that if the system is in a particular state at a particular time, the entire future development will always be the same. From the point of view of the modeler, there may be uncertainty concerning the exact nature of the rules of change of a dynamical system, or concerning the exact state the system is in. Thus a statistical analysis is often appropriate, even in the case of a deterministic model.

Some models describe average behaviour in a system with so many interacting individuals that it is not feasible to follow the development of each individual. Instead the total population of individual behaviors is modeled using statistical tools. This strategy is used, for instance in physics to model the behavior of gases (thermodynamics), and in biology to model changes in animal populations. It is a possible strategy regardless of whether or not individual behaviour is considered to follow deterministic rules.

In phonetics, exemplar-theoretic models [cf. eg. 18, 10] apply this idea to explain how individual speakers can exhibit variable behavior apparently governed by probability distributions. The question has been raised, how can individuals store and manipulate such abstract distributions? Exemplar models are based on the idea that an individual accumulates a large collection of examples or exemplars, which can approximate a distrubtion (similar to the use of a large sample in Bayesian MCMC methods when exact probability distributions are intractable). Of course these exemplars (or traces of past experiences) are abstractions from the original concrete events, but it is assumed that they are rich in contextual information and associations. Therefore any statistical trend in the previous experience of the individual is automatically reflected in subsequent behavior, which is guided by the collection of previously accumulated exemplars. Among other effects, various phonetic categories emerge in the learner from relatively dense clouds of exemplars [for a recent example of this type of analysis, cf. 22]. Accumulated exemplars of past communicative experience are intimately connected to the communicative network an individual is emersed in, and in sociophonetics exemplar theory is readily extended from individuals to entire populations of speakers and their social interactions [cf. eg. the articles in 8].

A deeper form of stochasticity arises in dynamical models that contain inherent uncertainty regarding the future development from any state. In such a system the exact trajectory starting from a given state would vary from one instantiation to the next, even if all relevant parameters were known exactly. Whereas dynamical systems are often modeled with the help of differential equations, such stochastic systems may be modeled using stochastic differential equations. In some cases actual individual trajectories in such models can be thought of as belonging to an inifinite dimensional probability distribution, with each point in time representing a separate dimension (eg. in Gaussian process theory).

In the usual case of a continuous state space dynamical system, the number of states will be infinite, and the rule of change is a vector determining the direction and magnitude of change at each state. With a stochastic rule of change the system develops like a (drunken) walker that takes steps in random directions (and such systems are sometimes referred to as random walks), but more often in some directions (“downhill”) than others (“uphill”). Associated with state space then is a “landscape” of sorts that influences the direction of change without determining it completely.

In more complex models it is even possible for the landscape itself to exhibit dynamical behaviour, in other words for past behavior to influence the form of the landscape in the future. An early example of such a model is Grassé’s theory of stigmergy [6], first used to model insect path building behavior. Lam has called such models active walker models [13, 14] in opposition to other random walk models with an immutable landscape. The essential idea is that although the landscape influences action, so that some actions (or routes) are more attractive than others, past actions also change the landscape (locally), increasing the attractiveness of those actions in the future. Applied to language typology, such models could offer an explanation as to how it is possible for some languages to exhibit relative stability of nonoptimal phonetic patterns (such as large V.O.T. values for stops in a language where voicing and aspiration do not distinguish meaning). In addition to ease of articulation and ease of perception, phonetic behavior can be seen as shaped by ease of repetition—just as a naturally evolved path from A to B does not necessarily represent the theoretically optimal route between them.

  1. MORE MODELS ...

Obviously this is only the merest beginning of an exposition of types and features of mathematical models useful in phonetics, and an exhaustive list is certainly not even possible. It is hoped, however, that this small glimpse into mathematical descriptive tools will encourage the reader to think abstractly about concrete speech in order to discover new instances of well understood mathematical patterns, and perhaps also to discover new patterns in speech which call for new or refined mathematical models to aid in understanding.


[1]  Ralph Abraham and Christopher D. Shaw. Dynamics: The Geometry of Behavior. 2nd ed. Addison-Wesley, 1992.

[2]  Plínio A. Barbosa. “From Syntax to Acoustic Duration: A Dynamical Model of Speech Rhythm Production”. In: Speech Communication 49 (2007), pp. 725–742.

[3]  Plínio A. Barbosa and Sandra Madureira. “Toward a Hierarchical Model of Rhythm Production: Evidence from Phrase Stress Domains in Brazilian Portuguese”. In: [17], pp. 297–300.

[4]  Catherine P. Browman and Louis Goldstein. “Gestural Specification using Dynamically-Defined Articulatory Structures”. In: Journal of Phonetics 18 (1990), pp. 299–320.

[5]  Catherine P. Browman and Louis Goldstein. “Articulatory Phonology: An Overview”. In: Phonetica 3-4 (1992), pp. 155–180.

[6]  Pierre-Paul Grassé. “La Reconstruction du nid et les Coordinations Inter-Individuelles chez Bellicositermes Natalensis et Cubitermes sp. La théorie de la Stigmergie: Essai d’interprétation du Comportement des Termites Constructeurs”. In: Insectes Sociaux 6 (1959), pp. 41–84.

[7]  Sarah Hawkins. “An Introduction to Task Dynamics”. In: Papers in Laboratory Phonology II: Gesture, Segment, Prosody. Cambridge University Press, 1992, pp. 9–25.

[8]  Stefanie Jannedy and Jennifer Hay, eds. Journal of Phonetics 4 (2006): Modelling Sociophonetic Variation. (special issue).

[9]  Edwin T. Jaynes. Probability Theory: The Logic of Science. 2003. URL:

[10]  Keith Johnson. “Resonance in an Exemplar-Based Lexicon: The Emergence of Social Identity and Phonology”. In: Journal of Phonetics 4 (2006), pp. 485–499.

[11]  J. A. Scott Kelso. Dynamic Patterns: The Self Organization of Brain and Behavior. MIT Press, 1995.

[12]  John K. Kruschke. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan. 2nd ed. Waltham, MA: Academic Press, 2015.

[13]  Lui Lam. “Active Walks: The First Twelve Years (Part I)”. In: International Journal of Bifurcation and Chaos 8 (2005), pp. 2317–2348.

[14]  Lui Lam. “Active Walks: The First Twelve Years (Part II)”. In: International Journal of Bifurcation and Chaos 2 (2006), pp. 239–268.

[15]  Michael O’Dell and Tommi Nieminen. “Coupled Oscillator Model of Speech Rhythm”. In: [17], 1075–1078.

[16]  Michael O’Dell and Tommi Nieminen. “Coupled Oscillator Model for Speech Timing: Overview and Examples”. In: Nordic Prosody: Proceedings of the Xth Conference, Helsinki 2008. Ed. by Martti Vainio, Reijo Aulanko, and Olli Aaltonen. Peter Lang, 2009, pp. 179–189.

[17]  J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, and A. Bailey, eds. Proceedings of the XIVth International Congress of Phonetic Sciences. University of California, Berkeley. 1999.

[18]  Janet B. Pierrehumbert. “Exemplar Dynamics: Word Frequency, Lenition and Contrast”. In: Frequency and the Emergence of Linguistic Structure. Ed. by Joan Bybee and Paul Hopper. Amsterdam: John Benjamins, 2001, pp. 137–157.

[19]  Arkady Pikovsky, Michael Rosenblum, and Jürgen Kurths. Synchronization: A Universal Concept in Nonlinear Sciences. Cambridge University Press, 2001.

[20]  Mikhail I. Rabinovich, Ramón Huerta, Pablo Varona, and Valentin S. Afraimovich. “Transient Cognitive Dynamics, Metastability, and Decision Making”. In: PLoS Computational Biology 5 (2008), e1000072.

[21]  Elliot L. Saltzman and Kevin G. Munhall. “A Dynamical Approach to Gestural Patterning in Speech Production”. In: Ecological Psychology 4 (1989), pp. 333–382.

[22]  Antje Schweitzer. “Exemplar-Theoretic Integration of Phonetics and Phonology: Detecting Prominence Categories in Phonetic Space”. In: Journal of Phonetics 100915 (2019), pp. 1–20.

[23]  Juraj Šimko and Fred Cummins. “Sequencing and Optimization within an Embodied Task Dynamic Model”. In: Cognitive Science 3 (2011), pp. 527–562.

[24]  Shravan Vasishth, Bruno Nicenboim, Mary E. Beckman, Fangfang Li, and Eun Jong Kong. “Bayesian Data Analysis in the Phonetic Sciences: A Tutorial Introduction”. In: Journal of Phonetics 71 (2018), pp. 147–161.

[25]  Kenji Yoshida, Kenneth J. de Jong, John K. Kruschke, and Pia-Maria Päiviö. “Cross-language Similarity and Difference in Quantity Categorization of Finnish and Japanese”. In: Journal of Phonetics 50 (2015), pp. 81–98.